Background Stochastic gradient descent

- January 15, 2013

both statistical estimation , machine learning consider problem of minimizing objective function has form of sum:

q
(
w
)
=

1
n

∑

i
=
1

n

q

i

(
w
)
,

{\displaystyle q(w)={\frac {1}{n}}\sum _{i=1}^{n}q_{i}(w),}

where parameter

w

{\displaystyle w}

minimizes

q
(
w
)

{\displaystyle q(w)}

estimated. each summand function

q

i

{\displaystyle q_{i}}

typically associated

i

{\displaystyle i}

-th observation in data set (used training).

in classical statistics, sum-minimization problems arise in least squares , in maximum-likelihood estimation (for independent observations). general class of estimators arise minimizers of sums called m-estimators. however, in statistics, has been long recognized requiring local minimization restrictive problems of maximum-likelihood estimation. therefore, contemporary statistical theorists consider stationary points of likelihood function (or zeros of derivative, score function, , other estimating equations).

the sum-minimization problem arises empirical risk minimization. in case,

q

i

(
w
)

{\displaystyle q_{i}(w)}

value of loss function @

i

{\displaystyle i}

-th example, ,

q
(
w
)

{\displaystyle q(w)}

empirical risk.

when used minimize above function, standard (or batch ) gradient descent method perform following iterations :

w
:=
w
−
η
∇
q
(
w
)
=
w
−
η

∑

i
=
1

n

∇

q

i

(
w
)

/

n
,

{\displaystyle w:=w-\eta \nabla q(w)=w-\eta \sum _{i=1}^{n}\nabla q_{i}(w)/n,}

where

η

{\displaystyle \eta }

step size (sometimes called learning rate in machine learning).

in many cases, summand functions have simple form enables inexpensive evaluations of sum-function , sum gradient. example, in statistics, one-parameter exponential families allow economical function-evaluations , gradient-evaluations.

however, in other cases, evaluating sum-gradient may require expensive evaluations of gradients summand functions. when training set enormous , no simple formulas exist, evaluating sums of gradients becomes expensive, because evaluating gradient requires evaluating summand functions gradients. economize on computational cost @ every iteration, stochastic gradient descent samples subset of summand functions @ every step. very

effective in case of large-scale machine learning problems.

Search This Blog

MVC

Background Stochastic gradient descent

Comments

Post a Comment

Popular posts from this blog

United Kingdom List of armoured fighting vehicles by country

Episodes List of 2 Broke Girls episodes

Advert synopses Orson Welles Paul Masson adverts