TDSM 9.15

From The Data Science Design Manual Wikia
Jump to: navigation, search

Based on the textbook, the concept would be

1) randomize the order of the training example once,

2) march down the training sample with given batch size,

3) before update the weight, compute the average batch loss value.

4) and after several iteration, these precomputed loss values would give you a hint that how the stochastic gradient descent perform and whether it is converge.

Credit: [1]