Stochastic Gradient Descent

The optimization algorithm used to train neural networks.

Published May 8, 2024 ET

Stochastic Gradient Descent (SGD) is an algorithm used to find the optimal weights and biases that minimize the loss function.

It iteratively adjusts the weights and biases in the direction that reduces the loss most rapidly. This process is known as "descent". This concept roughly explains how back propagation actually operates to eventually arrive at a net zero state.

Why "Stochastic"?

The "Stochastic" refers to the method's use of randomly selected subsets of the data to compute the gradient of the loss function during the optimization process, instead of using the entire dataset at once.

This random sampling introduces randomness into the gradient estimation, which is different each time.

References:

https://www.youtube.com/watch?v=S5AGN9XfPK4