AI Glossary: What Is SGD With Momentum (SGD-M)? Definition & Meaning

SGD com Momentum (Stochastic Gradiente Descendente with Momentum) is an advanced algoritmo de otimização used in treinar modelos de aprendizado de máquina. It builds upon the basic stochastic gradient descent (SGD) method by incorporating the concept of momentum, which helps to improve the convergence speed and stability of the training process.

In standard SGD, the model’s parameters are updated based solely on the gradient of the função de perda with respect to the parameters. This means that each update is influenced only by the most recent data point or mini-batch. While this approach can be effective, it often leads to oscillations and slower convergence, particularly in regions where the loss surface is steep or has a lot of noise.

Momentum addresses this issue by accumulating a velocity vector in the direction of the gradients. Essentially, it allows the model to build up speed in the direction of the solução ótima by considering past gradients when updating the weights. The update rule for SGD with momentum can be expressed mathematically as:

v_t = beta * v_{t-1} + (1 - beta) * gradient

theta_t = theta_{t-1} - learning_rate * v_t

Here, v_t is the velocity, beta is the momentum term (typically set between 0.5 and 0.9), gradient is the current gradient, and theta_t represents the updated parameters.

By incorporating momentum, the algorithm can make more significant progress in flat regions of the loss landscape while avoiding excessive oscillation. This results in faster convergence and improves the eficiência geral do processo de treinamento, especialmente para modelos de deep learning.