AI Glossary: What Is SGD With Momentum (SGD-M)? Definition & Meaning

モメンタム付きSGD (Stochastic 勾配降下法 with Momentum) is an advanced 最適化アルゴリズム used in 機械学習モデルのトレーニング. It builds upon the basic stochastic gradient descent (SGD) method by incorporating the concept of momentum, which helps to improve the convergence speed and stability of the training process.

In standard SGD, the model’s parameters are updated based solely on the gradient of the 損失関数 with respect to the parameters. This means that each update is influenced only by the most recent data point or mini-batch. While this approach can be effective, it often leads to oscillations and slower convergence, particularly in regions where the loss surface is steep or has a lot of noise.

Momentum addresses this issue by accumulating a velocity vector in the direction of the gradients. Essentially, it allows the model to build up speed in the direction of the 最適解 by considering past gradients when updating the weights. The update rule for SGD with momentum can be expressed mathematically as:

v_t = beta * v_{t-1} + (1 - beta) * gradient

theta_t = theta_{t-1} - learning_rate * v_t

Here, v_t is the velocity, beta is the momentum term (typically set between 0.5 and 0.9), gradient is the current gradient, and theta_t represents the updated parameters.

By incorporating momentum, the algorithm can make more significant progress in flat regions of the loss landscape while avoiding excessive oscillation. This results in faster convergence and improves the 全体の効率性の一つです。