AI Glossary: What Is Momentum Update (MU)? Definition & Meaning

Momentum Update is a technique used in optimization algorithms for training machine learning models, particularly in the context of neural networks. It aims to accelerate the convergence of the training process by incorporating the concept of momentum from physics.

In traditional gradient descent, model parameters are updated by moving them in the direction of the negative gradient of the loss function. This can lead to slow convergence, especially in scenarios with high curvature or noisy gradients. Momentum Update addresses this issue by maintaining a running average of past gradients, allowing the optimization process to continue moving in the same direction when the gradients are consistent.

The core idea is to introduce a momentum term, which is typically a weighted sum of the previous gradients. This term helps to smooth out the updates, allowing for faster movement in flat regions and reducing oscillations in steep regions. Mathematically, the update rule can be expressed as:

v(t) = beta * v(t-1) + (1 - beta) * g(t)

where v(t) is the velocity (or accumulated gradient), beta is the momentum coefficient (usually between 0 and 1), and g(t) is the current gradient at time t. The parameters are then updated using:

w(t) = w(t-1) - learning_rate * v(t)

This approach not only speeds up the convergence but also helps to avoid local minima, making it a popular choice in training deep learning models. Variants of momentum methods, such as Nesterov Accelerated Gradient (NAG), build upon this idea by providing a look-ahead mechanism, further enhancing the optimization process.