AI Glossary: What Is Gradient Variance (GV)? Definition & Meaning

勾配分散 refers to the variability or spread of the gradients computed during the training of 機械学習 models, particularly in the context of 最適化アルゴリズム such as stochastic 勾配降下法（SGD）などの

In 深層学習, the training process involves adjusting the weights of the model to minimize the 損失関数. This is done by calculating the gradients of the loss with respect to the model parameters. However, when using mini-batches of data, the gradients can vary significantly from one batch to another due to the randomness in the data selection. This variability is referred to as gradient variance.

高い勾配分散度は、訓練プロセスの不安定さにつながる可能性があります。例えば、勾配が大きく変動すると、モデルが適切に収束しない場合があり、性能が最適でなくなることがあります。一方、低い勾配分散度は、モデルのパラメータの更新がより一貫して行われることを示し、より速い収束を促進します。

To address the issues caused by high gradient variance, practitioners often implement techniques such as 勾配クリッピング, which limits the size of the gradients, or use more advanced optimizers that adapt the learning rate based on the variance of the gradients. Understanding and managing gradient variance is crucial for effectively training deep learning models and achieving desired performance levels.