AI Glossary: What Is AdamW? Definition & Meaning

AdamW é um avançado algoritmo de otimização commonly used in training aprendizado profundo models. It builds on the popular otimizador Adam, which combines the benefits of two other extensions of stochastic gradiente descendente (SGD): AdaGrad and RMSProp. AdamW addresses a specific issue related to weight decay, a regularization technique that helps prevent overfitting by penalizing large weights.

In traditional Adam, weight decay is implemented by simply adding a penalty term to the função de perda during optimization. This can lead to suboptimal results because it affects the adaptive learning rates of the parameters. AdamW, introduced by Loshchilov and Hutter in 2017, modifies this approach by decoupling weight decay from the gradient updates. Instead of incorporating weight decay into the loss, AdamW applies it directly to the weights after the gradient update. This decoupling allows for more effective optimization, as the learning rates can be adjusted independently of the weight decay.

As a result, AdamW often leads to better generalization performance and faster convergence in training deep learning models, especially in tasks such as image classification and natural language processing. It is widely used in various frameworks like TensorFlow and PyTorch, making it a go-to choice for many practitioners in the campo de inteligência artificial.