AI Glossary: What Is GELU? Definition & Meaning

L'unité linéaire d'erreur gaussienne (GELU) est un fonction d'activation commonly used in apprentissage profond architectures, particularly in transformer models like BERT and GPT. It offers a smoother alternative to traditional fonctions d'activation telles que ReLU (Unité Linéaire Rectifiée) et sigmoïde.

The mathematical formulation of GELU combines both linear and nonlinear elements, allowing it to model complex motifs dans les données plus efficacement. La fonction peut être exprimée comme :

GELU(x) = x * P(X ≤ x) = x * 0.5 * (1 + erf(x / √2))

Ici, erf denotes the error function, and P(X ≤ x) represents the fonction de distribution cumulative of the standard normal distribution. This formulation means that GELU outputs a value that is zero for negative inputs and gradually increases for positive inputs, which helps in maintaining a flow of gradients during backpropagation.

One of the key advantages of GELU over ReLU is its ability to retain negative values, which can lead to better dynamiques d'apprentissage and more nuanced representations in the model. This characteristic helps reduce the risk of dead neurons, a common issue with ReLU where neurons can become inactive and stop learning.

Due to its mathematical properties and performance benefits, GELU has gained popularity in state-of-the-art models, contributing to advancements in traitement du langage naturel NLP, vision par ordinateur, et autres applications de l'IA.