La Unidad de Error Gaussiana Lineal (GELU) es un función de activación commonly used in aprendizaje profundo architectures, particularly in transformer models like BERT and GPT. It offers a smoother alternative to traditional funciones de activación como ReLU (Unidad Lineal Rectificada) y sigmoide.
The mathematical formulation of GELU combines both linear and nonlinear elements, allowing it to model complex patrones en los datos de manera más efectiva. La función puede expresarse como:
GELU(x) = x * P(X ≤ x) = x * 0.5 * (1 + erf(x / √2))
Aquí, erf denotes the error function, and P(X ≤ x) represents the función de distribución acumulada of the standard normal distribution. This formulation means that GELU outputs a value that is zero for negative inputs and gradually increases for positive inputs, which helps in maintaining a flow of gradients during backpropagation.
One of the key advantages of GELU over ReLU is its ability to retain negative values, which can lead to better dinámicas de aprendizaje and more nuanced representations in the model. This characteristic helps reduce the risk of dead neurons, a common issue with ReLU where neurons can become inactive and stop learning.
Due to its mathematical properties and performance benefits, GELU has gained popularity in state-of-the-art models, contributing to advancements in procesamiento de lenguaje natural (PLN), visión por computadora y otras aplicaciones de IA.