AI Glossary: What Is GELU? Definition & Meaning

The Gaussian Error Linear Unit (GELU) is an activation function commonly used in deep learning architectures, particularly in transformer models like BERT and GPT. It offers a smoother alternative to traditional activation functions such as ReLU (Rectified Linear Unit) and sigmoid.

The mathematical formulation of GELU combines both linear and nonlinear elements, allowing it to model complex patterns in data more effectively. The function can be expressed as:

GELU(x) = x * P(X ≤ x) = x * 0.5 * (1 + erf(x / √2))

Here, erf denotes the error function, and P(X ≤ x) represents the cumulative distribution function of the standard normal distribution. This formulation means that GELU outputs a value that is zero for negative inputs and gradually increases for positive inputs, which helps in maintaining a flow of gradients during backpropagation.

One of the key advantages of GELU over ReLU is its ability to retain negative values, which can lead to better learning dynamics and more nuanced representations in the model. This characteristic helps reduce the risk of dead neurons, a common issue with ReLU where neurons can become inactive and stop learning.

Due to its mathematical properties and performance benefits, GELU has gained popularity in state-of-the-art models, contributing to advancements in natural language processing (NLP), computer vision, and other AI applications.