AI Glossary: What Is GELU? Definition & Meaning

Die Gaußsche Fehler-Linear-Einheit (GELU) ist eine Aktivierungsfunktion commonly used in Deep Learning architectures, particularly in transformer models like BERT and GPT. It offers a smoother alternative to traditional Aktivierungsfunktionen wie ReLU (Rectified Linear Unit) und Sigmoid.

The mathematical formulation of GELU combines both linear and nonlinear elements, allowing it to model complex Muster in Daten effektiver erkennen. Die Funktion kann ausgedrückt werden als:

GELU(x) = x * P(X ≤ x) = x * 0,5 * (1 + erf(x / √2))

Hier ist erf denotes the error function, and P(X ≤ x) represents the kumulative Verteilungsfunktion of the standard normal distribution. This formulation means that GELU outputs a value that is zero for negative inputs and gradually increases for positive inputs, which helps in maintaining a flow of gradients during backpropagation.

One of the key advantages of GELU over ReLU is its ability to retain negative values, which can lead to better Lern-Dynamik and more nuanced representations in the model. This characteristic helps reduce the risk of dead neurons, a common issue with ReLU where neurons can become inactive and stop learning.

Due to its mathematical properties and performance benefits, GELU has gained popularity in state-of-the-art models, contributing to advancements in der Verarbeitung natürlicher Sprache (NLP), Computer Vision und andere KI-Anwendungen.