AI Glossary: What Is GELU? Definition & Meaning

A Unidade Linear de Erro Gaussiana (GELU) é uma função de ativação commonly used in aprendizado profundo architectures, particularly in transformer models like BERT and GPT. It offers a smoother alternative to traditional funções de ativação como ReLU (Unidade Linear Retificada) e sigmoid.

The mathematical formulation of GELU combines both linear and nonlinear elements, allowing it to model complex padrões em dados de forma mais eficaz. A função pode ser expressa como:

GELU(x) = x * P(X ≤ x) = x * 0.5 * (1 + erf(x / √2))

Aqui, erf denotes the error function, and P(X ≤ x) represents the função de distribuição acumulada of the standard normal distribution. This formulation means that GELU outputs a value that is zero for negative inputs and gradually increases for positive inputs, which helps in maintaining a flow of gradients during backpropagation.

One of the key advantages of GELU over ReLU is its ability to retain negative values, which can lead to better dinâmicas de aprendizado and more nuanced representations in the model. This characteristic helps reduce the risk of dead neurons, a common issue with ReLU where neurons can become inactive and stop learning.

Due to its mathematical properties and performance benefits, GELU has gained popularity in state-of-the-art models, contributing to advancements in processamento de linguagem natural (NLP), visão computacional e outras aplicações de IA.