AI Glossary: What Is Bahdanau Attention? Definition & Meaning

Attention Bahdanau

L'attention Bahdanau, également connue sous le nom d'attention additive, est un mécanisme utilisé dans réseaux neuronaux, especially in tâches de traitement du langage naturel like machine translation. It was introduced by Dzmitry Bahdanau and his colleagues in their 2014 paper, which aimed to improve the performance of sequence-to-sequence models.

The core idea behind Bahdanau Attention is to allow the model to dynamically focus on different parts of the input sequence at each step of the output generation process. Traditional sequence models, like réseaux neuronaux récurrents (RNNs), encode the entire input sequence into a fixed-size context vector. This can lead to difficulties in capturing long-range dependencies, as important information may be lost.

Bahdanau Attention addresses this issue by calculating a set of attention scores for the input sequence. For each jeton de sortie being generated, the model computes a context vector based on the relevant parts of the input sequence. This involves three main components: a scoring function that evaluates how well each input element aligns with the current output state, a softmax function that normalizes these scores to create a probability distribution, and a weighted sum of the input features based on these scores to create the context vector.

Ce vecteur de contexte est ensuite utilisé avec l'état actuel du decoder to produce the next output token. By doing so, the model can selectively focus on the most relevant parts of the input, which significantly enhances its performance, especially in tasks involving long sequences.

Dans l'ensemble, l'attention Bahdanau a joué un rôle crucial dans l'avancement des capacités des réseaux neuronaux dans diverses applications, notamment la traduction, le résumé, et plus encore.