AI Glossary: What Is Self-Attention (SA)? Definition & Meaning

L'auto-attention est un mécanisme essentiel utilisé dans réseaux neuronaux, particularly in traitement du langage naturel (NLP) and vision par ordinateur. It enables a model to analyze and process input data by assessing the relationships and dependencies between various elements within the same input sequence. Unlike traditional methods that may focus on local context or fixed-length contexts, self-attention considers the entire input, allowing the model to dynamically focus on the most relevant parts.

The self-attention process works by transforming input data into three distinct vectors: queries, keys, and values. Each input element is represented as these three vectors, which are derived from the original input embeddings through learned linear transformations. The model calculates a score that measures how much attention each element should pay to every other element based on the produit scalaire des vecteurs de requête et de clé.

After obtaining these scores, they are normalized using a softmax function, turning them into attention weights that sum to one. These weights are then applied to the value vectors, creating a weighted sum that reflects the most relevant information for that particular input element. This process allows the model to capture long-range dependencies and contextual relationships, making it especially powerful in tasks like la traduction de langues, text summarization, and more.

One of the key advantages of self-attention is its parallelization capability, which allows for efficient computation, especially when handling long sequences. This has led to its adoption in various state-of-the-art architectures, such as the Transformer model, which has significantly advanced the field of apprentissage profond.