AI Glossary: What Is Multi-Head Attention (MHA)? Definition & Meaning

Multi-Head Attention

Multi-Head Attention is a key component of various neural network architectures, particularly in the field of natural language processing and computer vision. This technique allows models to focus on different parts of the input data simultaneously, which enhances their ability to learn complex relationships within the data.

The core idea behind Multi-Head Attention is to apply multiple attention mechanisms, or ‘heads,’ in parallel. Each head independently computes attention scores based on the input data, allowing the model to capture various aspects of the information. These attention scores indicate how much focus should be placed on different words or elements in a sequence when making predictions or generating outputs.

In a typical attention mechanism, a set of queries, keys, and values are derived from the input data. The attention scores are calculated by taking the dot product of the queries and keys, followed by a softmax operation to derive weights. These weights are then used to compute a weighted sum of the values, producing an output that reflects the most relevant parts of the input.

With Multi-Head Attention, this process is repeated for each head, and the outputs are concatenated and transformed through a linear layer. This allows the model to learn from multiple perspectives, improving its performance on tasks such as translation, summarization, and image captioning.

Overall, Multi-Head Attention is a powerful mechanism that enhances the expressiveness of neural networks, enabling them to process and understand data more effectively.