非局所ブロック
A 非局所ブロック is a type of ニューラルネットワーク層 designed to capture long-range dependencies in data, particularly in tasks involving sequences or images. Unlike conventional convolutional layers that focus on local regions, non-local blocks compute relationships between all positions in the input data.
In a typical non-local operation, each element in the input feature map is compared with every other element, allowing the model to learn contextual relationships across the entire 特徴空間. This is achieved through a mechanism that calculates attention scores, which weigh the contribution of each element based on its relevance to others.
非局所ブロックの基本構造は、主に三つの要素に分けられます。
- Query、Key、Value(QKV): The input is transformed into three separate representations. Queries represent the element currently being processed, while keys and values represent other elements within the same feature map.
- アテンションスコア 計算: The attention score between each query and key is computed, typically using a dot product followed by a softmax operation to normalize the scores across the feature dimensions.
- 出力生成: The output is generated by a weighted sum of the values, where the weights are determined by the attention scores.
Non-local blocks have become popular in various applications, including image classification, object detection, and 自然言語処理, as they enhance the model’s ability to consider global contextual information. However, they also come with a computational cost due to their quadratic complexity in relation to the input size, which can lead to increased memory usage and longer training times. Researchers continue to explore ways to optimize these blocks for efficiency.