ゲーテッドリニアユニット(GLU)
ゲーテッドリニアユニット(GLU)は ニューラルネットワークの活性化関数 designed to enhance the model’s ability to capture complex relationships in data by incorporating gating mechanisms. Introduced in a paper by Yann N. Dauphin et al. in 2017, GLUs help in improving the performance of deep learning architectures, particularly in tasks related to 自然言語処理 (NLP)やその他の逐次データ。
GLUは、線形変換と情報の流れを制御するゲートを統合して動作します。GLUの基本的な式は次のように表されます:
GLU(x) = (W_1 * x) ⊗ σ(W_2 * x)
In this formula, W_1 and W_2 are weight matrices, x is the input data, and σ represents the sigmoid activation function. The output of the GLU is the element-wise product of a 線形変換 and a gating mechanism, which allows the model to learn which features to focus on while ignoring others.
The gating mechanism provides a way to control the information from the input that gets passed forward through the network, allowing for more effective learning and improved gradient flow. This is particularly useful in deep networks where 消失勾配 は重大な問題となることがあります。
GLUs are often used in combination with other neural network layers, such as convolutional or recurrent layers, to enhance their performance. They can be seen as an evolution of traditional 活性化関数 like ReLU, as they add a layer of complexity and adaptability that helps in various applications, including language modeling, machine translation, and more.