ザビエル Initialization, also known as Glorot Initialization, is a technique used to set the initial weights of artificial ニューラルネットワーク. Developed by Xavier Glorot and Yoshua Bengio, this method aims to address the problem of vanishing and 爆発勾配, which can occur during the training of deep networks.
The core idea behind Xavier Initialization is to maintain a consistent variance of the activations throughout the layers of the network. When a ニューラルネットワーク is initialized with weights that are too small, the signals can diminish as they propagate through layers, leading to 消失勾配. Conversely, if the weights are too large, the signals can explode, resulting in instability during training.
To implement Xavier Initialization, weights are typically sampled from a distribution (either uniform or normal) with a specific variance. The recommended approach is to draw weights from a uniform distribution within the range of [-sqrt(6 / (fan_in + fan_out)), sqrt(6 / (fan_in + fan_out))], where fan_in is the number of input units in the weight tensor, and fan_out is the number of output units. This formula ensures that the weights are scaled appropriately based on the layer’s size, helping to keep the signal flowing through the network at a manageable level.
Xavier初期化は特に効果的です 活性化関数 like the hyperbolic tangent (tanh) or logistic sigmoid, which are sensitive to the scale of the input. By starting with well-scaled weights, networks are more likely to converge quickly and effectively during training, leading to better performance and reduced training time.