AI Glossary: What Is Mini-Batch Gradient Descent? Definition & Meaning

Mini-Batch Gradient Descent is a variant of the traditional gradient descent optimization algorithm commonly used in machine learning and deep learning. Unlike full-batch gradient descent, which computes the gradient of the loss function using the entire training dataset, mini-batch gradient descent divides the dataset into small batches. The model’s weights are updated after each mini-batch, providing a balance between the stability of full-batch updates and the efficiency of stochastic gradient descent.

This approach helps in speeding up the training process by making use of vectorized operations, which are well-optimized in modern computing environments. By using batches, the algorithm can take advantage of parallel processing capabilities of hardware such as GPUs. Additionally, mini-batch gradient descent introduces a certain level of noise into the training process, which can help the model escape local minima and potentially lead to better generalization on unseen data.

The size of the mini-batch is a hyperparameter that can significantly affect the model’s performance. Smaller batch sizes can lead to more noisy updates, which may help with exploration of the loss landscape, while larger batch sizes provide a more accurate estimate of the gradient. A common practice is to experiment with different batch sizes during the training process to find the most effective configuration.