In machine learning, a mini-batch is a small, randomly selected subset of the full training dataset that is used to update the model’s parameters during training. This approach allows for a balance between the efficiency of batch processing and the robustness of stochastic gradient descent (SGD). Instead of using the entire dataset for each parameter update, which can be computationally expensive and time-consuming, mini-batch training processes a fraction of the data at a time.
Mini-batches typically range in size from a few examples to several hundred, depending on the dataset and the model being trained. The choice of mini-batch size can significantly affect the training dynamics, convergence speed, and final model performance. A smaller mini-batch may lead to noisier updates but can provide a more exploratory training process, while a larger mini-batch may stabilize the convergence but at the cost of potentially missing the optimal solution.
Using mini-batches also helps leverage the parallel processing capabilities of modern hardware, such as GPUs, making the training process faster and more efficient. Additionally, mini-batch training facilitates online learning scenarios where data arrives in streams, allowing the model to update frequently without needing the entire dataset at once.
Overall, mini-batch training is a widely used technique in deep learning and other areas of machine learning, balancing computational efficiency with effective learning.