Agrupamento por Bootstrap (Bagging)
Agregação por bootstrap, comumente referida como bagging, is an ensemble de aprendizado de máquina designed to enhance the accuracy and robustness of predictive models. The core idea behind bagging is to reduce the variance of a model by averaging multiple models trained on different subsets of the data.
The process begins by creating multiple subsets of the original training dataset. Each subset is generated through a method called ‘bootstrapping’, which involves randomly sampling the data with replacement. This means some data points may appear multiple times in a single subset while others may be omitted. As a result, each subset can vary significantly, providing diverse perspectives on the underlying distribuição de dados.
Once the subsets are created, an identical model (often a decision tree) is trained on each subset. After training, predictions are made by aggregating the outputs of all models. In a regression scenario, the final prediction is typically the average of all individual predictions. For classification tasks, the most common approach is to use votação majoritária, where the class predicted by the majority of models is chosen as the final output.
Bagging is particularly effective for high-variance models, like decision trees, as it helps to stabilize their predictions. This technique not only improves accuracy but also reduces overfitting, making the model more generalizable to unseen data. A well-known example of bagging in practice is the Floresta Aleatória algorithm, which constructs a multitude of decision trees and merges their predictions for improved accuracy.