O desequilíbrio de rótulos é um fenômeno encontrado em aprendizado de máquina and inteligência artificial where the classes in a dataset are not represented equally. This often occurs in classification tasks where one class may have significantly more examples than others, leading to an imbalance. For instance, in a dataset used to train a model for detecting fraudulent transactions, there may be thousands of legitimate transactions for every single fraudulent one. This imbalance can severely impact the performance of the model, as it may become biased towards the classe majoritária and fail to accurately predict the classe minoritária.
The consequences of label imbalance include reduced model accuracy, increased false negatives for the minority class, and overall poor generalization to real-world scenarios where the distribution may differ from the training dataset. Techniques to mitigate label imbalance include resampling methods such as oversampling the minority class or undersampling the majority class, using geração de dados sintéticos methods like SMOTE (Synthetic Minority Over-sampling Technique), and employing algorithms specifically designed to handle imbalanced datasets.
Addressing label imbalance is crucial for developing robust AI systems, especially in fields such as healthcare, detecção de fraudes, and risk assessment, where the consequences of misclassification can be significant.