AI Glossary: What Is Label Leakage? Definition & Meaning

Fuite de labels refers to a common issue in apprentissage automatique and formation de modèles d'IA where sensitive information about the target labels unintentionally influences the training process. This can lead to overly optimistic métriques de performance during l'évaluation de modèles et conduit finalement à une mauvaise généralisation aux données non vues.

Label leakage often occurs when the training dataset has features that are derived from the labels themselves or when the training and test datasets are not properly separated. For example, if a model is trained on data that includes future outcomes or derived metrics that correlate strongly with the labels, the model may learn to rely on this information rather than the true underlying patterns in the data.

To avoid label leakage, it is crucial to ensure that the training and test datasets are completely independent. This involves proper data preprocessing, including feature selection and engineering, to ensure that no information about the labels is inadvertently included in the features used for training. Techniques such as cross-validation can also help in identifying potential leakage by l'évaluation des performances du modèle à travers différents sous-ensembles de données.

Ultimately, understanding and preventing label leakage is vital for building robust and reliable modèles d'IA qui fonctionnent bien dans des applications du monde réel.