AI Glossary: What Is Model Leakage? Definition & Meaning

Fuite de modèle refers to a situation in apprentissage automatique and intelligence artificielle where information from outside the training dataset is inadvertently used in the la formation de modèles process. This can lead to overly optimistic métriques de performance, as the model may appear to perform well during validation or testing phases, but fails to generalize when applied to unseen data.

La fuite de modèle peut se produire de différentes manières, telles que :

Contamination des données : This happens when the training dataset includes information that should have been kept separate, such as future data or labels that are not available in real-world scenarios.
Fuite de caractéristiques : This occurs when features used in the model are derived from data that will not be available at the time de la prédiction, donnant au modèle un avantage injuste.

For example, if a model is trained to predict whether a patient will develop a disease based on medical history, but the training set includes outcomes from future patients, the model might learn from this future information, leading to skewed results.

To avoid model leakage, practitioners should ensure strict separation of training, validation, and test datasets, adhere to proper data handling protocols, and perform thorough checks for any potential contamination in the data. Effective strategies include using techniques such as cross-validation and careful sélection de caractéristiques to ensure that the model is trained on valid information only. Proper understanding and management of model leakage are essential for developing robust AI systems that can perform reliably in real-world applications.