Modell-Leckage refers to a situation in maschinellem Lernen and künstliche Intelligenz where information from outside the training dataset is inadvertently used in the des Modelltrainings führen process. This can lead to overly optimistic Leistungskennzahlen, as the model may appear to perform well during validation or testing phases, but fails to generalize when applied to unseen data.
Modell-Leckage kann auf verschiedene Weisen auftreten, zum Beispiel:
- Datenkontamination: This happens when the training dataset includes information that should have been kept separate, such as future data or labels that are not available in real-world scenarios.
- Merkmals-Leckage: This occurs when features used in the model are derived from data that will not be available at the time Vorhersage führen, was dem Modell einen unfairen Vorteil verschafft.
For example, if a model is trained to predict whether a patient will develop a disease based on medical history, but the training set includes outcomes from future patients, the model might learn from this future information, leading to skewed results.
To avoid model leakage, practitioners should ensure strict separation of training, validation, and test datasets, adhere to proper data handling protocols, and perform thorough checks for any potential contamination in the data. Effective strategies include using techniques such as cross-validation and careful Merkmalsauswahl to ensure that the model is trained on valid information only. Proper understanding and management of model leakage are essential for developing robust AI systems that can perform reliably in real-world applications.