AI Glossary: What Is In-Distribution Data? Definition & Meaning

Données en Distribution is a term utilisé en apprentissage automatique and intelligence artificielle to describe data that is drawn from the same distribution as the dataset used to train a model. This concept is crucial for evaluating the performance and reliability of modèles d'IA, as they are typically designed to make predictions based on the patterns learned from their données d'entraînement.

When a model is trained, it learns to recognize patterns, features, and relationships within the training dataset. In-distribution data helps ensure that the model’s predictions remain accurate and relevant. For instance, if a model is trained on images of cats and dogs from a specific set of environments, it is expected to perform well when presented with new images of cats and dogs from similar environments—that is, the in-distribution data.

À l'inverse, les données qui se situent en dehors de la distribution d'entraînement sont appelées hors distribution (OOD) data. Models often struggle with out-of-distribution data because they have not encountered these scenarios during training. As a result, the predictions made on OOD data may be less reliable, leading to potential errors or misclassifications.

Understanding the distinction between in-distribution and out-of-distribution data is vital for AI practitioners, as it influences model evaluation, robustness, and generalization capabilities. Techniques such as adaptation de domaine or l'apprentissage par transfert are often employed to améliorer la performance du modèle sur les données OOD en comblant le fossé entre différentes distributions de données.