An échantillon hors distribution refers to a data point that falls outside the range of data the model was trained on. In the context of apprentissage automatique and intelligence artificielle, models are typically trained on a specific distribution of data, meaning they learn to make predictions based on patterns observed within that data. When the model is then presented with a sample that does not fit these learned patterns—often due to differences in the characteristics or features of that sample—it is considered to be out-of-distribution.
Les échantillons hors distribution peuvent poser des défis importants pour modèles d'IA, particularly in fields like image recognition or traitement du langage naturel. For example, if a model trained on images of dogs only sees pictures of dogs from a specific breed and then encounters an image of a cat, that image would be considered out-of-distribution. The model may struggle to make accurate predictions or may provide completely erroneous outputs in such cases.
Pour répondre aux problèmes liés aux échantillons hors distribution, les chercheurs et praticiens peuvent mettre en œuvre diverses stratégies, telles que :
- Augmentation de données: Enhancing the training dataset by introducing variations that mimic potential out-of-distribution scenarios.
- Adaptation de domaine: Techniques that allow models to adapt to new distributions without extensive retraining or additional labeled data.
- Formation adversariale: Training models with adversarial examples that can help improve their robustness against unexpected input.
Understanding and mitigating the impact of out-of-distribution samples is crucial for developing reliable and effective AI systems that can operate in real-world environments, where the data encountered may not always align with the données d'entraînement.