AI Glossary: What Is Out-of-Distribution Sample? Definition & Meaning

An amostra fora da distribuição refers to a data point that falls outside the range of data the model was trained on. In the context of aprendizado de máquina and inteligência artificial, models are typically trained on a specific distribution of data, meaning they learn to make predictions based on patterns observed within that data. When the model is then presented with a sample that does not fit these learned patterns—often due to differences in the characteristics or features of that sample—it is considered to be out-of-distribution.

Amostras fora de distribuição podem representar desafios significativos para modelos de IA, particularly in fields like image recognition or processamento de linguagem natural. For example, if a model trained on images of dogs only sees pictures of dogs from a specific breed and then encounters an image of a cat, that image would be considered out-of-distribution. The model may struggle to make accurate predictions or may provide completely erroneous outputs in such cases.

Para lidar com os problemas decorrentes de amostras fora da distribuição, pesquisadores e profissionais podem implementar várias estratégias, como:

Aumento de Dados: Enhancing the training dataset by introducing variations that mimic potential out-of-distribution scenarios.
Adaptação de Domínio: Techniques that allow models to adapt to new distributions without extensive retraining or additional labeled data.
Treinamento Adversarial: Training models with adversarial examples that can help improve their robustness against unexpected input.

Understanding and mitigating the impact of out-of-distribution samples is crucial for developing reliable and effective AI systems that can operate in real-world environments, where the data encountered may not always align with the dados de treinamento.