AI Glossary: What Is Out-of-Sample Error? Definition & Meaning

L'erreur hors-échantillon fait référence au taux d'erreur of a predictive model when applied to new, unseen data, which is not part of the data used for training the model. This metric is crucial in evaluating a model’s ability to generalize its findings to data outside the training set. In the context of apprentissage automatique and statistics, the distinction between in-sample and out-of-sample error is vital for understanding the reliability and performance of the model.

When a model is trained, it learns patterns and relationships within the training dataset. However, if the model performs well only on this données d'entraînement but poorly on nouvelles données, it may be overfitting, meaning it has learned noise or random fluctuations rather than the underlying distribution des données. Therefore, assessing out-of-sample error allows practitioners to verify that the model can make accurate predictions on data it has not encountered before.

Common methods for estimating out-of-sample error include cross-validation and holdout validation, where a portion of the data is reserved for testing after training the model on the remainder. The out-of-sample error is then calculated based on the model’s performance on this test set, providing insights into its predictive power and robustness dans des applications du monde réel.