O

Surapprentissage

Le surapprentissage est une erreur de modélisation où un modèle d'apprentissage automatique apprend le bruit au lieu du motif sous-jacent.

Surapprentissage

Dans le contexte de l’apprentissage automatique et modélisation statistique, overfitting refers to a scenario where a model learns not only the underlying patterns in the training data but also the noise and fluctuations that do not generalize to unseen data. This can lead to a model that performs exceptionally well on the training dataset but fails to make accurate predictions on new, unseen data.

Overfitting occurs when a model is too complex relative to the amount of training data available. For example, a model with a high number of parameters or layers can capture intricate details and subtle variations in the training data. However, if it captures too much of the noise, it loses its ability to generalize effectively.

Les symptômes courants du surapprentissage incluent :

  • Formation élevée accuracy mais une précision de validation/test faible : The model performs well on the training set but poorly on validation or test sets.
  • Modèles complexes : Models that are overly complex (like high-degree polynomial regression or deep réseaux neuronaux sans régularisation) sont plus sujets au surapprentissage.

Pour atténuer le surapprentissage, plusieurs techniques peuvent être employées :

  • Régularisation : Adding a penalty for complexity in the model (e.g., L1 or Régularisation L2) helps constrain the model’s capacity.
  • Validation croisée : Using techniques like k-fold cross-validation to ensure the model performs well across different subsets of the data.
  • Élagage: In decision trees and similar models, removing parts of the model that have little importance can help reduce overfitting.
  • Arrêt précoce: Monitoring the model’s performance on a validation set during training and stopping when performance begins to decline.

En fin de compte, l'objectif dans la formation de modèles is to find a balance between underfitting (too simple a model) and overfitting, achieving a model that generalizes well to new data.

oEmbed (JSON) + /