Le sous-apprentissage est un problème courant dans apprentissage automatique and modélisation statistique where a model is too simplistic to accurately represent the data it is trying to learn from. This happens when the model has insufficient complexity, meaning it cannot capture the relationships and patterns inherent in the dataset.
Par exemple, si vous essayez d'ajuster un modèle linéaire to data that actually follows a complex, non-linear trend, the model will fail to learn the true structure of the data. This results in poor performance both on the training dataset and on new, unseen data. Essentially, an underfitted model has high bias and low variance, leading to generalized errors and inadequate predictions.
Il existe plusieurs raisons pour lesquelles le sous-apprentissage peut se produire :
- Complexité insuffisante du modèle : Using a model that is not complex enough, such as a régression linéaire modèle pour un problème non linéaire.
- Caractéristiques inadéquates : Not including enough relevant features in the model can prevent it from capturing essential patterns.
- Excessif Régularisation: Applying too much regularization can overly constrain the model, making it too simple.
To address underfitting, one can try to increase the complexity of the model by selecting a more sophisticated algorithm, adding relevant features, or reducing the regularization. Diagnosing underfitting typically involves evaluating the model’s métriques de performance, such as accuracy, precision, or loss, on both the training and validation datasets. If the model performs poorly on both, underfitting is likely the cause.