Apprentissage semi-supervisé
L'apprentissage semi-supervisé est une approche d'apprentissage automatique that combines a small amount of données étiquetées with a large amount of unlabeled data during the training process. It sits between apprentissage supervisé, where all données d'entraînement est étiqueté, et l'apprentissage non supervisé, où aucune donnée étiquetée n'est utilisée.
In many real-world scenarios, acquiring labeled data can be expensive and time-consuming, while unlabeled data is often abundant. Semi-supervised learning aims to leverage this wealth of unlabeled data to improve the learning accuracy of models. The fundamental idea is that even though unlabeled data does not provide explicit target values, it can still contain valuable information about the underlying structure of the distribution des données.
A common approach in semi-supervised learning is to use a model trained on the labeled data to make predictions on the unlabeled data. These predicted labels can then be used to refine the model further. Techniques such as self-training, co-training, and graph-based methods are often employed to facilitate this process. For instance, self-training iteratively adds high-confidence predictions for unlabeled data to the training set, while co-training involves training two models on different views of the data, allowing them to teach each other.
Cette méthode a été largement utilisée dans diverses applications, y compris traitement du langage naturel, image classification, and speech recognition, where labeled data can be scarce but unlabeled data is plentiful. By effectively utilizing both types of data, semi-supervised learning can lead to improved performance and generalization of machine learning models.