Semi-überwachtes Lernen
Semi-überwachtes Lernen ist ein Ansatz des maschinellen Lernens that combines a small amount of gelabelte Daten with a large amount of unlabeled data during the training process. It sits between überwachten Lernens, where all Trainingsdaten ist gekennzeichnet, und unüberwachtes Lernen, bei dem keine gekennzeichneten Daten verwendet werden.
In many real-world scenarios, acquiring labeled data can be expensive and time-consuming, while unlabeled data is often abundant. Semi-supervised learning aims to leverage this wealth of unlabeled data to improve the learning accuracy of models. The fundamental idea is that even though unlabeled data does not provide explicit target values, it can still contain valuable information about the underlying structure of the Datenverteilung.
A common approach in semi-supervised learning is to use a model trained on the labeled data to make predictions on the unlabeled data. These predicted labels can then be used to refine the model further. Techniques such as self-training, co-training, and graph-based methods are often employed to facilitate this process. For instance, self-training iteratively adds high-confidence predictions for unlabeled data to the training set, while co-training involves training two models on different views of the data, allowing them to teach each other.
Diese Methode wurde in verschiedenen Anwendungen weit verbreitet eingesetzt, einschließlich der Verarbeitung natürlicher Sprache, image classification, and speech recognition, where labeled data can be scarce but unlabeled data is plentiful. By effectively utilizing both types of data, semi-supervised learning can lead to improved performance and generalization of machine learning models.