AI Glossary: What Is Semi-Supervised Learning (SSL)? Definition & Meaning

Aprendizaje Semi-Supervisado

El aprendizaje semi-supervisado es una aprendizaje automático fundamental that combines a small amount of datos etiquetados with a large amount of unlabeled data during the training process. It sits between aprendizaje supervisado, where all datos de entrenamiento está etiquetado, y el aprendizaje no supervisado, donde no se utilizan datos etiquetados.

In many real-world scenarios, acquiring labeled data can be expensive and time-consuming, while unlabeled data is often abundant. Semi-supervised learning aims to leverage this wealth of unlabeled data to improve the learning accuracy of models. The fundamental idea is that even though unlabeled data does not provide explicit target values, it can still contain valuable information about the underlying structure of the distribución de datos.

A common approach in semi-supervised learning is to use a model trained on the labeled data to make predictions on the unlabeled data. These predicted labels can then be used to refine the model further. Techniques such as self-training, co-training, and graph-based methods are often employed to facilitate this process. For instance, self-training iteratively adds high-confidence predictions for unlabeled data to the training set, while co-training involves training two models on different views of the data, allowing them to teach each other.

Este método ha sido ampliamente utilizado en varias aplicaciones, incluyendo procesamiento de lenguaje natural, image classification, and speech recognition, where labeled data can be scarce but unlabeled data is plentiful. By effectively utilizing both types of data, semi-supervised learning can lead to improved performance and generalization of machine learning models.