El co-entrenamiento es una técnica de aprendizaje automático that falls under the category of aprendizaje semi-supervisado. It leverages multiple views of data to enhance the classification performance of models. In co-training, two or more classifiers are trained on different subsets or ‘views’ of the same data. Each view contains different features that provide complementary information about the data.
El proceso generalmente comienza con una pequeña cantidad de datos etiquetados and a larger pool of unlabeled data. Initially, each classifier is trained on its respective view of the labeled data. Once trained, the classifiers can then be used to label additional unlabeled data. This newly labeled data is subsequently added to the training set of each classifier, allowing them to iteratively improve their performance.
Co-training is particularly effective when the views are conditionally independent given the class label, meaning that each view provides unique information that complements the others. For example, in text classification, one view might utilize the textual content while another might focus on the metadata associated with the document. This technique has been shown to work well in various applications, such as procesamiento de lenguaje natural and computer vision, where different representations of the same underlying data can yield better predictive performance.