La co-formation est une d'apprentissage automatique that falls under the category of apprentissage semi-supervisé. It leverages multiple views of data to enhance the classification performance of models. In co-training, two or more classifiers are trained on different subsets or ‘views’ of the same data. Each view contains different features that provide complementary information about the data.
Le processus commence généralement avec une petite quantité de données étiquetées and a larger pool of unlabeled data. Initially, each classifier is trained on its respective view of the labeled data. Once trained, the classifiers can then be used to label additional unlabeled data. This newly labeled data is subsequently added to the training set of each classifier, allowing them to iteratively improve their performance.
Co-training is particularly effective when the views are conditionally independent given the class label, meaning that each view provides unique information that complements the others. For example, in text classification, one view might utilize the textual content while another might focus on the metadata associated with the document. This technique has been shown to work well in various applications, such as traitement du langage naturel and computer vision, where different representations of the same underlying data can yield better predictive performance.