AI Glossary: What Is Co-training? Definition & Meaning

Co-training is a machine learning technique that falls under the category of semi-supervised learning. It leverages multiple views of data to enhance the classification performance of models. In co-training, two or more classifiers are trained on different subsets or ‘views’ of the same data. Each view contains different features that provide complementary information about the data.

The process typically starts with a small amount of labeled data and a larger pool of unlabeled data. Initially, each classifier is trained on its respective view of the labeled data. Once trained, the classifiers can then be used to label additional unlabeled data. This newly labeled data is subsequently added to the training set of each classifier, allowing them to iteratively improve their performance.

Co-training is particularly effective when the views are conditionally independent given the class label, meaning that each view provides unique information that complements the others. For example, in text classification, one view might utilize the textual content while another might focus on the metadata associated with the document. This technique has been shown to work well in various applications, such as natural language processing and computer vision, where different representations of the same underlying data can yield better predictive performance.