AI Glossary: AI Training Data Terms & Definitions

Curriculum Poisoning

Curriculum poisoning involves manipulating training data to degrade AI model performance.

Data Annotation Services provide labeled data for training AI models, essential for tasks like image recognition and natural language processing.

A data augmentation pipeline enhances training datasets by applying various transformations to improve AI model performance.

GC

The Gutenberg Corpus is a collection of texts from Project Gutenberg used for language processing and AI training.

Input space refers to the range of all possible inputs that an AI model can accept and process.

An input vector is a mathematical representation of data used to feed into machine learning models.

Label bias refers to the systematic errors in labeling data that can affect AI model performance.

Label uncertainty refers to the ambiguity in data labels used for training AI models.

Labeled data is annotated information used to train machine learning models, allowing them to learn patterns and make predictions.

Labeling functions are heuristics used to generate labels for data in machine learning tasks.

Manual annotation is the process of manually labeling data for training AI models, ensuring accuracy and precision in datasets.

Model input refers to the data fed into an AI model for processing and prediction.

A negative sample is a data point used in machine learning to represent an instance of the non-target class.

Network training involves teaching AI models to recognize patterns in data through iterative learning processes.

Observed data refers to the information collected through direct measurement or observation in various fields.