Explore 24 AI terms in Datasets
BoolQ is a dataset for evaluating machine learning models on yes/no questions based on passages.
The C4 Dataset is a large-scale, curated dataset for training language models, derived from web content.
CIFAR is a dataset widely used for training machine learning models in computer vision tasks.
COCO is a large-scale dataset for image recognition, segmentation, and captioning in AI applications.
CoNLL 2003 is a dataset used for evaluating named entity recognition systems in natural language processing.
A DROP Dataset is a collection of data used for training AI models, focusing on reasoning and problem-solving tasks.
DuReader is a large-scale Chinese reading comprehension dataset designed for training AI models.
HotpotQA is a benchmark dataset for evaluating AI models on multi-hop question answering tasks.
JaQuAD is a dataset designed for evaluating question answering systems using natural language.
KorQuAD is a Korean language dataset for question-answering tasks in natural language processing.
LAION-400M is a large-scale dataset containing 400 million image-text pairs for AI training and research.
LAION-5B is a large-scale dataset for training AI models, consisting of 5 billion image-text pairs.
LFW Dataset is a collection of labeled face images used for facial recognition research.
MNIST is a dataset of handwritten digits used for training image processing systems.
MNIST Digit refers to handwritten digits in a standard dataset used for training image processing systems.
MS COCO is a large-scale dataset for image recognition and segmentation in AI research.
The MUMFORD Dataset is a collection of annotated images for evaluating machine learning models in computer vision tasks.
The Open Images Dataset is a large collection of annotated images for training computer vision models.
OpenWebText is a dataset designed for training AI language models using content from the web.
The RACE Dataset is a large-scale dataset for evaluating reading comprehension in AI models.
The Pile is a large dataset used for training AI language models, consisting of diverse internet texts.
TriviaQA is a large-scale dataset for training AI models on open-domain question answering using trivia questions.
Visual Genome is a large-scale dataset for training AI on image understanding and visual reasoning.
Waymo Open Dataset is a large-scale dataset for autonomous vehicle research, featuring diverse sensor data and labeled scenarios.