AI Glossary: Benchmarking Terms & Definitions

ARC Benchmark

ARC

ARC Benchmark is a suite for evaluating AI models based on their reasoning and understanding abilities.

GLUE is a benchmark for evaluating natural language understanding models across various tasks.

MMLU stands for Massive Multitask Language Understanding, a benchmark for evaluating AI language models.

TQA

TruthfulQA is a benchmark for evaluating the truthfulness of AI-generated responses.