AI Glossary: Evaluation Metrics Terms & Definitions

BLEU Score

BLEU

BLEU Score is a metric for evaluating the quality of text generated by AI, comparing it to reference translations.

CIDEr is a metric used to evaluate the quality of image captions by comparing them to human-written references.

GIFA

GIFA Loss is a metric used to evaluate generative models based on their ability to generate realistic samples.

IoU

Intersection over Union (IoU) measures the overlap between two bounding boxes in object detection.

PP

Perplexity is a measurement used to evaluate the performance of language models.

Precision refers to the accuracy and consistency of AI model predictions.

SB

A safety benchmark is a standard used to evaluate the safety performance of AI systems.