AI Glossary: AI Evaluation Metrics Terms & Definitions

Absolute Error

AE

Absolute Error measures the difference between a predicted value and the actual value, indicating the accuracy of a model.

Akaike Information Criterion

AIC

The Akaike Information Criterion (AIC) helps evaluate the quality of statistical models.

Asymmetric Loss

Asymmetric loss refers to a loss function that penalizes errors differently based on their type or severity in predictive models.

Average Precision Score

AP Score

Average Precision Score measures the accuracy of a model's predictions in classification tasks, balancing precision and recall.

Baseline Accuracy

Baseline accuracy is the minimum accuracy a model must achieve to be considered effective.

Bayesian Information Criterion

BIC

The Bayesian Information Criterion (BIC) is a statistical tool used for model selection.

BERTScore

BERTScore is an evaluation metric for natural language processing that uses BERT embeddings to assess text similarity.

Bleu Score Metric

BLEU

The Bleu Score Metric evaluates the quality of machine-generated text against reference texts.

Brier Score

The Brier Score measures the accuracy of probabilistic predictions, quantifying the mean squared differences between predicted and actual outcomes.

CIDEr Score

CIDEr

CIDEr Score is a metric for evaluating image captioning models based on consensus with human-generated captions.

Comparative Evaluation

Comparative Evaluation assesses the performance of AI systems by comparing them against each other using defined metrics.

Confidence Bounds

Confidence bounds are statistical limits that quantify uncertainty in predictions or estimates.

Confidence Score

CS

A Confidence Score quantifies the certainty of an AI model's predictions.

Confusion Matrix Metrics

Confusion Matrix Metrics evaluate classification model performance using key indicators like accuracy, precision, recall, and F1 score.

Divergence Metric

A divergence metric quantifies the difference between two probability distributions in machine learning.

Earth Mover’s Distance

EMD

Earth Mover's Distance (EMD) quantifies the difference between two probability distributions over a region.

Epistemic Humility Score

EHS

The Epistemic Humility Score measures an AI's ability to recognize and express uncertainty in its knowledge.

Equal Error Rate

EER

The Equal Error Rate (EER) is a metric used to evaluate the performance of biometric systems.

F-Measure

F1

F-Measure is a metric used to evaluate the performance of classification models, balancing precision and recall.

F-Score

F1

F-Score is a statistical measure used to evaluate the accuracy of binary classification models.

False Acceptance Rate

FAR

The False Acceptance Rate measures the likelihood that a system incorrectly identifies an unauthorized user as authorized.

False Discovery Rate

FDR

The False Discovery Rate (FDR) is the proportion of false positives among all positive results in statistical hypothesis testing.

False Negative

A false negative occurs when a test incorrectly indicates no presence of a condition that is actually present.

False Positive Rate

FPR

The False Positive Rate measures the proportion of incorrect positive predictions in a model's output.

False Rejection Rate

FRR

False Rejection Rate (FRR) measures the percentage of unauthorized users incorrectly accepted by a system.

Forecasting Error

Forecasting Error refers to the difference between predicted and actual values in predictive models.

Frechet Inception Distance

FID

Fréchet Inception Distance (FID) measures the quality of generated images by comparing their distribution to real images.

Hamming Loss

Hamming Loss measures the fraction of wrong labels in multi-label classification tasks.