Explore 90 AI terms in AI Evaluation Metrics
Absolute Error measures the difference between a predicted value and the actual value, indicating the accuracy of a model.
The Akaike Information Criterion (AIC) helps evaluate the quality of statistical models.
Asymmetric loss refers to a loss function that penalizes errors differently based on their type or severity in predictive models.
Average Precision Score measures the accuracy of a model's predictions in classification tasks, balancing precision and recall.
Baseline accuracy is the minimum accuracy a model must achieve to be considered effective.
The Bayesian Information Criterion (BIC) is a statistical tool used for model selection.
BERTScore is an evaluation metric for natural language processing that uses BERT embeddings to assess text similarity.
The Bleu Score Metric evaluates the quality of machine-generated text against reference texts.
The Brier Score measures the accuracy of probabilistic predictions, quantifying the mean squared differences between predicted and actual outcomes.
CIDEr Score is a metric for evaluating image captioning models based on consensus with human-generated captions.
Comparative Evaluation assesses the performance of AI systems by comparing them against each other using defined metrics.
Confidence bounds are statistical limits that quantify uncertainty in predictions or estimates.
A Confidence Score quantifies the certainty of an AI model's predictions.
Confusion Matrix Metrics evaluate classification model performance using key indicators like accuracy, precision, recall, and F1 score.
A divergence metric quantifies the difference between two probability distributions in machine learning.
Earth Mover's Distance (EMD) quantifies the difference between two probability distributions over a region.
The Epistemic Humility Score measures an AI's ability to recognize and express uncertainty in its knowledge.
The Equal Error Rate (EER) is a metric used to evaluate the performance of biometric systems.
F-Measure is a metric used to evaluate the performance of classification models, balancing precision and recall.
F-Score is a statistical measure used to evaluate the accuracy of binary classification models.
The False Acceptance Rate measures the likelihood that a system incorrectly identifies an unauthorized user as authorized.
The False Discovery Rate (FDR) is the proportion of false positives among all positive results in statistical hypothesis testing.
A false negative occurs when a test incorrectly indicates no presence of a condition that is actually present.
The False Positive Rate measures the proportion of incorrect positive predictions in a model's output.
False Rejection Rate (FRR) measures the percentage of unauthorized users incorrectly accepted by a system.
Forecasting Error refers to the difference between predicted and actual values in predictive models.
Fréchet Inception Distance (FID) measures the quality of generated images by comparing their distribution to real images.
Hamming Loss measures the fraction of wrong labels in multi-label classification tasks.