AI Glossary: What Is Evaluation Metric (EM)? Definition & Meaning

An 評価指標です is a standard used to assess the performance of an 人工知能 (AI) model. These metrics provide a quantitative measure that helps researchers and developers determine how well their model is performing in relation to its intended task. Different types of tasks require different metrics, as the criteria for success can vary greatly depending on the application.

一般的な評価指標含まれるもの：

正確性： The proportion of correct predictions made by the model out of all predictions. This metric is widely used in classification タスク。
精度: The ratio of true positive predictions to the total predicted positives, indicating how many of the identified positive instances are actually correct.
想起（感度）： The ratio of true positive predictions to the total actual positives, highlighting the model’s ability to identify all relevant instances.
F1スコア: The harmonic mean of precision and recall, providing a balance between the two metrics, especially important in cases of クラス不均衡.
平均二乗誤差（MSE）： A common metric for regression tasks, measuring the average of the squares of the errors—that is, the average squared difference between predicted and actual values.

Choosing the right evaluation metric is crucial, as it can significantly influence how a model is optimized and interpreted. For instance, a high accuracy might be misleading in cases of class imbalance, where a model could achieve high accuracy by simply predicting the 多数派クラス. Therefore, understanding the context and the specific requirements of the task is essential when selecting appropriate evaluation metrics.