AI Glossary: What Is Human Evaluation (HE)? Definition & Meaning

Human evaluation is a crucial method used to assess the performance and quality of intelligence artificielle (AI) systems, particularly in traitement du langage naturel (NLP) and apprentissage automatique applications. Unlike automated metrics, which rely on predefined algorithms and statistical measures, human evaluation involves real people judging the output of AI systems based on various criteria.

This method is especially important for tasks where subjective interpretation plays a significant role, such as génération de langage, sentiment analysis, and translation. In these cases, human evaluators can provide insights into aspects like fluency, accuracy, relevance, and overall user satisfaction that automated metrics may not capture.

En général, l’évaluation humaine comporte plusieurs étapes :

Sélection des évaluateurs : A diverse group of individuals with relevant expertise or experience is chosen to minimize bias.
Critères d’évaluation : Clear guidelines and criteria are established to ensure consistency in the evaluation process. Common criteria include coherence, grammatical correctness, and contextual relevance.
Système de notation : Evaluators are often asked to score or rank AI outputs based on the established criteria, which can be qualitative or quantitative.
Agrégation des résultats : The scores from multiple evaluators are compiled to provide an overall assessment of the AI system’s performance.

Human evaluations can be time-consuming and costly, but they are vital for understanding the real-world effectiveness of AI models. They can also help identify areas for improvement, guiding developers in refining algorithms and enhancing the expérience utilisateur. As AI technologies continue to evolve, human evaluation remains an essential component of responsible and effective AI development.