AI Glossary: What Is ROUGE Score? Definition & Meaning

ROUGE-Score

ROUGE, was für Recall-Oriented Understudy for Gisting steht Bewertung, is a set of metrics used to evaluate the quality of summaries produced by automatic summarization systems. It is particularly popular in der Verarbeitung natürlicher Sprache (NLP) and is often employed to assess the performance of models that generate text, such as summarizers, maschinelle Übersetzung systems, and other Textgenerierung Tools.

ROUGE vergleicht hauptsächlich die generierte Zusammenfassung mit einer oder mehreren Referenzzusammenfassungen (oft von Menschen erstellt), um zu sehen, wie gut sie übereinstimmen. Die wichtigsten Metriken, die in ROUGE enthalten sind, sind:

ROUGE-N: Measures n-grams (contiguous sequences of n items from a given sample of text). For instance, ROUGE-1 evaluates single words, whereas ROUGE-2 looks at pairs of consecutive words.
ROUGE-L: Focuses on the longest common subsequence between the generated summary and the reference summaries, taking into account the order of the words.
ROUGE-W: A weighted version of ROUGE-L that accounts for consecutive matches and penalizes gaps.

The scores generated by ROUGE are typically expressed as recall, precision, and F1-score. Recall measures the percentage of n-grams from the reference summaries that are found in the generated summary, while precision measures the percentage of n-grams in the generated summary that are also in the reference. The F1-score is the harmonisches Mittel von Präzision und Recall und bietet eine einzelne Metrik, die beides ausbalanciert.

Overall, ROUGE Score is a valuable tool in the field of NLP, helping researchers and practitioners objectively measure the effectiveness of their text generation systems by providing insights into how well they replicate menschliche Schreibmuster.