Skip to content

models Rouge Score Evaluator

github-actions[bot] edited this page Nov 14, 2024 · 5 revisions

Rouge-Score-Evaluator

Overview

Score range Float [0-1]: higher means better quality.
What is this metric? ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics used to evaluate automatic summarization and machine translation. It measures the overlap between generated text and reference summaries. ROUGE focuses on recall-oriented measures to assess how well the generated text covers the reference text. The ROUGE score is composed of precision, recall, and F1 score.
How does it work? The ROUGE score evaluates the similarity between the generated text and reference text based on n-gram overlap, including ROUGE-N (unigram, bigram, etc.), and ROUGE-L (longest common subsequence). It calculates precision, recall, and F1 scores to capture how well the generated text matches the reference text. Rouge type options are "rouge1" (Unigram overlap), "rouge2" (Bigram overlap), "rouge3" (Trigram overlap), "rouge4" (4-gram overlap), "rouge5" (5-gram overlap), "rougeL" (L-graph overlap)
When to use it? The recommended scenario is Natural Language Processing (NLP) tasks. Text summarization and document comparison are among the recommended use cases for ROUGE, especially when focusing on recall and the ability to capture relevant information from the reference text.
What does it need as input? Response, Ground Truth

Version: 3

View in Studio: https://ml.azure.com/registries/azureml/models/Rouge-Score-Evaluator/version/3

Properties

is-promptflow: True

is-evaluator: True

show-artifact: True

_default-display-file: ./RougeScoreEvaluator/_rouge.py

Clone this wiki locally