-
Notifications
You must be signed in to change notification settings - Fork 128
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update UI eval flow and evaluators (#3661)
* Update quality evaluators and UI eval flow * Update quality evaluators and UI eval flow * Fix paths --------- Co-authored-by: Kelly <[email protected]>
- Loading branch information
1 parent
ce6968f
commit 40f2396
Showing
20 changed files
with
73 additions
and
17 deletions.
There are no files selected for viewing
2 changes: 1 addition & 1 deletion
2
assets/promptflow/evaluators/models/coherence-evaluator/model.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4 changes: 4 additions & 0 deletions
4
assets/promptflow/evaluators/models/content-safety-evaluator/asset.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
extra_config: model.yaml | ||
spec: spec.yaml | ||
type: model | ||
categories: ["prompt flow evaluator"] |
7 changes: 7 additions & 0 deletions
7
assets/promptflow/evaluators/models/content-safety-evaluator/description.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
| | | | ||
| -- | -- | | ||
| Score range | Integer [0-7]: where 0 is the least harmful and 7 is the most harmful. A text label inis also provided. | | ||
| What is this metric? | Measures comprehensively the severity level of the content harm of a response, covering violence, sexual, self-harm, and hate and unfairness as 4 harmful categories. | | ||
| How does it work? | The Content Safety evaluator leverages AI-assisted evaluators including `ViolenceEvaluator`, `SexualEvaluator`, `SelfHarmEvaluator`, `HateUnfairnessEvaluator` with a language model as a judge on the response to a user query. See the [definitions and severity scale](https://learn.microsoft.com/azure/ai-studio/concepts/evaluation-metrics-built-in?tabs=severity#risk-and-safety-evaluators) for these AI-assisted evaluators. | | ||
| When to use it? | Use it when assessing the readability and user-friendliness of your model's generated responses in real-world applications. | | ||
| What does it need as input? | Query, Response | |
8 changes: 8 additions & 0 deletions
8
assets/promptflow/evaluators/models/content-safety-evaluator/model.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
path: | ||
container_name: rai-eval-flows | ||
container_path: models/evaluators/ContentSafetyEvaluator/v1/ContentSafetyEvaluator | ||
storage_name: amlraipfmodels | ||
type: azureblob | ||
publish: | ||
description: description.md | ||
type: custom_model |
9 changes: 9 additions & 0 deletions
9
assets/promptflow/evaluators/models/content-safety-evaluator/spec.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json | ||
name: Content-Safety-Evaluator | ||
path: ./ | ||
properties: | ||
is-promptflow: true | ||
is-evaluator: true | ||
tags: | ||
Preview: "" | ||
version: 1 |
2 changes: 1 addition & 1 deletion
2
assets/promptflow/evaluators/models/fluency-evaluator/model.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
assets/promptflow/evaluators/models/groundedness-evaluator/model.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
extra_config: model.yaml | ||
spec: spec.yaml | ||
type: model | ||
categories: ["prompt flow evaluator"] |
7 changes: 7 additions & 0 deletions
7
assets/promptflow/evaluators/models/qa-evaluator/description.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
| | | | ||
| -- | -- | | ||
| Score range | Float [0-1] for F1 score evaluator: the higher, the more similar is the response with ground truth. Integer [1-5] for AI-assisted quality evaluators for question-and-answering (QA) scenarios: where 1 is bad and 5 is good | | ||
| What is this metric? | Measures comprehensively the groundedness, coherence, and fluency of a response in QA scenarios, as well as the textual similarity between the response and its ground truth. | | ||
| How does it work? | The QA evaluator leverages prompt-based AI-assisted evaluators using a language model as a judge on the response to a user query, including `GroundednessEvaluator` (needs input `context`), `RelevanceEvaluator`, `CoherenceEvaluator`, `FluencyEvaluator`, and `SimilarityEvaluator` (needs input `ground_truth`). It also includes a Natural Language Process (NLP) metric `F1ScoreEvaluator` using F1 score on shared tokens between the response and its ground truth. See the [definitions and scoring rubrics](https://learn.microsoft.com/azure/ai-studio/concepts/evaluation-metrics-built-in?tabs=warning#generation-quality-metrics) for these AI-assisted evaluators and F1 score evaluator. | | ||
| When to use it? | Use it when assessing the readability and user-friendliness of your model's generated responses in real-world applications. | | ||
| What does it need as input? | Query, Response, Context, Ground Truth | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
path: | ||
container_name: rai-eval-flows | ||
container_path: models/evaluators/QAEvaluator/v1/QAEvaluator | ||
storage_name: amlraipfmodels | ||
type: azureblob | ||
publish: | ||
description: description.md | ||
type: custom_model |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json | ||
name: QA-Evaluator | ||
path: ./ | ||
properties: | ||
is-promptflow: true | ||
is-evaluator: true | ||
tags: | ||
Preview: "" | ||
version: 1 |
2 changes: 1 addition & 1 deletion
2
assets/promptflow/evaluators/models/relevance-evaluator/model.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
assets/promptflow/evaluators/models/retrieval-evaluator/model.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters