Skip to content

Commit

Permalink
Update UI eval flow and evaluators (#3661)
Browse files Browse the repository at this point in the history
* Update quality evaluators and UI eval flow

* Update quality evaluators and UI eval flow

* Fix paths

---------

Co-authored-by: Kelly <[email protected]>
  • Loading branch information
diondrapeck and lykelly19 authored Dec 12, 2024
1 parent ce6968f commit 40f2396
Show file tree
Hide file tree
Showing 20 changed files with 73 additions and 17 deletions.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
path:
container_name: rai-eval-flows
container_path: models/evaluators/CoherenceEvaluator/v4
container_path: models/evaluators/CoherenceEvaluator/v5/CoherenceEvaluator
storage_name: amlraipfmodels
type: azureblob
publish:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ properties:
is-promptflow: true
is-evaluator: true
show-artifact: true
_default-display-file: ./coherence.prompty
_default-display-file: ./CoherenceEvaluator/coherence.prompty
tags:
hiddenlayerscanned: ""
version: 4
version: 5
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
extra_config: model.yaml
spec: spec.yaml
type: model
categories: ["prompt flow evaluator"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
| | |
| -- | -- |
| Score range | Integer [0-7]: where 0 is the least harmful and 7 is the most harmful. A text label inis also provided. |
| What is this metric? | Measures comprehensively the severity level of the content harm of a response, covering violence, sexual, self-harm, and hate and unfairness as 4 harmful categories. |
| How does it work? | The Content Safety evaluator leverages AI-assisted evaluators including `ViolenceEvaluator`, `SexualEvaluator`, `SelfHarmEvaluator`, `HateUnfairnessEvaluator` with a language model as a judge on the response to a user query. See the [definitions and severity scale](https://learn.microsoft.com/azure/ai-studio/concepts/evaluation-metrics-built-in?tabs=severity#risk-and-safety-evaluators) for these AI-assisted evaluators. |
| When to use it? | Use it when assessing the readability and user-friendliness of your model's generated responses in real-world applications. |
| What does it need as input? | Query, Response |
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
path:
container_name: rai-eval-flows
container_path: models/evaluators/ContentSafetyEvaluator/v1/ContentSafetyEvaluator
storage_name: amlraipfmodels
type: azureblob
publish:
description: description.md
type: custom_model
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
name: Content-Safety-Evaluator
path: ./
properties:
is-promptflow: true
is-evaluator: true
tags:
Preview: ""
version: 1
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
path:
container_name: rai-eval-flows
container_path: models/evaluators/FluencyEvaluator/v4
container_path: models/evaluators/FluencyEvaluator/v5/FluencyEvaluator
storage_name: amlraipfmodels
type: azureblob
publish:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ properties:
is-promptflow: true
is-evaluator: true
show-artifact: true
_default-display-file: ./fluency.prompty
_default-display-file: ./FluencyEvaluator/fluency.prompty
tags:
hiddenlayerscanned: ""
version: 4
version: 5
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
path:
container_name: rai-eval-flows
container_path: models/evaluators/GroundednessEvaluator/v4
container_path: models/evaluators/GroundednessEvaluator/v5/GroundednessEvaluator
storage_name: amlraipfmodels
type: azureblob
publish:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ properties:
is-promptflow: true
is-evaluator: true
show-artifact: true
_default-display-file: ./groundedness_without_query.prompty
_default-display-file: ./GroundednessEvaluator/groundedness_without_query.prompty
tags:
hiddenlayerscanned: ""
version: 4
version: 5
4 changes: 4 additions & 0 deletions assets/promptflow/evaluators/models/qa-evaluator/asset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
extra_config: model.yaml
spec: spec.yaml
type: model
categories: ["prompt flow evaluator"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
| | |
| -- | -- |
| Score range | Float [0-1] for F1 score evaluator: the higher, the more similar is the response with ground truth. Integer [1-5] for AI-assisted quality evaluators for question-and-answering (QA) scenarios: where 1 is bad and 5 is good |
| What is this metric? | Measures comprehensively the groundedness, coherence, and fluency of a response in QA scenarios, as well as the textual similarity between the response and its ground truth. |
| How does it work? | The QA evaluator leverages prompt-based AI-assisted evaluators using a language model as a judge on the response to a user query, including `GroundednessEvaluator` (needs input `context`), `RelevanceEvaluator`, `CoherenceEvaluator`, `FluencyEvaluator`, and `SimilarityEvaluator` (needs input `ground_truth`). It also includes a Natural Language Process (NLP) metric `F1ScoreEvaluator` using F1 score on shared tokens between the response and its ground truth. See the [definitions and scoring rubrics](https://learn.microsoft.com/azure/ai-studio/concepts/evaluation-metrics-built-in?tabs=warning#generation-quality-metrics) for these AI-assisted evaluators and F1 score evaluator. |
| When to use it? | Use it when assessing the readability and user-friendliness of your model's generated responses in real-world applications. |
| What does it need as input? | Query, Response, Context, Ground Truth |
8 changes: 8 additions & 0 deletions assets/promptflow/evaluators/models/qa-evaluator/model.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
path:
container_name: rai-eval-flows
container_path: models/evaluators/QAEvaluator/v1/QAEvaluator
storage_name: amlraipfmodels
type: azureblob
publish:
description: description.md
type: custom_model
9 changes: 9 additions & 0 deletions assets/promptflow/evaluators/models/qa-evaluator/spec.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
name: QA-Evaluator
path: ./
properties:
is-promptflow: true
is-evaluator: true
tags:
Preview: ""
version: 1
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
path:
container_name: rai-eval-flows
container_path: models/evaluators/RelevanceEvaluator/v4
container_path: models/evaluators/RelevanceEvaluator/v5/RelevanceEvaluator
storage_name: amlraipfmodels
type: azureblob
publish:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ properties:
is-promptflow: true
is-evaluator: true
show-artifact: true
_default-display-file: ./relevance.prompty
_default-display-file: ./RelevanceEvaluator/relevance.prompty
tags:
hiddenlayerscanned: ""
version: 4
version: 5
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
path:
container_name: rai-eval-flows
container_path: models/evaluators/RetrievalEvaluator/v1
container_path: models/evaluators/RetrievalEvaluator/v2/RetrievalEvaluator
storage_name: amlraipfmodels
type: azureblob
publish:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ properties:
is-promptflow: true
is-evaluator: true
show-artifact: true
_default-display-file: ./retrieval.prompty
_default-display-file: ./RetrievalEvaluator/retrieval.prompty
tags:
hiddenlayerscanned: ""
version: 1
version: 2
2 changes: 1 addition & 1 deletion assets/promptflow/models/rai-eval-ui-dag-flow/model.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
path:
container_name: rai-eval-flows
container_path: models/rai_eval_ui_dag_flow/v6
container_path: models/rai_eval_ui_dag_flow/v7
storage_name: amlraipfmodels
type: azureblob
publish:
Expand Down
2 changes: 1 addition & 1 deletion assets/promptflow/models/rai-eval-ui-dag-flow/spec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ properties:
azureml.promptflow.description: Compute the quality and safety of the answer for the given question based on the ground_truth and the context
inference-min-sku-spec: 2|0|14|28
inference-recommended-sku: Standard_DS3_v2
version: 6
version: 7

0 comments on commit 40f2396

Please sign in to comment.