Update the evaluation method to pass the dataset item to the scoring method #698

jverre · 2024-11-22T13:42:11Z

Details

Update the evaluation method to pass the dataset item to the scoring method

alexkuzmik

Regarding the tests:

Please update unit.evaluation.test_evaluate.test_evaluate_happyflow so that task outputs don't duplicate dataset items content. Also update e2e.test_experiment.py
Add a new test similar to unit.evaluation.test_evaluate.test_evaluate_happyflow, but which will use mapping functionality.

It should be enough.

alexkuzmik · 2024-11-22T15:17:11Z

sdks/python/src/opik/evaluation/evaluator.py

@@ -21,6 +21,9 @@ def evaluate(
    nb_samples: Optional[int] = None,
    task_threads: int = 16,
    prompt: Optional[Prompt] = None,
+    scoring_key_mapping: Optional[
+        Dict[str, Union[str, Callable[[dataset_item.DatasetItem], Any]]]


dataset_item.DatasetItem is not a part of our public API, users manipulate with dictionaries.
So, it's better to allow them to specify callables that take dataset item dictionary as input.
We can also make type hints more verbose with aliases:

DatasetItemDict = Dict[str, Any] ... scoring_key_mapping: Optional[ Dict[str, Union[str, Callable[[DatasetItemDict], Any]]] ]

UPD. I ran the code, looks like it works with dictionaries as it should, so probably just the type hint is wrong.

alexkuzmik · 2024-11-22T15:22:09Z

sdks/python/src/opik/evaluation/tasks_scorer.py

@@ -56,12 +56,32 @@ def _score_test_case(
    return test_result_


+def _create_scoring_inputs(


Let's move this to metrics.arguments_helpers.create_score_inputs and then rename item to dataset_item.
A few unit tests would be nice specifically for this function, it's a very important and sensitive piece of logic.

alexkuzmik · 2024-11-22T15:29:20Z

sdks/python/src/opik/evaluation/tasks_scorer.py

+) -> Dict[str, Any]:
+    mapped_inputs = {**item, **task_output}
+
+    if scoring_key_mapping is not None:


Invert if condition with a "guard clause" and the nesting will be reduced

if scoring_key_mapping is None: return mapped_inputs for k, v in scoring_key_mapping.items(): if callable(v): mapped_inputs[k] = v(item) else: mapped_inputs[k] = mapped_inputs[v] return mapped_inputs

sdks/python/src/opik/evaluation/tasks_scorer.py

sdks/python/src/opik/evaluation/utils.py

…method (#698) * Updated evaluate function

jverre added 2 commits November 22, 2024 13:33

Updated evaluate function

bad0a5d

Updated evaluate docs

9273544

jverre requested review from a team as code owners November 22, 2024 13:42

Fix linter

69c313a

alexkuzmik requested changes Nov 22, 2024

View reviewed changes

alexkuzmik mentioned this pull request Nov 22, 2024

[OPIK-449] evaluate: make experiment items to be uploaded right after their tasks are done #691

Merged

Fix PR comments

0012476

jverre requested a review from alexkuzmik November 22, 2024 16:53

jverre added 2 commits November 22, 2024 16:55

Fix PR comments

69d0b23

Fix PR comments

ae2d346

alexkuzmik approved these changes Nov 22, 2024

View reviewed changes

jverre merged commit 399b5cc into main Nov 22, 2024
23 checks passed

jverre deleted the jacques/merge_dataset_items_with_task_output branch November 22, 2024 17:35

aadereiko pushed a commit that referenced this pull request Nov 25, 2024

Update the evaluation method to pass the dataset item to the scoring …

05b129c

…method (#698) * Updated evaluate function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update the evaluation method to pass the dataset item to the scoring method #698

Update the evaluation method to pass the dataset item to the scoring method #698

jverre commented Nov 22, 2024

alexkuzmik left a comment •

edited

Loading

alexkuzmik Nov 22, 2024

alexkuzmik Nov 22, 2024

alexkuzmik Nov 22, 2024 •

edited

Loading

alexkuzmik Nov 22, 2024 •

edited

Loading

		@@ -56,12 +56,32 @@ def _score_test_case(
		return test_result_


		def _create_scoring_inputs(

Update the evaluation method to pass the dataset item to the scoring method #698

Update the evaluation method to pass the dataset item to the scoring method #698

Conversation

jverre commented Nov 22, 2024

Details

alexkuzmik left a comment • edited Loading

Choose a reason for hiding this comment

alexkuzmik Nov 22, 2024

Choose a reason for hiding this comment

alexkuzmik Nov 22, 2024

Choose a reason for hiding this comment

alexkuzmik Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

alexkuzmik Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

alexkuzmik left a comment •

edited

Loading

alexkuzmik Nov 22, 2024 •

edited

Loading

alexkuzmik Nov 22, 2024 •

edited

Loading