You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The ToolCorrectnessMetric raises a ZeroDivisionError when called with an LLMTestCase for which expected_tools is empty. In my use case, I want to benchmark that the LLM only calls a tool when necessary, i.e. with an empty expected_tools list.
To Reproduce
Run minimal example:
from deepeval.metrics import ToolCorrectnessMetric
from deepeval.test_case import LLMTestCase
metric = ToolCorrectnessMetric()
test_case = LLMTestCase(
input="What is an elephant?",
actual_output="(...)",
tools_called=[],
expected_tools=[]
)
metric.measure(test_case)
print(metric.score)
See error ZeroDivisionError: division by zero raised by metric.measure(test_case). This is caused by the line score = len(used_expected_tools) / len(self.expected_tools) in _calculate_score() of the ToolCorrectnessMetric.
Expected behavior
I expected a perfect score of 1 for the minimal example, which represents a test case where no tool call was executed and no tool call was expected.
Software versions:
OS: macOS 15.2 (24C101)
Python 3.12
deepeval 2.0.5
Additional context
This can be mitigated by using should_exact_match=True, but I did not find this workaround to be documented.
The text was updated successfully, but these errors were encountered:
Describe the bug
The
ToolCorrectnessMetric
raises aZeroDivisionError
when called with anLLMTestCase
for whichexpected_tools
is empty. In my use case, I want to benchmark that the LLM only calls a tool when necessary, i.e. with an emptyexpected_tools
list.To Reproduce
ZeroDivisionError: division by zero
raised bymetric.measure(test_case)
. This is caused by the linescore = len(used_expected_tools) / len(self.expected_tools)
in_calculate_score()
of theToolCorrectnessMetric
.Expected behavior
I expected a perfect score of 1 for the minimal example, which represents a test case where no tool call was executed and no tool call was expected.
Software versions:
Additional context
This can be mitigated by using
should_exact_match=True
, but I did not find this workaround to be documented.The text was updated successfully, but these errors were encountered: