Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code evaluation using bigcode-evaluation-harness framework #1776

Open
mtasic85 opened this issue Oct 5, 2024 · 1 comment
Open

Code evaluation using bigcode-evaluation-harness framework #1776

mtasic85 opened this issue Oct 5, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@mtasic85
Copy link
Contributor

mtasic85 commented Oct 5, 2024

Code evaluation task/benchmark such as HumanEval and MBPP are missing from lm-evaluation-harness, but are present and maintained in bigcode-evaluation-harness.

https://github.com/bigcode-project/bigcode-evaluation-harness

Since, we would need to parse tasks and check if they are in lm-evaluation-harness or bigcode-evaluation-harness, I propose to keep litgpt evaluate but add argument --framework "lm-evaluation-harness" (default if not specified) or --framework "bigcode-evaluation-harness".

@mtasic85 mtasic85 added the enhancement New feature or request label Oct 5, 2024
@rasbt
Copy link
Collaborator

rasbt commented Oct 7, 2024

Thanks for suggesting. That's a good idea, in my opinion. I was just reading through EleutherAI/lm-evaluation-harness#1157 and HumanEval and MBPP might eventually come to the lm-evaluation-harness, but it's hard to say when.

So, in the meantime, I think it's a good idea to add support as you suggested with the --framework "lm-evaluation-harness" default flag. (Please feel free to open a PR if you are interested and have time.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants