Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge main #12

Merged
merged 55 commits into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from 50 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
4bef1f3
adds leaderboard tasks
NathanHB Jun 26, 2024
6066de1
Delete lm_eval/tasks/leaderboard/leaderboard_chat_template.yaml
NathanHB Jul 1, 2024
15ddca1
add readme
NathanHB Jul 1, 2024
3def90b
Merge branch 'add_tasks' of github.com:huggingface/lm-evaluation-harn…
NathanHB Jul 1, 2024
bfd9cba
Delete lm_eval/tasks/leaderboard/mmlu_pro/mmlu_pro_chat_template.yaml
NathanHB Jul 1, 2024
93a35f7
modify readme
NathanHB Jul 1, 2024
6ebf9f9
Merge branch 'add_tasks' of github.com:huggingface/lm-evaluation-harn…
NathanHB Jul 1, 2024
3ae8771
fix bbh task
NathanHB Jul 1, 2024
eacf21b
fix bbh salient task
Jul 1, 2024
c93f3c1
modify the readme
Jul 1, 2024
c6cd21d
Delete lm_eval/tasks/leaderboard/ifeval/README.md
NathanHB Jul 1, 2024
47cb77b
Delete lm_eval/tasks/leaderboard/math/README.md
NathanHB Jul 1, 2024
f548afe
add leaderboard to the tasks repertory
NathanHB Jul 2, 2024
f21c7be
add anouncment about new leaderbaord tasks
NathanHB Jul 2, 2024
cc028ee
linting
NathanHB Jul 2, 2024
aea3b0b
Update README.md
NathanHB Jul 2, 2024
2cd1090
installs ifeval dependency in new_task github workflow
NathanHB Jul 2, 2024
45ce1a3
Merge branch 'add_tasks' of github.com:huggingface/lm-evaluation-harn…
NathanHB Jul 2, 2024
da57db5
Merge remote-tracking branch 'origin/add_tasks' into adding_all_changess
NathanHB Jul 10, 2024
8bf653d
add fixes
NathanHB Jul 17, 2024
4919f6f
fix results repo
NathanHB Jul 18, 2024
b2e818e
Fix for gemma models
clefourrier Jul 30, 2024
6a4eb02
Bug fix: accelerator is None when doing MP without accelerate
clefourrier Aug 9, 2024
04519f2
Fix private repo name in evaluation_tracker.py
Aug 9, 2024
9fc4a23
Add gemmas support and print jinja2 chat_template error
Aug 9, 2024
782aafc
Update .gitignore
Aug 9, 2024
bd8a75b
fix for MP
clefourrier Aug 9, 2024
27ca10b
Merge branch 'adding_all_changess' of github.com:huggingface/lm-evalu…
Aug 12, 2024
d4631ea
Update gitignore and max_length function in huggingdace.py
Aug 12, 2024
4470491
Handling multiple chat templates
Oct 21, 2024
61980f8
Update gitignore
Oct 21, 2024
d94f176
Modify gitignore and add recent changes
Oct 24, 2024
3ab17d2
Remove ammlu folder
Oct 24, 2024
65b2f06
Fix .gitignore
Oct 24, 2024
f1804bd
Correct gemma handling [wip]
Oct 24, 2024
f0dee36
Merge remote-tracking branch 'origin/main' into nathan-merge-main
NathanHB Oct 29, 2024
cf63c40
fix batch function
NathanHB Oct 29, 2024
8a6cfb3
fix
NathanHB Oct 29, 2024
df8e2d6
Update lm_eval/models/huggingface.py
NathanHB Oct 29, 2024
0183ac2
Update lm_eval/models/huggingface.py
NathanHB Oct 29, 2024
0363cf9
Delete lm_eval/tasks/piqa_ar/README.md
NathanHB Oct 29, 2024
8a26266
Delete lm_eval/tasks/piqa_ar/piqa_ar.yaml
NathanHB Oct 29, 2024
22c9753
remove src
NathanHB Oct 29, 2024
e92fbbb
Merge branch 'nathan-merge-main' of github.com:huggingface/lm-evaluat…
NathanHB Oct 29, 2024
152541c
change python version of tests
NathanHB Oct 29, 2024
58a4da5
Fixed test case for Hugging Face LM model.
NathanHB Oct 30, 2024
ce06ef1
Update Python version in unit tests workflow.
NathanHB Oct 30, 2024
86781aa
fix tests
NathanHB Oct 30, 2024
2d8cc71
remove uneeded print
NathanHB Oct 30, 2024
fdbdcee
Fix bug to load model to CPU when choosen by user.
NathanHB Oct 31, 2024
a48781c
Update lm_eval/models/huggingface.py
NathanHB Oct 31, 2024
181403a
Update lm_eval/models/huggingface.py
NathanHB Oct 31, 2024
d3882dd
test
NathanHB Oct 31, 2024
a499c98
test
NathanHB Oct 31, 2024
cc532e5
Merge branch 'nathan-merge-main' of github.com:huggingface/lm-evaluat…
NathanHB Oct 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/unit_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [ "3.8", "3.9", "3.10", "3.11" ]
python-version: [ "3.10", "3.11" ]
timeout-minutes: 30
steps:
- name: Checkout Code
Expand Down Expand Up @@ -75,10 +75,10 @@ jobs:
steps:
- name: Checkout Code
uses: actions/checkout@v4
- name: Set up Python 3.8
- name: Set up Python 3.11
uses: actions/setup-python@v5
with:
python-version: 3.8
python-version: 3.11
cache: pip
cache-dependency-path: pyproject.toml
- name: Install dependencies
Expand Down
96 changes: 88 additions & 8 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,14 +1,94 @@
env
*.pyc
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# Ruff
.ruff_cache/

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Pyenv version management
.python-version

# PEP 582; Dependency management
__pypackages__/

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Output directories
output/

# Data directories
data/
lm_cache
.idea
build
dist
*.egg-info
venv

# IDE configuration files
.idea/
.vscode/

# Temporary files including logs
temp/
logs/
scratch/
cache/
slurm_logs/
eval_results/
votes/
evals/
evaluate_modules/
outputs/
test_logs/


# SH files (feel free to change the condition to less strict)
set_hf_token.sh
temp
__pycache__
.ipynb_checkpoints
Expand Down
7 changes: 0 additions & 7 deletions lm_eval/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -292,13 +292,6 @@ def cli_evaluate(args: Union[argparse.Namespace, None] = None) -> None:
"If fewshot_as_multiturn is set, apply_chat_template must be set to True."
)

if (
args.num_fewshot is None or args.num_fewshot == 0
) and args.fewshot_as_multiturn:
raise ValueError(
"If fewshot_as_multiturn is set, num_fewshot must be greater than 0."
)

if args.include_path is not None:
eval_logger.info(f"Including path: {args.include_path}")
task_manager = TaskManager(args.verbosity, include_path=args.include_path)
Expand Down
4 changes: 2 additions & 2 deletions lm_eval/models/dummy.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ def loglikelihood(self, requests, disable_tqdm: bool = False):
def generate_until(self, requests, disable_tqdm: bool = False):
res = []

for ctx, _ in tqdm(requests, disable=disable_tqdm):
for request in tqdm(requests, disable=disable_tqdm):
res.append("lol")
assert ctx.strip() != ""
assert request.arguments[0].strip() != ""

return res

Expand Down
Loading
Loading