OpenCompass v0.2.4
The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.4!
🌟 Highlights
- Enhanced support for multiple datasets including QuALITY, APPS and TACO.
- Introducing multi-model judging for subjective test.
- Bug fixes and improvements in configurations and documentation.
🚀 New Features
🌐 General
- Feat #963 - Support for APPS dataset.
- Feature #976 - Add the implementation of QuALITY datasets.
- Feature #984 - Add support for setting prediction paths.
- Feature #1006 - Support alpacaeval_v2.
- Feature #1016 - Add multi-model judge.
- Feature #1019 - Add ATC Choice Version.
📖 Documentation
- Updates docs #1015 - General documentation updates and improvements.
🐛 Bug Fixes
- Fix #964 - Fix the config's name of deepseek-coder.
- Fix #890 - Update links and link checkers.
- Fix #977 - Fix a bug in internlm2 series configs.
- Fix #975 - Fix documentation issues.
- Fix #992 - Fix running issues in turbomind_tis.
- Fix #994 - Change status to list in base.py.
- Fix #995, Fix #1020 - Quick fixes and refactors for configs.
⚙ Enhancements and Refactors
- Modify requirements/runtime.txt #983 - Update numpy version requirement.
- Update Needlebench and configs #986 - Enhancements in Needlebench configurations.
- Simplify needlebench summarizer #1024 - Streamline Needlebench summarizer for better efficiency.
🎉 Welcome New Contributors
- @seanzhang-zhichen, @kleinzcy, @ispobock, @Chaseldot, and @Y0oMu made their first contributions. Welcome to the OpenCompass community!
🔗 Full Change Logs
[Fix] fix the config's name of deepseek-coder by @jingmingzhuo in #964
[Fix] Update links and link checkers by @Leymore in #890
[Feat] support apps by @Connor-Shen in #963
fix doc problem by @seanzhang-zhichen in #975
[Fix] fix a bug in internlm2 series configs by @jingmingzhuo in #977
[Feature] Add the implement of QuALITY datasets by @jingmingzhuo in #976
modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4 by @kleinzcy in #983
[Feature] add support for set prediction path by @bittersweet1999 in #984
[Feat] Support TACO by @Connor-Shen in #966
[Feature] update apps by @Connor-Shen in #985
[Fix] update apps/taco by @Connor-Shen in #988
[Feature] add one script for subjective by @bittersweet1999 in #993
Fix running issues in turbomind_tis by @ispobock in #992
[Fix] base.py change status into list by @Chaseldot in #994
[Fix] quick fix for configs by @bittersweet1999 in #995
[Feature] update needlebench and configs by @DseidLi in #986
[Feature] support alpacaeval_v2 by @bittersweet1999 in #1006
updates docs by @Y0oMu in #1015
[Feature] Add multi-model judge and fix some problems by @bittersweet1999 in #1016
[Fix] Refactor Needlebench Configs for CLI Testing Support by @DseidLi in #1020
[Feature] Add ATC Choice Version by @DseidLi in #1019
[Fix] Simplify needlebench summarizer by @DseidLi in #1024
For a detailed overview of all changes, check out our Full Changelog.