Release OpenCompass v0.2.4 · open-compass/opencompass

The OpenCompass team is thrilled to announce the release of OpenCompass v0.2.4!

🌟 Highlights

Enhanced support for multiple datasets including QuALITY, APPS and TACO.
Introducing multi-model judging for subjective test.
Bug fixes and improvements in configurations and documentation.

🚀 New Features

🌐 General

Feat #963 - Support for APPS dataset.
Feature #976 - Add the implementation of QuALITY datasets.
Feature #984 - Add support for setting prediction paths.
Feature #1006 - Support alpacaeval_v2.
Feature #1016 - Add multi-model judge.
Feature #1019 - Add ATC Choice Version.

📖 Documentation

Updates docs #1015 - General documentation updates and improvements.

🐛 Bug Fixes

Fix #964 - Fix the config's name of deepseek-coder.
Fix #890 - Update links and link checkers.
Fix #977 - Fix a bug in internlm2 series configs.
Fix #975 - Fix documentation issues.
Fix #992 - Fix running issues in turbomind_tis.
Fix #994 - Change status to list in base.py.
Fix #995, Fix #1020 - Quick fixes and refactors for configs.

⚙ Enhancements and Refactors

Modify requirements/runtime.txt #983 - Update numpy version requirement.
Update Needlebench and configs #986 - Enhancements in Needlebench configurations.
Simplify needlebench summarizer #1024 - Streamline Needlebench summarizer for better efficiency.

🎉 Welcome New Contributors

@seanzhang-zhichen, @kleinzcy, @ispobock, @Chaseldot, and @Y0oMu made their first contributions. Welcome to the OpenCompass community!

🔗 Full Change Logs

[Fix] fix the config's name of deepseek-coder by @jingmingzhuo in #964
[Fix] Update links and link checkers by @Leymore in #890
[Feat] support apps by @Connor-Shen in #963
fix doc problem by @seanzhang-zhichen in #975
[Fix] fix a bug in internlm2 series configs by @jingmingzhuo in #977
[Feature] Add the implement of QuALITY datasets by @jingmingzhuo in #976
modify the requirements/runtime.txt: numpy==1.23.4 --> numpy>=1.23.4 by @kleinzcy in #983
[Feature] add support for set prediction path by @bittersweet1999 in #984
[Feat] Support TACO by @Connor-Shen in #966
[Feature] update apps by @Connor-Shen in #985
[Fix] update apps/taco by @Connor-Shen in #988
[Feature] add one script for subjective by @bittersweet1999 in #993
Fix running issues in turbomind_tis by @ispobock in #992
[Fix] base.py change status into list by @Chaseldot in #994
[Fix] quick fix for configs by @bittersweet1999 in #995
[Feature] update needlebench and configs by @DseidLi in #986
[Feature] support alpacaeval_v2 by @bittersweet1999 in #1006
updates docs by @Y0oMu in #1015
[Feature] Add multi-model judge and fix some problems by @bittersweet1999 in #1016
[Fix] Refactor Needlebench Configs for CLI Testing Support by @DseidLi in #1020
[Feature] Add ATC Choice Version by @DseidLi in #1019
[Fix] Simplify needlebench summarizer by @DseidLi in #1024

For a detailed overview of all changes, check out our Full Changelog.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenCompass v0.2.4