Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for New SOTA MoE: tencent/Tencent-Hunyuan-Large #111

Open
ThomasBaruzier opened this issue Nov 5, 2024 · 0 comments
Open

Support for New SOTA MoE: tencent/Tencent-Hunyuan-Large #111

ThomasBaruzier opened this issue Nov 5, 2024 · 0 comments

Comments

@ThomasBaruzier
Copy link

Abstract

In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's superior performance across various benchmarks including language understanding and generation, logical reasoning, mathematical problem-solving, coding, long-context, and aggregated tasks, where it outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model. Key practice of Hunyuan-Large include large-scale synthetic data that is orders larger than in previous literature, a mixed expert routing strategy, a key-value cache compression technique, and an expert-specific learning rate strategy. Additionally, we also investigate the scaling laws and learning rate schedule of mixture of experts models, providing valuable insights and guidances for future model development and optimization. The code and checkpoints of Hunyuan-Large are released to facilitate future innovations and applications.

Code: https://github.com/Tencent/Tencent-Hunyuan-Large

Hunyuan-Large Models: https://huggingface.co/tencent/Tencent-Hunyuan-Large

Model LLama3.1-405B LLama3.1-70B Mixtral-8x22B DeepSeek-V2 Hunyuan-Large
MMLU 85.2 79.3 77.8 78.5 88.4
MMLU-Pro 61.6 53.8 49.5 - 60.2
BBH 85.9 81.6 78.9 78.9 86.3
HellaSwag - - 88.7 87.8 86.8
CommonsenseQA 85.8 84.1 82.4 - 92.9
WinoGrande 86.7 85.3 85.0 84.9 88.7
PIQA - - 83.6 83.7 88.3
NaturalQuestions - - 39.6 38.7 52.8
DROP 84.8 79.6 80.4 80.1 88.9
ARC-C 96.1 92.9 91.2 92.4 95.0
TriviaQA - - 82.1 79.9 89.2
CMMLU - - 60.0 84.0 90.2
C-Eval - - 59.6 81.7 91.9
C3 - - 71.4 77.4 82.3
GSM8K 89.0 83.7 83.7 79.2 92.8
MATH 53.8 41.4 42.5 43.6 69.8
CMATH - - 72.3 78.7 91.3
HumanEval 61.0 58.5 53.1 48.8 71.4
MBPP 73.4 68.6 64.2 66.6 72.6
Model LLama3.1 405B Inst. LLama3.1 70B Inst. Mixtral 8x22B Inst. DeepSeekV2.5 Chat Hunyuan-Large Inst.
MMLU 87.3 83.6 77.8 80.4 89.9
CMMLU - - 61.0 - 90.4
C-Eval - - 60.0 - 88.6
BBH - - 78.4 84.3 89.5
HellaSwag - - 86.0 90.3 88.5
ARC-C 96.9 94.8 90.0 - 94.6
GPQA_diamond 51.1 46.7 - - 42.4
MATH 73.8 68.0 49.8 74.7 77.4
HumanEval 89.0 80.5 75.0 89.0 90.0
AlignBench 6.0 5.9 6.2 8.0 8.3
MT-Bench 9.1 8.8 8.1 9.0 9.4
IFEval strict-prompt 86.0 83.6 71.2 - 85.0
Arena-Hard 69.3 55.7 - 76.2 81.8
AlpacaEval-2.0 39.3 34.3 30.9 50.5 51.8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant