Support for New SOTA MoE: tencent/Tencent-Hunyuan-Large #111

ThomasBaruzier · 2024-11-05T14:29:32Z

Abstract

In this paper, we introduce Hunyuan-Large, which is currently the largest open-source Transformer-based mixture of experts model, with a total of 389 billion parameters and 52 billion activation parameters, capable of handling up to 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's superior performance across various benchmarks including language understanding and generation, logical reasoning, mathematical problem-solving, coding, long-context, and aggregated tasks, where it outperforms LLama3.1-70B and exhibits comparable performance when compared to the significantly larger LLama3.1-405B model. Key practice of Hunyuan-Large include large-scale synthetic data that is orders larger than in previous literature, a mixed expert routing strategy, a key-value cache compression technique, and an expert-specific learning rate strategy. Additionally, we also investigate the scaling laws and learning rate schedule of mixture of experts models, providing valuable insights and guidances for future model development and optimization. The code and checkpoints of Hunyuan-Large are released to facilitate future innovations and applications.

Code: https://github.com/Tencent/Tencent-Hunyuan-Large

Hunyuan-Large Models: https://huggingface.co/tencent/Tencent-Hunyuan-Large

Model	LLama3.1-405B	LLama3.1-70B	Mixtral-8x22B	DeepSeek-V2	Hunyuan-Large
MMLU	85.2	79.3	77.8	78.5	88.4
MMLU-Pro	61.6	53.8	49.5	-	60.2
BBH	85.9	81.6	78.9	78.9	86.3
HellaSwag	-	-	88.7	87.8	86.8
CommonsenseQA	85.8	84.1	82.4	-	92.9
WinoGrande	86.7	85.3	85.0	84.9	88.7
PIQA	-	-	83.6	83.7	88.3
NaturalQuestions	-	-	39.6	38.7	52.8
DROP	84.8	79.6	80.4	80.1	88.9
ARC-C	96.1	92.9	91.2	92.4	95.0
TriviaQA	-	-	82.1	79.9	89.2
CMMLU	-	-	60.0	84.0	90.2
C-Eval	-	-	59.6	81.7	91.9
C3	-	-	71.4	77.4	82.3
GSM8K	89.0	83.7	83.7	79.2	92.8
MATH	53.8	41.4	42.5	43.6	69.8
CMATH	-	-	72.3	78.7	91.3
HumanEval	61.0	58.5	53.1	48.8	71.4
MBPP	73.4	68.6	64.2	66.6	72.6

Model	LLama3.1 405B Inst.	LLama3.1 70B Inst.	Mixtral 8x22B Inst.	DeepSeekV2.5 Chat	Hunyuan-Large Inst.
MMLU	87.3	83.6	77.8	80.4	89.9
CMMLU	-	-	61.0	-	90.4
C-Eval	-	-	60.0	-	88.6
BBH	-	-	78.4	84.3	89.5
HellaSwag	-	-	86.0	90.3	88.5
ARC-C	96.9	94.8	90.0	-	94.6
GPQA_diamond	51.1	46.7	-	-	42.4
MATH	73.8	68.0	49.8	74.7	77.4
HumanEval	89.0	80.5	75.0	89.0	90.0
AlignBench	6.0	5.9	6.2	8.0	8.3
MT-Bench	9.1	8.8	8.1	9.0	9.4
IFEval strict-prompt	86.0	83.6	71.2	-	85.0
Arena-Hard	69.3	55.7	-	76.2	81.8
AlpacaEval-2.0	39.3	34.3	30.9	50.5	51.8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for New SOTA MoE: tencent/Tencent-Hunyuan-Large #111

Support for New SOTA MoE: tencent/Tencent-Hunyuan-Large #111

ThomasBaruzier commented Nov 5, 2024

Support for New SOTA MoE: tencent/Tencent-Hunyuan-Large #111

Support for New SOTA MoE: tencent/Tencent-Hunyuan-Large #111

Comments

ThomasBaruzier commented Nov 5, 2024