Skip to content

Marker-Inc-Korea/KoLLM_Eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KoLLM_Eval🥰

한국어 벤치마크 평가 코드 통합본(?) 2024.11.09 Version

Logickor, K2-Eval, LM-Harness and KoMT-Bench 평가를 하나의 코드에서 실행

🍚Gukbap-Series LLM🍚

Install (required)🤩

First, download ⭐LM-Eval-Harness.

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd ./lm-evaluation-harness
pip install -e .
pip install -e ".[multilingual]"

pip install vllm

Secondly, download ⭐KoMT-Bench.

git clone https://github.com/LG-AI-EXAONE/KoMT-Bench
cd ./KoMT-Bench/FastChat
sh setting.sh
cd ../../

Lastly, you need to move below files into ./lm-evaluation-harness folder.

KoMT-Bench (folder)
lm_eval (folder)
questions.jsonl # logickor
data_k2-eval-generation.csv # k2_eval
MTBench (folder)
├──logickor.py
└──k2_eval.py

korean_eval.sh

If pydantic module has been error, please re-install pydantic. Then, the problem will be solved.

Implementation🤩

sh korean_eval.sh

You must set api key through OpenAI.

You can test on a A100 GPU (using COLAB).

Debug lists😎

  1. ImportError: libgthread-2.0.so.0: cannot open shared object file: No such file or directory: apt-get install libglib2.0-0
  2. RuntimeError: Unable to open file at /home/jovyan/LLM_Eval/lm-evaluation-harness/KoMT-Bench/FastChat/fastchat/llm_judge/data/mt_bench/model_judgment/detector.tflite: Move detector.tflite file into fastchat/llm_judge/data/mt_bench/model_judgement.

Maybe, you will find detector.tflite in KoMT-Bench/FastChat.

Examples🤩

Model Logickor(0-shot) K^2-Eval Haerae(Acc) CSAT-QA(Acc) kmmlu(Acc) KoMT-Bench
Human-MarkrAI/Gukbap-Gemma2-9B 8.77 4.50 62.60 43.85 46.46 8.71
google/gemma-2-9b-it 8.32 4.38 64.34 47.06 42.51 7.92
rtzr/ko-gemma-2-9b-it 8.67 4.40 64.07 48.13 44.75 8.32
LGAI/EXAONE-3.0-7.8B-Instruct 8.64 4.43 77.09 34.76 35.23 8.92
yanolja/EEVE-Korean-Instruct-10.8B-v1.0 6.03 3.51 70.94 38.50 41.99 7.08

Logickor and K^2 Eval Evaluator: GPT-4-1106-preview
KoMT-Bench Evaluator: gpt-4-0613 (same manner as LG-AI)
Logickor [0,10], K^2-Eval [0,5] & KoMT-Bench [0,10]

References🌠

Logickor
LM-Harness
K2-eval
KoMT-Bench

About

한국어 벤치마크 평가 코드 통합본(?)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published