This a self-built repo to reproduce the fine-tuning of some Large Language Models (LLM) based on certain training frameworks like DeepSpeed, bitsandbytes, and QLora. This project is currently servered as a personal-trial to test the feasibility of training large models based on QLoRA.
NVIDIA-SMI 510.108.03
Driver Version: 510.108.03
CUDA Version: 11.6
8 * NVIDIA RTX A6000 (50GB)
Model Name | Parameters | Trainable Parameters Percentage | Methods | Batch Size (train/evaluate) | Training Time | Inference Time |
---|---|---|---|---|---|---|
T5-3B | 3B | 100% | - | 1/4 | 1 (base) | 1 (base) |
T5-3B | 3B | 100% | DeepSpeed (Zero-2) | 2/4 | 0.8 | 1 |
flan-t5-base | 0.248B | 50% | QLora | 36/16 | 0.04 | - |
falcon-7B | 7B | 0.0653% | QLora | 4/24 | 4 | 6.1 |
gpt-neox | 20B | 0.0816% | QLora | 4/24 | 4 | 6.1 |
llama-65B | 65B | 0.0639% | QLora | 4/24 | 4 | 6.1 |
- | - | - | - | - | - | - |
This is a method to depend on 4bit training to decrease the memory cost of LLMs (derived from int8). By introducing more information loss (which might cause performation loss) and time cost, we saved more memory space to fit in large models (20B, 65B...).
pip install bitsandbytes
pip install git+https://github.com/huggingface/transformers.git
pip install git+https://github.com/huggingface/peft.git
pip install git+https://github.com/huggingface/accelerate.git
pip install datasets
pip install deepspeed
deepspeed --include localhost:0,1,2,3,4,5,6,7,8 finaltest_trainer_eval.py
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 LLAMA_65B.py
CUDA_VISIBLE_DEVICES=0 python3 GPT_Neox.py
GPT_Neox can only be run by using 1 GPU, reason unknown.
CUDA_VISIBLE_DEVICES=0,1,2 python3 Flan_T5.py
T5-flan-base can only be run by using fewer than 3 GPUs, reason unknown.
What is the average enrollment of schools?
db_id:school_player
Table player, columns = [*,Player_ID,Player,Team,Age,Position,School_ID]
Table school, columns = [*,School_ID,School,Location,Enrollment,Founded,Denomination,Boys_or_Girls,Day_or_Boarding,Year_Entered_Competition,School_Colors]
Table school_details, columns = [*,School_ID,Nickname,Colors,League,Class,Division]
Table school_performance, columns = [*,School_Id,School_Year,Class_A,Class_AA]
foreign key:[school_details.School_ID = school.School_ID,school_performance.School_Id = school.School_ID,player.School_ID = school.School_ID]
primary key:[school.School_ID,school_details.School_ID,school_performance.School_Id,player.Player_ID]
Expected Output:
SELECT avg(Enrollment) FROM school
- As the original paramter size for each model varies, better use larger when training smaller models, and smaller when training larger models.
- So far, the model I have tried can be trained smoothly, but the performance is terrible, I don't know whether the problem is out of training or out of this Qlora method.