Introduction

This a self-built repo to reproduce the fine-tuning of some Large Language Models (LLM) based on certain training frameworks like DeepSpeed, bitsandbytes, and QLora. This project is currently servered as a personal-trial to test the feasibility of training large models based on QLoRA.

Hardware

NVIDIA-SMI 510.108.03
Driver Version: 510.108.03
CUDA Version: 11.6
8 * NVIDIA RTX A6000 (50GB)

Model Reproduction

Model Name	Parameters	Trainable Parameters Percentage	Methods	Batch Size (train/evaluate)	Training Time	Inference Time
T5-3B	3B	100%	-	1/4	1 (base)	1 (base)
T5-3B	3B	100%	DeepSpeed (Zero-2)	2/4	0.8	1
flan-t5-base	0.248B	50%	QLora	36/16	0.04	-
falcon-7B	7B	0.0653%	QLora	4/24	4	6.1
gpt-neox	20B	0.0816%	QLora	4/24	4	6.1
llama-65B	65B	0.0639%	QLora	4/24	4	6.1
-	-	-	-	-	-	-

Qlora

This is a method to depend on 4bit training to decrease the memory cost of LLMs (derived from int8). By introducing more information loss (which might cause performation loss) and time cost, we saved more memory space to fit in large models (20B, 65B...).

Requirements

pip install bitsandbytes
pip install git+https://github.com/huggingface/transformers.git 
pip install git+https://github.com/huggingface/peft.git
pip install git+https://github.com/huggingface/accelerate.git
pip install datasets
pip install deepspeed

Launch

T5_3B.py

deepspeed --include localhost:0,1,2,3,4,5,6,7,8 finaltest_trainer_eval.py

LLAMA_65B.py

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python3 LLAMA_65B.py

GPT_Neox.py

CUDA_VISIBLE_DEVICES=0 python3 GPT_Neox.py

GPT_Neox can only be run by using 1 GPU, reason unknown.

Flan_T5.py

CUDA_VISIBLE_DEVICES=0,1,2 python3 Flan_T5.py

T5-flan-base can only be run by using fewer than 3 GPUs, reason unknown.

Input Examples

What is the average enrollment of schools?
db_id:school_player
Table player, columns = [*,Player_ID,Player,Team,Age,Position,School_ID]
Table school, columns = [*,School_ID,School,Location,Enrollment,Founded,Denomination,Boys_or_Girls,Day_or_Boarding,Year_Entered_Competition,School_Colors]
Table school_details, columns = [*,School_ID,Nickname,Colors,League,Class,Division]
Table school_performance, columns = [*,School_Id,School_Year,Class_A,Class_AA]

foreign key:[school_details.School_ID = school.School_ID,school_performance.School_Id = school.School_ID,player.School_ID = school.School_ID]
primary key:[school.School_ID,school_details.School_ID,school_performance.School_Id,player.Player_ID]

Expected Output:

SELECT avg(Enrollment) FROM school

Notice:

As the original paramter size for each model varies, better use larger when training smaller models, and smaller when training larger models.
So far, the model I have tried can be trained smoothly, but the performance is terrible, I don't know whether the problem is out of training or out of this Qlora method.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Introduction

Hardware

Model Reproduction

Qlora

Requirements

Launch

T5_3B.py

LLAMA_65B.py

GPT_Neox.py

Flan_T5.py

Input Examples

Notice:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Introduction

Hardware

Model Reproduction

Qlora

Requirements

Launch

T5_3B.py

LLAMA_65B.py

GPT_Neox.py

Flan_T5.py

Input Examples

Notice: