Skip to content

Latest commit

 

History

History

sft_trl

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Finetuning with TRL

Structure

slurm_scripts For setting up your python virtual enviroment and launching slurm jobs.

training scripts for training and utility stuff.

configs Yaml files for the accelerate configurations and training arguments

Getting started

Go to slurm_scripts and modify it according to your own paths.

sbatch slurm_scripts/setup_venv

Dataset

Either use your own dataset and convert it to chatml's conversational format. You can get inspiration from data.

Format

{"messages": [{"role": "system", "content": "You are..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"messages": [{"role": "system", "content": "You are..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"messages": [{"role": "system", "content": "You are..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

or use some of the ready-made datasets in /scratch/project_462000558/TurkuNLP_workshop/data

Training

Go to configs if you want to change training arguments, as you would with huggingface TrainingArguments

Modify the launch script at slurm_scripts

sbatch slurm_scripts/sft.sh

Full model weight training on a 34B model requires minimum of 2 nodes and atleast 3 nodes is required. Also note that as you increase the amount of nodes, the training becomes more unstable and prone to nccl crashes/hangs.

Useful links

This work was heavily inspired by:

My own fork of the alignment handbook is https://github.com/Vmjkom/alignment-handbook (wip). The Alignment handbook implements reinforcement learning techniques, along with the Sft. Furthermore, there is more sophisticated data handling

Documentation