Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoRA finetuning tutorial #671

Merged
merged 11 commits into from
Sep 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions .github/workflows/doc-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,14 @@ jobs:
- name: Make documentation
shell: bash
run: |
doc-builder build optimum.neuron docs/source/ --repo_name optimum-neuron --build_dir neuron-doc-build/ --version ${{ env.VERSION }} --version_tag_suffix "" --html --clean
doc-builder build optimum.neuron docs/source/ \
--repo_name optimum-neuron \
--build_dir neuron-doc-build/ \
--version ${{ env.VERSION }} \
--version_tag_suffix "" \
--html \
--clean \
--notebook_dir docs/notebooks/
cd neuron-doc-build/
mv optimum.neuron optimum-neuron
doc-builder push optimum-neuron --doc_build_repo_id "hf-doc-build/doc-build" --token "${{ secrets.HF_DOC_BUILD_PUSH }}" --commit_msg "Updated with commit $COMMIT_SHA See: https://github.com/huggingface/optimum-neuron/commit/$COMMIT_SHA" --n_retries 5
doc-builder push optimum-neuron --doc_build_repo_id "hf-doc-build/doc-build" --token "${{ secrets.HF_DOC_BUILD_PUSH }}" --commit_msg "Updated with commit $COMMIT_SHA See: https://github.com/huggingface/optimum-neuron/commit/$COMMIT_SHA" --n_retries 5
9 changes: 8 additions & 1 deletion .github/workflows/doc-pr-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,14 @@ jobs:
- name: Make documentation
shell: bash
run: |
doc-builder build optimum.neuron docs/source/ --repo_name optimum-neuron --build_dir neuron-doc-build/ --version pr_${{ env.PR_NUMBER }} --version_tag_suffix "" --html --clean
doc-builder build optimum.neuron docs/source/ \
--repo_name optimum-neuron \
--build_dir neuron-doc-build/ \
--version pr_${{ env.PR_NUMBER }} \
--version_tag_suffix "" \
--html \
--clean \
--notebook_dir docs/notebooks/

- name: Save commit_sha & pr_number
run: |
Expand Down
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
title: Fine-tune BERT for Text Classification on AWS Trainium
- local: training_tutorials/finetune_llm
title: Fine-tune Llama 3 8B on AWS Trainium
- local: training_tutorials/sft_lora_finetune_llm
title: Fine-tune Llama 3 8B on with LoRA and the SFTTrainer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title: Fine-tune Llama 3 8B on with LoRA and the SFTTrainer
title: Fine-tune Llama 3.1 8B with LoRA and the SFTTrainer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do 3.1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could not do for the rope thing? Or for the transformers version.
Let's keep it like that. In any case we will move to 70B asap, and I can try to do 3.1 then.

title: Training Tutorials
- sections:
- local: inference_tutorials/notebooks
Expand Down
33 changes: 13 additions & 20 deletions docs/source/training_tutorials/finetune_llm.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -45,15 +45,15 @@ And many others!

Before starting this tutorial, you will need to setup your environment:

1. Create an AWS Trainium instance. You can follow this [guide](https://huggingface.co/docs/optimum-neuron/guides/setup_aws_instance) to create one.
1. Create an AWS Trainium instance. **You will need a `trn1.32xlarge`, which contains 16 Neuron Devices.** You can follow this [guide](https://huggingface.co/docs/optimum-neuron/guides/setup_aws_instance) to create one.
2. Make sure you are logged in on the Hugging Face Hub:
```bash
huggingface-cli login --token YOUR_TOKEN
```
3. Check that you have access to the model. Some open source models are gated, meaning that users need to apply to the model owner to be able to use the model weights. Here we will be training Llama-3 8B, for which there are two possibilities:
* The official gated repo: [`meta-llama/Meta-Llama-3-8B`](https://huggingface.co/meta-llama/Meta-Llama-3-8B)
* The non-official un-gated repo: [`NousResearch/Meta-Llama-3-8B`](https://huggingface.co/NousResearch/Meta-Llama-3-8B)
4. Clone the Optimum Neuron repository, **which contains the [complete script](https://github.com/huggingface/optimum-neuron/docs/source/training_tutorials/finetune_llm.py) described in this tutorial:**
4. Clone the Optimum Neuron repository, **which contains the [complete script](https://github.com/huggingface/optimum-neuron/blob/main/docs/source/training_tutorials/finetune_llm.py) described in this tutorial:**
```bash
git clone https://github.com/huggingface/optimum-neuron.git
```
Expand All @@ -68,7 +68,10 @@ Example:
{
"instruction": "What is world of warcraft",
"context": "",
"response": "World of warcraft is a massive online multi player role playing game. It was released in 2004 by bizarre entertainment"
"response": (
"World of warcraft is a massive online multi player role playing game. "
"It was released in 2004 by blizarre entertainment"
)
}
```

Expand Down Expand Up @@ -98,7 +101,7 @@ def format_dolly(sample):
return prompt
```

In addition to formatting our samples, we also want to pack multiple samples to one sequence to have a more efficient training. In other words, we are stacking multiple samples to one sequence and split them with an EOS Token. Packing/stacking samples can be done during training or before. Here, we will do it before training to save time.
In addition to formatting our samples, we also want to pack multiple samples to one sequence to have a more efficient training. In other words, we are stacking multiple samples to one sequence and split them with an EOS Token. Packing/stacking samples can be done during training or before.

The following function `pack_dataset` takes a `dataset` and a `chunk_length` and returns a packed dataset:

Expand Down Expand Up @@ -181,16 +184,6 @@ dataset = dataset.map(
lm_dataset = pack_dataset(dataset, chunk_length=2048) # We use 2048 as the maximum length for packing
```

After we processed the datasets we are going save it to disk. You could also save it to S3 or the Hugging Face Hub for later use.

_Note: Packing and preprocessing your dataset can be run outside of the Trainium instance._

```python
# save train_dataset to disk
dataset_path = "tokenized_dolly"
lm_dataset.save_to_disk(dataset_path)
```

## 3. Fine-tune Llama on AWS Trainium using the `NeuronTrainer`

Normally you would use the **[Trainer](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer)** and **[TrainingArguments](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments)** classes to fine-tune PyTorch-based transformer models.
Expand Down Expand Up @@ -244,16 +237,18 @@ The key points here are:

## 4. Launch Training

We prepared a script called [finetune_llm.py](https://github.com/huggingface/optimum-neuron/docs/source/training_tutorials/finetune_llm.py) summing up everything mentioned in this tutorial.
We prepared a script called [finetune_llm.py](https://github.com/huggingface/optimum-neuron/blob/main/docs/source/training_tutorials/finetune_llm.py) summing up everything mentioned in this tutorial.

<Tip>

This script is a minimalistic version of our official example training script to run causal language modeling fine-tuning, called [run_clm.py](https://github.com/huggingface/optimum-neuron/blob/main/examples/language-modeling/run_clm.py). For the sake of this tutorial, we tried to get rid of anything that is not necessary, but if you want to do more custom things, maybe the solution is already implemented in `run_clm.py`!
This script is a minimalistic version of our official example training script to run causal language modeling fine-tuning, called [run_clm.py](https://github.com/huggingface/optimum-neuron/blob/main/examples/language-modeling/run_clm.py). For the sake of this tutorial, we tried to get rid of anything that is not necessary, and added the formatting step necessary for fine-tuning, but if you want to do more custom things, maybe the solution is already implemented in `run_clm.py`!

Also, these scripts are more designed as templates than final scripts. Feel free to take `finetune_llm.py` or `run_clm.py` and adapt them to your own needs!

</Tip>

PyTorch Neuron uses `torch_xla`. It evaluates operations lazily during execution of the training loops, which means it builds a symbolic graph in the background and the graph is executed on the hardware only when the tensor is printed, transfered to CPU, or `xm.mark_step()` is called. During execution, multiple graphs can be build depending on control-flow and it can take time to compile each graph sequentially. To alleviate that, the Neuron SDK provides `neuron_parallel_compile`, a tool which performs a fast trial run that builds all the graphs and compile them in parallel. This step is usually called precompilation.

### Precompilation

When training models on AWS Trainium we first need to compile our model with our training arguments.
Expand All @@ -266,8 +261,7 @@ The compilation command simply consists in calling your script as an input to th

```bash
MALLOC_ARENA_MAX=64 XLA_USE_BF16=1 neuron_parallel_compile torchrun --nproc_per_node=32 finetune_llm.py \
--model_id {model_id} \
--dataset_path {dataset_path} \
--model_id meta-llama/Meta-Llama-3-8B \
--bf16 True \
--learning_rate 5e-5 \
--output_dir dolly_llama \
Expand Down Expand Up @@ -305,8 +299,7 @@ Launch the training, with the following command.

```bash
MALLOC_ARENA_MAX=64 XLA_USE_BF16=1 torchrun --nproc_per_node=32 finetune_llm.py \
--model_id {model_id} \
--dataset_path {dataset_path} \
--model_id meta-llama/Meta-Llama-3-8B \
--bf16 True \
--learning_rate 5e-5 \
--output_dir dolly_llama \
Expand Down
30 changes: 8 additions & 22 deletions docs/source/training_tutorials/finetune_llm.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
from dataclasses import dataclass, field
from functools import partial
from itertools import chain
from typing import Optional

from datasets import load_dataset, load_from_disk
from datasets import load_dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
Expand All @@ -17,10 +16,6 @@
from optimum.neuron.distributed import lazy_load_for_parallelism


# Load dataset from the hub
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")


def format_dolly(sample):
instruction = f"### Instruction\n{sample['instruction']}"
context = f"### Context\n{sample['context']}" if len(sample["context"]) > 0 else None
Expand Down Expand Up @@ -70,9 +65,7 @@ def chunk(sample, chunk_length=chunk_length):
return lm_dataset


def create_and_save_dataset(model_id: str, dataset_path: str):
tokenizer = AutoTokenizer.from_pretrained(model_id)

def prepare_dataset(tokenizer, dataset):
# template dataset to add prompt to each sample
def template_dataset(sample):
sample["text"] = f"{format_dolly(sample)}{tokenizer.eos_token}"
Expand All @@ -89,15 +82,16 @@ def template_dataset(sample):
# chunk dataset
lm_dataset = pack_dataset(dataset, chunk_length=2048) # We use 2048 as the maximum length for packing

# save train_dataset to disk
lm_dataset.save_to_disk(dataset_path)
return lm_dataset


def training_function(script_args, training_args):
# load dataset
dataset = load_from_disk(script_args.dataset_path)

tokenizer = AutoTokenizer.from_pretrained(script_args.model_id)

# Load dataset from the hub and prepare it for training.
dataset = load_dataset("databricks/databricks-dolly-15k", split="train")
dataset = prepare_dataset(tokenizer, dataset)

with lazy_load_for_parallelism(tensor_parallel_size=training_args.tensor_parallel_size):
model = AutoModelForCausalLM.from_pretrained(script_args.model_id)

Expand All @@ -122,20 +116,12 @@ class ScriptArguments:
default="meta-llama/Meta-Llama-3-8B",
metadata={"help": "The model that you want to train from the Hugging Face hub."},
)
dataset_path: Optional[str] = field(
metadata={"help": "Path to the preprocessed and tokenized dataset."},
default=None,
)


def main():
parser = HfArgumentParser([ScriptArguments, TrainingArguments])
script_args, training_args = parser.parse_args_into_dataclasses()

if script_args.dataset_path is None:
create_and_save_dataset(script_args.model_id, "tokenized_dolly")
script_args.dataset_path = "tokenized_dolly"

# set seed
set_seed(training_args.seed)

Expand Down
Loading
Loading