Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added other dependencies and clarification about HF models #11

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open
18 changes: 13 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ In addition, we release the Guanaco model family for base LLaMA model sizes of 7
## Demo
Guanaco is a system purely intended for research purposes and could produce problematic outputs.

1. Access the [live demo here](https://huggingface.co/spaces/uwnlp/guanaco-playground-tgi).
1. Access the [live demo here](https://huggingface.co/spaces/uwnlp/guanaco-playground-tgi). Note this is the 33B model, the 65B model demo will come later.

2. Or host your own Guanaco gradio demo directly in Colab with [this notebook](https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing). Works with free GPUs for 7B and 13B models.

Expand All @@ -41,7 +41,7 @@ pip install -q -U git+https://github.com/huggingface/accelerate.git

## Getting Started
The `qlora.py` code is a starting point for finetuning and inference on various datasets.
Basic command for finetuning a baseline model on the Alpaca dataset:
Basic command for finetuning a baseline (HuggingFace formatted) llama model on the Alpaca dataset:
```bash
python qlora.py --model_name_or_path <path_or_name>
```
Expand Down Expand Up @@ -78,10 +78,14 @@ Quantization parameters are controlled from the `BitsandbytesConfig` ([see HF do
You can access the paged optimizer with the argument `--optim paged_adamw_32bit`

## Tutorials and Demonstrations
Examples are found under the `examples/` folder.
Here is [a blog](https://huggingface.co/blog/4bit-transformers-bitsandbytes) discussing 4-bit quantization, QLoRA, and how they are integrated in transformers.

### Colab Gradio Demo
You can host your own gradio Guanaco demo directly in Colab following [this notebook](https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing).
In addition, here are Colab notebooks with examples for inference and finetuning using QLoRA:
- [Inference notebook](https://colab.research.google.com/drive/1ge2F1QSK8Q7h0hn3YKuBCOAS0bK8E0wf?usp=sharing)
- [Finetuning notebook](https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing)

Other examples are found under the `examples/` folder.

## Sample Outputs
We provide generations for the models described in the paper for both OA and Vicuna queries in the `eval/generations` folder. These are intended to foster further research on model evaluation and analysis.
Expand All @@ -96,6 +100,9 @@ To facilitate the replication of our evaluation and future work in this area, we

More details can be found at `eval/EVAL_README.md`.

## Dataset for Guanaco
You can find the dataset used to train Guanaco models on HF at [timdettmers/openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco).

## Known Issues and Limitations
Here a list of known issues and bugs. If your issue is not reported here, please open a new issue and describe the problem.

Expand All @@ -118,7 +125,8 @@ Here a list of known issues and bugs. If your issue is not reported here, please
}
```

## Acknoledgements
## Acknowledgements
We thank the Huggingface team, in particular Younes Belkada, for their support integrating QLoRA with PEFT and transformers libraries.
We also thank Meta for releasing the LLaMA models without which this work would not have been possible.

This repo builds on the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) and [LMSYS FastChat](https://github.com/lm-sys/FastChat) repos.
28 changes: 15 additions & 13 deletions qlora.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,12 @@
AutoModelForCausalLM,
set_seed,
Seq2SeqTrainer,
BitsAndBytesConfig
BitsAndBytesConfig,
LlamaTokenizerFast

)
from datasets import load_dataset
import evaluate
import nltk

from peft import (
prepare_model_for_int8_training,
Expand Down Expand Up @@ -602,24 +603,25 @@ def train():
padding_side="right",
use_fast=True,
)
if tokenizer.pad_token is None:
if tokenizer._pad_token is None:
smart_tokenizer_and_embedding_resize(
special_tokens_dict=dict(pad_token=DEFAULT_PAD_TOKEN),
tokenizer=tokenizer,
model=model,
)
if any(key in args.model_name_or_path for key in ['llama', '7B', '13B', '30B', '65B']):
# LLaMA tokenizer does not have special tokens set.
# Add them to prevent them from being parsed into different tokens.
if isinstance(tokenizer, LlamaTokenizerFast):
# LLaMA tokenizer may not have correct special tokens set.
# Check and add them if missing to prevent them from being parsed into different tokens.
# Note that these are present in the vocabulary.
# Note also that `model.config.pad_token_id` is 0 which corresponds to `<unk>` token.
tokenizer.add_special_tokens(
{
"eos_token": tokenizer.convert_ids_to_tokens(model.config.eos_token_id),
"bos_token": tokenizer.convert_ids_to_tokens(model.config.bos_token_id),
"unk_token": tokenizer.convert_ids_to_tokens(model.config.pad_token_id),
}
)
if tokenizer.eos_token_id != model.config.eos_token_id or tokenizer.pad_token_id != model.config.pad_token_id or tokenizer.unk_token_id != model.config.unk_token_id:
tokenizer.add_special_tokens(
{
"eos_token": tokenizer.convert_ids_to_tokens(model.config.eos_token_id),
"bos_token": tokenizer.convert_ids_to_tokens(model.config.bos_token_id),
"unk_token": tokenizer.convert_ids_to_tokens(model.config.pad_token_id),
}
)

data_module = make_data_module(tokenizer=tokenizer, args=args)
trainer = Seq2SeqTrainer(
Expand Down
3 changes: 3 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@ rouge-score==0.1.2
scikit-learn==1.2.2
sentencepiece==0.1.99
wandb==0.15.2
datasets
evaluate
scipy
2 changes: 1 addition & 1 deletion scripts/finetune.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ python qlora.py \
--source_max_len 384 \
--target_max_len 128 \
--per_device_train_batch_size 4 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 4 \
--logging_steps 10 \
--max_steps 10000 \
Expand Down