artidoro · randerzander · May 25, 2023 · May 25, 2023 · May 25, 2023 · May 27, 2023
diff --git a/README.md b/README.md
@@ -21,7 +21,7 @@ In addition, we release the Guanaco model family for base LLaMA model sizes of 7
 ## Demo
 Guanaco is a system purely intended for research purposes and could produce problematic outputs.
 
-1. Access the [live demo here](https://huggingface.co/spaces/uwnlp/guanaco-playground-tgi). 
+1. Access the [live demo here](https://huggingface.co/spaces/uwnlp/guanaco-playground-tgi). Note this is the 33B model, the 65B model demo will come later.
 
 2. Or host your own Guanaco gradio demo directly in Colab with [this notebook](https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing). Works with free GPUs for 7B and 13B models.
 
@@ -41,7 +41,7 @@ pip install -q -U git+https://github.com/huggingface/accelerate.git
 
 ## Getting Started
 The `qlora.py` code is a starting point for finetuning and inference on various datasets.
-Basic command for finetuning a baseline model on the Alpaca dataset:
+Basic command for finetuning a baseline (HuggingFace formatted) llama model on the Alpaca dataset:
 ```bash
 python qlora.py --model_name_or_path <path_or_name>
 ```
@@ -78,10 +78,14 @@ Quantization parameters are controlled from the `BitsandbytesConfig` ([see HF do
 You can access the paged optimizer with the argument `--optim paged_adamw_32bit`
 
 ## Tutorials and Demonstrations
-Examples are found under the `examples/` folder.
+Here is [a blog](https://huggingface.co/blog/4bit-transformers-bitsandbytes) discussing 4-bit quantization, QLoRA, and how they are integrated in transformers.
 
-### Colab Gradio Demo
 You can host your own gradio Guanaco demo directly in Colab following [this notebook](https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing). 
+In addition, here are Colab notebooks with examples for inference and finetuning using QLoRA:
+- [Inference notebook](https://colab.research.google.com/drive/1ge2F1QSK8Q7h0hn3YKuBCOAS0bK8E0wf?usp=sharing)
+- [Finetuning notebook](https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing)
+
+Other examples are found under the `examples/` folder.
 
 ## Sample Outputs
 We provide generations for the models described in the paper for both OA and Vicuna queries in the `eval/generations` folder. These are intended to foster further research on model evaluation and analysis.
@@ -96,6 +100,9 @@ To facilitate the replication of our evaluation and future work in this area, we
 
 More details can be found at `eval/EVAL_README.md`.
 
+## Dataset for Guanaco
+You can find the dataset used to train Guanaco models on HF at [timdettmers/openassistant-guanaco](https://huggingface.co/datasets/timdettmers/openassistant-guanaco).
+
 ## Known Issues and Limitations
 Here a list of known issues and bugs. If your issue is not reported here, please open a new issue and describe the problem.
 
@@ -118,7 +125,8 @@ Here a list of known issues and bugs. If your issue is not reported here, please
 }
 ```
 
-## Acknoledgements
+## Acknowledgements
 We thank the Huggingface team, in particular Younes Belkada, for their support integrating QLoRA with PEFT and transformers libraries.
+We also thank Meta for releasing the LLaMA models without which this work would not have been possible.
 
 This repo builds on the [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca) and [LMSYS FastChat](https://github.com/lm-sys/FastChat) repos.
diff --git a/qlora.py b/qlora.py
@@ -23,11 +23,12 @@
     AutoModelForCausalLM, 
     set_seed, 
     Seq2SeqTrainer,
-    BitsAndBytesConfig
+    BitsAndBytesConfig,
+    LlamaTokenizerFast
+
 )
 from datasets import load_dataset
 import evaluate
-import nltk
 
 from peft import (
     prepare_model_for_int8_training,
@@ -602,24 +603,25 @@ def train():
         padding_side="right",
         use_fast=True,
     )
-    if tokenizer.pad_token is None:
+    if tokenizer._pad_token is None:
         smart_tokenizer_and_embedding_resize(
             special_tokens_dict=dict(pad_token=DEFAULT_PAD_TOKEN),
             tokenizer=tokenizer,
             model=model,
         )
-    if any(key in args.model_name_or_path for key in ['llama', '7B', '13B', '30B', '65B']):
-        # LLaMA tokenizer does not have special tokens set.
-        # Add them to prevent them from being parsed into different tokens.
+    if isinstance(tokenizer, LlamaTokenizerFast):
+        # LLaMA tokenizer may not have correct special tokens set.
+        # Check and add them if missing to prevent them from being parsed into different tokens.
         # Note that these are present in the vocabulary. 
         # Note also that `model.config.pad_token_id` is 0 which corresponds to `<unk>` token.
-        tokenizer.add_special_tokens(
-            {
-                "eos_token": tokenizer.convert_ids_to_tokens(model.config.eos_token_id),
-                "bos_token": tokenizer.convert_ids_to_tokens(model.config.bos_token_id),
-                "unk_token": tokenizer.convert_ids_to_tokens(model.config.pad_token_id), 
-            }
-        )
+        if tokenizer.eos_token_id != model.config.eos_token_id or tokenizer.pad_token_id != model.config.pad_token_id or tokenizer.unk_token_id != model.config.unk_token_id:
+            tokenizer.add_special_tokens(
+                {
+                    "eos_token": tokenizer.convert_ids_to_tokens(model.config.eos_token_id),
+                    "bos_token": tokenizer.convert_ids_to_tokens(model.config.bos_token_id),
+                    "unk_token": tokenizer.convert_ids_to_tokens(model.config.pad_token_id),
+                }
+            )
 
     data_module = make_data_module(tokenizer=tokenizer, args=args)
     trainer = Seq2SeqTrainer(

diff --git a/requirements.txt b/requirements.txt
@@ -4,3 +4,6 @@ rouge-score==0.1.2
 scikit-learn==1.2.2
 sentencepiece==0.1.99
 wandb==0.15.2
+datasets
+evaluate
+scipy
diff --git a/scripts/finetune.sh b/scripts/finetune.sh
@@ -8,7 +8,7 @@ python qlora.py \
     --source_max_len 384 \
     --target_max_len 128 \
     --per_device_train_batch_size 4 \
-    --per_device_train_batch_size 4 \
+    --per_device_eval_batch_size 4 \
     --gradient_accumulation_steps 4 \
     --logging_steps 10 \
     --max_steps 10000 \