QLoRA Inference #1020

jeff52415 · 2024-05-25T04:39:43Z

Can I load QLoRA fine-tuning weights into a Hugging Face model as shown below?

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4'
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=model_id,  
    #config=config,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    device_map='auto'
)

model = PeftModel.from_pretrained(model, "qlora_finetune_folder/")

I have changed the Checkpointer to FullModelHFCheckpointer.
Essentially, it is loadable & runnable, but I am curious if it reflects the same structure as qlora_llama3_8b. Thanks.

ebsmothers · 2024-05-27T16:41:05Z

Hi @jeff52415 thanks for opening this issue, this is a really good question. One possible source of discrepancy is the different implementations of NF4 quantization used by torchtune and Hugging Face. To be more explicit, torchtune relies on the NF4Tensor class from torchao in QLoRA instead of the bitsandbytes version from Hugging Face. I need to verify that quantizing a torchtune checkpoint with bitsandbytes yields the same result as quantizing with ao. Let me look into it and get back to you. Also cc @rohan-varma who may have some insights here

ebsmothers added the high-priority label May 27, 2024

msaroufim mentioned this issue May 30, 2024

Numerics checks between NF4 and bnb nf4 pytorch/ao#295

Open

felipemello1 assigned ebsmothers Jun 28, 2024

joecummings added triage review This issue should be discussed in weekly review and removed triage review This issue should be discussed in weekly review high-priority labels Dec 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QLoRA Inference #1020

QLoRA Inference #1020

jeff52415 commented May 25, 2024

ebsmothers commented May 27, 2024

QLoRA Inference #1020

QLoRA Inference #1020

Comments

jeff52415 commented May 25, 2024

ebsmothers commented May 27, 2024