Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does This Fine-tuning code doesn't work in single A6000 GPU for LLaMA-2-7B with LoRA? #359

Open
01choco opened this issue Nov 4, 2024 · 0 comments

Comments

@01choco
Copy link

01choco commented Nov 4, 2024

Hi, i am trying to work with your RLHF code to Fine-tune and Reinforcement learning for LLaMA. But I keep getting CUDA out of Memory error while fine-tuning LLaMA-2-7B model with single A6000 GPU even though i use PEFT LoRA method.

I applied these changes to get rid of CUDA om but error is still occuring.

  1. smaller batch size (1)
  2. smaller max length of token and sequence
  3. PEFT

Can i run LLaMA-2-7B fine-tuning with A6000 GPU? Does anyone have succeed LLaMA fine-tuning with single GPU? I just want to know if I'm doing something wrong or if it's just fundamentally impossible to fine tune this model into a single A6000 GPU.
And does anyone knows how to get rid of CUDA error in this situation?
Here is my config.yaml file!

  model: "llama-7B"
  model_folder: "./llama/llama-2-7b"
  tokenizer_path: "./llama/tokenizer.model"
  train_dataset_path: "./datasets/actor_training_data.json"
  validation_dataset_path: null
  # froze model embedding during training
  froze_embeddings: True
  # use fairscale layers to build the model instead of vanilla pytorch
  # only for llama
  use_fairscale: True
  # max sequence length for the actor (i.e. prompt + completion) it depends on
  # the model used.
  max_sequence_length: 1024
  # max tokens generated by the actor (completion only)
  max_tokens: 1024
  # minimum number of tokens generated by the actor
  min_tokens: 100
  # additional prompt tokens to be used for template or as safety
  additonal_prompt_tokens: 20
  # temperature for the actor
  temperature: 0.1
  batch_size: 2
  # number iteration after print
  iteration_per_print: 1
  lr: 0.000009
  epochs: 1
  # number of backpropagation after saving the checkpoints
  checkpoint_steps: 5000
  # number of checkpoints to keep while removing the older 
  # (keep memory consumption of checkpoints reasonable)
  n_checkpoints_to_keep: 5
  # here specify the name of the actor checkpoint from which resume 
  # during actor training. If null load the last one.
  checkpoint_name: null
  # deepspeed settings
  deepspeed_enable: True
  deepspeed_config_path: "./artifacts/config/ds_config.json"
  # accelerate settings
  accelerate_enable: False
  # use_peft - the parameters of PEFT can be modified in the peft_config.yaml
  peft_enable: True
  peft_config_path: "./artifacts/config/peft_config.yaml"

and here is my peft_config file:

inference_mode: False
r: 8
lora_alpha: 32
lora_dropout: 0.1

Thank you for reading!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant