Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2.5 #1863

Merged
merged 25 commits into from
Oct 31, 2024
Merged

Qwen2.5 #1863

Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
5b2acbb
additional special tokens
calvinpelletier Oct 15, 2024
5c18e35
chat template
calvinpelletier Oct 15, 2024
6c7c0a9
.
calvinpelletier Oct 15, 2024
faa9249
tool response
calvinpelletier Oct 16, 2024
0af29a1
qwen2.5 model builders
calvinpelletier Oct 17, 2024
28cef3a
qwen2.5 lora builders
calvinpelletier Oct 17, 2024
d4a937c
configs
calvinpelletier Oct 18, 2024
d5ba267
docstrings
calvinpelletier Oct 18, 2024
0c9eff7
fix
calvinpelletier Oct 18, 2024
ab8d452
various
calvinpelletier Oct 18, 2024
590a89c
lint
calvinpelletier Oct 18, 2024
9595a46
separating qwen2 and qwen2.5
calvinpelletier Oct 20, 2024
9522452
unit test for qwen2.5 tokenizer
calvinpelletier Oct 21, 2024
d938e3b
Merge remote-tracking branch 'origin/main' into qwen2.5
calvinpelletier Oct 23, 2024
70bbe96
separate model builders for base and instruct models
calvinpelletier Oct 24, 2024
a110bbb
moving chat template logic into tokenizer
calvinpelletier Oct 25, 2024
ac221f3
tool call special tokens
calvinpelletier Oct 25, 2024
a17ba7a
separate qwen2/2.5 tokenizers
calvinpelletier Oct 25, 2024
50232be
configs
calvinpelletier Oct 27, 2024
b13de5a
Merge remote-tracking branch 'origin/main' into qwen2.5
calvinpelletier Oct 27, 2024
ec905e7
various
calvinpelletier Oct 30, 2024
c064d1d
Merge remote-tracking branch 'origin/main' into qwen2.5
calvinpelletier Oct 30, 2024
e813076
addressing comments
calvinpelletier Oct 31, 2024
c35ec53
adding base/instruct explanations in docstrings
calvinpelletier Oct 31, 2024
9271c5c
registry fix
calvinpelletier Oct 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions recipes/configs/qwen2_5/0_5B_full.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Config for multi-device full finetuning in full_finetune_distributed.py
# using a Qwen2.5 0.5B model
#
# This config assumes that you've run the following command before launching
# this run:
# tune download Qwen/Qwen2.5-0.5B-Instruct --output-dir /tmp/Qwen2_5-0_5B-Instruct --ignore-patterns None
#
# To launch on 4 devices, run the following command from root:
# tune run --nnodes 1 --nproc_per_node 4 full_finetune_distributed --config qwen2_5/0_5B_full
calvinpelletier marked this conversation as resolved.
Show resolved Hide resolved
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training
# you can run:
# tune run --nnodes 1 --nproc_per_node 4 full_finetune_distributed --config qwen2_5/0_5B_full checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works best when the model is being fine-tuned on 2+ GPUs.
# Single device full finetuning requires more memory optimizations. It's
# best to use 0_5B_full.yaml for those cases
calvinpelletier marked this conversation as resolved.
Show resolved Hide resolved

# Tokenizer
tokenizer:
_component_: torchtune.models.qwen2_5.qwen2_5_tokenizer
path: /tmp/Qwen2_5-0_5B-Instruct/vocab.json
merges_file: /tmp/Qwen2_5-0_5B-Instruct/merges.txt
max_seq_len: null

# Dataset
dataset:
_component_: torchtune.datasets.alpaca_cleaned_dataset
calvinpelletier marked this conversation as resolved.
Show resolved Hide resolved
seed: null
shuffle: True

# Model Arguments
model:
_component_: torchtune.models.qwen2_5.qwen2_5_0_5b

checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/Qwen2_5-0_5B-Instruct
checkpoint_files: [
model.safetensors
]
recipe_checkpoint: null
output_dir: /tmp/Qwen2_5-0_5B-Instruct-finetune
model_type: QWEN2
resume_from_checkpoint: False

# Fine-tuning arguments
batch_size: 2
epochs: 1
optimizer:
_component_: torch.optim.AdamW
fused: True
lr: 2e-5
loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
max_steps_per_epoch: null
gradient_accumulation_steps: 16

# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True
calvinpelletier marked this conversation as resolved.
Show resolved Hide resolved

# Reduced precision
dtype: bf16

# Logging
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: ${output_dir}
output_dir: /tmp/Qwen2_5-0_5B-Instruct-finetune
log_every_n_steps: 1
log_peak_memory_stats: False
81 changes: 81 additions & 0 deletions recipes/configs/qwen2_5/0_5B_full_single_device.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Config for single device full finetuning in full_finetune_single_device.py
# using a Qwen2.5 0.5B
#
# This config assumes that you've run the following command before launching
# this run:
# tune download Qwen/Qwen2.5-0.5B-Instruct --output-dir /tmp/Qwen2_5-0_5B-Instruct --ignore-patterns None
#
# The default config uses an optimizer from bitsandbytes. If you do not have it installed,
# you can install it with
# pip install bitsandbytes
#
# To launch on a single device, run the following command from root:
# tune run full_finetune_single_device --config qwen2_5/0_5B_full_single_device
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training
# you can run:
# tune run full_finetune_single_device --config qwen2_5/0_5B_full_single_device checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works only for training on single device.

# Tokenizer
tokenizer:
_component_: torchtune.models.qwen2_5.qwen2_5_tokenizer
path: /tmp/Qwen2_5-0_5B-Instruct/vocab.json
merges_file: /tmp/Qwen2_5-0_5B-Instruct/merges.txt
max_seq_len: null

# Dataset
dataset:
_component_: torchtune.datasets.alpaca_cleaned_dataset
seed: null
shuffle: True

# Model Arguments
model:
_component_: torchtune.models.qwen2_5.qwen2_5_0_5b

checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/Qwen2_5-0_5B-Instruct
checkpoint_files: [
model.safetensors
]
recipe_checkpoint: null
output_dir: /tmp/Qwen2_5-0_5B-Instruct-finetune
model_type: QWEN2
resume_from_checkpoint: False

# Fine-tuning arguments
batch_size: 2
epochs: 1
optimizer:
_component_: torch.optim.AdamW
fused: True
lr: 2e-5

loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss
optimizer_in_bwd: False

max_steps_per_epoch: null
gradient_accumulation_steps: 8
compile: False

# Training environment
device: cuda

# Memory management
enable_activation_checkpointing: True

# Reduced precision
dtype: bf16

# Logging
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: ${output_dir}
output_dir: /tmp/Qwen2_5-0_5B-Instruct-finetune
log_every_n_steps: 1
log_peak_memory_stats: False
112 changes: 112 additions & 0 deletions recipes/configs/qwen2_5/0_5B_lora.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Config for multi-device LoRA finetuning in lora_finetune_distributed.py
# using a Qwen2.5 0.5B model
#
# This config assumes that you've run the following command before launching
# this run:
# tune download Qwen/Qwen2.5-0.5B-Instruct --output-dir /tmp/Qwen2_5-0_5B-Instruct --ignore-patterns None
#
# To launch on 2 devices, run the following command from root:
# tune run --nnodes 1 --nproc_per_node 2 lora_finetune_distributed --config qwen2_5/0_5B_lora
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training
# you can run:
# tune run --nnodes 1 --nproc_per_node 2 lora_finetune_distributed --config qwen2_5/0_5B_lora checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works best when the model is being fine-tuned on 2+ GPUs.
# For single device LoRA finetuning please use 0_5B_lora_single_device.yaml


# Model Arguments
model:
_component_: torchtune.models.qwen2_5.lora_qwen2_5_0_5b
lora_attn_modules: ['q_proj', 'v_proj']
apply_lora_to_mlp: False
apply_lora_to_output: False
lora_rank: 32
lora_alpha: 64
lora_dropout: 0.0

tokenizer:
_component_: torchtune.models.qwen2_5.qwen2_5_tokenizer
path: /tmp/Qwen2_5-0_5B-Instruct/vocab.json
merges_file: /tmp/Qwen2_5-0_5B-Instruct/merges.txt
max_seq_len: null

checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/Qwen2_5-0_5B-Instruct
checkpoint_files: [
model.safetensors
]
recipe_checkpoint: null
output_dir: /tmp/Qwen2_5-0_5B-Instruct-lora-finetune
model_type: QWEN2
resume_from_checkpoint: False

# Dataset and Sampler
dataset:
_component_: torchtune.datasets.alpaca_cleaned_dataset

seed: null
shuffle: True
batch_size: 4

# Optimizer and Scheduler
optimizer:
_component_: torch.optim.AdamW
fused: True
weight_decay: 0.01
lr: 2e-3

lr_scheduler:
_component_: torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
num_warmup_steps: 100

loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss

# Training
epochs: 1
max_steps_per_epoch: null
gradient_accumulation_steps: 4

# Logging
output_dir: /tmp/Qwen2_5-0_5B-Instruct-lora-finetune
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: ${output_dir}
log_every_n_steps: 1
log_peak_memory_stats: False

# Environment
device: cuda
dtype: bf16
enable_activation_checkpointing: True

# Show case the usage of pytorch profiler
# Set enabled to False as it's only needed for debugging training
profiler:
_component_: torchtune.training.setup_torch_profiler

enabled: False

#Output directory of trace artifacts
output_dir: ${output_dir}/profiling_outputs

#`torch.profiler.ProfilerActivity` types to trace
cpu: True
cuda: True

#trace options passed to `torch.profiler.profile`
profile_memory: False
with_stack: False
record_shapes: True
with_flops: False

# `torch.profiler.schedule` options:
# wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeat
wait_steps: 5
warmup_steps: 5
active_steps: 2
num_cycles: 1
113 changes: 113 additions & 0 deletions recipes/configs/qwen2_5/0_5B_lora_single_device.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# Config for single device LoRA finetuning in lora_finetune_single_device.py
# using a Qwen2.5 0.5B model
#
# This config assumes that you've run the following command before launching
# this run:
# tune download Qwen/Qwen2.5-0.5B-Instruct --output-dir /tmp/Qwen2_5-0_5B-Instruct --ignore-patterns None
#
# To launch on a single device, run the following command from root:
# tune run lora_finetune_single_device --config qwen2_5/0_5B_lora_single_device
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training
# you can run:
# tune run lora_finetune_single_device --config qwen2_5/0_5B_lora_single_device checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works only for training on single device.


# Model Arguments
model:
_component_: torchtune.models.qwen2_5.lora_qwen2_5_0_5b
lora_attn_modules: ['q_proj', 'v_proj']
apply_lora_to_mlp: False
apply_lora_to_output: False
lora_rank: 32
lora_alpha: 64
lora_dropout: 0.0

tokenizer:
_component_: torchtune.models.qwen2_5.qwen2_5_tokenizer
path: /tmp/Qwen2_5-0_5B-Instruct/vocab.json
merges_file: /tmp/Qwen2_5-0_5B-Instruct/merges.txt
max_seq_len: null

checkpointer:
_component_: torchtune.training.FullModelHFCheckpointer
checkpoint_dir: /tmp/Qwen2_5-0_5B-Instruct
checkpoint_files: [
model.safetensors
]
recipe_checkpoint: null
output_dir: /tmp/Qwen2_5-0_5B-Instruct-lora-finetune
model_type: QWEN2
resume_from_checkpoint: False

# Dataset and Sampler
dataset:
_component_: torchtune.datasets.alpaca_cleaned_dataset
seed: null
shuffle: True
batch_size: 4

# Optimizer and Scheduler
optimizer:
_component_: torch.optim.AdamW
fused: True
weight_decay: 0.01
lr: 2e-3

lr_scheduler:
_component_: torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
num_warmup_steps: 100

loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss

# Training
epochs: 1
max_steps_per_epoch: null
gradient_accumulation_steps: 4
compile: False

# Logging
output_dir: /tmp/Qwen2_5-0_5B-Instruct-lora-finetune
metric_logger:
_component_: torchtune.training.metric_logging.DiskLogger
log_dir: ${output_dir}
log_every_n_steps: 1
log_peak_memory_stats: False

# Environment
device: cuda
dtype: bf16

# Activations Offloading
enable_activation_checkpointing: True
enable_activation_offloading: False

# Show case the usage of pytorch profiler
# Set enabled to False as it's only needed for debugging training
profiler:
_component_: torchtune.training.setup_torch_profiler
enabled: False

#Output directory of trace artifacts
output_dir: ${output_dir}/profiling_outputs

#`torch.profiler.ProfilerActivity` types to trace
cpu: True
cuda: True

#trace options passed to `torch.profiler.profile`
profile_memory: False
with_stack: False
record_shapes: True
with_flops: False

# `torch.profiler.schedule` options:
# wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeat
wait_steps: 5
warmup_steps: 5
active_steps: 2
num_cycles: 1
Loading
Loading