Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRL SFTTrainer Examples #2211

Merged
merged 24 commits into from
Apr 24, 2024
Merged

TRL SFTTrainer Examples #2211

merged 24 commits into from
Apr 24, 2024

Conversation

Satrat
Copy link

@Satrat Satrat commented Apr 2, 2024

  • Simplified the SparseML Trainer to be a barebones class definition, everything is handled by SessionManagerMixIn. Removed a bunch of old code from loading recipes, as this is handled by SparseAutoModelForCausalLM
  • Added new SFTTrainer class which adds our mix-in to trl's SFTTrainer. The only added code here is to add support for passing in a tokenized dataset to SFTTrainer
  • Added examples of using SFTTrainer for sparse finetuning, both with out dataset preprocessing and TRL's dataset preprocessing

Asana ticket: https://app.asana.com/0/1201735099598270/1206486351032763/f

Testing

See examples in integrations/huggingface-transformers/tutorials/text-generation/trl_mixin

@robertgshaw2-neuralmagic
Copy link
Contributor

Thanks Sara - this looks really nice

Are there any other features we should flex? I am thinking we might want to look at:

  • FSDP
  • Distillation

@Satrat
Copy link
Author

Satrat commented Apr 3, 2024

Thanks Sara - this looks really nice

Are there any other features we should flex? I am thinking we might want to look at:

  • FSDP
  • Distillation

Sure I'll test both of these scenarios, but if it ends up being more than tweaking to get FSDP working I'm going to leave that for another ticket :)

Edit: both worked with some minor tweaks!

Copy link
Contributor

@bfineran bfineran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great overall - should probably move the examples out of src and add a brief readme to go along with it. Debatable whether or not we'd want SFTTrainer out of src as well

src/sparseml/transformers/finetune/sft_trainer.py Outdated Show resolved Hide resolved
Comment on lines 60 to 65
training_args = TrainingArguments(
output_dir=output_dir,
num_train_epochs=0.6,
logging_steps=50,
gradient_checkpointing=True,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it important at all that the TrainingArguments comes from SparseML?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things won't work if the native transformers TrainingArguments is used: no support for recipe overrides, no compressed save, no multistage training runs. The mix-in uses these params, so if we wanted to support the non-sparseml TrainingArguments we would have to check each time we reference them. I don't think its worth the extra lines personally

src/sparseml/transformers/finetune/sft_trainer.py Outdated Show resolved Hide resolved
@Satrat Satrat requested review from mgoin and bfineran April 9, 2024 19:18
@bfineran bfineran merged commit 3ddc2d4 into main Apr 24, 2024
16 of 17 checks passed
@bfineran bfineran deleted the sa/sft_trainer_mixin branch April 24, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants