TRL SFTTrainer Examples #2211

Satrat · 2024-04-02T21:26:30Z

Simplified the SparseML Trainer to be a barebones class definition, everything is handled by SessionManagerMixIn. Removed a bunch of old code from loading recipes, as this is handled by SparseAutoModelForCausalLM
Added new SFTTrainer class which adds our mix-in to trl's SFTTrainer. The only added code here is to add support for passing in a tokenized dataset to SFTTrainer
Added examples of using SFTTrainer for sparse finetuning, both with out dataset preprocessing and TRL's dataset preprocessing

Asana ticket: https://app.asana.com/0/1201735099598270/1206486351032763/f

Testing

See examples in integrations/huggingface-transformers/tutorials/text-generation/trl_mixin

robertgshaw2-neuralmagic · 2024-04-02T22:52:38Z

Thanks Sara - this looks really nice

Are there any other features we should flex? I am thinking we might want to look at:

FSDP
Distillation

Satrat · 2024-04-03T12:44:02Z

Thanks Sara - this looks really nice

Are there any other features we should flex? I am thinking we might want to look at:

FSDP

Distillation

Sure I'll test both of these scenarios, but if it ends up being more than tweaking to get FSDP working I'm going to leave that for another ticket :)

Edit: both worked with some minor tweaks!

bfineran

looks great overall - should probably move the examples out of src and add a brief readme to go along with it. Debatable whether or not we'd want SFTTrainer out of src as well

src/sparseml/transformers/finetune/session_mixin.py

src/sparseml/transformers/finetune/sft_trainer.py

mgoin · 2024-04-05T22:11:49Z

src/sparseml/transformers/finetune/examples/test_trl_sft_data.py

+training_args = TrainingArguments(
+    output_dir=output_dir,
+    num_train_epochs=0.6,
+    logging_steps=50,
+    gradient_checkpointing=True,
+)


is it important at all that the TrainingArguments comes from SparseML?

A few things won't work if the native transformers TrainingArguments is used: no support for recipe overrides, no compressed save, no multistage training runs. The mix-in uses these params, so if we wanted to support the non-sparseml TrainingArguments we would have to check each time we reference them. I don't think its worth the extra lines personally

src/sparseml/transformers/finetune/examples/test_trl_sft_data.py

src/sparseml/transformers/finetune/sft_trainer.py

Sara Adkins added 11 commits March 5, 2024 21:52

WIP sft mixin

f013c2c

its running at least

9986f34

clean up

51dd109

Merge branch 'main' into sa/sft_trainer_mixin

72e3685

revert debugging changes

086c2ff

example script

64096c7

POC SFT sparse trainer

e243b46

use sft data functionality

c84cd60

update unit tests

4f938cb

move examples folder

3821606

clarity comments

4f619bd

Satrat requested review from horheynm, rahul-tuli, mgoin, bfineran, robertgshaw2-neuralmagic and dsikka April 2, 2024 21:26

barest bones trainer

e425523

Sara Adkins added 3 commits April 3, 2024 12:46

style

3c9a2e3

tweaks to work with distillation and FSDP

2ed1bb2

Merge branch 'main' into sa/sft_trainer_mixin

118c7c1

bfineran reviewed Apr 8, 2024

View reviewed changes

src/sparseml/transformers/finetune/session_mixin.py Show resolved Hide resolved

src/sparseml/transformers/finetune/sft_trainer.py Outdated Show resolved Hide resolved

mgoin reviewed Apr 8, 2024

View reviewed changes

Sara Adkins added 2 commits April 9, 2024 19:02

tests and readme

1f61081

naming

44cb166

Satrat requested review from mgoin and bfineran April 9, 2024 19:18

Merge branch 'main' into sa/sft_trainer_mixin

db5060c

Sara Adkins added 6 commits April 10, 2024 11:17

Merge branch 'main' into sa/sft_trainer_mixin

e8021e6

Merge branch 'main' into sa/sft_trainer_mixin

7648a7f

Merge branch 'main' into sa/sft_trainer_mixin

02aec71

move examples

0051eaf

quality

7630dd7

Merge branch 'main' into sa/sft_trainer_mixin

7b289d2

bfineran approved these changes Apr 24, 2024

View reviewed changes

bfineran merged commit 3ddc2d4 into main Apr 24, 2024
16 of 17 checks passed

bfineran deleted the sa/sft_trainer_mixin branch April 24, 2024 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRL SFTTrainer Examples #2211

TRL SFTTrainer Examples #2211

Satrat commented Apr 2, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Apr 2, 2024

Satrat commented Apr 3, 2024 •

edited

Loading

bfineran left a comment

mgoin Apr 5, 2024

Satrat Apr 9, 2024

TRL SFTTrainer Examples #2211

TRL SFTTrainer Examples #2211

Conversation

Satrat commented Apr 2, 2024 • edited Loading

Testing

robertgshaw2-neuralmagic commented Apr 2, 2024

Satrat commented Apr 3, 2024 • edited Loading

bfineran left a comment

Choose a reason for hiding this comment

mgoin Apr 5, 2024

Choose a reason for hiding this comment

Satrat Apr 9, 2024

Choose a reason for hiding this comment

Satrat commented Apr 2, 2024 •

edited

Loading

Satrat commented Apr 3, 2024 •

edited

Loading