Optimize DPO recipe - precomputing reference model log probabilites #25

yash12khandelwal · 2024-12-30T10:37:42Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Please link to any issues this PR addresses.

Changelog

What are the changes made in this PR?
The primary purpose of this PR is to add support for precomputing reference log probabilities when using DPO. This would make the overall training faster by removing the redundant computation across epochs.

CustomPreferenceDataset - This file is a modification of the Preference Dataset that allows the storage of the reference log probabilities along with the data. Every get-item call would return a dictionary of input_ids, labels and the reference model chosen and rejected log probabilities.
padded_collate_dpo - Modified this function to return the precomputed log probabilities too along with the inputs and labels.
lora_dpo_distributed - Added the support to precompute the reference log probabilities during data setup. For computing losses, the batch item can return the precomputed values saving compute. Implementation is inspired from the Hugging Face trl repository DPO implementation.

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

I did not change any public API
I have added an example to docs or docstrings

… ref probs

aashay-sarvam · 2025-01-02T08:19:53Z

recipes/configs/llama3_1/8B_lora_dpo_precompute.yaml

+
+# Dataset and Sampler
+dataset:
+  _component_: torchtune.datasets.stack_exchange_paired_dataset


Component needs change

yash12khandelwal added 3 commits December 30, 2024 15:15

Collate precomputed reference log probs

c5ddf16

Add wrapper class for PreferenceDataset to support saving precomputed…

1238fe4

… ref probs

Modified dpo recipe to support precomputing ref probs

47884b4

aashay-sarvam reviewed Jan 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize DPO recipe - precomputing reference model log probabilites #25

Optimize DPO recipe - precomputing reference model log probabilites #25

yash12khandelwal commented Dec 30, 2024 •

edited

Loading

aashay-sarvam Jan 2, 2025

Optimize DPO recipe - precomputing reference model log probabilites #25

Are you sure you want to change the base?

Optimize DPO recipe - precomputing reference model log probabilites #25

Conversation

yash12khandelwal commented Dec 30, 2024 • edited Loading

Context

Changelog

Test plan

UX

aashay-sarvam Jan 2, 2025

Choose a reason for hiding this comment

yash12khandelwal commented Dec 30, 2024 •

edited

Loading