Generalizing reward models #1160

SalmanMohammadi · 2024-07-11T11:09:44Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

A hopefully quick add in parallel to my PPO PR to generalize reward models to also support e.g. llama2 reward models.
Model builders for the previous so-called classifier models are now correctly named as reward models, since they only classify a single label. I've also moved checkpoint conversion logic to modules/rlhf. This checkpointing logic can generally be useful for any classifier model, but I've tried to keep things simple and contained here without leaking the scope of this functionality to the rest of the API.

Previously, if someone wanted to use classifiers for their own recipes, they'd use the underlying component builder and set the model type to MISTRAL_REWARD. This hasn't really changed - they just now use REWARD, and the model builders are a bit clearer to indicate their default use is for reward modelling.

Test plan

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

pytorch-bot · 2024-07-11T11:09:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1160

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 12488b7 with merge base 7eb89e2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…cts, updating test

felipemello1

Great PR! Thanks for writing it.

I have some questions, a few of them are simple, but others may be more towards the core maintainers. My main points are:

What should be named "classifier" and "reward". IMO, reward should be broad, allowing multiple scalars, for example.
When constructing these builders, should we recreate the whole model, or leverage the base model and add the reward head?
Are we comfortable with having 9 builders per model size? Maybe the answer is yes, and thats ok

felipemello1 · 2024-07-16T20:49:58Z

torchtune/models/llama2/__init__.py

 )
 from ._model_utils import scale_hidden_dim_for_mlp
 from ._tokenizer import Llama2Tokenizer

 __all__ = [
    "Llama2Tokenizer",
    "llama2",
+    "llama2_classifier_7b",


do you mind double checking this list? For example, "llama2_classifier_7b" should be "llama2_classifier", i believe

felipemello1 · 2024-07-16T21:30:57Z

torchtune/models/llama2/_component_builders.py

+# ------------------ Llama2 Classifier ------------------
+
+
+def llama2_classifier(


I dont want to overcomplicate things, but is "classifier" the right word?

My knowledge of RL is limited, but can the output of the LLM just be a scalar, like how "polite", "trustworthy", "helpful", "friendly" the model is?

In this case, would classifier still be the right naming? I have seen it in other places too, so I guess this may be the standard.

felipemello1 · 2024-07-16T21:32:58Z

torchtune/models/llama2/__init__.py

    qlora_llama2_13b,
    qlora_llama2_70b,
    qlora_llama2_7b,
+    qlora_llama2_reward_7b,
 )
 from ._model_utils import scale_hidden_dim_for_mlp
 from ._tokenizer import Llama2Tokenizer

 __all__ = [


Seeing this huge list scares me a little bit, because every model size would have 6 variants: normal, classifier, lora, qlora, classifier_lora, classifier_qlora. If we add DoRA, this would have 3 more builders. I think you followed the pattern correctly. Maybe a question for @kartikayk and @ebsmothers. I guess we dont like hooks in torchtune, but something like replace_with_reward_head(model=llama3) seems convenient, instead of rebuilding llama3 completely.

felipemello1 · 2024-07-16T21:53:28Z

torchtune/models/llama2/_model_builders.py

+    """
+    Builder for creating a Llama2 model initialized w/ the default 7B parameter values
+    from https://arxiv.org/abs/2307.09288, where the output layer is a classification layer
+    projecting to a single class for reward modelling.


following up on my previous question about the naming "classifier", here we use reward to denote this specific model instantiation. I think that they could be inverted.

the llama2_classifier_7b is a generic builder, that can be use for any type of reward modeling with num_classes = n (num_classes could also be renamed for generality to something denoting the output size).

However, the llama2_reward_7b is indeed a classifier, hardcoded with num_classes=1.

What do you think?

There's actually a typo here, sorry, llama2_classifier_7b shouldn't exist. I think we're on the same page. The general pattern here is: _classifier exists in _component_builders as a general interfacew to a classifier model. The only use for these models currently in the codebase is for reward modelling for PPO, where reward models output a scalar (I do agree this constraint doesn't apply to all reward models).

For simplicity, I could drop the _reward in model builders, and expose a num_classes in the model builder which is defaulted to 1. So you'd have:

model_family/model_builders.py ... import model_family_classifier def model_family_7b_classifier(num_classes: int = 1): return model_family_classifier(num_classes=num_classes, ...)

I think this would clean up the API, and offer flexibility to users who wish to define recipes for classification models for which there is some interest from the community e.g. #1124.

How does this sound?

I think the way you have it here where classifier is a generic component builder and reward is just num_classes = 1 in the model builders is great. We also don't want to add too many builder layers that add distance between what the user calls and the actual model architecture definition

Thanks @RdoubleA. Let's roll with things as-is for now.

felipemello1 · 2024-07-16T21:57:37Z

torchtune/models/llama2/_model_builders.py

+    )
+
+
+qlora_llama2_reward_7b = partial(lora_llama2_7b, quantize_base=True)


I wonder if for simplification we should do something like this for the reward/classification versions, and have a flag: "use_reward_head=True, reward_head_size=n".

It would add some if/else to the builder, so I am not sure if others would approve it.

felipemello1 · 2024-07-16T22:00:46Z

torchtune/modules/rlhf/utils/_convert_weights.py

@@ -35,7 +35,7 @@ def mistral_reward_hf_to_tune(
 ) -> Dict[str, torch.Tensor]:
    """
    Convert a state dict from HF's format to torchtune's format, which contains the weights
-    of a Mistral reward model.
+    of a reward model (i.e. a classifier with a single class).


I guess this confirms that classifier is a case of a reward, and maybe we should swap the current naming?

SalmanMohammadi · 2024-07-17T17:53:59Z

Thanks so much for the review @felipemello1. Apologies for not providing a bit more context about the PR to help make things clearer. I discussed this briefly with @pbontrager on Discord in the "Discussing PPO" thread (which has been my defacto channel for pinging devs for contributing questions - maybe I should move to #dev..), it'd be great you have your input on some of the points we raised.

As you've correctly gathered, the main use for these classifier models is to support my work on RLHF to allow the use of pre-trained reward models during PPO. My primary aim for this PR was to reduce the review overhead on #1005, and I'd like my changes to be minimal and non-invasive (in that they can easily be refactored), and rely on future refactors (such as #1017 (comment) ) to address some of the potential long-term effects on the codebase.

Are we comfortable with having 9 builders per model size? Maybe the answer is yes, and thats ok

There's been a few suggestions around this (see some comments in the original PR #837) - generally, there's some agreement that we can probably define these classifiers in a more minimal way like you suggested. However, we're touching on support for general classifiers vs support for classifiers as they're needed now in the codebase here. For now, I'd argue that we should revisit this further down the line when the need for generalisation is apparent and/or in a TransformerDecoder refactor.

RdoubleA · 2024-07-17T17:55:20Z

torchtune/models/llama2/_model_builders.py

+    """
+    Builder for creating a Llama2 model initialized w/ the default 7B parameter values
+    from https://arxiv.org/abs/2307.09288, where the output layer is a classification layer
+    projecting to a single class for reward modelling.


I think the way you have it here where classifier is a generic component builder and reward is just num_classes = 1 in the model builders is great. We also don't want to add too many builder layers that add distance between what the user calls and the actual model architecture definition

RdoubleA · 2024-07-17T17:59:53Z

torchtune/modules/rlhf/utils/_convert_weights.py

@@ -76,17 +81,17 @@ def _permute(t, n_heads):
    return converted_state_dict


-def mistral_reward_tune_to_hf(
+def reward_tune_to_hf(


I wonder if there's a better location for these - our other converters are in models/convert_weights. PEFT adapter conversions were added there. I think it makes sense to also throw these in there, or do you think that will make the file bloated?

Yeah I was thinking similar. I think it does make the file a bit bloated and would reflect the sentiment from my above comment; we could revisit this and put weight conversion somewhere more general when we need it.

codecov-commenter · 2024-07-21T10:52:43Z

Codecov Report

Attention: Patch coverage is 59.32203% with 24 lines in your changes missing coverage. Please review.

Project coverage is 67.67%. Comparing base (7eb89e2) to head (12488b7).

Files	Patch %	Lines
torchtune/models/llama2/_component_builders.py	18.51%	22 Missing ⚠️
torchtune/models/llama2/_model_builders.py	71.42%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1160      +/-   ##
==========================================
- Coverage   67.81%   67.67%   -0.14%     
==========================================
  Files         219      220       +1     
  Lines        9908     9941      +33     
==========================================
+ Hits         6719     6728       +9     
- Misses       3189     3213      +24

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

felipemello1 · 2024-07-21T21:34:31Z

For now, I'd argue that we should revisit this further down the line

sounds good to me! Specially since you are already aligned with Philip and Rafi. Thanks for your reply!

RdoubleA

Overall looks good to me. Do we have tests for the reward convert weights utilities?

SalmanMohammadi · 2024-07-23T22:29:14Z

Overall looks good to me. Do we have tests for the reward convert weights utilities?

There's a test here

torchtune/tests/torchtune/utils/test_checkpointer.py

Line 443 in 570dc8f

class TestHFMistralRewardModelFullModelCheckpointer:

generalizing reward models

bedce7e

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 11, 2024

SalmanMohammadi added 3 commits July 11, 2024 12:10

updating docs

415d32f

updating weight conversion logic to remove output bias in HF state di…

575b0de

…cts, updating test

small bug in hf_to_tune

e0223aa

SalmanMohammadi mentioned this pull request Jul 16, 2024

RLHF with PPO #1005

Merged

19 tasks

felipemello1 reviewed Jul 16, 2024

View reviewed changes

RdoubleA reviewed Jul 17, 2024

View reviewed changes

SalmanMohammadi added 3 commits July 18, 2024 09:39

fixing typos in model builders

e8e9745

Merge branch 'main' into generalising_reward_models

0ab7748

fixing merge conflicts

570dc8f

SalmanMohammadi requested a review from RdoubleA July 18, 2024 08:45

SalmanMohammadi closed this Jul 20, 2024

SalmanMohammadi deleted the generalising_reward_models branch July 20, 2024 22:03

SalmanMohammadi restored the generalising_reward_models branch July 20, 2024 22:03

SalmanMohammadi reopened this Jul 21, 2024

RdoubleA reviewed Jul 23, 2024

View reviewed changes

RdoubleA approved these changes Jul 23, 2024

View reviewed changes

fixing merge conflicts

12488b7

SalmanMohammadi merged commit 5a92135 into pytorch:main Jul 23, 2024
29 checks passed

SalmanMohammadi deleted the generalising_reward_models branch July 26, 2024 10:32

SalmanMohammadi mentioned this pull request Aug 11, 2024

Removing vestigial mistral reward model converter references #1307

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalizing reward models #1160

Generalizing reward models #1160

SalmanMohammadi commented Jul 11, 2024 •

edited

Loading

pytorch-bot bot commented Jul 11, 2024 •

edited

Loading

felipemello1 left a comment •

edited

Loading

felipemello1 Jul 16, 2024

felipemello1 Jul 16, 2024

felipemello1 Jul 16, 2024

felipemello1 Jul 16, 2024

SalmanMohammadi Jul 17, 2024

RdoubleA Jul 17, 2024

SalmanMohammadi Jul 18, 2024

felipemello1 Jul 16, 2024

felipemello1 Jul 16, 2024

SalmanMohammadi commented Jul 17, 2024

RdoubleA Jul 17, 2024

RdoubleA Jul 17, 2024

SalmanMohammadi Jul 18, 2024 •

edited

Loading

codecov-commenter commented Jul 21, 2024 •

edited

Loading

felipemello1 commented Jul 21, 2024

RdoubleA left a comment

SalmanMohammadi commented Jul 23, 2024

		# ------------------ Llama2 Classifier ------------------


		def llama2_classifier(

		)


		qlora_llama2_reward_7b = partial(lora_llama2_7b, quantize_base=True)

Generalizing reward models #1160

Generalizing reward models #1160

Conversation

SalmanMohammadi commented Jul 11, 2024 • edited Loading

Context

Test plan

pytorch-bot bot commented Jul 11, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1160

✅ No Failures

felipemello1 left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SalmanMohammadi commented Jul 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SalmanMohammadi Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

codecov-commenter commented Jul 21, 2024 • edited Loading

Codecov Report

felipemello1 commented Jul 21, 2024

RdoubleA left a comment

Choose a reason for hiding this comment

SalmanMohammadi commented Jul 23, 2024

SalmanMohammadi commented Jul 11, 2024 •

edited

Loading

pytorch-bot bot commented Jul 11, 2024 •

edited

Loading

felipemello1 left a comment •

edited

Loading

SalmanMohammadi Jul 18, 2024 •

edited

Loading

codecov-commenter commented Jul 21, 2024 •

edited

Loading