Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable export for CLIPImageTransform #1242

Closed
wants to merge 1 commit into from

Conversation

lucylq
Copy link
Contributor

@lucylq lucylq commented Jul 29, 2024

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

Please link to any issues this PR addresses.

Changelog

What are the changes made in this PR?

Refactor CLIPImageTransform so that it can be exported, and shared with other repos (eg. ExecuTorch). This includes:

torchtune/models/clip/_transforms.py

  • Create CLIPImageTransformCore, an nn.Module containing resize, pad, reshape logic. ExecuTorch will export this nn.Module.
  • Replace F.pad with Pad, which is an nn.Module. This allows ExecuTorch to swap it for a custom op.
  • Move logic in resize_with_pad.py to the top level in CLIPImageTransform, pass the information to the nn.Module.
  • Add arg for antialias to make it configurable on ET side.

torchtune/modules/transforms/vision_utils/tile_crop.py

  • Move this into an nn.Module. This allows ExecuTorch to swap it for a custom op.

Update tests

  • tests/torchtune/models/clip/test_clip_image_transform.py
  • tests/torchtune/modules/transforms/test_tile_crop.py

Update code references

  • torchtune/modules/transforms/init.py

  • torchtune/modules/vision_transformer.py

  • docs/source/api_ref_modules.rst

  • I think we can remove resize_with_pad.py with associated tests after this change.

Test plan

Run pytest in directories:

  • torchtune/tests/torchtune/models/clip/
  • torchtune/tests/torchtune/modules/transforms/

Please make sure to do each of the following if applicable to your PR. (If you're not sure about any one of these just ask and we will happily help.)

  • run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
  • add unit tests for any new functionality (used existing tests)
  • update docstrings for any new or updated methods or classes
  • run unit tests via pytest tests
  • run recipe tests via pytest tests -m integration_test
  • Error: seems unrelated to this PR (also see this error on main, without my changes)
    • E AttributeError: `np.string_` was removed in the NumPy 2.0 release. Use `np.bytes_` instead.
    • Full error log: P1503938971
  • [na] manually run any new or modified recipes with sufficient proof of correctness

Copy link

pytorch-bot bot commented Jul 29, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1242

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit bf1d8e7 with merge base 78055fa (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 29, 2024
@lucylq lucylq force-pushed the lfq.export-preprocess branch 2 times, most recently from 9ae9c34 to ba0b608 Compare July 29, 2024 22:47
@felipemello1 felipemello1 self-requested a review July 29, 2024 23:05
@lucylq lucylq force-pushed the lfq.export-preprocess branch 5 times, most recently from 7a42b55 to c4055f5 Compare July 29, 2024 23:19
@codecov-commenter
Copy link

codecov-commenter commented Jul 29, 2024

Codecov Report

Attention: Patch coverage is 98.36066% with 1 line in your changes missing coverage. Please review.

Project coverage is 72.00%. Comparing base (78055fa) to head (bf1d8e7).

Files Patch % Lines
torchtune/models/clip/_transforms.py 97.67% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1242       +/-   ##
===========================================
+ Coverage   27.41%   72.00%   +44.58%     
===========================================
  Files         233      231        -2     
  Lines       10603    10570       -33     
===========================================
+ Hits         2907     7611     +4704     
+ Misses       7696     2959     -4737     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

image,
size=[target_h, target_w],
interpolation=self.config.resample,
antialias=True,
Copy link
Contributor Author

@lucylq lucylq Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, to lower to ExecuTorch we have to set this to False. We currently don't have a kernel for upsample when antialias=True.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sent a message to someone in GenAI to see what they think

@lucylq lucylq force-pushed the lfq.export-preprocess branch 7 times, most recently from 4540ecb to c175809 Compare July 31, 2024 00:08
image,
size=[target_h, target_w],
interpolation=self.resample,
antialias=self.antialias,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add as an argument, so we can configure it on ExecuTorch side as well.

@lucylq lucylq force-pushed the lfq.export-preprocess branch from c175809 to 4da32e2 Compare July 31, 2024 00:13
Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly small comments, will leave it for @felipemello1 to confirm details of the actual transforms


The check mitigates data dependent errors that may occur during torch.export. It installs
a deferred runtime assert, instead of a compile-time guard. For more context:
https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this doc publicly available? If not I'd just leave it out or give a tldr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the doc is publicly available, though we can also give a tl;dr if that's preferred? Something like:

Data dependent errors usually occur in models with data-dependent control flow, eg. via .item(), tolist(), nonzero().

torchtune/models/clip/_transforms.py Outdated Show resolved Hide resolved
torchtune/modules/transforms/vision_utils/tile_crop.py Outdated Show resolved Hide resolved
torchtune/models/clip/_transforms.py Outdated Show resolved Hide resolved
torchtune/models/clip/_transforms.py Outdated Show resolved Hide resolved

from torchvision.transforms.v2 import functional as F

logger = logging.getLogger(__name__)


class CLIPImageTransformCore(torch.nn.Module):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm after our discussion today, the idea is to fully make CLIPImageTransform exportable, but we are not there yet, right? If so, do we intend to land this as an intermediate version first? The reason I ask is because with this PR some of our utilities like resize_with_pad will no longer have any callsites in the repo, and I want to know whether this is expected to be the final state or just a temporary one.

Copy link
Contributor Author

@lucylq lucylq Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - ideally we can fully export CLIPImageTransform but right now that would require more support on export side.
With the refactor, I think we can remove resize_with_pad and its tests.

torchtune/models/clip/_transforms.py Outdated Show resolved Hide resolved
@lucylq lucylq force-pushed the lfq.export-preprocess branch from 4da32e2 to 24511d0 Compare July 31, 2024 16:54
Copy link
Contributor

@felipemello1 felipemello1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind updating the unit tests (tilecrop, if there is any) and deleting the unused code (e.g. resize_with_pad)?

You would also need to update the function here: https://github.com/pytorch/torchtune/blob/main/docs/source/api_ref_modules.rst#vision-transforms (this is probably why the doc is failing to build)

and the init.py that imports the resize_with_pad

torchtune/models/clip/_transforms.py Outdated Show resolved Hide resolved
@@ -69,6 +158,7 @@ class CLIPImageTransform:
resolution from possible_resolutions.
If False, it will pick the resolution that minimizes downscaling, including no downscaling at all.
In this case, the image will only be upscaled if it's size < tile_size. Default False.
max_image_size (int): Used to upper-bound the input image size for export.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a regular user, i am not sure if they understand what "for export" means. Will users change this? If not, should it be a global constant in the script? Maybe if we could add a bit more context, it would solve it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the height/width of the preprocessed image is constrained by max_num_tiles * tile_size, so I can use that instead of an arbitrary value. Thanks for pointing this out!

torchtune/models/clip/_transforms.py Outdated Show resolved Hide resolved
torchtune/models/clip/_transforms.py Outdated Show resolved Hide resolved
torchtune/modules/transforms/vision_utils/tile_crop.py Outdated Show resolved Hide resolved
torchtune/models/clip/_transforms.py Outdated Show resolved Hide resolved
torchtune/models/clip/_transforms.py Outdated Show resolved Hide resolved
@lucylq lucylq force-pushed the lfq.export-preprocess branch 2 times, most recently from 2924f6d to 923b923 Compare July 31, 2024 19:02
# Pad, such that the image is on the top-left and padded on the right-bottom.
sizes = [3, canvas_h, canvas_w]
padding = [0, canvas_h - target_h, 0, canvas_w - target_w]
output = v2.Pad(padding=padding)(image)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use a functional here, or add pad to init, so it follows the same pattern as the other transforms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hmn, I'm not sure we can place it into init, because we initialize the nn.Module with potentially different padding for each call

The reason we're using an nn.Module instead of the functional version is so we can do a module swap with our custom op (have some issues exporting pad).

@lucylq lucylq force-pushed the lfq.export-preprocess branch from 923b923 to af04f05 Compare July 31, 2024 19:08
@lucylq lucylq force-pushed the lfq.export-preprocess branch from af04f05 to a4d93d8 Compare July 31, 2024 20:07
@felipemello1
Copy link
Contributor

Though if antialias=True then we may need resize anyway.

I asked a researcher, but no answers yet. @pbontrager think it probably is. To unblock testing, maybe its ok to go like this for now, and come back to "antialias=True" when we have an answer? I just dont know if this is productive.

@felipemello1
Copy link
Contributor

btw, we would have to delete this unit test too: https://github.com/pytorch/torchtune/blob/main/tests/torchtune/modules/transforms/test_resize_with_pad.py

@lucylq
Copy link
Contributor Author

lucylq commented Jul 31, 2024

btw, we would have to delete this unit test too: https://github.com/pytorch/torchtune/blob/main/tests/torchtune/modules/transforms/test_resize_with_pad.py

oh - I think I removed it, though the docs build is still failing. It might be due to TileCrop? Repro'd the error locally though still can't pinpoint where the error is coming from.

@felipemello1
Copy link
Contributor

still can't pinpoint where the error is coming from.

Sphinx error messages are the worst! You could try to do this to debug:

  1. erase anything related to vision from the .rst and confirm that this is the problem.
  2. re-add one by one to see which is the problematic

To build the docs, you can follow the steps here: https://github.com/pytorch/torchtune/blob/main/CONTRIBUTING.md#building-docs

Thank you for doing all this!

@felipemello1
Copy link
Contributor

FYI: we realized that the unit tests are missing regression tests. I will add them tomorrow so we are 100% show about parity.

@lucylq lucylq force-pushed the lfq.export-preprocess branch from a4d93d8 to b144194 Compare July 31, 2024 22:37
@lucylq
Copy link
Contributor Author

lucylq commented Jul 31, 2024

  1. erase anything related to vision from the .rst and confirm that this is the problem.
  2. re-add one by one to see which is the problematic

Thanks, that helped - TileCrop was missing a docstring

def __init__(self):
super().__init__()

def forward(self, image: torch.Tensor, tile_size: int) -> torch.Tensor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a module now instead of function. I think we can follow torchvision style Module transforms and move tile_size into init

@lucylq lucylq force-pushed the lfq.export-preprocess branch 3 times, most recently from 70db4f1 to 5e2a3e3 Compare August 3, 2024 00:51
@lucylq lucylq force-pushed the lfq.export-preprocess branch from 5e2a3e3 to bf1d8e7 Compare August 3, 2024 00:51
@lucylq
Copy link
Contributor Author

lucylq commented Aug 3, 2024

Rebased and ran tests from #1253

lucylq added a commit to lucylq/pytorch that referenced this pull request Aug 5, 2024
Summary:
Pull Request resolved: pytorch#132679

Remove the early exit for padding when padding = [0, 0, 0, 0].

This prevents export from specializing when all padding=0, allowing export when all padding >= 0. Specialization will still happen for negative padding.

This change will be used to export image preprocess for multimodal models, where images of dynamic shape are padded. As images are of dynamic shape, we can't be sure if padding will be required or not. Padding is guaranteed to be non-negative.

Preprocess code: pytorch/torchtune#1242

Note: the alternative is to wrap padding in a custom op, which isn't ideal given the custom op will contain the same impl as constant_pad_nd.

Test Plan: ci

Differential Revision: D60687727
lucylq added a commit to lucylq/executorch-1 that referenced this pull request Aug 5, 2024
Summary:
For preprocess, we have two custom ops, pad and reshape.
These are used to hide the implementation and any data dependent issues from export, allowing us to export the model.

The model definition is in D60051533, though we're trying to upstream it to torchtune: pytorch/torchtune#1242

This diff adds:
- Python impl + tests
- C++ function signatures + buck setup

Differential Revision: D60491675
@lucylq
Copy link
Contributor Author

lucylq commented Aug 6, 2024

working on #1269 which refactors exported version into a separate file, after discussion with torchtune team.

@lucylq lucylq closed this Aug 6, 2024
lucylq added a commit to lucylq/executorch-1 that referenced this pull request Aug 6, 2024
Summary:
Pull Request resolved: pytorch#4548

For preprocess, we have two custom ops, pad and reshape.
These are used to hide the implementation and any data dependent issues from export, allowing us to export the model.

The model definition is in D60051533, though we're trying to upstream it to torchtune: pytorch/torchtune#1242

This diff adds:
- Python impl + tests
- C++ function signatures + buck setup

Differential Revision: D60491675
lucylq added a commit to lucylq/executorch-1 that referenced this pull request Aug 6, 2024
Summary:
Pull Request resolved: pytorch#4548

For preprocess, we have two custom ops, pad and reshape.
These are used to hide the implementation and any data dependent issues from export, allowing us to export the model.

The model definition is in D60051533, though we're trying to upstream it to torchtune: pytorch/torchtune#1242

This diff adds:
- Python impl + tests
- C++ function signatures + buck setup

Differential Revision: D60491675
lucylq added a commit to lucylq/executorch-1 that referenced this pull request Aug 7, 2024
Summary:
Pull Request resolved: pytorch#4548

For preprocess, we have two custom ops, pad and reshape.
These are used to hide the implementation and any data dependent issues from export, allowing us to export the model.

The model definition is in D60051533, though we're trying to upstream it to torchtune: pytorch/torchtune#1242

This diff adds:
- Python impl + tests
- C++ function signatures + buck setup for reshape.

Differential Revision: D60491675
lucylq added a commit to lucylq/executorch-1 that referenced this pull request Aug 7, 2024
Summary:
Pull Request resolved: pytorch#4548

For preprocess, we have two custom ops, pad and reshape.
These are used to hide the implementation and any data dependent issues from export, allowing us to export the model.

The model definition is in D60051533, though we're trying to upstream it to torchtune: pytorch/torchtune#1242

This diff adds:
- Python impl + tests
- C++ function signatures + buck setup for reshape.

Differential Revision: D60491675
lucylq added a commit to lucylq/pytorch that referenced this pull request Aug 14, 2024
Summary:
Pull Request resolved: pytorch#132679

Remove the early exit for padding when padding = [0, 0, 0, 0].

This prevents export from specializing when all padding=0, allowing export when all padding >= 0. Specialization will still happen for negative padding.

This change will be used to export image preprocess for multimodal models, where images of dynamic shape are padded. As images are of dynamic shape, we can't be sure if padding will be required or not. Padding is guaranteed to be non-negative.

Preprocess code: pytorch/torchtune#1242

Note: the alternative is to wrap padding in a custom op, which isn't ideal given the custom op will contain the same impl as constant_pad_nd.

Test Plan: ci

Differential Revision: D60687727
lucylq added a commit to lucylq/pytorch that referenced this pull request Aug 14, 2024
Summary:
Pull Request resolved: pytorch#132679

Remove the early exit for padding when padding = [0, 0, 0, 0].

This prevents export from specializing when all padding=0, allowing export when all padding >= 0. Specialization will still happen for negative padding.

This change will be used to export image preprocess for multimodal models, where images of dynamic shape are padded. As images are of dynamic shape, we can't be sure if padding will be required or not. Padding is guaranteed to be non-negative.

Preprocess code: pytorch/torchtune#1242

Note: the alternative is to wrap padding in a custom op, which isn't ideal given the custom op will contain the same impl as constant_pad_nd.

Test Plan: ci

Differential Revision: D60687727
lucylq added a commit to lucylq/pytorch that referenced this pull request Aug 14, 2024
Summary:
Pull Request resolved: pytorch#132679

Remove the early exit for padding when padding = [0, 0, 0, 0].

This prevents export from specializing when all padding=0, allowing export when all padding >= 0. Specialization will still happen for negative padding.

This change will be used to export image preprocess for multimodal models, where images of dynamic shape are padded. As images are of dynamic shape, we can't be sure if padding will be required or not. Padding is guaranteed to be non-negative.

Preprocess code: pytorch/torchtune#1242

Note: the alternative is to wrap padding in a custom op, which isn't ideal given the custom op will contain the same impl as constant_pad_nd.

Test Plan: ci

Differential Revision: D60687727
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Aug 20, 2024
Summary:
Remove the early exit for padding when padding = [0, 0, 0, 0].

This prevents export from specializing when all padding=0, allowing export when all padding >= 0. Specialization will still happen for negative padding.

This change will be used to export image preprocess for multimodal models, where images of dynamic shape are padded. As images are of dynamic shape, we can't be sure if padding will be required or not. Padding is guaranteed to be non-negative.

Preprocess code: pytorch/torchtune#1242

Note: the alternative is to wrap padding in a custom op, which isn't ideal given the custom op will contain the same impl as constant_pad_nd.

Test Plan: ci

Differential Revision: D60687727

Pull Request resolved: #132679
Approved by: https://github.com/ezyang
pytorchmergebot pushed a commit to mori360/pytorch that referenced this pull request Aug 20, 2024
Summary:
Remove the early exit for padding when padding = [0, 0, 0, 0].

This prevents export from specializing when all padding=0, allowing export when all padding >= 0. Specialization will still happen for negative padding.

This change will be used to export image preprocess for multimodal models, where images of dynamic shape are padded. As images are of dynamic shape, we can't be sure if padding will be required or not. Padding is guaranteed to be non-negative.

Preprocess code: pytorch/torchtune#1242

Note: the alternative is to wrap padding in a custom op, which isn't ideal given the custom op will contain the same impl as constant_pad_nd.

Test Plan: ci

Differential Revision: D60687727

Pull Request resolved: pytorch#132679
Approved by: https://github.com/ezyang
pytorch-bot bot pushed a commit to pytorch/pytorch that referenced this pull request Sep 13, 2024
Summary:
Remove the early exit for padding when padding = [0, 0, 0, 0].

This prevents export from specializing when all padding=0, allowing export when all padding >= 0. Specialization will still happen for negative padding.

This change will be used to export image preprocess for multimodal models, where images of dynamic shape are padded. As images are of dynamic shape, we can't be sure if padding will be required or not. Padding is guaranteed to be non-negative.

Preprocess code: pytorch/torchtune#1242

Note: the alternative is to wrap padding in a custom op, which isn't ideal given the custom op will contain the same impl as constant_pad_nd.

Test Plan: ci

Differential Revision: D60687727

Pull Request resolved: #132679
Approved by: https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants