-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable export for CLIPImageTransform #1242
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1242
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit bf1d8e7 with merge base 78055fa (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
9ae9c34
to
ba0b608
Compare
7a42b55
to
c4055f5
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1242 +/- ##
===========================================
+ Coverage 27.41% 72.00% +44.58%
===========================================
Files 233 231 -2
Lines 10603 10570 -33
===========================================
+ Hits 2907 7611 +4704
+ Misses 7696 2959 -4737 ☔ View full report in Codecov by Sentry. |
torchtune/models/clip/_transforms.py
Outdated
image, | ||
size=[target_h, target_w], | ||
interpolation=self.config.resample, | ||
antialias=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, to lower to ExecuTorch we have to set this to False. We currently don't have a kernel for upsample when antialias=True.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sent a message to someone in GenAI to see what they think
4540ecb
to
c175809
Compare
image, | ||
size=[target_h, target_w], | ||
interpolation=self.resample, | ||
antialias=self.antialias, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add as an argument, so we can configure it on ExecuTorch side as well.
c175809
to
4da32e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly small comments, will leave it for @felipemello1 to confirm details of the actual transforms
torchtune/models/clip/_transforms.py
Outdated
|
||
The check mitigates data dependent errors that may occur during torch.export. It installs | ||
a deferred runtime assert, instead of a compile-time guard. For more context: | ||
https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this doc publicly available? If not I'd just leave it out or give a tldr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the doc is publicly available, though we can also give a tl;dr if that's preferred? Something like:
Data dependent errors usually occur in models with data-dependent control flow, eg. via .item(), tolist(), nonzero().
torchtune/models/clip/_transforms.py
Outdated
|
||
from torchvision.transforms.v2 import functional as F | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
class CLIPImageTransformCore(torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to confirm after our discussion today, the idea is to fully make CLIPImageTransform
exportable, but we are not there yet, right? If so, do we intend to land this as an intermediate version first? The reason I ask is because with this PR some of our utilities like resize_with_pad
will no longer have any callsites in the repo, and I want to know whether this is expected to be the final state or just a temporary one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - ideally we can fully export CLIPImageTransform
but right now that would require more support on export side.
With the refactor, I think we can remove resize_with_pad
and its tests.
4da32e2
to
24511d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind updating the unit tests (tilecrop, if there is any) and deleting the unused code (e.g. resize_with_pad)?
You would also need to update the function here: https://github.com/pytorch/torchtune/blob/main/docs/source/api_ref_modules.rst#vision-transforms (this is probably why the doc is failing to build)
and the init.py that imports the resize_with_pad
torchtune/models/clip/_transforms.py
Outdated
@@ -69,6 +158,7 @@ class CLIPImageTransform: | |||
resolution from possible_resolutions. | |||
If False, it will pick the resolution that minimizes downscaling, including no downscaling at all. | |||
In this case, the image will only be upscaled if it's size < tile_size. Default False. | |||
max_image_size (int): Used to upper-bound the input image size for export. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a regular user, i am not sure if they understand what "for export" means. Will users change this? If not, should it be a global constant in the script? Maybe if we could add a bit more context, it would solve it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the height/width of the preprocessed image is constrained by max_num_tiles * tile_size, so I can use that instead of an arbitrary value. Thanks for pointing this out!
2924f6d
to
923b923
Compare
# Pad, such that the image is on the top-left and padded on the right-bottom. | ||
sizes = [3, canvas_h, canvas_w] | ||
padding = [0, canvas_h - target_h, 0, canvas_w - target_w] | ||
output = v2.Pad(padding=padding)(image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use a functional here, or add pad to init, so it follows the same pattern as the other transforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hmn, I'm not sure we can place it into init, because we initialize the nn.Module with potentially different padding for each call
The reason we're using an nn.Module instead of the functional version is so we can do a module swap with our custom op (have some issues exporting pad).
923b923
to
af04f05
Compare
af04f05
to
a4d93d8
Compare
I asked a researcher, but no answers yet. @pbontrager think it probably is. To unblock testing, maybe its ok to go like this for now, and come back to "antialias=True" when we have an answer? I just dont know if this is productive. |
btw, we would have to delete this unit test too: https://github.com/pytorch/torchtune/blob/main/tests/torchtune/modules/transforms/test_resize_with_pad.py |
oh - I think I removed it, though the docs build is still failing. It might be due to TileCrop? Repro'd the error locally though still can't pinpoint where the error is coming from. |
Sphinx error messages are the worst! You could try to do this to debug:
To build the docs, you can follow the steps here: https://github.com/pytorch/torchtune/blob/main/CONTRIBUTING.md#building-docs Thank you for doing all this! |
FYI: we realized that the unit tests are missing regression tests. I will add them tomorrow so we are 100% show about parity. |
a4d93d8
to
b144194
Compare
Thanks, that helped - TileCrop was missing a docstring |
def __init__(self): | ||
super().__init__() | ||
|
||
def forward(self, image: torch.Tensor, tile_size: int) -> torch.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a module now instead of function. I think we can follow torchvision style Module transforms and move tile_size into init
70db4f1
to
5e2a3e3
Compare
5e2a3e3
to
bf1d8e7
Compare
Rebased and ran tests from #1253 |
Summary: Pull Request resolved: pytorch#132679 Remove the early exit for padding when padding = [0, 0, 0, 0]. This prevents export from specializing when all padding=0, allowing export when all padding >= 0. Specialization will still happen for negative padding. This change will be used to export image preprocess for multimodal models, where images of dynamic shape are padded. As images are of dynamic shape, we can't be sure if padding will be required or not. Padding is guaranteed to be non-negative. Preprocess code: pytorch/torchtune#1242 Note: the alternative is to wrap padding in a custom op, which isn't ideal given the custom op will contain the same impl as constant_pad_nd. Test Plan: ci Differential Revision: D60687727
Summary: For preprocess, we have two custom ops, pad and reshape. These are used to hide the implementation and any data dependent issues from export, allowing us to export the model. The model definition is in D60051533, though we're trying to upstream it to torchtune: pytorch/torchtune#1242 This diff adds: - Python impl + tests - C++ function signatures + buck setup Differential Revision: D60491675
working on #1269 which refactors exported version into a separate file, after discussion with torchtune team. |
Summary: Pull Request resolved: pytorch#4548 For preprocess, we have two custom ops, pad and reshape. These are used to hide the implementation and any data dependent issues from export, allowing us to export the model. The model definition is in D60051533, though we're trying to upstream it to torchtune: pytorch/torchtune#1242 This diff adds: - Python impl + tests - C++ function signatures + buck setup Differential Revision: D60491675
Summary: Pull Request resolved: pytorch#4548 For preprocess, we have two custom ops, pad and reshape. These are used to hide the implementation and any data dependent issues from export, allowing us to export the model. The model definition is in D60051533, though we're trying to upstream it to torchtune: pytorch/torchtune#1242 This diff adds: - Python impl + tests - C++ function signatures + buck setup Differential Revision: D60491675
Summary: Pull Request resolved: pytorch#4548 For preprocess, we have two custom ops, pad and reshape. These are used to hide the implementation and any data dependent issues from export, allowing us to export the model. The model definition is in D60051533, though we're trying to upstream it to torchtune: pytorch/torchtune#1242 This diff adds: - Python impl + tests - C++ function signatures + buck setup for reshape. Differential Revision: D60491675
Summary: Pull Request resolved: pytorch#4548 For preprocess, we have two custom ops, pad and reshape. These are used to hide the implementation and any data dependent issues from export, allowing us to export the model. The model definition is in D60051533, though we're trying to upstream it to torchtune: pytorch/torchtune#1242 This diff adds: - Python impl + tests - C++ function signatures + buck setup for reshape. Differential Revision: D60491675
Summary: Pull Request resolved: pytorch#132679 Remove the early exit for padding when padding = [0, 0, 0, 0]. This prevents export from specializing when all padding=0, allowing export when all padding >= 0. Specialization will still happen for negative padding. This change will be used to export image preprocess for multimodal models, where images of dynamic shape are padded. As images are of dynamic shape, we can't be sure if padding will be required or not. Padding is guaranteed to be non-negative. Preprocess code: pytorch/torchtune#1242 Note: the alternative is to wrap padding in a custom op, which isn't ideal given the custom op will contain the same impl as constant_pad_nd. Test Plan: ci Differential Revision: D60687727
Summary: Pull Request resolved: pytorch#132679 Remove the early exit for padding when padding = [0, 0, 0, 0]. This prevents export from specializing when all padding=0, allowing export when all padding >= 0. Specialization will still happen for negative padding. This change will be used to export image preprocess for multimodal models, where images of dynamic shape are padded. As images are of dynamic shape, we can't be sure if padding will be required or not. Padding is guaranteed to be non-negative. Preprocess code: pytorch/torchtune#1242 Note: the alternative is to wrap padding in a custom op, which isn't ideal given the custom op will contain the same impl as constant_pad_nd. Test Plan: ci Differential Revision: D60687727
Summary: Pull Request resolved: pytorch#132679 Remove the early exit for padding when padding = [0, 0, 0, 0]. This prevents export from specializing when all padding=0, allowing export when all padding >= 0. Specialization will still happen for negative padding. This change will be used to export image preprocess for multimodal models, where images of dynamic shape are padded. As images are of dynamic shape, we can't be sure if padding will be required or not. Padding is guaranteed to be non-negative. Preprocess code: pytorch/torchtune#1242 Note: the alternative is to wrap padding in a custom op, which isn't ideal given the custom op will contain the same impl as constant_pad_nd. Test Plan: ci Differential Revision: D60687727
Summary: Remove the early exit for padding when padding = [0, 0, 0, 0]. This prevents export from specializing when all padding=0, allowing export when all padding >= 0. Specialization will still happen for negative padding. This change will be used to export image preprocess for multimodal models, where images of dynamic shape are padded. As images are of dynamic shape, we can't be sure if padding will be required or not. Padding is guaranteed to be non-negative. Preprocess code: pytorch/torchtune#1242 Note: the alternative is to wrap padding in a custom op, which isn't ideal given the custom op will contain the same impl as constant_pad_nd. Test Plan: ci Differential Revision: D60687727 Pull Request resolved: #132679 Approved by: https://github.com/ezyang
Summary: Remove the early exit for padding when padding = [0, 0, 0, 0]. This prevents export from specializing when all padding=0, allowing export when all padding >= 0. Specialization will still happen for negative padding. This change will be used to export image preprocess for multimodal models, where images of dynamic shape are padded. As images are of dynamic shape, we can't be sure if padding will be required or not. Padding is guaranteed to be non-negative. Preprocess code: pytorch/torchtune#1242 Note: the alternative is to wrap padding in a custom op, which isn't ideal given the custom op will contain the same impl as constant_pad_nd. Test Plan: ci Differential Revision: D60687727 Pull Request resolved: pytorch#132679 Approved by: https://github.com/ezyang
Summary: Remove the early exit for padding when padding = [0, 0, 0, 0]. This prevents export from specializing when all padding=0, allowing export when all padding >= 0. Specialization will still happen for negative padding. This change will be used to export image preprocess for multimodal models, where images of dynamic shape are padded. As images are of dynamic shape, we can't be sure if padding will be required or not. Padding is guaranteed to be non-negative. Preprocess code: pytorch/torchtune#1242 Note: the alternative is to wrap padding in a custom op, which isn't ideal given the custom op will contain the same impl as constant_pad_nd. Test Plan: ci Differential Revision: D60687727 Pull Request resolved: #132679 Approved by: https://github.com/ezyang
Context
What is the purpose of this PR? Is it to
Please link to any issues this PR addresses.
Changelog
What are the changes made in this PR?
Refactor CLIPImageTransform so that it can be exported, and shared with other repos (eg. ExecuTorch). This includes:
torchtune/models/clip/_transforms.py
torchtune/modules/transforms/vision_utils/tile_crop.py
Update tests
Update code references
torchtune/modules/transforms/init.py
torchtune/modules/vision_transformer.py
docs/source/api_ref_modules.rst
I think we can remove resize_with_pad.py with associated tests after this change.
Test plan
Run pytest in directories:
Please make sure to do each of the following if applicable to your PR. (If you're not sure about any one of these just ask and we will happily help.)
pre-commit install
)pytest tests
pytest tests -m integration_test
E AttributeError: `np.string_` was removed in the NumPy 2.0 release. Use `np.bytes_` instead.