Lookaside for `torch.ops.higher_order.autograd_function_apply` #1256

crcrpar · 2024-10-03T13:34:04Z

What does this PR do?

As per #1248, the support of torch.ops.higher_order.autograd_function_apply would be a bit more flexible by tracing into both fwd and bwd.

thunder/core/jit_ext.py

IvanYashchuk · 2024-10-04T12:37:25Z

thunder/torch/__init__.py

Can we keep this code? I think it does a good job at separation of concerns. The job if "jit_ext" is to make things Thunder friendly.

Then it feels like we should have a better utility to make a callable friendly to Thunder before doing what this PR is trying to do

You have already done the complicated part of making callables friendly to Thunder. What I mean is that it's now possible to remove a bit of complexity of registering a grad rule by using the functions that were removed:

diff --git a/thunder/core/jit_ext.py b/thunder/core/jit_ext.py index d98836a2..9d390e7d 100644 --- a/thunder/core/jit_ext.py +++ b/thunder/core/jit_ext.py @@ -864,9 +864,18 @@ def _general_jit_torch_ops_higher_order_autograd_function_apply(fwd, bwd, *fwd_a bwd_trace._siginfo = SigInfo.from_name_and_args(f"bwd_{sym_id}", saved_values + grads) @wraps(bwd_trace.python_callable()) - def bwd_impl_callable(*args, **kwargs): + def bwd_impl_callable(ctx, *args, **kwargs): return thunder.core.trace_interpreter.interpret_trace(bwd_trace, *args, **kwargs) + @wraps(augmented_fwd_trace.python_callable()) + def fwd_impl_callable(ctx, *args, **kwargs): + return thunder.core.trace_interpreter.interpret_trace(augmented_fwd_trace, *args, **kwargs) + + from thunder.torch import autograd_function_apply + wrapped_fwd = wrap_const(fwd_impl_callable) + wrapped_bwd = wrap_const(bwd_impl_callable) + return interpreter_needs_wrap(autograd_function_apply)(wrapped_fwd, wrapped_bwd, *fwd_args, **fwd_kwargs) + @wraps(core_of_fwd) def grad_transform(*args, **kwargs): from thunder.core.transforms import get_grad, put_grads

thunder/core/jit_ext.py

Signed-off-by: Masaki Kozuki <[email protected]>

IvanYashchuk

I want the lookasides' scope to be limited only to the preprocessing of PyTorch code. If the removed code is reused in the updated lookaside we'll achieve that.

IvanYashchuk · 2024-11-14T13:16:52Z

thunder/core/jit_ext.py

+    from thunder.core import utils
+    from thunder.core.baseutils import sequencify
+    from thunder.core.pytree import tree_flatten, tree_map
+    from thunder.core.transforms import VJPDual, augmented_forward_impls, backward_impls


These imports are unused.

IvanYashchuk · 2024-11-14T13:17:20Z

thunder/core/jit_ext.py

+        if p in producer_map:
+            prod_bsym = producer_map[p]
+            tensor_to_prod_bsym[variableify(p)] = prod_bsym
+    prod_bsym_to_tensor = {v: unvariableify(k) for k, v in tensor_to_prod_bsym.items()}


This variable is unused.

IvanYashchuk · 2024-11-14T13:17:38Z

thunder/core/jit_ext.py

+        return aug_fwd_trace
+    aug_fwd_result = aug_fwd_trace.output
+    output, saved_values = unwrap(aug_fwd_result)
+    wrapped_output = wrap(output, provenance=aug_fwd_provenance)


This variable is unused.

IvanYashchuk · 2024-11-14T13:38:57Z

thunder/torch/__init__.py

You have already done the complicated part of making callables friendly to Thunder. What I mean is that it's now possible to remove a bit of complexity of registering a grad rule by using the functions that were removed:

diff --git a/thunder/core/jit_ext.py b/thunder/core/jit_ext.py index d98836a2..9d390e7d 100644 --- a/thunder/core/jit_ext.py +++ b/thunder/core/jit_ext.py @@ -864,9 +864,18 @@ def _general_jit_torch_ops_higher_order_autograd_function_apply(fwd, bwd, *fwd_a bwd_trace._siginfo = SigInfo.from_name_and_args(f"bwd_{sym_id}", saved_values + grads) @wraps(bwd_trace.python_callable()) - def bwd_impl_callable(*args, **kwargs): + def bwd_impl_callable(ctx, *args, **kwargs): return thunder.core.trace_interpreter.interpret_trace(bwd_trace, *args, **kwargs) + @wraps(augmented_fwd_trace.python_callable()) + def fwd_impl_callable(ctx, *args, **kwargs): + return thunder.core.trace_interpreter.interpret_trace(augmented_fwd_trace, *args, **kwargs) + + from thunder.torch import autograd_function_apply + wrapped_fwd = wrap_const(fwd_impl_callable) + wrapped_bwd = wrap_const(bwd_impl_callable) + return interpreter_needs_wrap(autograd_function_apply)(wrapped_fwd, wrapped_bwd, *fwd_args, **fwd_kwargs) + @wraps(core_of_fwd) def grad_transform(*args, **kwargs): from thunder.core.transforms import get_grad, put_grads

IvanYashchuk · 2024-11-14T13:40:21Z

thunder/core/jit_ext.py

+    @wraps(core_of_fwd)
+    def grad_transform(*args, **kwargs):


This part can be removed if we reuse existing functions and use interpreter_needs_wrap(autograd_function_apply) here.

IvanYashchuk · 2024-11-14T13:40:58Z

thunder/torch/__init__.py

Let's not remove this code and use it inside the lookaside.

Signed-off-by: Masaki Kozuki <[email protected]>

crcrpar requested review from mruberry, lantiga and t-vi as code owners October 3, 2024 13:34

t-vi reviewed Oct 3, 2024

View reviewed changes

thunder/core/jit_ext.py Outdated Show resolved Hide resolved

crcrpar force-pushed the crpa/lookaside_autograd-function-apply branch from b4647ed to 71db6cd Compare October 3, 2024 14:42

IvanYashchuk reviewed Oct 4, 2024

View reviewed changes

crcrpar added a commit that referenced this pull request Nov 13, 2024

cherry-pick of #1256 as of 627845d

f349308

Signed-off-by: Masaki Kozuki <[email protected]>

crcrpar force-pushed the crpa/lookaside_autograd-function-apply branch from 627845d to f349308 Compare November 13, 2024 11:04

crcrpar added a commit that referenced this pull request Nov 13, 2024

cherry-pick of #1256 as of 627845d

cc2bbbe

Signed-off-by: Masaki Kozuki <[email protected]>

crcrpar force-pushed the crpa/lookaside_autograd-function-apply branch 2 times, most recently from 9c51ae2 to 94c3409 Compare November 13, 2024 11:29

crcrpar added a commit that referenced this pull request Nov 13, 2024

cherry-pick of #1256 as of 627845d

b2eede7

Signed-off-by: Masaki Kozuki <[email protected]>

crcrpar force-pushed the crpa/lookaside_autograd-function-apply branch from c5621cd to dd702f5 Compare November 13, 2024 12:24

crcrpar requested review from t-vi and IvanYashchuk November 13, 2024 12:35

crcrpar added a commit that referenced this pull request Nov 14, 2024

cherry-pick of #1256 as of 627845d

199d688

Signed-off-by: Masaki Kozuki <[email protected]>

crcrpar force-pushed the crpa/lookaside_autograd-function-apply branch from dd702f5 to 1b85a21 Compare November 14, 2024 13:04

IvanYashchuk reviewed Nov 14, 2024

View reviewed changes

crcrpar added 6 commits November 14, 2024 23:40

cherry-pick of #1256 as of 627845d

eaae2d6

Signed-off-by: Masaki Kozuki <[email protected]>

use push|pop_scope

4cfb94b

Signed-off-by: Masaki Kozuki <[email protected]>

simplified siginfo

fd8d0f4

Signed-off-by: Masaki Kozuki <[email protected]>

Use ad_hoc_executor

f0cf8ae

Signed-off-by: Masaki Kozuki <[email protected]>

keep torchsymbol

069bc57

Signed-off-by: Masaki Kozuki <[email protected]>

cosmetic

7729af1

Signed-off-by: Masaki Kozuki <[email protected]>

crcrpar force-pushed the crpa/lookaside_autograd-function-apply branch from 1b85a21 to 7729af1 Compare November 14, 2024 14:40

IvanYashchuk mentioned this pull request Nov 14, 2024

[WIP] Draft checkpoint interpret call #1275

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lookaside for `torch.ops.higher_order.autograd_function_apply` #1256

Lookaside for `torch.ops.higher_order.autograd_function_apply` #1256

crcrpar commented Oct 3, 2024

IvanYashchuk Oct 4, 2024

crcrpar Oct 4, 2024

IvanYashchuk Nov 14, 2024

IvanYashchuk left a comment

IvanYashchuk Nov 14, 2024

IvanYashchuk Nov 14, 2024

IvanYashchuk Nov 14, 2024

IvanYashchuk Nov 14, 2024

IvanYashchuk Nov 14, 2024

IvanYashchuk Nov 14, 2024

Lookaside for torch.ops.higher_order.autograd_function_apply #1256

Are you sure you want to change the base?

Lookaside for torch.ops.higher_order.autograd_function_apply #1256

Conversation

crcrpar commented Oct 3, 2024

What does this PR do?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IvanYashchuk left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lookaside for `torch.ops.higher_order.autograd_function_apply` #1256

Lookaside for `torch.ops.higher_order.autograd_function_apply` #1256