[Flow] Add TensorReshapeOp canonicalization #18729

IanWood1 · 2024-10-09T05:10:58Z

Canonicalize flow.tensor.reshape -> tensor collapse/expand -> flow.tensor.reshape where the first flow.tensor.reshape only converts from static to dynamic dims and the second converts back.

For example:

%0 = flow.tensor.reshape %arg0: tensor<4x1x8192xf16> -> tensor<?x?x8192xf16>{%c4, %c1}
%expanded_0 = tensor.expand_shape %0 [[0], [1], [2, 3]] output_shape [%c4, %c1, 256, 32] : tensor<?x?x8192xf16> into tensor<?x?x256x32xf16>
%1 = flow.tensor.reshape %expanded_0 : tensor<?x?x256x32xf16>{%c4, %c1} -> tensor<4x1x256x32xf16>

// transformed to...

%expanded_0 = tensor.expand_shape %0 [[0], [1], [2, 3]] output_shape [4, 1, 256, 32] : tensor<4x1x8192xf16> into tensor<4x1x256x32xf16>

%0 = flow.tensor.reshape %arg0: tensor<4x1x256x32xf16> -> tensor<?x?x256x32xf16>{%c4, %c1}
%collapsed= tensor.collapse_shape %0 [[0], [1], [2, 3]] : tensor<?x?x256x32xf16> into tensor<?x?x8192xf16>
%1 = flow.tensor.reshape %collapsed: tensor<?x?x8192xf16>{%c4, %c1} -> tensor<4x1x8192xf16>

// transformed to...

%collapsed= tensor.collapse_shape %0 [[0], [1], [2, 3]] : tensor<4x1x256x32xf16> into tensor<4x1x8192xf16>

Signed-off-by: Ian Wood <[email protected]>

hanhanW · 2024-10-10T18:02:21Z

I wonder if we can do shape inference for all these ops. After we fold the tensor.cast chain away, they become static shapes. This is what we've done for linalg ops.

E.g.,

  %0 = flow.tensor.reshape %arg0: tensor<4x1x8192xf16> -> tensor<?x?x8192xf16>{%c4, %c1}
  %expanded_0 = tensor.expand_shape %0 [[0], [1], [2, 3]] output_shape [%c4, %c1, 256, 32] : tensor<?x?x8192xf16> into tensor<?x?x256x32xf16>
  %1 = flow.tensor.reshape %expanded_0 : tensor<?x?x256x32xf16>{%c4, %c1} -> tensor<4x1x256x32xf16>

For the tensor.expand_shape op, we can infer the output shape because %c4 and %c1 are constants. So it becomes:

%cast_0 = tensor.cast %0 : tensor<?x?x8192xf16> to tensor<4x1x8192x32>
%expanded_0 = tensor.expand_shape %cast_0 [[0], [1], [2, 3]] output_shape [4, 1, 256, 32] : tensor<4x1x8192xf16> into tensor<4x1x256x32xf16>
%cast_1 = tensor.cast %expanded_0 : tensor<4x1x256x32xf16> to tensor<?x?x256x32>

For the flow.tensor.reshape, it can be folded to cast because the dynamic sizes are just constants:

%cast_2 = tensor.cast %cast_1 : tensor<?x?x256x32xf16> to tensor<4x1x256x32xf16>

Then it becomes a nop?

Your patch could be valuable for real dynamic cases though. I'm just throwing out the question because the shapes in the test are all static (or say statically known).

hanhanW

Adding some IR examples before and after the folder to the PR description would help.

compiler/src/iree/compiler/Dialect/Flow/IR/FlowOpFolders.cpp

IanWood1 · 2024-10-10T18:09:57Z

I wonder if we can do shape inference for all these ops. After we fold the tensor.cast chain away, they become static shapes. This is what we've done for linalg ops.

Do you know where this is being done for linalg ops?

hanhanW · 2024-10-10T18:12:52Z

I wonder if we can do shape inference for all these ops. After we fold the tensor.cast chain away, they become static shapes. This is what we've done for linalg ops.

Do you know where this is being done for linalg ops?

Here is the implementation for LinalgOps: https://github.com/llvm/llvm-project/blob/99c8557c175e88ff1c338c4c29e3a4d63c5a46cb/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp#L2589-L2595

Here is the implementation for tensor.pack ops: https://github.com/llvm/llvm-project/blob/99c8557c175e88ff1c338c4c29e3a4d63c5a46cb/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp#L4324-L4355

IanWood1 · 2024-10-10T18:22:54Z

I wonder if we can do shape inference for all these ops. After we fold the tensor.cast chain away, they become static shapes. This is what we've done for linalg ops.

Do you know where this is being done for linalg ops?

Here is the implementation for LinalgOps: llvm/llvm-project@99c8557/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp#L2589-L2595

Here is the implementation for tensor.pack ops: llvm/llvm-project@99c8557/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp#L4324-L4355

Given that tensor.cast ops get converted to flow.tensor.reshape ops and this is done for linalg.generic ops as well as tensor.pack ops, should this be an upstream canonicalization? Or is there a reason this isn't being done.

Also, to address the question. I think what you were saying makes sense and seems like a better solution. I was only trying to target cases where the tensor ops should be fully static.

hanhanW · 2024-10-10T19:19:44Z

Given that tensor.cast ops get converted to flow.tensor.reshape ops and this is done for linalg.generic ops as well as tensor.pack ops, should this be an upstream canonicalization? Or is there a reason this isn't being done.

Do you mean that if we should add a shape inference pattern to tensor.expand_shape op's canonicalization pattern? Yes, I think so.

If I read it correctly, it was tensor.cast -> expand_shape -> tensor.case and it becomes flow.tensor.reshape -> expand_shape -> flow.tensor_reshape, right? Then if we have the shape inference pattern for tensor.expand_shape, the tensor.cast chain will be folded away. Then it becomes a single expand_shape op. If this is the case, then yes.

For the other example, I don't have enough context. We could infer some dimensions for the collapse_shape op for sure. But the generated IR is not what you have in the PR. The second dimension can't be inferred without implementing the shape inference for flow.tensor.reshape op.

util.func public @canonicalizeReshapeCollapse(%arg0: tensor<4x1x256x32xf16>) -> tensor<4x1x8192xf16> {
  %c1 = arith.constant 1 : index
  %c4 = arith.constant 4 : index
  %0 = flow.tensor.reshape %arg0: tensor<4x1x256x32xf16> -> tensor<?x?x256x32xf16>{%c4, %c1}
  %expanded_0 = tensor.collapse_shape %0 [[0], [1], [2, 3]] : tensor<?x?x256x32xf16> into tensor<?x?x8192xf16>
  %1 = flow.tensor.reshape %expanded_0 : tensor<?x?x8192xf16>{%c4, %c1} -> tensor<4x1x8192xf16>
  util.return %1 : tensor<4x1x8192xf16>
}

Signed-off-by: Ian Wood <[email protected]>

IanWood1 · 2024-10-10T20:57:19Z

If I read it correctly, it was tensor.cast -> expand_shape -> tensor.case and it becomes flow.tensor.reshape -> expand_shape -> flow.tensor_reshape, right? Then if we have the shape inference pattern for tensor.expand_shape, the tensor.cast chain will be folded away. Then it becomes a single expand_shape op. If this is the case, then yes.

Yes

For the other example, I don't have enough context. We could infer some dimensions for the collapse_shape op for sure. But the generated IR is not what you have in the PR. The second dimension can't be inferred without implementing the shape inference for flow.tensor.reshape op.

I see your point, is there a reason why flow.tensor.reshape doesnt implement something like ReifyRankedShapedTypeInterface, then the collapse_shape could query the interface for constant values to reify.

hanhanW · 2024-10-10T21:11:35Z

I see your point, is there a reason why flow.tensor.reshape doesnt implement something like ReifyRankedShapedTypeInterface, then the collapse_shape could query the interface for constant values to reify.

I don't know. I think it's just because we never have the needs, or the interface did not exist when we had the ops. The shape inference of pack ops is implemented few months ago; we implemented it because we had the needs. It is even not mature enough now.

I think we do miss some interface implementations for flow ops. E.g., I implemented an interface method for flow.dispatch.tensor.load/store half of year ago.

https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/ExternalInterfaces/FlowExternalModels.cpp

benvanik · 2024-10-10T21:14:06Z

We generally should not be mixing flow ops with non-flow ops, and flow doesn't need ReifyRankedShapedTypeInterface (and shouldn't - that interface is really garbage heavy). I'm still skeptical of any code that's using both tensor dialect and flow tensor ops as that indicates a bug in the layering: we accept tensor ops and produce flow ops, and shouldn't be trying to cross the streams.

IanWood1 · 2024-10-10T21:40:29Z

I'm still skeptical of any code that's using both tensor dialect and flow tensor ops as that indicates a bug in the layering: we accept tensor ops and produce flow ops, and shouldn't be trying to cross the streams.

Fair point, the problem is that #18351 causes all tensor.casts to be converted to flow.tensor.reshape ops as early as global optimization and these cast-like flow ops get in the way of DispatchCreation's reshape propagation. Would the solution be to treat these ops as opaque during DispatchCreation (which I think was the goal of the linked PR as it was preventing tensor.cast ops from being cloned into dispatches), and try to address the source of the tensor.cast ops earlier?

benvanik · 2024-10-10T21:56:56Z

I think what's happened over time in dispatch creation with partial conversion in various steps is not great (I understand and can sympathize with how it got there, but it's a bad place to be for these kind of reasons :). The flow ops are only intended for the host and by having them prior to dispatch region formation (which is creating code for the device) we will run into situations like this frequently. We never want to be in the situation where we are reconverting (tensor->flow->tensor) for dispatch region formation so the only path that makes sense to me is making dispatch region formation only convert to flow after it does its formation (as was originally the case). If there are ops needed prior to that which behave more like tensor than flow we should create a tensor_ext and put them there. I think flow.tensor.reshape was chosen because it does something different than the upstream ops and not because it was the correct op to use. If the issue (which I very much suspect) was that upstream is a pain and changing things is impossible then making our own tensor_ext.cast that does what we want is the path forward. Then we can implement the ReifyShapeMumble stuff, fold aggressively knowing that we're only dealing with tensor/tensor_ext/linalg ops, and treat all flow ops as blocking for dispatch region formation.

hanhanW · 2024-10-10T22:01:43Z

Does adding shape inference to expand/collpase_shape ops' canonicalization patterns fix your issue?

IanWood1 · 2024-10-11T21:49:52Z

@benvanik and @hanhanW, I just talked with Mahesh. Adding a canonicalizer to expand/collapse shape makes the most sense, as hanhan originally suggested. Then there would be no reason to have to deal with both Flow and Tensor. Additionally, I'll look at rolling back #18351 (issue: #18229) to stop early conversion of tensor.cast ops and check what happens to the original issue.

IanWood1 force-pushed the canonicalize_castlike_reshapes branch from e3a633f to cc455d5 Compare October 9, 2024 21:28

Add TensorReshapeOp canon & test

9ad1c08

Signed-off-by: Ian Wood <[email protected]>

IanWood1 force-pushed the canonicalize_castlike_reshapes branch from cc455d5 to 9ad1c08 Compare October 9, 2024 22:20

IanWood1 marked this pull request as ready for review October 10, 2024 16:10

IanWood1 requested review from hanhanW and MaheshRavishankar as code owners October 10, 2024 16:10

IanWood1 mentioned this pull request Oct 10, 2024

[compiler][flow] Move cast, reshape and bitcast after transfer op #18742

Open

hanhanW reviewed Oct 10, 2024

View reviewed changes

compiler/src/iree/compiler/Dialect/Flow/IR/FlowOpFolders.cpp Outdated Show resolved Hide resolved

compiler/src/iree/compiler/Dialect/Flow/IR/FlowOpFolders.cpp Outdated Show resolved Hide resolved

Fix naming and cleanup condition

8870cd7

Signed-off-by: Ian Wood <[email protected]>

IanWood1 closed this Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flow] Add TensorReshapeOp canonicalization #18729

[Flow] Add TensorReshapeOp canonicalization #18729

IanWood1 commented Oct 9, 2024 •

edited

Loading

hanhanW commented Oct 10, 2024 •

edited

Loading

hanhanW left a comment

IanWood1 commented Oct 10, 2024

hanhanW commented Oct 10, 2024

IanWood1 commented Oct 10, 2024 •

edited

Loading

hanhanW commented Oct 10, 2024

IanWood1 commented Oct 10, 2024

hanhanW commented Oct 10, 2024

benvanik commented Oct 10, 2024

IanWood1 commented Oct 10, 2024

benvanik commented Oct 10, 2024

hanhanW commented Oct 10, 2024

IanWood1 commented Oct 11, 2024

[Flow] Add TensorReshapeOp canonicalization #18729

[Flow] Add TensorReshapeOp canonicalization #18729

Conversation

IanWood1 commented Oct 9, 2024 • edited Loading

For example:

hanhanW commented Oct 10, 2024 • edited Loading

hanhanW left a comment

Choose a reason for hiding this comment

IanWood1 commented Oct 10, 2024

hanhanW commented Oct 10, 2024

IanWood1 commented Oct 10, 2024 • edited Loading

hanhanW commented Oct 10, 2024

IanWood1 commented Oct 10, 2024

hanhanW commented Oct 10, 2024

benvanik commented Oct 10, 2024

IanWood1 commented Oct 10, 2024

benvanik commented Oct 10, 2024

hanhanW commented Oct 10, 2024

IanWood1 commented Oct 11, 2024

IanWood1 commented Oct 9, 2024 •

edited

Loading

hanhanW commented Oct 10, 2024 •

edited

Loading

IanWood1 commented Oct 10, 2024 •

edited

Loading