Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flow] Add TensorReshapeOp canonicalization #18729

Closed

Conversation

IanWood1
Copy link
Contributor

@IanWood1 IanWood1 commented Oct 9, 2024

Canonicalize flow.tensor.reshape -> tensor collapse/expand -> flow.tensor.reshape where the first flow.tensor.reshape only converts from static to dynamic dims and the second converts back.

For example:

%0 = flow.tensor.reshape %arg0: tensor<4x1x8192xf16> -> tensor<?x?x8192xf16>{%c4, %c1}
%expanded_0 = tensor.expand_shape %0 [[0], [1], [2, 3]] output_shape [%c4, %c1, 256, 32] : tensor<?x?x8192xf16> into tensor<?x?x256x32xf16>
%1 = flow.tensor.reshape %expanded_0 : tensor<?x?x256x32xf16>{%c4, %c1} -> tensor<4x1x256x32xf16>

// transformed to...

%expanded_0 = tensor.expand_shape %0 [[0], [1], [2, 3]] output_shape [4, 1, 256, 32] : tensor<4x1x8192xf16> into tensor<4x1x256x32xf16>
%0 = flow.tensor.reshape %arg0: tensor<4x1x256x32xf16> -> tensor<?x?x256x32xf16>{%c4, %c1}
%collapsed= tensor.collapse_shape %0 [[0], [1], [2, 3]] : tensor<?x?x256x32xf16> into tensor<?x?x8192xf16>
%1 = flow.tensor.reshape %collapsed: tensor<?x?x8192xf16>{%c4, %c1} -> tensor<4x1x8192xf16>

// transformed to...

%collapsed= tensor.collapse_shape %0 [[0], [1], [2, 3]] : tensor<4x1x256x32xf16> into tensor<4x1x8192xf16>

@IanWood1 IanWood1 force-pushed the canonicalize_castlike_reshapes branch from e3a633f to cc455d5 Compare October 9, 2024 21:28
@hanhanW
Copy link
Contributor

hanhanW commented Oct 10, 2024

I wonder if we can do shape inference for all these ops. After we fold the tensor.cast chain away, they become static shapes. This is what we've done for linalg ops.

E.g.,

  %0 = flow.tensor.reshape %arg0: tensor<4x1x8192xf16> -> tensor<?x?x8192xf16>{%c4, %c1}
  %expanded_0 = tensor.expand_shape %0 [[0], [1], [2, 3]] output_shape [%c4, %c1, 256, 32] : tensor<?x?x8192xf16> into tensor<?x?x256x32xf16>
  %1 = flow.tensor.reshape %expanded_0 : tensor<?x?x256x32xf16>{%c4, %c1} -> tensor<4x1x256x32xf16>

For the tensor.expand_shape op, we can infer the output shape because %c4 and %c1 are constants. So it becomes:

%cast_0 = tensor.cast %0 : tensor<?x?x8192xf16> to tensor<4x1x8192x32>
%expanded_0 = tensor.expand_shape %cast_0 [[0], [1], [2, 3]] output_shape [4, 1, 256, 32] : tensor<4x1x8192xf16> into tensor<4x1x256x32xf16>
%cast_1 = tensor.cast %expanded_0 : tensor<4x1x256x32xf16> to tensor<?x?x256x32>

For the flow.tensor.reshape, it can be folded to cast because the dynamic sizes are just constants:

%cast_2 = tensor.cast %cast_1 : tensor<?x?x256x32xf16> to tensor<4x1x256x32xf16>

Then it becomes a nop?

Your patch could be valuable for real dynamic cases though. I'm just throwing out the question because the shapes in the test are all static (or say statically known).

Copy link
Contributor

@hanhanW hanhanW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some IR examples before and after the folder to the PR description would help.

@IanWood1
Copy link
Contributor Author

I wonder if we can do shape inference for all these ops. After we fold the tensor.cast chain away, they become static shapes. This is what we've done for linalg ops.

Do you know where this is being done for linalg ops?

@hanhanW
Copy link
Contributor

hanhanW commented Oct 10, 2024

I wonder if we can do shape inference for all these ops. After we fold the tensor.cast chain away, they become static shapes. This is what we've done for linalg ops.

Do you know where this is being done for linalg ops?

Here is the implementation for LinalgOps: https://github.com/llvm/llvm-project/blob/99c8557c175e88ff1c338c4c29e3a4d63c5a46cb/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp#L2589-L2595

Here is the implementation for tensor.pack ops: https://github.com/llvm/llvm-project/blob/99c8557c175e88ff1c338c4c29e3a4d63c5a46cb/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp#L4324-L4355

@IanWood1
Copy link
Contributor Author

IanWood1 commented Oct 10, 2024

I wonder if we can do shape inference for all these ops. After we fold the tensor.cast chain away, they become static shapes. This is what we've done for linalg ops.

Do you know where this is being done for linalg ops?

Here is the implementation for LinalgOps: llvm/llvm-project@99c8557/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp#L2589-L2595

Here is the implementation for tensor.pack ops: llvm/llvm-project@99c8557/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp#L4324-L4355

Given that tensor.cast ops get converted to flow.tensor.reshape ops and this is done for linalg.generic ops as well as tensor.pack ops, should this be an upstream canonicalization? Or is there a reason this isn't being done.

Also, to address the question. I think what you were saying makes sense and seems like a better solution. I was only trying to target cases where the tensor ops should be fully static.

@hanhanW
Copy link
Contributor

hanhanW commented Oct 10, 2024

Given that tensor.cast ops get converted to flow.tensor.reshape ops and this is done for linalg.generic ops as well as tensor.pack ops, should this be an upstream canonicalization? Or is there a reason this isn't being done.

Do you mean that if we should add a shape inference pattern to tensor.expand_shape op's canonicalization pattern? Yes, I think so.

If I read it correctly, it was tensor.cast -> expand_shape -> tensor.case and it becomes flow.tensor.reshape -> expand_shape -> flow.tensor_reshape, right? Then if we have the shape inference pattern for tensor.expand_shape, the tensor.cast chain will be folded away. Then it becomes a single expand_shape op. If this is the case, then yes.


For the other example, I don't have enough context. We could infer some dimensions for the collapse_shape op for sure. But the generated IR is not what you have in the PR. The second dimension can't be inferred without implementing the shape inference for flow.tensor.reshape op.

util.func public @canonicalizeReshapeCollapse(%arg0: tensor<4x1x256x32xf16>) -> tensor<4x1x8192xf16> {
  %c1 = arith.constant 1 : index
  %c4 = arith.constant 4 : index
  %0 = flow.tensor.reshape %arg0: tensor<4x1x256x32xf16> -> tensor<?x?x256x32xf16>{%c4, %c1}
  %expanded_0 = tensor.collapse_shape %0 [[0], [1], [2, 3]] : tensor<?x?x256x32xf16> into tensor<?x?x8192xf16>
  %1 = flow.tensor.reshape %expanded_0 : tensor<?x?x8192xf16>{%c4, %c1} -> tensor<4x1x8192xf16>
  util.return %1 : tensor<4x1x8192xf16>
}

@IanWood1
Copy link
Contributor Author

If I read it correctly, it was tensor.cast -> expand_shape -> tensor.case and it becomes flow.tensor.reshape -> expand_shape -> flow.tensor_reshape, right? Then if we have the shape inference pattern for tensor.expand_shape, the tensor.cast chain will be folded away. Then it becomes a single expand_shape op. If this is the case, then yes.

Yes

For the other example, I don't have enough context. We could infer some dimensions for the collapse_shape op for sure. But the generated IR is not what you have in the PR. The second dimension can't be inferred without implementing the shape inference for flow.tensor.reshape op.

I see your point, is there a reason why flow.tensor.reshape doesnt implement something like ReifyRankedShapedTypeInterface, then the collapse_shape could query the interface for constant values to reify.

@hanhanW
Copy link
Contributor

hanhanW commented Oct 10, 2024

I see your point, is there a reason why flow.tensor.reshape doesnt implement something like ReifyRankedShapedTypeInterface, then the collapse_shape could query the interface for constant values to reify.

I don't know. I think it's just because we never have the needs, or the interface did not exist when we had the ops. The shape inference of pack ops is implemented few months ago; we implemented it because we had the needs. It is even not mature enough now.

I think we do miss some interface implementations for flow ops. E.g., I implemented an interface method for flow.dispatch.tensor.load/store half of year ago.

https://github.com/iree-org/iree/blob/main/compiler/src/iree/compiler/ExternalInterfaces/FlowExternalModels.cpp

@benvanik
Copy link
Collaborator

We generally should not be mixing flow ops with non-flow ops, and flow doesn't need ReifyRankedShapedTypeInterface (and shouldn't - that interface is really garbage heavy). I'm still skeptical of any code that's using both tensor dialect and flow tensor ops as that indicates a bug in the layering: we accept tensor ops and produce flow ops, and shouldn't be trying to cross the streams.

@IanWood1
Copy link
Contributor Author

I'm still skeptical of any code that's using both tensor dialect and flow tensor ops as that indicates a bug in the layering: we accept tensor ops and produce flow ops, and shouldn't be trying to cross the streams.

Fair point, the problem is that #18351 causes all tensor.casts to be converted to flow.tensor.reshape ops as early as global optimization and these cast-like flow ops get in the way of DispatchCreation's reshape propagation. Would the solution be to treat these ops as opaque during DispatchCreation (which I think was the goal of the linked PR as it was preventing tensor.cast ops from being cloned into dispatches), and try to address the source of the tensor.cast ops earlier?

@benvanik
Copy link
Collaborator

I think what's happened over time in dispatch creation with partial conversion in various steps is not great (I understand and can sympathize with how it got there, but it's a bad place to be for these kind of reasons :). The flow ops are only intended for the host and by having them prior to dispatch region formation (which is creating code for the device) we will run into situations like this frequently. We never want to be in the situation where we are reconverting (tensor->flow->tensor) for dispatch region formation so the only path that makes sense to me is making dispatch region formation only convert to flow after it does its formation (as was originally the case). If there are ops needed prior to that which behave more like tensor than flow we should create a tensor_ext and put them there. I think flow.tensor.reshape was chosen because it does something different than the upstream ops and not because it was the correct op to use. If the issue (which I very much suspect) was that upstream is a pain and changing things is impossible then making our own tensor_ext.cast that does what we want is the path forward. Then we can implement the ReifyShapeMumble stuff, fold aggressively knowing that we're only dealing with tensor/tensor_ext/linalg ops, and treat all flow ops as blocking for dispatch region formation.

@hanhanW
Copy link
Contributor

hanhanW commented Oct 10, 2024

Does adding shape inference to expand/collpase_shape ops' canonicalization patterns fix your issue?

@IanWood1
Copy link
Contributor Author

@benvanik and @hanhanW, I just talked with Mahesh. Adding a canonicalizer to expand/collapse shape makes the most sense, as hanhan originally suggested. Then there would be no reason to have to deal with both Flow and Tensor. Additionally, I'll look at rolling back #18351 (issue: #18229) to stop early conversion of tensor.cast ops and check what happens to the original issue.

@IanWood1 IanWood1 closed this Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants