-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flow] Add TensorReshapeOp canonicalization #18729
Conversation
e3a633f
to
cc455d5
Compare
Signed-off-by: Ian Wood <[email protected]>
cc455d5
to
9ad1c08
Compare
I wonder if we can do shape inference for all these ops. After we fold the tensor.cast chain away, they become static shapes. This is what we've done for linalg ops. E.g., %0 = flow.tensor.reshape %arg0: tensor<4x1x8192xf16> -> tensor<?x?x8192xf16>{%c4, %c1}
%expanded_0 = tensor.expand_shape %0 [[0], [1], [2, 3]] output_shape [%c4, %c1, 256, 32] : tensor<?x?x8192xf16> into tensor<?x?x256x32xf16>
%1 = flow.tensor.reshape %expanded_0 : tensor<?x?x256x32xf16>{%c4, %c1} -> tensor<4x1x256x32xf16> For the %cast_0 = tensor.cast %0 : tensor<?x?x8192xf16> to tensor<4x1x8192x32>
%expanded_0 = tensor.expand_shape %cast_0 [[0], [1], [2, 3]] output_shape [4, 1, 256, 32] : tensor<4x1x8192xf16> into tensor<4x1x256x32xf16>
%cast_1 = tensor.cast %expanded_0 : tensor<4x1x256x32xf16> to tensor<?x?x256x32> For the %cast_2 = tensor.cast %cast_1 : tensor<?x?x256x32xf16> to tensor<4x1x256x32xf16> Then it becomes a nop? Your patch could be valuable for real dynamic cases though. I'm just throwing out the question because the shapes in the test are all static (or say statically known). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some IR examples before and after the folder to the PR description would help.
Do you know where this is being done for linalg ops? |
Here is the implementation for LinalgOps: https://github.com/llvm/llvm-project/blob/99c8557c175e88ff1c338c4c29e3a4d63c5a46cb/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp#L2589-L2595 Here is the implementation for tensor.pack ops: https://github.com/llvm/llvm-project/blob/99c8557c175e88ff1c338c4c29e3a4d63c5a46cb/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp#L4324-L4355 |
Given that Also, to address the question. I think what you were saying makes sense and seems like a better solution. I was only trying to target cases where the tensor ops should be fully static. |
Do you mean that if we should add a shape inference pattern to tensor.expand_shape op's canonicalization pattern? Yes, I think so. If I read it correctly, it was For the other example, I don't have enough context. We could infer some dimensions for the collapse_shape op for sure. But the generated IR is not what you have in the PR. The second dimension can't be inferred without implementing the shape inference for util.func public @canonicalizeReshapeCollapse(%arg0: tensor<4x1x256x32xf16>) -> tensor<4x1x8192xf16> {
%c1 = arith.constant 1 : index
%c4 = arith.constant 4 : index
%0 = flow.tensor.reshape %arg0: tensor<4x1x256x32xf16> -> tensor<?x?x256x32xf16>{%c4, %c1}
%expanded_0 = tensor.collapse_shape %0 [[0], [1], [2, 3]] : tensor<?x?x256x32xf16> into tensor<?x?x8192xf16>
%1 = flow.tensor.reshape %expanded_0 : tensor<?x?x8192xf16>{%c4, %c1} -> tensor<4x1x8192xf16>
util.return %1 : tensor<4x1x8192xf16>
} |
Signed-off-by: Ian Wood <[email protected]>
Yes
I see your point, is there a reason why |
I don't know. I think it's just because we never have the needs, or the interface did not exist when we had the ops. The shape inference of pack ops is implemented few months ago; we implemented it because we had the needs. It is even not mature enough now. I think we do miss some interface implementations for flow ops. E.g., I implemented an interface method for flow.dispatch.tensor.load/store half of year ago. |
We generally should not be mixing flow ops with non-flow ops, and flow doesn't need ReifyRankedShapedTypeInterface (and shouldn't - that interface is really garbage heavy). I'm still skeptical of any code that's using both tensor dialect and flow tensor ops as that indicates a bug in the layering: we accept tensor ops and produce flow ops, and shouldn't be trying to cross the streams. |
Fair point, the problem is that #18351 causes all |
I think what's happened over time in dispatch creation with partial conversion in various steps is not great (I understand and can sympathize with how it got there, but it's a bad place to be for these kind of reasons :). The flow ops are only intended for the host and by having them prior to dispatch region formation (which is creating code for the device) we will run into situations like this frequently. We never want to be in the situation where we are reconverting (tensor->flow->tensor) for dispatch region formation so the only path that makes sense to me is making dispatch region formation only convert to flow after it does its formation (as was originally the case). If there are ops needed prior to that which behave more like tensor than flow we should create a tensor_ext and put them there. I think flow.tensor.reshape was chosen because it does something different than the upstream ops and not because it was the correct op to use. If the issue (which I very much suspect) was that upstream is a pain and changing things is impossible then making our own tensor_ext.cast that does what we want is the path forward. Then we can implement the ReifyShapeMumble stuff, fold aggressively knowing that we're only dealing with tensor/tensor_ext/linalg ops, and treat all flow ops as blocking for dispatch region formation. |
Does adding shape inference to expand/collpase_shape ops' canonicalization patterns fix your issue? |
@benvanik and @hanhanW, I just talked with Mahesh. Adding a canonicalizer to expand/collapse shape makes the most sense, as hanhan originally suggested. Then there would be no reason to have to deal with both Flow and Tensor. Additionally, I'll look at rolling back #18351 (issue: #18229) to stop early conversion of |
Canonicalize
flow.tensor.reshape
-> tensor collapse/expand ->flow.tensor.reshape
where the firstflow.tensor.reshape
only converts from static to dynamic dims and the second converts back.For example: