Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang][OpenMP] Add support for multi-range do concurrent loops #89

Merged
merged 20 commits into from
Jun 17, 2024

Conversation

ergawy
Copy link

@ergawy ergawy commented May 17, 2024

Extends do concurrent to OpenMP mapping by adding support for multi-range loops. The current implementation only works for perfectly nested loops. So taking this input:

do concurrent(i=1:n, j=1:m)
  a(i,j,k) = i * j
end do

will behave in exactly the same way as this input:

do concurrent(i=1:n)
  do concurrent(j=1:m)
    a(i,j,k) = i * j
  end do
end do

@Meinersbur
Copy link

First thoughts without looking at the code:

  1. OpenMP does allow more than one DO CONCURRENT per construct (https://github.com/OpenMP/spec/issues/3425). If this is only for -fdo-concurrent-parallel then what the OpenMP spec says doesn't matter I guess.

  2. do concurrent(i=1:n, j=1:m) allows more executions than do concurrent(i=1:n) do concurrent(j=1:m). In the former $(1,2) (2,1) (1,1) (2,2)$ is a valid sequential execution order but not in the second. In partially_nested.f90 two DO CONCURRENT are merged into a single omp.loop_nest. I think omp.loop_nest alone has lexicographic order semantics, but might it be a problem when executed in parallel? My initial thought is that it is not as long as we have -fdo-concurrent-parallel which does allow the above order.

@ergawy
Copy link
Author

ergawy commented May 22, 2024

Thanks @Meinersbur for taking a look.

  1. .... If this is only for -fdo-concurrent-parallel then what the OpenMP spec says doesn't matter I guess.

Indeed that's only for the do concurrent mapping since I am targeting to detect loop nests. So having more one do concurrent on the same level invalidates the mapping.

  1. do concurrent(i=1:n, j=1:m) allows more executions than do concurrent(i=1:n) do concurrent(j=1:m). ....

Very ingishtful comment for me, thanks! However, the thing is that on the fir IR level, all 3 forms in the test you referred to have exactly the same IR. So I am not sure whether we should tighten the fir representation to model multi-range vs. nested loops (something I raised on Teams but seems like we do not want to change how fir works here for now).

@mjklemm
Copy link

mjklemm commented May 22, 2024

Very ingishtful comment for me, thanks! However, the thing is that on the fir IR level, all 3 forms in the test you referred to have exactly the same IR.

How does the FIR look like for these cases? Is the do concurrent (i=1:n, j=1:m) split up in two nested loops in FIR? I would consider this to be a potential bug. Even though it's correct from a language perspective, I would consider it a potential performance bug. Or, did I miss understand that comment?

@ergawy
Copy link
Author

ergawy commented May 22, 2024

Is the do concurrent (i=1:n, j=1:m) split up in two nested loops in FIR?

@mjklemm Yes, exactly. Here is a digest with an example to show how this looks like.

@mjklemm
Copy link

mjklemm commented May 23, 2024

Ok, then please record an issue as a potential performance bug. I'd argue that the nesting should be reflected in FIR to make sure that the compiler see the merged loop vs. a nested loop.

Copy link

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do belong to roc-trunk-dev which GitHub does not allow me to comment on.

createMapInfoOp and calculateTripCount seem to have been copied from FortranLower's Utils.cpp. It fails with a linker error. Please don't copy&paste from somewhere else but find a location where it can be used from both.

Why is there a need for isOpUltimatelyConstant? Isn't there already an MLIR utility that determines this, or a normalization pass that normalizes constant expressions to ConstantOp?

DoConcurrentConversionPassOptions::mapTo and DoConcurrentConversionPassBase::mapTo should be enums.

flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp Outdated Show resolved Hide resolved
flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp Outdated Show resolved Hide resolved
flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp Outdated Show resolved Hide resolved
flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp Outdated Show resolved Hide resolved
flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp Outdated Show resolved Hide resolved
flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp Outdated Show resolved Hide resolved
flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp Outdated Show resolved Hide resolved
flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp Outdated Show resolved Hide resolved
flang/lib/Optimizer/Transforms/DoConcurrentConversion.cpp Outdated Show resolved Hide resolved
Copy link
Author

@ergawy ergawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for detailed review Michael. Handled some of you comments and looking into the rest ... 👀

createMapInfoOp and calculateTripCount seem to have been copied from FortranLower's Utils.cpp. It fails with a linker error. Please don't copy&pase from somewhere else but find a location where it can be used from both.

I understand this is less than ideal. We agreed to do that for now and I will split these utils upstream (see #88 and the relevant Teams discussion).

Why is there a need for isOpUltimatelyConstant?

This is sort of guardrail that will be lifted quite soon. I added that check because I tested the code on very simple loop for a start. But as you can see in this PR, the restrictions are being lifted one by one. So this check should be just temporary.

@ergawy
Copy link
Author

ergawy commented May 27, 2024

DoConcurrentConversionPassOptions::mapTo and DoConcurrentConversionPassBase::mapTo should be enums.

I did that in a separate PR: #96 since this was already in-place and not directly related to the scope of this PR.

@ergawy
Copy link
Author

ergawy commented May 27, 2024

@Meinersbur I think all of your comments are now handled. Please have another look whenever you have a chance.

@ergawy
Copy link
Author

ergawy commented May 30, 2024

Ping 🔔! Please have a look when you have time!

@Meinersbur
Copy link

Meinersbur commented Jun 3, 2024

Thanks for addressing my review. I am still uncomfortable with the implicit privatization heuristics. Maybe there is already a precedent that I don't know of?

I understand this is less than ideal. We agreed to do that for now and I will split these utils upstream (see #88 and the relevant Teams discussion).

Since it doesn't even build for me, I suggest to give the copy a different name or namespace (and TODO comment), so at least they don't clash.

@ergawy
Copy link
Author

ergawy commented Jun 7, 2024

I think all comments are handled and all the issues we discussed are captured in b447a85. Let me know if you suggest any other changes in this PR.

@ergawy
Copy link
Author

ergawy commented Jun 14, 2024

Ping Ping 🔔! Anything else needs to be done here? Please take a look 🙏!

Copy link

@mjklemm mjklemm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can go in. LGTM.

@ergawy ergawy merged commit 49a0052 into ROCm:amd-trunk-dev Jun 17, 2024
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants