Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Support for Treating Device Functions As hsa_executable_symbol_t's #203

Open
matinraayai opened this issue Apr 27, 2024 · 6 comments

Comments

@matinraayai
Copy link

matinraayai commented Apr 27, 2024

As of right now, the HSA standard only supports identifying the following symbols types:

  • Kernels
  • Variables

The standard has listed indirect functions as a symbol type, but the AMD ROCr runtime does not implement it.

Device functions on the other hand, are absent from this list, even though they can be identified by inspecting the Loaded Code Object's storage ELF directly, and are emitted by the LLVM AMDGPU compiler.

Supporting device functions as hsa_executable_symbol_t's can have the following benefits:

  1. The CUDA runtime treats device functions as symbols. Adding support in ROCr means HIP can also behave in the same manner as CUDA.
  2. Supporting device functions as hsa_executable_symbol_ts means the loader can resolve their relocations. A user can have a library of device functions in a separate code object and another one that uses said library. Instead of having to link both code objects together before loading, the user can simply add both code objects into a single executable before freezing the executable.
  3. In dynamic instrumentation, device functions are treated as symbols:
    a. A tool writer inserts callbacks in the kernel to device functions they have written in the tool; The instrumentation runtime should be able to identify where the device function is loaded, so that it can perform insert the requested callback into the target application.
    b. When analyzing the target kernel, a list of possible device functions called from it needs to be identified and returned to the tool writer, in case they want to instrument them as well. Exposing these as hsa_executable_symbol_t seems like the logical option.

cc @kzhuravl

@t-tye
Copy link
Contributor

t-tye commented Apr 29, 2024

What calling convention do you expect these functions to conform to?

@matinraayai
Copy link
Author

matinraayai commented Apr 29, 2024

@t-tye for now the default emitted by LLVM, which is the C Calling convention. Support for other calling conventions and querying them can also be considered for the long term, but for now I don't think is required.

@t-tye
Copy link
Contributor

t-tye commented Apr 29, 2024

I do not think AMD GPU defines a fixed C Calling convention (see AMDGPUUsage). There are the complexities that the register allocation is done dynamically at kernel launch, so when a function is called its convention depends on the register budget allocated.

@matinraayai
Copy link
Author

@t-tye I understand the reg constraint concerns. For now adding support for just recognizing them and locating them should be enough for my use case. Do you think that's feasible?

@t-tye
Copy link
Contributor

t-tye commented Apr 30, 2024

There are a lot of issues in a calling convention, and to me the AMD GPU calling convention is not likely at a level that you can do the full generality that you describe. That does not mean some lesser thing could not be achieved. We are wrestling with a lot of this in our work on the debugger and compiler, but my sense is that that work is not fully set yet.

To give a more precise answer we probably need to have a meeting to discuss further.

@matinraayai
Copy link
Author

@t-tye let's meet to discuss this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants