Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Comgr] Support compressed device binaries #65

Closed
wants to merge 1 commit into from

Conversation

GZGavinZhao
Copy link

@GZGavinZhao GZGavinZhao commented Apr 23, 2024

This adds support for compressed device binaries implemented in llvm@7e28234.

Previously this has not been possible on the implementation level because to decompress the device binaries, we need to know the exact size of the binary, but __hipRegisterFatBinary only gives the starting address of the binary. Thanks to llvm#88827 that was merged last week, the size info of the device binary is now included in the compressed device binary header, so this is possible now.

This allows significant space saving in GPU binaries that have grown so large that they've exceed linker limits, e.g. ROCm/composable_kernel#789 and ROCm/composable_kernel#1044. In composable_kernel's case, with this patch the total size of static libraries library shrink from 3.69 GiB to 1.3GB, and this is only with the default compression settings (i.e. without tuning compression levels to optimize further).

I have yet to notice any significant runtime penalty. Even if there are, it would only be a one-time penalty at executable startup when __hipRegisterFatBinary is called to register the device binaries, so overall I believe we should be fine. In addition, --offload-compress would probably not be used anywhere except in special cases like in composable_kernel where binary sizes are big enough to cause linking issues.

This change should be NFC for binaries that are not compiled with --offload-compress. With --offload-compress, I've verified that all tests in composable_kernel are passed with gfx1030 and gfx900.

This adds suppor for compressed device binaries implemented in
llvm@7e28234.

This allows significant space saving in GPU binaries that have grown so
large that they've exceed linker limits, e.g.
ROCm/composable_kernel#789

Signed-off-by: Gavin Zhao <[email protected]>
@lamb-j
Copy link
Collaborator

lamb-j commented Apr 30, 2024

@GZGavinZhao thanks for putting this together.

Because we're using the OfflaodBundling API in Comgr to unbundle files before linking, we should pick up the decompression changes automatically without any Comgr changes.

What contexts would you expect a user to use the new "AMD_COMGR_DATA_KIND_COMPRESSED_FATBIN" data type? We did just add a new data type "AMD_COMGR_DATA_KIND_OBJ_BUNDLE" which may cover the same use case. See these two changes, which adds more support for unbundling:

[Comgr] Add OffloadBundler APIs
[Comgr] Split get_bundle_entry_ids into two API calls

Do you think this PR is still needed on top of those changes?

@GZGavinZhao
Copy link
Author

@lamb-j Thanks for your response! The purpose of this PR was originally to enable loading compressed device binaries in clr through ROCm/clr#75, but it seems like ROCm/clr@e53df57 just implemented that feature, so I think this PR is not needed anymore. Thanks again for your time looking at this!

@GZGavinZhao GZGavinZhao closed this May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants