[Comgr] Support compressed device binaries #65
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds support for compressed device binaries implemented in llvm@7e28234.
Previously this has not been possible on the implementation level because to decompress the device binaries, we need to know the exact size of the binary, but
__hipRegisterFatBinary
only gives the starting address of the binary. Thanks to llvm#88827 that was merged last week, the size info of the device binary is now included in the compressed device binary header, so this is possible now.This allows significant space saving in GPU binaries that have grown so large that they've exceed linker limits, e.g. ROCm/composable_kernel#789 and ROCm/composable_kernel#1044. In
composable_kernel
's case, with this patch the total size of static libraries library shrink from 3.69 GiB to 1.3GB, and this is only with the default compression settings (i.e. without tuning compression levels to optimize further).I have yet to notice any significant runtime penalty. Even if there are, it would only be a one-time penalty at executable startup when
__hipRegisterFatBinary
is called to register the device binaries, so overall I believe we should be fine. In addition,--offload-compress
would probably not be used anywhere except in special cases like incomposable_kernel
where binary sizes are big enough to cause linking issues.This change should be NFC for binaries that are not compiled with
--offload-compress
. With--offload-compress
, I've verified that all tests incomposable_kernel
are passed withgfx1030
andgfx900
.