Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Debug build for native code in spark-rapids-jni #1573

Closed

Conversation

NVnavkumar
Copy link
Collaborator

This adds the ability to build the native code with debug symbols that can be used in gdb and cuda-gdb. We separate out the build types for libcudf and libsparkrapidsjni because building libcudf with debug symbols is incredibly slow and should only be used by an advanced developer.

In order to build the debug version of libsparkrapidsjni, you can use:

./build/build-in-docker install -DGPU_ARCHS='<compute_archs>' -DCPP_PARALLEL_LEVEL=64 -DBUILD_TESTS=ON -DBUILD_TYPE=Debug -DCUDA_FLAGS_DEBUG'=-G -g'

Note, the CUDA_FLAGS_DEBUG setting here. By default this value is only -g which only enables debug symbols in host code. You need to add -G to enable debug symbols for device code. You need to set the BUILD_TYPE to Debug to enable the usage of these flags (by default, this is Release).

Debug builds can be quite slow so this should only be used if you really need to debug native code on device.

…o enable building native code with debug symbols

Signed-off-by: Navin Kumar <[email protected]>
@NVnavkumar NVnavkumar self-assigned this Nov 18, 2023
Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look OK but it would be nice to update CONTRIBUTING.md.

@@ -81,6 +81,9 @@
<RMM_LOGGING_LEVEL>OFF</RMM_LOGGING_LEVEL>
<SPARK_RAPIDS_JNI_CXX_FLAGS/>
<USE_GDS>OFF</USE_GDS>
<LIBCUDF_BUILD_TYPE>Release</LIBCUDF_BUILD_TYPE>
<BUILD_TYPE>${LIBCUDF_BUILD_TYPE}</BUILD_TYPE>
<CUDA_FLAGS_DEBUG>-g</CUDA_FLAGS_DEBUG>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be overridden by the user for some use cases? I assume this is only used for Debug builds, and I'm a bit surprised this needs to be specified at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a little weird, but the default flags only adds -g (which enables debugging in host code), you need -G to enable debugging in device code as well.

@@ -81,6 +81,9 @@
<RMM_LOGGING_LEVEL>OFF</RMM_LOGGING_LEVEL>
<SPARK_RAPIDS_JNI_CXX_FLAGS/>
<USE_GDS>OFF</USE_GDS>
<LIBCUDF_BUILD_TYPE>Release</LIBCUDF_BUILD_TYPE>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my previous experience, libcudf as a whole fails to build with BUILD_TYPE=Debug. I have been relying on selective Debug compiling https://github.com/rapidsai/cudf/blob/branch-23.12/CONTRIBUTING.md#device-debug-symbols. After a certain threshold depending on how many files are compiled with debugging symbols, the plugin slow to start or hangs for me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering now, if I should try the CUDF approach here with spark-rapids-jni code and then maybe file another PR with a similar section in spark-rapids-jni CONTRIBUTING.md if that works with the way we build.

Might make a bit more sense then.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants