-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for clang as host && device compiler #1933
support for clang as host && device compiler #1933
Conversation
- use C++11 `static_assert` if clang is the device compiler - `PMaccConfig.cmake` - add compiler independent options: CUDA_FAST_MATH, CUDA_FTZ, CUDA_SHOW_REGISTER and CUDA_KEEP_FILES - add clang compiler flag preperation section - add option to select the device (CUDA) compiler - PIConGPU `CmakeList.txt` - add clang compiler section to create the device code - add option `CUDAMEMTEST_ENABLE` - set `CUDAMEMTEST_ENABLE=OFF` if clang is the device compiler
Compile time on hypnos4 KHI:
|
interesting observation. It's not much, much more but could be indeed a hint. can you try an other run with nvcc & clang with a different seed? Can you pls add the clang & nvcc version (I edited your compile time post already - just fill in the version via edit for future reference). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Just a few comments & questions :)
src/libPMacc/PMaccConfig.cmake
Outdated
if("${PMACC_CUDA_COMPILER}" STREQUAL "clang") | ||
add_definitions(-DPMACC_CUDA_COMPILER_CLANG=1) | ||
#set(LIBS ${LIBS} cudart_static) | ||
set(CLANG_BUILD_FLAGS "-O3 -x cuda --cuda-path=${CUDA_TOOLKIT_ROOT_DIR}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does CUDA_TOOLKIT_ROOT_DIR
get set by FindCUDA.cmake
well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is provided by FindCUDA
endforeach() | ||
|
||
elseif("${PMACC_CUDA_COMPILER}" STREQUAL "nvcc") | ||
add_definitions(-DPMACC_CUDA_COMPILER_CLANG=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes else I need to put a fallback within the code to the file which contains the main()
function.
This could removed in the future if we use the CMake functionality CONFIGURE_FILE(...) to create a config file for the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends if configure-file is better or not...
Anyway, my quesition was: you depend on 0 and 1 in-code and not #if(PMACC_CUDA_COMPILER_CLANG == 1)
and #else
syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the check is #if(PMACC_CUDA_COMPILER_CLANG == 1) .. #else ... #endif
.
Because we always enable flags with the value 1
it it not clean(good practice) to have it undefined if nvcc
is enabled.
In that case the clean version would be #ifdef PMACC_CUDA_COMPILER_CLANG ... #else .. #endif
src/libPMacc/PMaccConfig.cmake
Outdated
endif(CUDA_KEEP_FILES) | ||
|
||
if("${PMACC_CUDA_COMPILER}" STREQUAL "clang") | ||
add_definitions(-DPMACC_CUDA_COMPILER_CLANG=1) | ||
#set(LIBS ${LIBS} cudart_static) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this dead code or a problem that needs to be addressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no some old dead code from my first clang testing steps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, just remove then
src/libPMacc/PMaccConfig.cmake
Outdated
|
||
else() | ||
|
||
message(FATAL_ERROR "selected CUDA compiler '${PMACC_CUDA_COMPILER}' is not supported") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to add empty lines around this
src/libPMacc/PMaccConfig.cmake
Outdated
@@ -207,6 +279,8 @@ if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Intel") | |||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBOOST_NO_VARIADIC_TEMPLATES") | |||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBOOST_NO_CXX11_VARIADIC_TEMPLATES") | |||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBOOST_NO_FENV_H") | |||
elseif("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang") | |||
add_definitions(-DBOOST_NO_CXX11_SMART_PTR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, weird. can you post some details?
also, do you want to keep the same style as above? (Although we should probably use add_definitions
in all cases...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error without the suppression is:
In file included from /bigdata/hplsim/scratch/widera/dev/src/picongpu/main.cu:33:
In file included from /bigdata/hplsim/scratch/widera/dev/src/picongpu/include/simulation_defines.hpp:
53:
In file included from /bigdata/hplsim/scratch/widera/testPic/khi/include/simulation_defines/unitless/
starter.unitless:25:
In file included from /bigdata/hplsim/scratch/widera/dev/src/picongpu/include/plugins/PluginControlle
r.hpp:58:
In file included from /bigdata/hplsim/scratch/widera/dev/src/picongpu/include/plugins/output/images/P
ngCreator.hpp:33:
In file included from /home/widera/lib/boost_1_62_0_clang4/include/boost/thread.hpp:23:
In file included from /home/widera/lib/boost_1_62_0_clang4/include/boost/thread/barrier.hpp:21:
/home/widera/lib/boost_1_62_0_clang4/include/boost/thread/detail/nullary_function.hpp:9
9:29: error: no member named 'impl' in 'boost::detail::thread_move_t<boost
::detail::nullary_function<void ()> >'; did you mean to use '->' instead of '.'?
impl(boost::move(other.impl))
^
/home/widera/lib/boost_1_62_0_clang4/include/boost/thread/detail/nullary_function.hpp:1
18:33: error: no member named 'impl' in 'boost::detail::thread_move_t<boos
t::detail::nullary_function<void ()> >'; did you mean to use '->' instead of '.'?
impl = boost::move(other.impl);
^
/home/widera/lib/boost_1_62_0_clang4/include/boost/thread/detail/nullary_function.hpp:1
98:30: error: no member named 'impl' in 'boost::detail::thread_move_t<boos
t::detail::nullary_function<unsigned long ()> >'
impl(boost::move(other.impl))
~~~~~ ^
/home/widera/lib/boost_1_62_0_clang4/include/boost/thread/barrier.hpp:203:7:
note: in instantiation of member function 'boost::detail::nullary_function<unsigned long
()>::nullary_function' requested here
fct_(funct
^
3 errors generated.
BOOST_MPL_ASSERT_MSG(pmacc_cond,PMACC_JOIN(pmacc_msg,PMACC_JOIN(_________,pmacc_unique_id)),(pmacc_typeInfo)) | ||
#if ( PMACC_CUDA_COMPILER_CLANG == 1 ) | ||
/* device compile with clang: boost static assert can not be used | ||
* error is: calling a `__host__` function from `__device__` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please report that issue on https://bugs.llvm.org/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a Clang bug. Boost is calling a host function if BOOST_MPL_ASSERT_MSG
is used on the device.
This is a well defined behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, mismatched that. then we might want to contribute to boos::mpl
to give it a __host__ __device__
at some point.
src/picongpu/CMakeLists.txt
Outdated
option(CUDAMEMTEST_ENABLE "Build cuda_memtest and the helper mpiInfo \ | ||
(allow GPU health test before running PIConGPU)" ON) | ||
|
||
if("${PMACC_CUDA_COMPILER}" STREQUAL "clang") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please additionally describe the problems and what needs to be done via a new issue on https://github.com/ComputationalRadiationPhysics/cuda_memtest/issues ? Then link this issue please, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/picongpu/CMakeLists.txt
Outdated
PATHS "${CMAKE_CURRENT_SOURCE_DIR}/../mpiInfo" | ||
DOC "path to mpiInfo" | ||
) | ||
############################################################################ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make the headline less verbose in that case and edit the cuda_memtest heading:
################################################################################
# load cuda_memtest and mpiInfo projects
################################################################################
...
# mpiInfo utility
...
Cool, we could try if squashing bugs via Clang Static Analyzer is now possible :D |
Just because I am curious, is the clang RT overhead (measured without I/O) still only ~8%? |
- remove dead code - change usage of `add_definitions()` to old `set(CMAKE_CXX_FLAGS ... -DDEFINE)` - add comments
I did run no new speed measurements but I can see in the runs with IO that clang are still slower than nvcc binaries. |
Roughly 8% is perfect, icc vs. gcc is and was way worse in the past for scientific codes. |
embedded ptx code within the binary was accidentally removed with ComputationalRadiationPhysics#1933
static_assert
if clang is the device compilerPMaccConfig.cmake
CmakeList.txt
CUDAMEMTEST_ENABLE
CUDAMEMTEST_ENABLE=OFF
if clang is the device compilerTests