Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for clang as host && device compiler #1933

Merged

Conversation

psychocoderHPC
Copy link
Member

@psychocoderHPC psychocoderHPC commented Mar 15, 2017

  • use C++11 static_assert if clang is the device compiler
  • PMaccConfig.cmake
    • add compiler independent options: CUDA_FAST_MATH, CUDA_FTZ, CUDA_SHOW_REGISTER and CUDA_KEEP_FILES
    • add clang compiler flag preperation section
    • add option to select the device (CUDA) compiler
  • PIConGPU CmakeList.txt
    • add clang compiler section to create the device code
    • add option CUDAMEMTEST_ENABLE
    • set CUDAMEMTEST_ENABLE=OFF if clang is the device compiler

Tests

  • compile tests clang
  • compile tests nvcc/gcc
  • KHI runtime test clang (positrons/electrons)

- use C++11 `static_assert` if clang is the device compiler
- `PMaccConfig.cmake`
  - add compiler independent options: CUDA_FAST_MATH, CUDA_FTZ, CUDA_SHOW_REGISTER and CUDA_KEEP_FILES
  - add clang compiler flag preperation section
  - add option to select the device (CUDA) compiler
- PIConGPU `CmakeList.txt`
  - add clang compiler section to create the device code
  - add option `CUDAMEMTEST_ENABLE`
  - set `CUDAMEMTEST_ENABLE=OFF` if clang is the device compiler
@psychocoderHPC psychocoderHPC added this to the Next Stable: 0.3.0 milestone Mar 15, 2017
@psychocoderHPC psychocoderHPC changed the title support for clang as host and device compiler support for clang as host && device compiler Mar 15, 2017
@psychocoderHPC
Copy link
Member Author

psychocoderHPC commented Mar 15, 2017

Compile time on hypnos4 KHI:

  • nvcc (release 8.0, V8.0.44) 389 sec
  • clang (clang version 4.0.0 (trunk 290769)) 313 sec

@psychocoderHPC
Copy link
Member Author

psychocoderHPC commented Mar 15, 2017

KelvinHelmholz example positrons/electrons 3000 steps 16GPUs k20

heating test

heat_clang_vs_nvcc

charge conservation test

charge_clang_vs_nvcc

I am not sure why the nvcc creates much more heating than clang.

  • it could be triggered be small differences during the initialization
  • it can also be a bug

@ax3l
Copy link
Member

ax3l commented Mar 15, 2017

interesting observation. It's not much, much more but could be indeed a hint. can you try an other run with nvcc & clang with a different seed?

Can you pls add the clang & nvcc version (I edited your compile time post already - just fill in the version via edit for future reference).

Copy link
Member

@ax3l ax3l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
Just a few comments & questions :)

if("${PMACC_CUDA_COMPILER}" STREQUAL "clang")
add_definitions(-DPMACC_CUDA_COMPILER_CLANG=1)
#set(LIBS ${LIBS} cudart_static)
set(CLANG_BUILD_FLAGS "-O3 -x cuda --cuda-path=${CUDA_TOOLKIT_ROOT_DIR}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does CUDA_TOOLKIT_ROOT_DIR get set by FindCUDA.cmake well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is provided by FindCUDA

endforeach()

elseif("${PMACC_CUDA_COMPILER}" STREQUAL "nvcc")
add_definitions(-DPMACC_CUDA_COMPILER_CLANG=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes else I need to put a fallback within the code to the file which contains the main() function.
This could removed in the future if we use the CMake functionality CONFIGURE_FILE(...) to create a config file for the project.

Copy link
Member

@ax3l ax3l Mar 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends if configure-file is better or not...

Anyway, my quesition was: you depend on 0 and 1 in-code and not #if(PMACC_CUDA_COMPILER_CLANG == 1) and #else syntax?

Copy link
Member Author

@psychocoderHPC psychocoderHPC Mar 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the check is #if(PMACC_CUDA_COMPILER_CLANG == 1) .. #else ... #endif.
Because we always enable flags with the value 1 it it not clean(good practice) to have it undefined if nvcc is enabled.
In that case the clean version would be #ifdef PMACC_CUDA_COMPILER_CLANG ... #else .. #endif

endif(CUDA_KEEP_FILES)

if("${PMACC_CUDA_COMPILER}" STREQUAL "clang")
add_definitions(-DPMACC_CUDA_COMPILER_CLANG=1)
#set(LIBS ${LIBS} cudart_static)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this dead code or a problem that needs to be addressed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no some old dead code from my first clang testing steps

Copy link
Member

@ax3l ax3l Mar 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, just remove then


else()

message(FATAL_ERROR "selected CUDA compiler '${PMACC_CUDA_COMPILER}' is not supported")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to add empty lines around this

@@ -207,6 +279,8 @@ if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Intel")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBOOST_NO_VARIADIC_TEMPLATES")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBOOST_NO_CXX11_VARIADIC_TEMPLATES")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBOOST_NO_FENV_H")
elseif("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
add_definitions(-DBOOST_NO_CXX11_SMART_PTR)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, weird. can you post some details?

also, do you want to keep the same style as above? (Although we should probably use add_definitions in all cases...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error without the suppression is:

In file included from /bigdata/hplsim/scratch/widera/dev/src/picongpu/main.cu:33:
In file included from /bigdata/hplsim/scratch/widera/dev/src/picongpu/include/simulation_defines.hpp:
53:
In file included from /bigdata/hplsim/scratch/widera/testPic/khi/include/simulation_defines/unitless/
starter.unitless:25:
In file included from /bigdata/hplsim/scratch/widera/dev/src/picongpu/include/plugins/PluginControlle
r.hpp:58:
In file included from /bigdata/hplsim/scratch/widera/dev/src/picongpu/include/plugins/output/images/P
ngCreator.hpp:33:
In file included from /home/widera/lib/boost_1_62_0_clang4/include/boost/thread.hpp:23:
In file included from /home/widera/lib/boost_1_62_0_clang4/include/boost/thread/barrier.hpp:21:
/home/widera/lib/boost_1_62_0_clang4/include/boost/thread/detail/nullary_function.hpp:9
9:29: error: no member named 'impl' in 'boost::detail::thread_move_t<boost
::detail::nullary_function<void ()> >'; did you mean to use '->' instead of '.'?

      impl(boost::move(other.impl))
                            ^
/home/widera/lib/boost_1_62_0_clang4/include/boost/thread/detail/nullary_function.hpp:1
18:33: error: no member named 'impl' in 'boost::detail::thread_move_t<boos
t::detail::nullary_function<void ()> >'; did you mean to use '->' instead of '.'?

        impl = boost::move(other.impl);
                                ^
/home/widera/lib/boost_1_62_0_clang4/include/boost/thread/detail/nullary_function.hpp:1
98:30: error: no member named 'impl' in 'boost::detail::thread_move_t<boos
t::detail::nullary_function<unsigned long ()> >'
      impl(boost::move(other.impl))
                       ~~~~~ ^
/home/widera/lib/boost_1_62_0_clang4/include/boost/thread/barrier.hpp:203:7: 
note: in instantiation of member function 'boost::detail::nullary_function<unsigned long
 ()>::nullary_function' requested here
      fct_(funct
      ^
3 errors generated.

BOOST_MPL_ASSERT_MSG(pmacc_cond,PMACC_JOIN(pmacc_msg,PMACC_JOIN(_________,pmacc_unique_id)),(pmacc_typeInfo))
#if ( PMACC_CUDA_COMPILER_CLANG == 1 )
/* device compile with clang: boost static assert can not be used
* error is: calling a `__host__` function from `__device__`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please report that issue on https://bugs.llvm.org/ ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a Clang bug. Boost is calling a host function if BOOST_MPL_ASSERT_MSG is used on the device.
This is a well defined behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, mismatched that. then we might want to contribute to boos::mpl to give it a __host__ __device__ at some point.

option(CUDAMEMTEST_ENABLE "Build cuda_memtest and the helper mpiInfo \
(allow GPU health test before running PIConGPU)" ON)

if("${PMACC_CUDA_COMPILER}" STREQUAL "clang")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please additionally describe the problems and what needs to be done via a new issue on https://github.com/ComputationalRadiationPhysics/cuda_memtest/issues ? Then link this issue please, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PATHS "${CMAKE_CURRENT_SOURCE_DIR}/../mpiInfo"
DOC "path to mpiInfo"
)
############################################################################
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the headline less verbose in that case and edit the cuda_memtest heading:

################################################################################
# load cuda_memtest and mpiInfo projects
################################################################################

...

    # mpiInfo utility
    ...

@ax3l
Copy link
Member

ax3l commented Mar 16, 2017

Cool, we could try if squashing bugs via Clang Static Analyzer is now possible :D

@ax3l
Copy link
Member

ax3l commented Mar 16, 2017

Just because I am curious, is the clang RT overhead (measured without I/O) still only ~8%?

- remove dead code
- change usage of `add_definitions()` to old `set(CMAKE_CXX_FLAGS ... -DDEFINE)`
- add comments
@psychocoderHPC
Copy link
Member Author

psychocoderHPC commented Mar 16, 2017

I did run no new speed measurements but I can see in the runs with IO that clang are still slower than nvcc binaries.

@ax3l
Copy link
Member

ax3l commented Mar 16, 2017

Roughly 8% is perfect, icc vs. gcc is and was way worse in the past for scientific codes.

@ax3l ax3l merged commit bb2de6f into ComputationalRadiationPhysics:dev Mar 17, 2017
psychocoderHPC added a commit to psychocoderHPC/picongpu that referenced this pull request Apr 12, 2017
embedded ptx code within the binary was accidentally removed with ComputationalRadiationPhysics#1933
@psychocoderHPC psychocoderHPC deleted the topic-clangFullCompile4 branch September 21, 2017 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants