New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

support for clang as host && device compiler #1933

Merged

ax3l merged 2 commits into ComputationalRadiationPhysics:dev from psychocoderHPC:topic-clangFullCompile4

Mar 17, 2017

Member

psychocoderHPC commented Mar 15, 2017 •

edited

Loading

use C++11 static_assert if clang is the device compiler
PMaccConfig.cmake
- add compiler independent options: CUDA_FAST_MATH, CUDA_FTZ, CUDA_SHOW_REGISTER and CUDA_KEEP_FILES
- add clang compiler flag preperation section
- add option to select the device (CUDA) compiler
PIConGPU CmakeList.txt
- add clang compiler section to create the device code
- add option CUDAMEMTEST_ENABLE
- set CUDAMEMTEST_ENABLE=OFF if clang is the device compiler

Tests

compile tests clang
compile tests nvcc/gcc
KHI runtime test clang (positrons/electrons)


          support for clang as host and device compiler

427e765

- use C++11 `static_assert` if clang is the device compiler
- `PMaccConfig.cmake`
  - add compiler independent options: CUDA_FAST_MATH, CUDA_FTZ, CUDA_SHOW_REGISTER and CUDA_KEEP_FILES
  - add clang compiler flag preperation section
  - add option to select the device (CUDA) compiler
- PIConGPU `CmakeList.txt`
  - add clang compiler section to create the device code
  - add option `CUDAMEMTEST_ENABLE`
  - set `CUDAMEMTEST_ENABLE=OFF` if clang is the device compiler

psychocoderHPC added the feature label

psychocoderHPC added this to the Next Stable: 0.3.0 milestone

psychocoderHPC assigned ax3l

psychocoderHPC requested a review from ax3l

March 15, 2017 14:03

psychocoderHPC mentioned this pull request

[WIP] allow full compile with clang #1731

Closed

1 task

psychocoderHPC changed the title ~~support for clang as host and device compiler~~ support for clang as host && device compiler

Member Author

psychocoderHPC commented Mar 15, 2017 •

edited

Loading

Compile time on hypnos4 KHI:

nvcc (release 8.0, V8.0.44) 389 sec
clang (clang version 4.0.0 (trunk 290769)) 313 sec

Member Author

psychocoderHPC commented Mar 15, 2017 •

edited

Loading

KelvinHelmholz example positrons/electrons 3000 steps 16GPUs k20

heating test

charge conservation test

I am not sure why the nvcc creates much more heating than clang.

it could be triggered be small differences during the initialization
it can also be a bug

Member

ax3l commented Mar 15, 2017 •

edited

Loading

interesting observation. It's not much, much more but could be indeed a hint. can you try an other run with nvcc & clang with a different seed?

Can you pls add the clang & nvcc version (I edited your compile time post already - just fill in the version via edit for future reference).

ax3l requested changes

View reviewed changes

Member

ax3l left a comment

Looks great!
Just a few comments & questions :)

src/libPMacc/PMaccConfig.cmake Outdated

+              if("${PMACC_CUDA_COMPILER}" STREQUAL "clang")
+                  add_definitions(-DPMACC_CUDA_COMPILER_CLANG=1)
+                  #set(LIBS ${LIBS} cudart_static)
+                  set(CLANG_BUILD_FLAGS "-O3 -x cuda --cuda-path=${CUDA_TOOLKIT_ROOT_DIR}")

Member

ax3l Mar 16, 2017

Does CUDA_TOOLKIT_ROOT_DIR get set by FindCUDA.cmake well?

Member Author

psychocoderHPC Mar 16, 2017

Yes it is provided by FindCUDA

src/libPMacc/PMaccConfig.cmake

+                  endforeach()
+              elseif("${PMACC_CUDA_COMPILER}" STREQUAL "nvcc")
+                  add_definitions(-DPMACC_CUDA_COMPILER_CLANG=0)

Member

ax3l Mar 16, 2017

necessary?

Member Author

psychocoderHPC Mar 16, 2017

Yes else I need to put a fallback within the code to the file which contains the main() function.
This could removed in the future if we use the CMake functionality CONFIGURE_FILE(...) to create a config file for the project.

Member

ax3l Mar 16, 2017 •

edited

Loading

Depends if configure-file is better or not...

Anyway, my quesition was: you depend on 0 and 1 in-code and not #if(PMACC_CUDA_COMPILER_CLANG == 1) and #else syntax?

Member Author

psychocoderHPC Mar 16, 2017 •

edited

Loading

Yes the check is #if(PMACC_CUDA_COMPILER_CLANG == 1) .. #else ... #endif.
Because we always enable flags with the value 1 it it not clean(good practice) to have it undefined if nvcc is enabled.
In that case the clean version would be #ifdef PMACC_CUDA_COMPILER_CLANG ... #else .. #endif

src/libPMacc/PMaccConfig.cmake Outdated

               endif(CUDA_KEEP_FILES)
+              if("${PMACC_CUDA_COMPILER}" STREQUAL "clang")
+                  add_definitions(-DPMACC_CUDA_COMPILER_CLANG=1)
+                  #set(LIBS ${LIBS} cudart_static)

Member

ax3l Mar 16, 2017

is this dead code or a problem that needs to be addressed?

Member Author

psychocoderHPC Mar 16, 2017

no some old dead code from my first clang testing steps

Member

ax3l Mar 16, 2017 •

edited

Loading

ok, just remove then

src/libPMacc/PMaccConfig.cmake Outdated


		else()

		message(FATAL_ERROR "selected CUDA compiler '${PMACC_CUDA_COMPILER}' is not supported")

Member

ax3l Mar 16, 2017

no need to add empty lines around this

src/libPMacc/PMaccConfig.cmake Outdated

@@ @@ -207,6 +279,8 @@ if("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Intel") @@
                   set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBOOST_NO_VARIADIC_TEMPLATES")
                   set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBOOST_NO_CXX11_VARIADIC_TEMPLATES")
                   set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DBOOST_NO_FENV_H")
+              elseif("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
+                  add_definitions(-DBOOST_NO_CXX11_SMART_PTR)

Member

ax3l Mar 16, 2017

huh, weird. can you post some details?

also, do you want to keep the same style as above? (Although we should probably use add_definitions in all cases...)

Member Author

psychocoderHPC Mar 16, 2017

The error without the suppression is:

In file included from /bigdata/hplsim/scratch/widera/dev/src/picongpu/main.cu:33:
In file included from /bigdata/hplsim/scratch/widera/dev/src/picongpu/include/simulation_defines.hpp:
53:
In file included from /bigdata/hplsim/scratch/widera/testPic/khi/include/simulation_defines/unitless/
starter.unitless:25:
In file included from /bigdata/hplsim/scratch/widera/dev/src/picongpu/include/plugins/PluginControlle
r.hpp:58:
In file included from /bigdata/hplsim/scratch/widera/dev/src/picongpu/include/plugins/output/images/P
ngCreator.hpp:33:
In file included from /home/widera/lib/boost_1_62_0_clang4/include/boost/thread.hpp:23:
In file included from /home/widera/lib/boost_1_62_0_clang4/include/boost/thread/barrier.hpp:21:
/home/widera/lib/boost_1_62_0_clang4/include/boost/thread/detail/nullary_function.hpp:9
9:29: error: no member named 'impl' in 'boost::detail::thread_move_t<boost
::detail::nullary_function<void ()> >'; did you mean to use '->' instead of '.'?

      impl(boost::move(other.impl))
                            ^
/home/widera/lib/boost_1_62_0_clang4/include/boost/thread/detail/nullary_function.hpp:1
18:33: error: no member named 'impl' in 'boost::detail::thread_move_t<boos
t::detail::nullary_function<void ()> >'; did you mean to use '->' instead of '.'?

        impl = boost::move(other.impl);
                                ^
/home/widera/lib/boost_1_62_0_clang4/include/boost/thread/detail/nullary_function.hpp:1
98:30: error: no member named 'impl' in 'boost::detail::thread_move_t<boos
t::detail::nullary_function<unsigned long ()> >'
      impl(boost::move(other.impl))
                       ~~~~~ ^
/home/widera/lib/boost_1_62_0_clang4/include/boost/thread/barrier.hpp:203:7: 
note: in instantiation of member function 'boost::detail::nullary_function<unsigned long
 ()>::nullary_function' requested here
      fct_(funct
      ^
3 errors generated.

src/libPMacc/include/static_assert.hpp

-                  BOOST_MPL_ASSERT_MSG(pmacc_cond,PMACC_JOIN(pmacc_msg,PMACC_JOIN(_________,pmacc_unique_id)),(pmacc_typeInfo))
+              #if ( PMACC_CUDA_COMPILER_CLANG == 1 )
+              /* device compile with clang: boost static assert can not be used
+               * error is: calling a `__host__` function from `__device__`

Member

ax3l Mar 16, 2017

Can you please report that issue on https://bugs.llvm.org/ ?

Member Author

psychocoderHPC Mar 16, 2017

This is not a Clang bug. Boost is calling a host function if BOOST_MPL_ASSERT_MSG is used on the device.
This is a well defined behavior.

Member

ax3l Mar 16, 2017

ah yes, mismatched that. then we might want to contribute to boos::mpl to give it a __host__ __device__ at some point.

src/picongpu/CMakeLists.txt Outdated

+              option(CUDAMEMTEST_ENABLE "Build cuda_memtest and the helper mpiInfo \
+                                        (allow GPU health test before running PIConGPU)" ON)
+              if("${PMACC_CUDA_COMPILER}" STREQUAL "clang")

Member

ax3l Mar 16, 2017

Can you please additionally describe the problems and what needs to be done via a new issue on https://github.com/ComputationalRadiationPhysics/cuda_memtest/issues ? Then link this issue please, too.

Member Author

psychocoderHPC Mar 16, 2017

ComputationalRadiationPhysics/cuda_memtest#10

src/picongpu/CMakeLists.txt Outdated

-                      PATHS "${CMAKE_CURRENT_SOURCE_DIR}/../mpiInfo"
-                      DOC "path to mpiInfo"
-                      )
+                  ############################################################################

Member

ax3l Mar 16, 2017

Make the headline less verbose in that case and edit the cuda_memtest heading:

################################################################################
# load cuda_memtest and mpiInfo projects
################################################################################

...

    # mpiInfo utility
    ...

Member

ax3l commented Mar 16, 2017

Cool, we could try if squashing bugs via Clang Static Analyzer is now possible :D

Member

ax3l commented Mar 16, 2017 •

edited

Loading

Just because I am curious, is the clang RT overhead (measured without I/O) still only ~8%?


          add @ax3l's annotations

f7dc89d

- remove dead code
- change usage of `add_definitions()` to old `set(CMAKE_CXX_FLAGS ... -DDEFINE)`
- add comments

Member Author

psychocoderHPC commented Mar 16, 2017 •

edited by ax3l

Loading

I did run no new speed measurements but I can see in the runs with IO that clang are still slower than nvcc binaries.

Member

ax3l commented Mar 16, 2017

Roughly 8% is perfect, icc vs. gcc is and was way worse in the past for scientific codes.

ax3l approved these changes

View reviewed changes

ax3l merged commit bb2de6f into ComputationalRadiationPhysics:dev

psychocoderHPC added a commit to psychocoderHPC/picongpu that referenced this pull request


          fix missing ptx code wthin the binary

ba33025

embedded ptx code within the binary was accidentally removed with ComputationalRadiationPhysics#1933

psychocoderHPC mentioned this pull request

fix missing ptx code within the binary #1960

Merged

2 tasks

psychocoderHPC deleted the topic-clangFullCompile4 branch

September 21, 2017 13:48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet