Add support for CUDA 10.2 #886

BenjaminW3 · 2019-11-23T12:38:05Z

Implements #885

Enables -Werror=all-warnings so that warnings from the cuda compiler frontend also make the build fail.

BenjaminW3 · 2019-11-23T12:39:47Z

The nvcc 10.2 + clang 8.0 build currently fails with:

Nov 23 11:58:56 /usr/local/cuda-10.2/bin/nvcc /home/travis/build/BenjaminW3/alpaka/example/bufferCopy/src//bufferCopy.cpp -x=cu -c -o /home/travis/build/BenjaminW3/alpaka/build/example/bufferCopy/CMakeFiles/bufferCopy.dir/src/./bufferCopy_generated_bufferCopy.cpp.o -ccbin /home/travis/llvm/bin/clang++ -m64 -DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_THREADS_ENABLED -DALPAKA_ACC_CPU_B_TBB_T_SEQ_ENABLED -DALPAKA_ACC_CPU_B_OMP2_T_SEQ_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_OMP2_ENABLED -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_DEBUG=0 -DALPAKA_CI -Xcompiler ,\"-stdlib=libc++\",\"-fopenmp=libomp\",\"-O3\",\"-DNDEBUG\" --expt-extended-lambda --expt-relaxed-constexpr --generate-code arch=compute_70,code=sm_70 --generate-code arch=compute_70,code=compute_70 -std=c++11 --use_fast_math --ftz=false -Xcudafe --display_error_number -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -Xcudafe --diag_suppress=2976 -Werror=all-warnings -DNVCC -I/usr/local/cuda-10.2/include -I/usr/include -I/home/travis/build/BenjaminW3/alpaka/include -I/home/travis/boost
/usr/local/cuda-10.2/include/crt/math_functions.h(8983): error #311: cannot overload functions distinguished by return type alone
/usr/local/cuda-10.2/include/crt/math_functions.h(8996): error #311: cannot overload functions distinguished by return type alone
/usr/local/cuda-10.2/include/crt/math_functions.h(9004): error #311: cannot overload functions distinguished by return type alone
/usr/local/cuda-10.2/include/crt/math_functions.h(9018): error #311: cannot overload functions distinguished by return type alone
/usr/local/cuda-10.2/include/crt/math_functions.hpp(375): error #311: cannot overload functions distinguished by return type alone
/usr/local/cuda-10.2/include/crt/math_functions.hpp(381): error #311: cannot overload functions distinguished by return type alone
/usr/local/cuda-10.2/include/crt/math_functions.hpp(383): error #311: cannot overload functions distinguished by return type alone
/usr/local/cuda-10.2/include/crt/math_functions.hpp(389): error #311: cannot overload functions distinguished by return type alone
8 errors detected in the compilation of "/tmp/tmpxft_00000136_00000000-6_bufferCopy.cpp1.ii".
Nov 23 11:58:59 -- Removing /home/travis/build/BenjaminW3/alpaka/build/example/bufferCopy/CMakeFiles/bufferCopy.dir/src/./bufferCopy_generated_bufferCopy.cpp.o
Nov 23 11:58:59 /home/travis/CMake/bin/cmake -E remove /home/travis/build/BenjaminW3/alpaka/build/example/bufferCopy/CMakeFiles/bufferCopy.dir/src/./bufferCopy_generated_bufferCopy.cpp.o
CMake Error at bufferCopy_generated_bufferCopy.cpp.o.Release.cmake:279 (message):
  Error generating file
  /home/travis/build/BenjaminW3/alpaka/build/example/bufferCopy/CMakeFiles/bufferCopy.dir/src/./bufferCopy_generated_bufferCopy.cpp.o
Nov 23 11:58:59 example/bufferCopy/CMakeFiles/bufferCopy.dir/build.make:63: recipe for target 'example/bufferCopy/CMakeFiles/bufferCopy.dir/src/bufferCopy_generated_bufferCopy.cpp.o' failed
Nov 23 11:58:59 make[2]: Leaving directory '/home/travis/build/BenjaminW3/alpaka/build'
make[2]: *** [example/bufferCopy/CMakeFiles/bufferCopy.dir/src/bufferCopy_generated_bufferCopy.cpp.o] Error 1
make[1]: *** [example/bufferCopy/CMakeFiles/bufferCopy.dir/all] Error 2
Nov 23 11:58:59 CMakeFiles/Makefile2:1017: recipe for target 'example/bufferCopy/CMakeFiles/bufferCopy.dir/all' failed
Nov 23 11:58:59 make[1]: Leaving directory '/home/travis/build/BenjaminW3/alpaka/build'
make: *** [all] Error 2
Nov 23 11:58:59 Makefile:94: recipe for target 'all' failed

Any ideas what we can do about this?

sbastrakov · 2019-11-25T08:43:31Z

@BenjaminW3 I don't. It might be an incompatibility between these compiler and CUDA versions, not the first time it happens. But I don't know for sure, might still be smth on our side.

alpakaConfig.cmake

BenjaminW3 · 2019-11-27T16:24:08Z

The problems of nvcc 10.2 with clang 8.0 within math_function.h and math_function.hpp are some declarations of isnan and isinf. I do not see anything we could do about this so I will disable this combination for now.

ax3l · 2019-11-27T17:52:51Z

Interesting, Clang 8.0 host-compiler support is not even new but part of 10.1.168+ already... Does compiling in Release mode help?

BenjaminW3 · 2019-11-27T17:55:52Z

I tried both, release and debug. Does not help. With nvcc 10.1 it works in CI. They probably broke something.

ax3l · 2019-11-27T19:33:35Z

Urgh, thanks for testing!

ax3l · 2019-11-27T21:21:32Z

Feedback from my Nvidia contact: Clang 8.0 works, but we should not pass -Xcompiler "-stdlib=libc++" which breaks compatibility in the build matrix. Can you re-enable accordingly, please?

BenjaminW3 · 2019-11-28T20:04:26Z

You are right, with libstdc++it seems to work.

BenjaminW3 · 2019-12-01T07:21:39Z

Ready for approval/merging!

test/unit/queue/src/CollectiveQueue.cpp

alpakaConfig.cmake

Change requested due to not properly understanding.

ax3l · 2019-12-04T17:22:13Z

test/CMakeLists.txt

    GET_TARGET_PROPERTY(_COMMON_COMPILE_OPTIONS common COMPILE_OPTIONS)
    # If the property does not exist, the variable is set to NOTFOUND.
    IF(_COMMON_COMPILE_OPTIONS)
        STRING(REPLACE ";" " " _COMMON_COMPILE_OPTIONS_STRING "${_COMMON_COMPILE_OPTIONS}")
        SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${_COMMON_COMPILE_OPTIONS_STRING}")
    ENDIF()
+    # nvcc supports werror starting with 10.2
+    IF(CUDA_VERSION GREATER_EQUAL 10.2)


Adding this here is not ideal either, since for example package managers in unanticipated environments might run your tests and a simple warning will mark a package as broken.

Please just enable -Werror=all-warnings in CI, e.g. via a travis environment variable or additional CMake option of Alpaka (that is by default not on). Shipping auto-enabled -Werror flags of any kind in CMake scripts is a bug :)

We already did this for years with the normal compilers Werror flag as most other open source projects are also doing. So this is at least not regression.
Furthermore it is the exact goal of setting the Werror flag to let the tests fail if they do not compile cleanly. Why should we treat warnings differently to compile errors? Having no watnings is our definition of our tests passing. So if a package manager for any reason (I do not know why it should do this at all) runs the tests and they produce a warning on this platform, then our tests fail and this is exactly what we want. That's the whole goal behind Werror.

I am just telling you from a package management point of view, that will use software with newer environments than what were available and testable at release, that this is not an ideal practice.

Package managers such as spack, homebrew, et al. run tests to see if a package is working in a larger environment (integration tests). One usually just runs what comes with the package, to avoid duplication and maintenance overhead.

As developers, we can recommend best practices but it is not our place to anticipate new additions to -Wall, -Wextra or whatever a compiler decides to warn on. There are more compilers and workflows out there than what we do daily, e.g. people that build their own compiler frontends, optimizer passes, tooling, etc. on projects such as ours. Maybe a new NVCC just warns on something totally minor. Let's not patronize them. Warnings are called warnings for a reason.

I see your point. However a newer, self built or unknown compiler is in no way different to newer CUDA versions, newer or self built CMake versions, newer OSs or any other dependency. The Compiler is also only a dependency of alpaka. All of those dependencies can make the build or the tests fail at compile or runtime.
I do not see a reason why we should have lower requirements than is possible on the compilation while we expect everything to generate, run and calculate perfectly.
If we lower the requirements on one dependency, we should also reduce the requirements on numerical stability, quality of random numbers and other things that may behave unexpectedly on untested platforms.
Finding issues on untested platforms is the goal of tests and for alpaka a clean compilation is what we defined as a requirement for a successful test execution.

If we lower the requirements on one dependency, we should also reduce the requirements on numerical stability, quality of random numbers and other things that may behave unexpectedly on untested platforms.

I think it's not that black and white. A compiler warning in a newer nvcc might just be fine, no reason to break the tests outside of CI for this. A test that passes and throws a warning is still a tests that passes its primary coded purpose. A CMake flag that also enables -Werror intentionally can then also be used on top of this.

This is just compartmentalization of goals into smaller scopes of interest. Our goal in mainline is to have the code base error and warning free. That's why we require -Werror for all code to come in. A user's primary downstream goal is to have the tests technically compute the right results as a baseline for whatever they want to do. They can decide if they want to treat warnings as errors as well. I tell you from experience that many use cases do not want this to be the case, because warnings are not errors, some warnings are false positives in new environments, etc. Let's just optionalize this and enable this for us.

BenjaminW3 added the Backend:CUDA label Nov 23, 2019

BenjaminW3 requested a review from a team November 23, 2019 12:38

BenjaminW3 added the State:Work In Progress label Nov 23, 2019

ax3l reviewed Nov 26, 2019

View reviewed changes

alpakaConfig.cmake Outdated Show resolved Hide resolved

alpakaConfig.cmake Outdated Show resolved Hide resolved

BenjaminW3 force-pushed the topic-cuda-10_2 branch from dfb21ff to 6c8b617 Compare November 26, 2019 17:31

ax3l reviewed Nov 26, 2019

View reviewed changes

alpakaConfig.cmake Show resolved Hide resolved

BenjaminW3 force-pushed the topic-cuda-10_2 branch from 6c8b617 to ce1dc4e Compare November 27, 2019 16:35

BenjaminW3 force-pushed the topic-cuda-10_2 branch from 1341bee to f9c0b21 Compare November 28, 2019 17:21

BenjaminW3 force-pushed the topic-cuda-10_2 branch 3 times, most recently from a9374fa to 30b5083 Compare November 29, 2019 18:40

Add support for CUDA 10.2

3f5acf3

BenjaminW3 force-pushed the topic-cuda-10_2 branch from 30b5083 to 3f5acf3 Compare November 30, 2019 06:56

BenjaminW3 removed the State:Work In Progress label Nov 30, 2019

BenjaminW3 changed the title ~~WIP: Add support for CUDA 10.2~~ Add support for CUDA 10.2 Dec 1, 2019

sbastrakov reviewed Dec 2, 2019

View reviewed changes

test/unit/queue/src/CollectiveQueue.cpp Show resolved Hide resolved

sbastrakov previously requested changes Dec 2, 2019

View reviewed changes

alpakaConfig.cmake Show resolved Hide resolved

sbastrakov approved these changes Dec 2, 2019

View reviewed changes

BenjaminW3 merged commit 45f738b into alpaka-group:develop Dec 2, 2019

BenjaminW3 deleted the topic-cuda-10_2 branch December 2, 2019 11:24

ax3l reviewed Dec 4, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for CUDA 10.2 #886

Add support for CUDA 10.2 #886

BenjaminW3 commented Nov 23, 2019 •

edited

Loading

BenjaminW3 commented Nov 23, 2019

sbastrakov commented Nov 25, 2019 •

edited

Loading

BenjaminW3 commented Nov 27, 2019

ax3l commented Nov 27, 2019

BenjaminW3 commented Nov 27, 2019

ax3l commented Nov 27, 2019

ax3l commented Nov 27, 2019 •

edited

Loading

BenjaminW3 commented Nov 28, 2019 •

edited

Loading

BenjaminW3 commented Dec 1, 2019

ax3l Dec 4, 2019 •

edited

Loading

BenjaminW3 Dec 4, 2019

ax3l Dec 4, 2019 •

edited

Loading

BenjaminW3 Dec 4, 2019

ax3l Dec 4, 2019 •

edited

Loading

Add support for CUDA 10.2 #886

Add support for CUDA 10.2 #886

Conversation

BenjaminW3 commented Nov 23, 2019 • edited Loading

BenjaminW3 commented Nov 23, 2019

sbastrakov commented Nov 25, 2019 • edited Loading

BenjaminW3 commented Nov 27, 2019

ax3l commented Nov 27, 2019

BenjaminW3 commented Nov 27, 2019

ax3l commented Nov 27, 2019

ax3l commented Nov 27, 2019 • edited Loading

BenjaminW3 commented Nov 28, 2019 • edited Loading

BenjaminW3 commented Dec 1, 2019

ax3l Dec 4, 2019 • edited Loading

Choose a reason for hiding this comment

BenjaminW3 Dec 4, 2019

Choose a reason for hiding this comment

ax3l Dec 4, 2019 • edited Loading

Choose a reason for hiding this comment

BenjaminW3 Dec 4, 2019

Choose a reason for hiding this comment

ax3l Dec 4, 2019 • edited Loading

Choose a reason for hiding this comment

BenjaminW3 commented Nov 23, 2019 •

edited

Loading

sbastrakov commented Nov 25, 2019 •

edited

Loading

ax3l commented Nov 27, 2019 •

edited

Loading

BenjaminW3 commented Nov 28, 2019 •

edited

Loading

ax3l Dec 4, 2019 •

edited

Loading

ax3l Dec 4, 2019 •

edited

Loading

ax3l Dec 4, 2019 •

edited

Loading