Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CUDA 10.2 #886

Merged
merged 1 commit into from
Dec 2, 2019

Conversation

BenjaminW3
Copy link
Member

@BenjaminW3 BenjaminW3 commented Nov 23, 2019

Implements #885

Enables -Werror=all-warnings so that warnings from the cuda compiler frontend also make the build fail.

@BenjaminW3
Copy link
Member Author

The nvcc 10.2 + clang 8.0 build currently fails with:

Nov 23 11:58:56 /usr/local/cuda-10.2/bin/nvcc /home/travis/build/BenjaminW3/alpaka/example/bufferCopy/src//bufferCopy.cpp -x=cu -c -o /home/travis/build/BenjaminW3/alpaka/build/example/bufferCopy/CMakeFiles/bufferCopy.dir/src/./bufferCopy_generated_bufferCopy.cpp.o -ccbin /home/travis/llvm/bin/clang++ -m64 -DALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_THREADS_ENABLED -DALPAKA_ACC_CPU_B_TBB_T_SEQ_ENABLED -DALPAKA_ACC_CPU_B_OMP2_T_SEQ_ENABLED -DALPAKA_ACC_CPU_B_SEQ_T_OMP2_ENABLED -DALPAKA_ACC_GPU_CUDA_ENABLED -DALPAKA_DEBUG=0 -DALPAKA_CI -Xcompiler ,\"-stdlib=libc++\",\"-fopenmp=libomp\",\"-O3\",\"-DNDEBUG\" --expt-extended-lambda --expt-relaxed-constexpr --generate-code arch=compute_70,code=sm_70 --generate-code arch=compute_70,code=compute_70 -std=c++11 --use_fast_math --ftz=false -Xcudafe --display_error_number -Xcudafe --diag_suppress=esa_on_defaulted_function_ignored -Xcudafe --diag_suppress=2976 -Werror=all-warnings -DNVCC -I/usr/local/cuda-10.2/include -I/usr/include -I/home/travis/build/BenjaminW3/alpaka/include -I/home/travis/boost
/usr/local/cuda-10.2/include/crt/math_functions.h(8983): error #311: cannot overload functions distinguished by return type alone
/usr/local/cuda-10.2/include/crt/math_functions.h(8996): error #311: cannot overload functions distinguished by return type alone
/usr/local/cuda-10.2/include/crt/math_functions.h(9004): error #311: cannot overload functions distinguished by return type alone
/usr/local/cuda-10.2/include/crt/math_functions.h(9018): error #311: cannot overload functions distinguished by return type alone
/usr/local/cuda-10.2/include/crt/math_functions.hpp(375): error #311: cannot overload functions distinguished by return type alone
/usr/local/cuda-10.2/include/crt/math_functions.hpp(381): error #311: cannot overload functions distinguished by return type alone
/usr/local/cuda-10.2/include/crt/math_functions.hpp(383): error #311: cannot overload functions distinguished by return type alone
/usr/local/cuda-10.2/include/crt/math_functions.hpp(389): error #311: cannot overload functions distinguished by return type alone
8 errors detected in the compilation of "/tmp/tmpxft_00000136_00000000-6_bufferCopy.cpp1.ii".
Nov 23 11:58:59 -- Removing /home/travis/build/BenjaminW3/alpaka/build/example/bufferCopy/CMakeFiles/bufferCopy.dir/src/./bufferCopy_generated_bufferCopy.cpp.o
Nov 23 11:58:59 /home/travis/CMake/bin/cmake -E remove /home/travis/build/BenjaminW3/alpaka/build/example/bufferCopy/CMakeFiles/bufferCopy.dir/src/./bufferCopy_generated_bufferCopy.cpp.o
CMake Error at bufferCopy_generated_bufferCopy.cpp.o.Release.cmake:279 (message):
  Error generating file
  /home/travis/build/BenjaminW3/alpaka/build/example/bufferCopy/CMakeFiles/bufferCopy.dir/src/./bufferCopy_generated_bufferCopy.cpp.o
Nov 23 11:58:59 example/bufferCopy/CMakeFiles/bufferCopy.dir/build.make:63: recipe for target 'example/bufferCopy/CMakeFiles/bufferCopy.dir/src/bufferCopy_generated_bufferCopy.cpp.o' failed
Nov 23 11:58:59 make[2]: Leaving directory '/home/travis/build/BenjaminW3/alpaka/build'
make[2]: *** [example/bufferCopy/CMakeFiles/bufferCopy.dir/src/bufferCopy_generated_bufferCopy.cpp.o] Error 1
make[1]: *** [example/bufferCopy/CMakeFiles/bufferCopy.dir/all] Error 2
Nov 23 11:58:59 CMakeFiles/Makefile2:1017: recipe for target 'example/bufferCopy/CMakeFiles/bufferCopy.dir/all' failed
Nov 23 11:58:59 make[1]: Leaving directory '/home/travis/build/BenjaminW3/alpaka/build'
make: *** [all] Error 2
Nov 23 11:58:59 Makefile:94: recipe for target 'all' failed

Any ideas what we can do about this?

@sbastrakov
Copy link
Member

sbastrakov commented Nov 25, 2019

@BenjaminW3 I don't. It might be an incompatibility between these compiler and CUDA versions, not the first time it happens. But I don't know for sure, might still be smth on our side.

alpakaConfig.cmake Outdated Show resolved Hide resolved
alpakaConfig.cmake Outdated Show resolved Hide resolved
@BenjaminW3
Copy link
Member Author

The problems of nvcc 10.2 with clang 8.0 within math_function.h and math_function.hpp are some declarations of isnan and isinf. I do not see anything we could do about this so I will disable this combination for now.

@ax3l
Copy link
Member

ax3l commented Nov 27, 2019

Interesting, Clang 8.0 host-compiler support is not even new but part of 10.1.168+ already... Does compiling in Release mode help?

@BenjaminW3
Copy link
Member Author

I tried both, release and debug. Does not help. With nvcc 10.1 it works in CI. They probably broke something.

@ax3l
Copy link
Member

ax3l commented Nov 27, 2019

Urgh, thanks for testing!

@ax3l
Copy link
Member

ax3l commented Nov 27, 2019

Feedback from my Nvidia contact: Clang 8.0 works, but we should not pass -Xcompiler "-stdlib=libc++" which breaks compatibility in the build matrix. Can you re-enable accordingly, please?

@BenjaminW3
Copy link
Member Author

BenjaminW3 commented Nov 28, 2019

You are right, with libstdc++it seems to work.

@BenjaminW3 BenjaminW3 force-pushed the topic-cuda-10_2 branch 3 times, most recently from a9374fa to 30b5083 Compare November 29, 2019 18:40
@BenjaminW3
Copy link
Member Author

Ready for approval/merging!

@BenjaminW3 BenjaminW3 changed the title WIP: Add support for CUDA 10.2 Add support for CUDA 10.2 Dec 1, 2019
alpakaConfig.cmake Show resolved Hide resolved
@sbastrakov sbastrakov dismissed their stale review December 2, 2019 10:31

Change requested due to not properly understanding.

@BenjaminW3 BenjaminW3 merged commit 45f738b into alpaka-group:develop Dec 2, 2019
@BenjaminW3 BenjaminW3 deleted the topic-cuda-10_2 branch December 2, 2019 11:24
GET_TARGET_PROPERTY(_COMMON_COMPILE_OPTIONS common COMPILE_OPTIONS)
# If the property does not exist, the variable is set to NOTFOUND.
IF(_COMMON_COMPILE_OPTIONS)
STRING(REPLACE ";" " " _COMMON_COMPILE_OPTIONS_STRING "${_COMMON_COMPILE_OPTIONS}")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${_COMMON_COMPILE_OPTIONS_STRING}")
ENDIF()
# nvcc supports werror starting with 10.2
IF(CUDA_VERSION GREATER_EQUAL 10.2)
Copy link
Member

@ax3l ax3l Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this here is not ideal either, since for example package managers in unanticipated environments might run your tests and a simple warning will mark a package as broken.

Please just enable -Werror=all-warnings in CI, e.g. via a travis environment variable or additional CMake option of Alpaka (that is by default not on). Shipping auto-enabled -Werror flags of any kind in CMake scripts is a bug :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already did this for years with the normal compilers Werror flag as most other open source projects are also doing. So this is at least not regression.
Furthermore it is the exact goal of setting the Werror flag to let the tests fail if they do not compile cleanly. Why should we treat warnings differently to compile errors? Having no watnings is our definition of our tests passing. So if a package manager for any reason (I do not know why it should do this at all) runs the tests and they produce a warning on this platform, then our tests fail and this is exactly what we want. That's the whole goal behind Werror.

Copy link
Member

@ax3l ax3l Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just telling you from a package management point of view, that will use software with newer environments than what were available and testable at release, that this is not an ideal practice.

Package managers such as spack, homebrew, et al. run tests to see if a package is working in a larger environment (integration tests). One usually just runs what comes with the package, to avoid duplication and maintenance overhead.

As developers, we can recommend best practices but it is not our place to anticipate new additions to -Wall, -Wextra or whatever a compiler decides to warn on. There are more compilers and workflows out there than what we do daily, e.g. people that build their own compiler frontends, optimizer passes, tooling, etc. on projects such as ours. Maybe a new NVCC just warns on something totally minor. Let's not patronize them. Warnings are called warnings for a reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. However a newer, self built or unknown compiler is in no way different to newer CUDA versions, newer or self built CMake versions, newer OSs or any other dependency. The Compiler is also only a dependency of alpaka. All of those dependencies can make the build or the tests fail at compile or runtime.
I do not see a reason why we should have lower requirements than is possible on the compilation while we expect everything to generate, run and calculate perfectly.
If we lower the requirements on one dependency, we should also reduce the requirements on numerical stability, quality of random numbers and other things that may behave unexpectedly on untested platforms.
Finding issues on untested platforms is the goal of tests and for alpaka a clean compilation is what we defined as a requirement for a successful test execution.

Copy link
Member

@ax3l ax3l Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we lower the requirements on one dependency, we should also reduce the requirements on numerical stability, quality of random numbers and other things that may behave unexpectedly on untested platforms.

I think it's not that black and white. A compiler warning in a newer nvcc might just be fine, no reason to break the tests outside of CI for this. A test that passes and throws a warning is still a tests that passes its primary coded purpose. A CMake flag that also enables -Werror intentionally can then also be used on top of this.

This is just compartmentalization of goals into smaller scopes of interest. Our goal in mainline is to have the code base error and warning free. That's why we require -Werror for all code to come in. A user's primary downstream goal is to have the tests technically compute the right results as a baseline for whatever they want to do. They can decide if they want to treat warnings as errors as well. I tell you from experience that many use cases do not want this to be the case, because warnings are not errors, some warnings are false positives in new environments, etc. Let's just optionalize this and enable this for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants