Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda-memcheck: initcheck and racecheck failed #2

Open
annymao opened this issue Feb 15, 2022 · 0 comments
Open

cuda-memcheck: initcheck and racecheck failed #2

annymao opened this issue Feb 15, 2022 · 0 comments

Comments

@annymao
Copy link

annymao commented Feb 15, 2022

Hi,
I have run cuda-memcheck on this project, but it showed errors on initcheck while compressing data and warnings on racecheck while decompressing data. I was running on Tesla V100-SXM2(32GB) and A100-SXM4(40GB) with cuda-11.2. I'm wondering whether it is a false alarm or not. Below are the examples of the errors and warnings and how I removed them.

The initcheck error can be removed by adding CHECKED_CUDA_CALL(cudaMemset, _memory, 0,size * sizeof(T)); in cuda_bits.cuh (line 196 and 216).
Here is an example of initcheck error:

========= Uninitialized __global__ memory read of size 4
=========     at 0x000000e0 in void ndzip::detail::gpu_cuda::hierarchical_inclusive_scan_reduce<unsigned int, ndzip::detail::gpu_cuda::plus<unsigned int>>(unsigned int*, unsigned int, unsigned int)
=========     by thread (64,0,0) in block (0,0,0)
=========     Address 0x7fffcfe20d00
=========     Device Frame:void ndzip::detail::gpu_cuda::hierarchical_inclusive_scan_reduce<unsigned int, ndzip::detail::gpu_cuda::plus<unsigned int>>(unsigned int*, unsigned int, unsigned int) (void ndzip::detail::gpu_cuda::hierarchical_inclusive_scan_reduce<unsigned int, ndzip::detail::gpu_cuda::plus<unsigned int>>(unsigned int*, unsigned int, unsigned int) : 0xe0)
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/lib64/libcuda.so.1 (cuLaunchKernel + 0x2b8) [0x222dc8]
=========     Host Frame:${prefix_path}/ndzip/build/libndzip-cuda.so [0x277bb]
=========     Host Frame:${prefix_path}/ndzip/build/libndzip-cuda.so [0x74270]
=========     Host Frame:${prefix_path}/ndzip/build/libndzip-cuda.so (_ZN5ndzip6detail8gpu_cuda27hierarchical_inclusive_scanIjNS1_4plusIjEEEEvPT_RSt6vectorINS1_11cuda_bufferIS5_EESaIS9_EEjT0_P11CUstream_st + 0x1ed) [0xc91d]
=========     Host Frame:${prefix_path}/ndzip/build/libndzip-cuda.so (_ZN5ndzip4cuda14compress_asyncIfLj1EEEvNS_5sliceIKT_XT0_EEEPvPmRNS0_25compressor_scratch_memoryIS3_XT0_EEEP11CUstream_st + 0x14b) [0xcc6b]
=========     Host Frame:${prefix_path}/ndzip/build/libndzip-cuda.so (_ZNK5ndzip12cuda_encoderIfLj1EE8compressERKNS_5sliceIKfLj1EEEPvPNSt6chrono8durationImSt5ratioILl1ELl1000000000EEEE + 0x3ee) [0xefae]
=========     Host Frame:./build/compress [0x11741]
=========     Host Frame:./build/compress [0x12451]
=========     Host Frame:./build/compress [0xb1b6]
=========     Host Frame:./build/compress [0xab27]
=========     Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x22555]
=========     Host Frame:./build/compress [0xaec3]
=========

Here is the code change

--- ndzip/src/ndzip/cuda_bits.cuh	
+++ ndzip/src/ndzip/cuda_bits.cuh	
@@ -193,6 +193,7 @@
 
     explicit cuda_buffer(index_type size) : _size(size) {
         CHECKED_CUDA_CALL(cudaMalloc, &_memory, size * sizeof(T));
+        CHECKED_CUDA_CALL(cudaMemset, _memory, 0,size * sizeof(T));//<---ADD    
     }
 
     cuda_buffer(cuda_buffer &&other) noexcept {
@@ -212,6 +213,7 @@
     void allocate(index_type size) {
         reset();
         CHECKED_CUDA_CALL(cudaMalloc, &_memory, size * sizeof(T));
+        CHECKED_CUDA_CALL(cudaMemset, _memory, 0,size * sizeof(T));//<---ADD
         _size = size;
     }
 

For the racecheck, warnings can be removed by adding __syncwarp() in cuda_encoder.inl after line 355. I think the warnings are because that some threads might enter line 323 in the second iteration while the other threads are still at line 345 in the first iteration after the __syncwarp() in line 326 ?

Here is the example of racecheck warning:

========= WARN: Race reported between Read access at 0x00000f20 in void ndzip::detail::gpu_cuda::decompress_block<ndzip::detail::profile<float, unsigned int=1>>(floatbits_type const *, ndzip::slice<ndzip::detail::gpu_cuda::decompress_block<ndzip::detail::profile<float, unsigned int=1>::data_type>, __scope__(dimensions)>)
=========     and Write access at 0x00000b30 in void ndzip::detail::gpu_cuda::decompress_block<ndzip::detail::profile<float, unsigned int=1>>(floatbits_type const *, ndzip::slice<ndzip::detail::gpu_cuda::decompress_block<ndzip::detail::profile<float, unsigned int=1>::data_type>, __scope__(dimensions)>) [21864 hazards]
=========

Here is the code change

--- ndzip/src/ndzip/cuda_encoder.inl
+++ ndzip/src/ndzip/cuda_encoder.inl
@@ -353,6 +353,7 @@
                 __builtin_memcpy(&row_bits, row, sizeof row_bits);
                 hc.store(item, row_bits);
             }
+            __syncwarp();//<-------ADD
         } else {
             // TODO duplication of the `item` calculation above. The term can be simplified!
             for (index_type w = 0; w < warps_per_col_chunk; ++w) {

Thanks!

@annymao annymao changed the title cuda-memcheck, initcheck and racecheck failed cuda-memcheck: initcheck and racecheck failed Feb 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant