Reorder kernels in CUDA backend fail with SZ=64 #51
Labels
bug
Something isn't working
cuda
Related to CUDA backend
performance
priority low
Low priority issue
This is due to us going above the maxiumum thread count per block when we set SZ=64. Some of the reorder kernels require a 2D thread with SZ*SZ threads. We need to fix these kernels so that when SZ=64 or above these kernels work on tiles of 32 by 32.
The text was updated successfully, but these errors were encountered: