CUDA: fix shared memory access condition for mmv #10740

JohannesGaessler · 2024-12-09T18:31:25Z

The memory access condition in the mmv kernel is wrong. Only WARP_SIZE floats are supposed to be used so all threads with an index >= WARP_SIZE need to return.

CUDA: fix shared memory access condition for mmv

6768787

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 9, 2024

JohannesGaessler mentioned this pull request Dec 9, 2024

Eval bug: llama-server: illegal memory access was encountered #10739

Closed

ggerganov approved these changes Dec 9, 2024

View reviewed changes

JohannesGaessler merged commit 26a8406 into ggerganov:master Dec 9, 2024
47 checks passed

rick-github mentioned this pull request Dec 15, 2024

current device: 0, in function ggml_backend_cuda_synchronize at llama/ggml-cuda/ggml-cuda.cu:2317 ollama/ollama#8098

Closed

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024

CUDA: fix shared memory access condition for mmv (ggerganov#10740)

44aee1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: fix shared memory access condition for mmv #10740

CUDA: fix shared memory access condition for mmv #10740

JohannesGaessler commented Dec 9, 2024

CUDA: fix shared memory access condition for mmv #10740

CUDA: fix shared memory access condition for mmv #10740

Conversation

JohannesGaessler commented Dec 9, 2024