vulkan: build fixes for 32b #10927

jeffbolznv · 2024-12-21T05:21:31Z

Should fix #10923

ggerganov · 2024-12-21T07:36:55Z

The ggml-ci has been reporting "maybe uninitialized" warnings for a while:

https://github.com/ggml-org/ci/blob/31168d7a582ded11a0dec489a62fb8bef74349a8/llama.cpp/a9/1a41364b25705dbb81ae996bc35c3440c63b35/ggml-6-x86-vulkan-t4/stdall#L538

Might want to fix these too.

jeffbolznv · 2024-12-21T20:10:30Z

Second commit ought to fix the uninitialized variables, though I couldn't reproduce the warnings/errors locally.

ggerganov · 2024-12-22T10:49:53Z

Thanks. Also, I remember you discussed recently the segfault upon program exit, but not sure which discussion was this. Do you have any ideas how this would be resolved? It's preventing the ggml-ci from running beyond the first test. (cc @netrunnereve)

netrunnereve · 2024-12-22T17:10:34Z

For what it's worth here's the thread discussing the segfault. It seems to be intermittent so if you restart the CI again you might be able to avoid it for now.

#10528

jeffbolznv · 2024-12-22T17:40:47Z

Right, the discussion is in #10528. I don't currently have a linux system to repro that, but if @0cc4m isn't able to work on it soon then I might be able to set something up.

ggerganov · 2024-12-22T18:00:47Z

It's no longer segfault-ing after the restart. However, the CI appears to have revealed an issue when computing embeddings:

https://github.com/ggml-org/ci/tree/results/llama.cpp/eb/dee9478ca7ba65497b9b96f7457698c6ee5115/ggml-6-x86-vulkan-t4

ggerganov · 2024-12-22T18:07:08Z

This command is randomly segfault-ing upon exit:

./bin/llama-embedding --model ../models-mnt/rerank-tiny/ggml-model-f16.gguf -p "what is panda?</s></s>hi\nwhat is panda?</s></s>it's a bear\nwhat is panda?</s></s>The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China." -ngl 99 -c 0 --pooling rank --embd-normalize -1 --verbose-prompt

I tried to get a stacktrace, but it's optimized out even in Debug for some reason:

batch_decode: n_tokens = 62, n_seq = 3

rerank score 0:    0.023
rerank score 1:    0.024
rerank score 2:    0.199

llama_perf_context_print:        load time =    1679.94 ms
llama_perf_context_print: prompt eval time =       7.25 ms /    62 tokens (    0.12 ms per token,  8556.44 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =      11.52 ms /    63 tokens
[Thread 0x7fffe2a006c0 (LWP 78286) exited]
[Thread 0x7fffdea006c0 (LWP 78291) exited]
[Thread 0x7fffe20006c0 (LWP 78287) exited]

Thread 6 "[vkps] Update" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffdf4006c0 (LWP 78290)]
0x00007fffe4801960 in ?? ()
(gdb) bt
#0  0x00007fffe4801960 in ?? ()
#1  0x0000000067685496 in ?? ()
#2  0x0000000036c9f0a0 in ?? ()
#3  0x0000000067685496 in ?? ()
#4  0x00000000000deb4c in ?? ()
#5  0x0000000000000007 in ?? ()
#6  0x00005555574cc838 in ?? ()
#7  0x181391d9e7cb40e8 in ?? ()
#8  0x00007fffe5c84320 in ?? ()
#9  0x0000555555a737d0 in ?? ()
#10 0x00007fffe4b392b4 in ?? ()
#11 0x00005555574cc958 in ?? ()
#12 0x181391d9f98f5968 in ?? ()
#13 0x181391d9e7af0a40 in ?? ()
#14 0x00005555574be3e0 in ?? ()
#15 0x203a6362696c6720 in ?? ()
#16 0x000055555768a3f0 in ?? ()
#17 0x00007fffe4803e20 in ?? ()
#18 0x00007fffdf4006c0 in ?? ()
#19 0xffffffffffffff60 in ?? ()
#20 0x0000000000000002 in ?? ()
#21 0x00007fffffffa3e0 in ?? ()
#22 0x00007fffe4804dfa in ?? ()
#23 0x00007fffdf4006c0 in ?? ()
#24 0x00007fffdf400cdc in ?? ()
#25 0x00007fffdf3ffef0 in ?? ()
#26 0x00007ffff669ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) q

jeffbolznv · 2024-12-22T18:26:51Z

Based on the thread name it's the same as #10528. The stack is entirely in a driver thread, so I wouldn't expect to be able to get a useful stack trace.

vulkan: build fixes for 32b

2efb2c1

Should fix ggerganov#10923

jeffbolznv requested a review from 0cc4m December 21, 2024 05:21

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Dec 21, 2024

jeffbolznv mentioned this pull request Dec 21, 2024

Compile bug: macOS Vulkan build fails #10923

Closed

vulkan: initialize some buffer/offset variables

a04db23

0cc4m approved these changes Dec 22, 2024

View reviewed changes

0cc4m merged commit ebdee94 into ggerganov:master Dec 22, 2024
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: build fixes for 32b #10927

vulkan: build fixes for 32b #10927

jeffbolznv commented Dec 21, 2024

ggerganov commented Dec 21, 2024

jeffbolznv commented Dec 21, 2024

ggerganov commented Dec 22, 2024 •

edited

Loading

netrunnereve commented Dec 22, 2024

jeffbolznv commented Dec 22, 2024

ggerganov commented Dec 22, 2024

ggerganov commented Dec 22, 2024

jeffbolznv commented Dec 22, 2024

vulkan: build fixes for 32b #10927

vulkan: build fixes for 32b #10927

Conversation

jeffbolznv commented Dec 21, 2024

ggerganov commented Dec 21, 2024

jeffbolznv commented Dec 21, 2024

ggerganov commented Dec 22, 2024 • edited Loading

netrunnereve commented Dec 22, 2024

jeffbolznv commented Dec 22, 2024

ggerganov commented Dec 22, 2024

ggerganov commented Dec 22, 2024

jeffbolznv commented Dec 22, 2024

ggerganov commented Dec 22, 2024 •

edited

Loading