Misc. bug: Large performance regression since version b4365 #10977

GlasslessPizza · 2024-12-25T22:23:17Z

Name and Version

b4365 onward

Operating systems

Windows

Which llama.cpp modules do you know to be affected?

llama-server

Problem description & steps to reproduce

I'm observing a slowdown between b4363 and b4365 that persists to this day. I tried two models:

https://huggingface.co/bartowski/gemma-2-27b-it-GGUF/blob/main/gemma-2-27b-it-Q5_K_L.gguf
https://huggingface.co/tensorblock/Qwen2.5-32B-Instruct-abliterated-GGUF/blob/main/Qwen2.5-32B-Instruct-abliterated-Q5_K_M.gguf

Results:

      |   qwen   |   gemma
-----------------------------
b4363 | 31.7 t/s | 36.1 t/s
b4365 | 24.5 t/s | 22.7 t/s
-----------------------------
      |  -23%    |  -37%

Command used:

.\llama-server.exe --model <model> --ctx-size 8192 --threads 10 --no-mmap --mlock --n-gpu-layers 999 --log-disable --flash-attn --cache-type-k q8_0 --cache-type-v q8_0

Windows 10

First Bad Commit

between b4363 and b4365

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

slaren · 2024-12-25T22:35:53Z

How are you measuring the performance? What queries are you performing? The only relevant commits that I see in that range is #10783, if you are requesting token probabilities the change in performance may be expected.

GlasslessPizza · 2024-12-25T23:24:51Z

How are you measuring the performance? What queries are you performing? The only relevant commits that I see in that range is #10783, if you are requesting token probabilities the change in performance may be expected.

The query is a basic q&a task in mikupad. I'm using it's token speed counter to measure. Now that you mention it, i know that Mikupad does request token probabilities internally as some functions use them like "show on hover", but I personally keep it on "hide" as I don't need them.

GlasslessPizza added the bug-unconfirmed label Dec 25, 2024

GlasslessPizza mentioned this issue Dec 26, 2024

Top probabilities broken since llama.cpp >= b4365 lmg-anon/mikupad#104

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Large performance regression since version b4365 #10977

Misc. bug: Large performance regression since version b4365 #10977

GlasslessPizza commented Dec 25, 2024

slaren commented Dec 25, 2024

GlasslessPizza commented Dec 25, 2024

Misc. bug: Large performance regression since version b4365 #10977

Misc. bug: Large performance regression since version b4365 #10977

Comments

GlasslessPizza commented Dec 25, 2024

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Problem description & steps to reproduce

First Bad Commit

Relevant log output

slaren commented Dec 25, 2024

GlasslessPizza commented Dec 25, 2024