You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How are you measuring the performance? What queries are you performing? The only relevant commits that I see in that range is #10783, if you are requesting token probabilities the change in performance may be expected.
How are you measuring the performance? What queries are you performing? The only relevant commits that I see in that range is #10783, if you are requesting token probabilities the change in performance may be expected.
The query is a basic q&a task in mikupad. I'm using it's token speed counter to measure. Now that you mention it, i know that Mikupad does request token probabilities internally as some functions use them like "show on hover", but I personally keep it on "hide" as I don't need them.
Name and Version
b4365 onward
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Problem description & steps to reproduce
I'm observing a slowdown between b4363 and b4365 that persists to this day. I tried two models:
https://huggingface.co/bartowski/gemma-2-27b-it-GGUF/blob/main/gemma-2-27b-it-Q5_K_L.gguf
https://huggingface.co/tensorblock/Qwen2.5-32B-Instruct-abliterated-GGUF/blob/main/Qwen2.5-32B-Instruct-abliterated-Q5_K_M.gguf
Results:
Command used:
Windows 10
First Bad Commit
between b4363 and b4365
Relevant log output
No response
The text was updated successfully, but these errors were encountered: