Top probabilities broken since llama.cpp >= b4365 #104

GlasslessPizza · 2024-12-26T10:45:34Z

Using the latest version of mikupad, the show-on-hover top probabilities function seems broken, nothing is shown. I can reproduce with llama.cpp backend version b4365 onward, works fine until b4363. In addition, since that version, inference is roughly 20% slower (varies with model).
This commit mentioned may be the culprit and may now require special handling from the perspective of the frontend.

lmg-anon · 2024-12-26T23:02:32Z

Thank you for the bug report! The token probabilities issue should be fixed as of commit c3daede.
For the performance regression, Mikupad is only interacting with the llama.cpp server through the API, so I don't think there's anything we can do. However, the point raised by @slaren makes sense, but, as far as I understand, only if you are using the OpenAI Compatible API option in Mikupad, since the token probabilities were already being sent before if you were using the llama.cpp API.

slaren · 2024-12-26T23:14:25Z

You will see a performance hit as long as n_probs is set in the request and higher than zero. This is because the probabilities that are returned now are pre-sampling, which requires a fairly expensive softmax. Alternatively, you can also obtain post-sampling probabilities (the previous behavior) by setting the post_sampling_probs option in the request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Top probabilities broken since llama.cpp >= b4365 #104

Top probabilities broken since llama.cpp >= b4365 #104

GlasslessPizza commented Dec 26, 2024

lmg-anon commented Dec 26, 2024 •

edited

Loading

slaren commented Dec 26, 2024

Top probabilities broken since llama.cpp >= b4365 #104

Top probabilities broken since llama.cpp >= b4365 #104

Comments

GlasslessPizza commented Dec 26, 2024

lmg-anon commented Dec 26, 2024 • edited Loading

slaren commented Dec 26, 2024

lmg-anon commented Dec 26, 2024 •

edited

Loading