Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top probabilities broken since llama.cpp >= b4365 #104

Open
GlasslessPizza opened this issue Dec 26, 2024 · 2 comments
Open

Top probabilities broken since llama.cpp >= b4365 #104

GlasslessPizza opened this issue Dec 26, 2024 · 2 comments

Comments

@GlasslessPizza
Copy link

Using the latest version of mikupad, the show-on-hover top probabilities function seems broken, nothing is shown. I can reproduce with llama.cpp backend version b4365 onward, works fine until b4363. In addition, since that version, inference is roughly 20% slower (varies with model).
This commit mentioned may be the culprit and may now require special handling from the perspective of the frontend.

@lmg-anon
Copy link
Owner

lmg-anon commented Dec 26, 2024

Thank you for the bug report! The token probabilities issue should be fixed as of commit c3daede.
For the performance regression, Mikupad is only interacting with the llama.cpp server through the API, so I don't think there's anything we can do. However, the point raised by @slaren makes sense, but, as far as I understand, only if you are using the OpenAI Compatible API option in Mikupad, since the token probabilities were already being sent before if you were using the llama.cpp API.

@slaren
Copy link

slaren commented Dec 26, 2024

You will see a performance hit as long as n_probs is set in the request and higher than zero. This is because the probabilities that are returned now are pre-sampling, which requires a fairly expensive softmax. Alternatively, you can also obtain post-sampling probabilities (the previous behavior) by setting the post_sampling_probs option in the request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants