You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the latest version of mikupad, the show-on-hover top probabilities function seems broken, nothing is shown. I can reproduce with llama.cpp backend version b4365 onward, works fine until b4363. In addition, since that version, inference is roughly 20% slower (varies with model). This commit mentioned may be the culprit and may now require special handling from the perspective of the frontend.
The text was updated successfully, but these errors were encountered:
Thank you for the bug report! The token probabilities issue should be fixed as of commit c3daede.
For the performance regression, Mikupad is only interacting with the llama.cpp server through the API, so I don't think there's anything we can do. However, the point raised by @slaren makes sense, but, as far as I understand, only if you are using the OpenAI Compatible API option in Mikupad, since the token probabilities were already being sent before if you were using the llama.cpp API.
You will see a performance hit as long as n_probs is set in the request and higher than zero. This is because the probabilities that are returned now are pre-sampling, which requires a fairly expensive softmax. Alternatively, you can also obtain post-sampling probabilities (the previous behavior) by setting the post_sampling_probs option in the request.
Using the latest version of mikupad, the show-on-hover top probabilities function seems broken, nothing is shown. I can reproduce with llama.cpp backend version b4365 onward, works fine until b4363. In addition, since that version, inference is roughly 20% slower (varies with model).
This commit mentioned may be the culprit and may now require special handling from the perspective of the frontend.
The text was updated successfully, but these errors were encountered: