You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @Stealthwriter , thanks for reaching out. Yes, you can basically route any API changes through to the underlying inference framework(s) we use, assuming they support the fields you need. For instance, we currently support https://github.com/scaleapi/open-tgi (forked from text-generation-inference v0.9.4) and vLLM. Would you like to try making the change yourself?
You can think of LLM Engine as adding 1) a set of higher-level abstractions (e.g. APIs are expressed in terms of Completions and Fine-tunes) and 2) autoscaling via k8s. TGI and vLLM are great but you have to bring your own scaling.
Hi,
Is there a way to change the frequency_penality or logit bias when sending a completion request?
The text was updated successfully, but these errors were encountered: