You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Server now supports hot-swapping LoRA adapters via /lora-adapters endpoint, which changes the global adapter config.
With this, the only "safe" moment to apply LoRA changes is when all slots are idle.
However, this is not practical in case the server has a high number of requests (ref: #10374). With continuous batching, the chance of all slots become idle is rare.
Motivation
Possible Implementation
We can group only requests using the same LoRA config to the same batch
Call common_lora_adapters_apply before processing the batch (remember to clear KV if needed)
The text was updated successfully, but these errors were encountered:
I think there needs to be another way.
it is weird to apply LoRa swap when server is idle, the swap is only meaningful when actual users Request it to happen. i.e. summarize this for me, calculate this for me etc.... what causes the need to swap adapters is a instantaneous thing. If you think about it , Its not possible to predict when users need the swap to happen and the better way will to have the swap happen WHEN they need it.
This functionality is critical espeically for small models that have to fit to multiple use cases.
Prerequisites
Feature Description
Server now supports hot-swapping LoRA adapters via
/lora-adapters
endpoint, which changes the global adapter config.With this, the only "safe" moment to apply LoRA changes is when all slots are idle.
However, this is not practical in case the server has a high number of requests (ref: #10374). With continuous batching, the chance of all slots become idle is rare.
Motivation
Possible Implementation
common_lora_adapters_apply
before processing the batch (remember to clear KV if needed)The text was updated successfully, but these errors were encountered: