You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I was wondering if it would be possible to have the following feature that could be called from your API:
Simply put, I'd want the ability to make a copy of the KV cache and put it in the RAM, coupled with the ability to move that copy back to the cache. And of course, have an API call for each action.
I'm working on a program using KoboldCpp as a back-end. One of the features uses intermediate question/response pairs between the moment the user post a message and the moment the bot responds. The ability to save and restore the cache when I need would save a whole lot of prompt processing time.
I understand it's quite a specific request. But hey, I might as well ask :)
Thanks for all the work you've done, btw.
The text was updated successfully, but these errors were encountered:
I'm not sure if this is that specific. Actually I'm quite baffled that such a functionality isn't standard with local backends. I can see that it might be problematic in a cloud setting to let 100's of users save KV states with multiple GBs but in a local setting this seems like an obvious choice once you start experimenting with stuff like in context learning or just saving your current conversation for later without the need to reevalute all of it.
Describe the Issue
Hi, I was wondering if it would be possible to have the following feature that could be called from your API:
Simply put, I'd want the ability to make a copy of the KV cache and put it in the RAM, coupled with the ability to move that copy back to the cache. And of course, have an API call for each action.
I'm working on a program using KoboldCpp as a back-end. One of the features uses intermediate question/response pairs between the moment the user post a message and the moment the bot responds. The ability to save and restore the cache when I need would save a whole lot of prompt processing time.
I understand it's quite a specific request. But hey, I might as well ask :)
Thanks for all the work you've done, btw.
The text was updated successfully, but these errors were encountered: