Feature Request: KV Cache save/load + API #1172

SerialKicked · 2024-10-17T05:13:22Z

Describe the Issue

Hi, I was wondering if it would be possible to have the following feature that could be called from your API:

Simply put, I'd want the ability to make a copy of the KV cache and put it in the RAM, coupled with the ability to move that copy back to the cache. And of course, have an API call for each action.

I'm working on a program using KoboldCpp as a back-end. One of the features uses intermediate question/response pairs between the moment the user post a message and the moment the bot responds. The ability to save and restore the cache when I need would save a whole lot of prompt processing time.

I understand it's quite a specific request. But hey, I might as well ask :)

Thanks for all the work you've done, btw.

gsenkowski · 2024-10-21T20:07:33Z

I'm not sure if this is that specific. Actually I'm quite baffled that such a functionality isn't standard with local backends. I can see that it might be problematic in a cloud setting to let 100's of users save KV states with multiple GBs but in a local setting this seems like an obvious choice once you start experimenting with stuff like in context learning or just saving your current conversation for later without the need to reevalute all of it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: KV Cache save/load + API #1172

Feature Request: KV Cache save/load + API #1172

SerialKicked commented Oct 17, 2024

gsenkowski commented Oct 21, 2024

Feature Request: KV Cache save/load + API #1172

Feature Request: KV Cache save/load + API #1172

Comments

SerialKicked commented Oct 17, 2024

gsenkowski commented Oct 21, 2024