Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: KV Cache save/load + API #1172

Open
SerialKicked opened this issue Oct 17, 2024 · 1 comment
Open

Feature Request: KV Cache save/load + API #1172

SerialKicked opened this issue Oct 17, 2024 · 1 comment

Comments

@SerialKicked
Copy link

Describe the Issue

Hi, I was wondering if it would be possible to have the following feature that could be called from your API:

Simply put, I'd want the ability to make a copy of the KV cache and put it in the RAM, coupled with the ability to move that copy back to the cache. And of course, have an API call for each action.

I'm working on a program using KoboldCpp as a back-end. One of the features uses intermediate question/response pairs between the moment the user post a message and the moment the bot responds. The ability to save and restore the cache when I need would save a whole lot of prompt processing time.

I understand it's quite a specific request. But hey, I might as well ask :)

Thanks for all the work you've done, btw.

@gsenkowski
Copy link

I'm not sure if this is that specific. Actually I'm quite baffled that such a functionality isn't standard with local backends. I can see that it might be problematic in a cloud setting to let 100's of users save KV states with multiple GBs but in a local setting this seems like an obvious choice once you start experimenting with stuff like in context learning or just saving your current conversation for later without the need to reevalute all of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants