[Feature Request] Add option to cache model compilation for `modular/max-openai-api` #271

remorses · 2024-12-19T11:32:11Z

What is your request?

I tried deploying modular/max-openai-api to fly.io, but it takes a lot of time to do the first compilation of the model, is it possible to cache the model compilation on disk?

What is your motivation for this change?

Add --model-compile-cache=/.root/model parameter

Any other details?

Fly.io is a serverless GPU deployment platform, the machine is stopped and started often, now model compilation is too slow to be able to deploy in this kind of infrastructure

The text was updated successfully, but these errors were encountered:

ehsanmok · 2024-12-20T22:51:33Z

Thanks! we do have a serialization support but isn't documented well which we'll fix and popularize when more testing is done.

remorses added the enhancement New feature or request label Dec 19, 2024

linear bot added the pipelines-common label Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add option to cache model compilation for `modular/max-openai-api` #271

[Feature Request] Add option to cache model compilation for `modular/max-openai-api` #271

remorses commented Dec 19, 2024

ehsanmok commented Dec 20, 2024

[Feature Request] Add option to cache model compilation for modular/max-openai-api #271

[Feature Request] Add option to cache model compilation for modular/max-openai-api #271

Comments

remorses commented Dec 19, 2024

What is your request?

What is your motivation for this change?

Any other details?

ehsanmok commented Dec 20, 2024

[Feature Request] Add option to cache model compilation for `modular/max-openai-api` #271

[Feature Request] Add option to cache model compilation for `modular/max-openai-api` #271