-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ColQwen2 example #897
Add ColQwen2 example #897
Conversation
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-7a1075b.modal.run |
Shorter than 1 min would be nice. Where are we spending time here? Is it loading the models, indexing the PDFs, or something else?
I would suggest not optimizing inference time unless it's a durable improvement. I suspect vllm will resolve this issue soon, so I'd skip working on it for now.
This might be worth looking into. Getting onto A100-40s (and later L40Ses) would be really nice. |
Cold start time is coming from loading the models |
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-c97a29f.modal.run |
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-e52659d.modal.run |
Have reduced the cold start |
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-f06a9a5.modal.run |
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-1aa3914.modal.run |
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-c2767f3.modal.run |
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-64c2fd0.modal.run |
What's the status? |
@erik-dunteman to pick this one up |
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-72f967a.modal.run |
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-5e0cb53.modal.run |
This one should be ready to go, pending the following:
Changes since I took the PR over:
|
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-b80fa95.modal.run |
Nice work! Will review quickly tomorrow. |
@charlesfrye I'd like to disable the keep_warm on this, cool if I make that one-line change? (edit: below commit does this. Ask forgiveness, not permission) |
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-1b7a289.modal.run |
🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-064ef7d.modal.run |
Adds a chat with RAG example using the following things:
Some things I would do if I wanted people to actually use this in prod. Curious which of these people think is worth doing
Optimize the cold start (takes 2 mins now on average)
Optimize the inference time for the chat (currently ~10s). Would try vllm, but there's an issue with VLLM and the
transformers
version thatColQwen2
needs, so I'd have to build vllm from source, which would increase build time?Try to make the app use less memory (I currently need an 80gb A100, largely because though the underlying model is the same, I couldn't find a clean way to use the same underlying object for the model, and so I end up having 2 model objects)
Type of Change
Checklist
lambda-test: false
is added to example frontmatter (---
)modal run
or an alternativecmd
is provided in the example frontmatter (e.g.cmd: ["modal", "deploy"]
)args
are provided in the example frontmatter (e.g.args: ["--prompt", "Formula for room temperature superconductor:"]
latest
python_version
for the base image, if it is used~=x.y.z
or==x.y
version < 1
are pinned to patch version,==0.y.z
Outside contributors
You're great! Thanks for your contribution.