Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ColQwen2 example #897

Merged
merged 19 commits into from
Oct 29, 2024
Merged

Add ColQwen2 example #897

merged 19 commits into from
Oct 29, 2024

Conversation

advay-modal
Copy link
Contributor

@advay-modal advay-modal commented Sep 30, 2024

Screenshot 2024-09-30 at 11 03 15 AM

Adds a chat with RAG example using the following things:

  • ColQwen2 to index the docs
  • Qwen2-VL as a VLM
  • Gradio chatUI and PDF upload UI

Some things I would do if I wanted people to actually use this in prod. Curious which of these people think is worth doing

  • Optimize the cold start (takes 2 mins now on average)

  • Optimize the inference time for the chat (currently ~10s). Would try vllm, but there's an issue with VLLM and the transformers version that ColQwen2 needs, so I'd have to build vllm from source, which would increase build time?

  • Try to make the app use less memory (I currently need an 80gb A100, largely because though the underlying model is the same, I couldn't find a clean way to use the same underlying object for the model, and so I end up having 2 model objects)

Type of Change

  • New example
  • Example updates (Bug fixes, new features, etc.)
  • Other (changes to the codebase, but not to examples)

Checklist

  • Example is testable in synthetic monitoring system, or lambda-test: false is added to example frontmatter (---)
    • Example is tested by executing with modal run or an alternative cmd is provided in the example frontmatter (e.g. cmd: ["modal", "deploy"])
    • Example is tested by running with no arguments or the args are provided in the example frontmatter (e.g. args: ["--prompt", "Formula for room temperature superconductor:"]
  • Example is documented with comments throughout, in a Literate Programming style.
  • Example does not require third-party dependencies to be installed locally
  • Example pins its dependencies
    • Example pins container images to a stable tag, not a dynamic tag like latest
    • Example specifies a python_version for the base image, if it is used
    • Example pins all dependencies to at least minor version, ~=x.y.z or ==x.y
    • Example dependencies with version < 1 are pinned to patch version, ==0.y.z

Outside contributors

You're great! Thanks for your contribution.

@advay-modal advay-modal changed the title Add colpali example Add ColQwen2 example Sep 30, 2024
@charlesfrye
Copy link
Collaborator

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-7a1075b.modal.run

@charlesfrye
Copy link
Collaborator

Optimize the cold start (takes 2 mins now on average)

Shorter than 1 min would be nice. Where are we spending time here? Is it loading the models, indexing the PDFs, or something else?

Optimize the inference time for the chat (currently ~10s). Would try vllm, but there's an issue with VLLM and the transformers version that ColQwen2 needs, so I'd have to build vllm from source, which would increase build time?

I would suggest not optimizing inference time unless it's a durable improvement. I suspect vllm will resolve this issue soon, so I'd skip working on it for now.

Try to make the app use less memory (I currently need an 80gb A100, largely because though the underlying model is the same, I couldn't find a clean way to use the same underlying object for the model, and so I end up having 2 model objects)

This might be worth looking into. Getting onto A100-40s (and later L40Ses) would be really nice.

@advay-modal
Copy link
Contributor Author

Cold start time is coming from loading the models

@charlesfrye
Copy link
Collaborator

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-c97a29f.modal.run

@charlesfrye
Copy link
Collaborator

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-e52659d.modal.run

@advay-modal
Copy link
Contributor Author

Have reduced the cold start

@charlesfrye
Copy link
Collaborator

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-f06a9a5.modal.run

@charlesfrye
Copy link
Collaborator

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-1aa3914.modal.run

@charlesfrye
Copy link
Collaborator

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-c2767f3.modal.run

@charlesfrye
Copy link
Collaborator

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-64c2fd0.modal.run

@erikbern
Copy link
Contributor

What's the status?

@charlesfrye
Copy link
Collaborator

What's the status?

@erik-dunteman to pick this one up

@charlesfrye
Copy link
Collaborator

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-72f967a.modal.run

@charlesfrye
Copy link
Collaborator

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-5e0cb53.modal.run

@erik-dunteman
Copy link
Collaborator

This one should be ready to go, pending the following:

  • CI passes
  • if we want to add automated testing (in the context of gradio, would need to call the gradio api directly from local entrypoint)

Changes since I took the PR over:

  • state no longer stored in class's "self", instead using modal dict, allowing concurrent users and horizontal scaling.
    • way to ID users, using gradio state

@charlesfrye
Copy link
Collaborator

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-b80fa95.modal.run

@charlesfrye
Copy link
Collaborator

Nice work! Will review quickly tomorrow.

@charlesfrye
Copy link
Collaborator

lookin good!

Screenshot 2024-10-28 at 2 21 31 PM

@erik-dunteman
Copy link
Collaborator

erik-dunteman commented Oct 28, 2024

@charlesfrye I'd like to disable the keep_warm on this, cool if I make that one-line change?

(edit: below commit does this. Ask forgiveness, not permission)

@charlesfrye
Copy link
Collaborator

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-1b7a289.modal.run

@charlesfrye
Copy link
Collaborator

charlesfrye commented Oct 29, 2024

Noticed that we OOMed at ~5 pages of PDF, so added batching with batch size 4. Somewhat surprising to me that the memory allocation is so high! But it's expected for this inference code.

Going past 5 pages also revealed that storing images in a Dict falls apart at a few tens of pages. I moved the image storage onto a Modal Volume.

With those enhancements, the model can now answer questions about my dissertation:

Screenshot 2024-10-28 at 10 10 44 PM

Also made some text edits and added a local_entrypoint for interfacing via the command line, as pictured above.

@charlesfrye
Copy link
Collaborator

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-064ef7d.modal.run

@charlesfrye charlesfrye merged commit 8cb7059 into main Oct 29, 2024
7 checks passed
@charlesfrye charlesfrye deleted the advay/colpali branch October 29, 2024 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants