Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new colpali example #908

Merged
merged 4 commits into from
Sep 7, 2024
Merged

Add new colpali example #908

merged 4 commits into from
Sep 7, 2024

Conversation

jobergum
Copy link

@jobergum jobergum commented Sep 6, 2024

I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.

@jobergum jobergum requested a review from thomasht86 September 6, 2024 12:54
@jobergum
Copy link
Author

jobergum commented Sep 6, 2024

Input welcome on this, both language and code, also would be nice to see if we can get this into CI/CD

@thomasht86
Copy link
Collaborator

Very cool! 🤩
Was able to run it on Colab ✅

Some comments:

  • Making it run on CI
    As a start, we could add
!sudo apt-get install poppler-utils -y

Should also move the note about poppler dependency there, and maybe link to other install options. .
But, it will most likely take prohibitively long to run this on CPU. Possible to scale down number of docs/images to a minimum if running on CPU?

  • The cell below
    image

Takes 1.5*15=22.5 mins to run in Colab with A100 GPU. 🐌
(Didn't bother to wait with T4.) Perhaps add a note about that. ☕

  • Could be nice to add Open In Colab to the top (I did this for the other cloud nbs in the big batch update)

  • The cell below

Screenshot 2024-09-06 at 17 24 57

Causes linting error in Colab (and older versions of eg. pylance Consider using https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.application.Vespa.feed_async_iterable for feeding at least.

  • tenant_name = "samples" would need to change to vespa-team to run in CI.

@jobergum
Copy link
Author

jobergum commented Sep 6, 2024

Great, thanks for this! Will iterate

Takes 1.5*15=22.5 mins to run in Colab with A100 GPU.

This is super weird as it takes just 12-15 seconds per batch on my M1 with MPS

@thomasht86
Copy link
Collaborator

Looks like the latest run encountered this hf issue

@thomasht86
Copy link
Collaborator

Great. Ran successfully in 25mins - we can add it to CI. 👍
(Wonder what my issue on Colab was though).

Consider less printout of b64-strings and embeddings (will force the reader to scroll a lot).

@thomasht86
Copy link
Collaborator

thomasht86 commented Sep 7, 2024

Made this for my own understanding.
May be useful for others. 😄
Feel free to add (or dm if you want to modify).
Phased ranking Colpali

@jobergum
Copy link
Author

jobergum commented Sep 7, 2024

It's a good overview diagram! Removed the larger output and also looks like cd passed

Copy link
Collaborator

@thomasht86 thomasht86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

@jobergum jobergum merged commit a921e97 into master Sep 7, 2024
43 checks passed
@jobergum jobergum deleted the jobergum/more-colpali branch September 7, 2024 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants