-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] Support of more HuggingFace embedders for multimodality #28090
Comments
Makes sense. CLIP has two parts, image encoding and text encoding, and are handled by two different neural networks. We could fit the text transformer model into the existing embed framework as already done in multiple vespa sample applications, but image encoding would not fit into the existing embed functionality which takes a string or array of string as inputs. |
So if you are fine with just having the text-to-image space model in Vespa, we can create that type of example using HF-embedder functionality. |
With the same process ?
|
To handle image data, we would have to create a new type of embedder functionality. |
Exactly! It will also prepare Vespa for further types: audio, video ... |
I was a bit ahead of time apparently. 7-modality is here. |
ImageBind is interesting, but I do recommend looking at the licensing :) |
Indeed, non commercial license. |
Does vespa support multimodality currently? |
Hey @AriMKatz, We currently do not expose any provided embedders that is for multimodal. The provided embedder models are text only. This doesn't mean that you cannot use multimodal representations with Vespa, for example here is a recent example of a multimodal model PDF Retrieval with Vision Language Models (ColPali). |
Hey @jobergum, |
See my comment above and #32389, the native built in embedders are currently only text. |
My goal is to build a unique multimodal WooCommerce search experience with Vespa multivectors and an hybrid ranking on text-BM25, text-vectors, and image-vectors.
For instance, E-commerce can use:
Of course, sounds and videos are also a possibility.
Currently, I implemented a text-to-text demo: https://demo-woocommerce-cloudways-2k-vespa-transformers.wpsolr.com/shop/
But image HF embedders are not available yet, as far as I can read in the documentation and blog.
Blog examples require an external Python code to produce the image vectors.
The text was updated successfully, but these errors were encountered: