[FR] Support of more HuggingFace embedders for multimodality #28090

eostis · 2023-08-20T18:09:13Z

My goal is to build a unique multimodal WooCommerce search experience with Vespa multivectors and an hybrid ranking on text-BM25, text-vectors, and image-vectors.

For instance, E-commerce can use:

text-to-image (CLIP): search images
text-to-text (sentence transformers): search texts
image-to-image (resnet): similar images.

Of course, sounds and videos are also a possibility.

Currently, I implemented a text-to-text demo: https://demo-woocommerce-cloudways-2k-vespa-transformers.wpsolr.com/shop/

But image HF embedders are not available yet, as far as I can read in the documentation and blog.

Blog examples require an external Python code to produce the image vectors.

jobergum · 2023-08-21T07:43:57Z

Makes sense. CLIP has two parts, image encoding and text encoding, and are handled by two different neural networks.

We could fit the text transformer model into the existing embed framework as already done in multiple vespa sample applications, but image encoding would not fit into the existing embed functionality which takes a string or array of string as inputs.

jobergum · 2023-08-21T08:46:03Z

So if you are fine with just having the text-to-image space model in Vespa, we can create that type of example using HF-embedder functionality.

eostis · 2023-08-21T10:43:27Z

With the same process ?

Export the HF CLIP .onnx
Set the containers's HF component in services.xml
Define the embedded field with the input fields and input images participating in .sd
Add a closeness rank profile
Define the YQL query with nearestNeighbor() and ranking

jobergum · 2023-08-21T11:06:17Z

To handle image data, we would have to create a new type of embedder functionality.

eostis · 2023-08-21T11:09:58Z

Exactly! It will also prepare Vespa for further types: audio, video ...

eostis · 2023-08-22T16:45:17Z

I was a bit ahead of time apparently. 7-modality is here.

jobergum · 2023-08-23T12:10:47Z

ImageBind is interesting, but I do recommend looking at the licensing :)

eostis · 2023-08-23T12:22:24Z

Indeed, non commercial license.
https://creativecommons.org/licenses/by-nc-sa/4.0/
https://github.com/facebookresearch/ImageBind/blob/main/LICENSE

AriMKatz · 2024-07-18T17:34:36Z

Does vespa support multimodality currently?

jobergum · 2024-07-19T07:36:43Z

Hey @AriMKatz,

We currently do not expose any provided embedders that is for multimodal. The provided embedder models are text only.

This doesn't mean that you cannot use multimodal representations with Vespa, for example here is a recent example of a multimodal model PDF Retrieval with Vision Language Models (ColPali).

alpha-javed · 2024-12-12T12:23:48Z

Hey @jobergum,
does vespa support multimodality currently?

jobergum · 2024-12-12T12:52:53Z

See my comment above and #32389, the native built in embedders are currently only text.

eostis mentioned this issue Aug 20, 2023

A checklist for WooCommerce #26694

Open

frodelu added this to the later milestone Aug 23, 2023

frodelu assigned bjorncs Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] Support of more HuggingFace embedders for multimodality #28090

[FR] Support of more HuggingFace embedders for multimodality #28090

eostis commented Aug 20, 2023 •

edited

Loading

jobergum commented Aug 21, 2023

jobergum commented Aug 21, 2023

eostis commented Aug 21, 2023 •

edited

Loading

jobergum commented Aug 21, 2023

eostis commented Aug 21, 2023

eostis commented Aug 22, 2023

jobergum commented Aug 23, 2023

eostis commented Aug 23, 2023

AriMKatz commented Jul 18, 2024

jobergum commented Jul 19, 2024

alpha-javed commented Dec 12, 2024

jobergum commented Dec 12, 2024

[FR] Support of more HuggingFace embedders for multimodality #28090

[FR] Support of more HuggingFace embedders for multimodality #28090

Comments

eostis commented Aug 20, 2023 • edited Loading

jobergum commented Aug 21, 2023

jobergum commented Aug 21, 2023

eostis commented Aug 21, 2023 • edited Loading

jobergum commented Aug 21, 2023

eostis commented Aug 21, 2023

eostis commented Aug 22, 2023

jobergum commented Aug 23, 2023

eostis commented Aug 23, 2023

AriMKatz commented Jul 18, 2024

jobergum commented Jul 19, 2024

alpha-javed commented Dec 12, 2024

jobergum commented Dec 12, 2024

eostis commented Aug 20, 2023 •

edited

Loading

eostis commented Aug 21, 2023 •

edited

Loading