Initial text and document embedders implementation #1

gadmarkovits · 2024-11-10T16:05:29Z

Description

Implemented a text embedder and a document embedder to integrate OPEA with Haystack.

Issues

RFC

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)
Breaking change (fix or feature that would break existing design and interface)
Others (enhancement, documentation, validation, etc.)

Dependencies

Haystack

Tests

Ran e2e test on a local machine against a Xeon instance running the ChatQnA example.

Signed-off-by: Gad Markovits <[email protected]>

* Adds an endpoint for image ingestion Signed-off-by: Melanie Buehler <[email protected]> * Combined image and video endpoint Signed-off-by: Melanie Buehler <[email protected]> * Add test and update README Signed-off-by: Melanie Buehler <[email protected]> * fixed variable name for embedding model (#1) Signed-off-by: okhleif-IL <[email protected]> * Fixed test script Signed-off-by: Melanie Buehler <[email protected]> * Remove redundant function Signed-off-by: Melanie Buehler <[email protected]> * get_videos, delete_videos --> get_files, delete_files (opea-project#3) Signed-off-by: okhleif-IL <[email protected]> * Updates test per review feedback Signed-off-by: Melanie Buehler <[email protected]> * Fixed test Signed-off-by: Melanie Buehler <[email protected]> * Add support for audio files multimodal data ingestion (opea-project#4) * Add support for audio files multimodal data ingestion Signed-off-by: dmsuehir <[email protected]> * Update function name Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: dmsuehir <[email protected]> * Change videos_with_transcripts to ingest_with_text Signed-off-by: Melanie Buehler <[email protected]> * Add image support to video ingestion with transcript functionality Signed-off-by: Melanie Buehler <[email protected]> * Update test and README Signed-off-by: Melanie Buehler <[email protected]> * Updated for review suggestions Signed-off-by: Melanie Buehler <[email protected]> * Add two tests for ingest_with_text Signed-off-by: Melanie Buehler <[email protected]> * LVM TGI Gaudi update for prompts without images (opea-project#7) * LVM Gaudi TGI update for prompts without images Signed-off-by: dmsuehir <[email protected]> * Wording Signed-off-by: dmsuehir <[email protected]> * Add a test Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: dmsuehir <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change dummy image to be b64 encoded instead of the url (opea-project#9) Signed-off-by: dmsuehir <[email protected]> * Updates based on review feedback (opea-project#10) Signed-off-by: dmsuehir <[email protected]> * Test fix (opea-project#11) Signed-off-by: dmsuehir <[email protected]> --------- Signed-off-by: Melanie Buehler <[email protected]> Signed-off-by: okhleif-IL <[email protected]> Signed-off-by: dmsuehir <[email protected]> Co-authored-by: dmsuehir <[email protected]> Co-authored-by: Omar Khleif <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Abolfazl Shahbazi <[email protected]>

julian-risch

Looks quite good to me already! 👍 Just left some minor comments about to_dict and from_dict.

comps/integrations/haystack/src/opea_haystack/embedders/tei/document_embedder.py

comps/integrations/haystack/src/opea_haystack/embedders/tei/text_embedder.py

comps/integrations/haystack/src/opea_haystack/generators/generator.py

comps/integrations/haystack/pyproject.toml

comps/integrations/haystack/src/opea_haystack/__about__.py

comps/integrations/haystack/src/opea_haystack/generators/generator.py

comps/integrations/haystack/src/opea_haystack/embedders/tei/text_embedder.py

comps/integrations/haystack/src/opea_haystack/generators/generator.py

comps/integrations/haystack/src/opea_haystack/embedders/tei/text_embedder.py

comps/integrations/haystack/src/opea_haystack/embedders/tei/document_embedder.py

comps/integrations/haystack/src/opea_haystack/embedders/tei/text_embedder.py

…anges Signed-off-by: Gad Markovits <[email protected]>

julian-risch

Found one last typo in the output of OPEATextEmbedder. Everything else looks good to me. Happy to approve right after the typo is fixed!

comps/integrations/haystack/src/opea_haystack/embedders/tei/text_embedder.py

Signed-off-by: Gad Markovits <[email protected]>

julian-risch

Looks very good to me! 👍

gadmarkovits added 4 commits November 10, 2024 17:54

Initial text and document embedders implementation

fc26c73

Signed-off-by: Gad Markovits <[email protected]>

Fixed implementation to match OPEA updates

bd3b787

Signed-off-by: Gad Markovits <[email protected]>

Implemented integration for a tgi based opea generator

144e23c

Signed-off-by: Gad Markovits <[email protected]>

Updated opea api url

1820ab1

Signed-off-by: Gad Markovits <[email protected]>

julian-risch suggested changes Jan 15, 2025

View reviewed changes

julian-risch reviewed Jan 15, 2025

View reviewed changes

comps/integrations/haystack/src/opea_haystack/generators/generator.py Outdated Show resolved Hide resolved