Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultimodalQnA image query, pdf, and dynamic ports #1134

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

mhbuehler
Copy link
Contributor

@mhbuehler mhbuehler commented Jan 10, 2025

Description

According to the RFC's Phase 2 plan, this PR adds image query support, PDF ingestion support, and dynamic ports to the microservices used by MultimodalQnA. This PR goes with this one in GenAIExamples.

Issues

RFC

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

pymupdf is new for the dataprep microservice, but it's not new to GenAIComps.

Tests

Tests were added to the following scripts:

  • tests/dataprep/test_dataprep_multimodal_redis_langchain.sh
  • tests/embeddings/test_embeddings_multimodal.sh
  • tests/lvms/test_lvms_llava.sh
  • tests/lvms/test_lvms_tgi-llava_on_intel_hpu.sh
  • tests/retrievers/test_retrievers_multimodal_redis_langchain.sh
  • tests/retrievers/test_retrievers_redis.sh

dmsuehir and others added 6 commits December 16, 2024 10:02
* Backend enhancements for image query capabilities for MultimodalQnA

* Fix model name var

Signed-off-by: dmsuehir <[email protected]>

* Remove space at end of prompt

Signed-off-by: dmsuehir <[email protected]>

* Add env var for the max number of images sent to the LVM

Signed-off-by: dmsuehir <[email protected]>

* README update for the MAX_IMAGES env var

Signed-off-by: dmsuehir <[email protected]>

* Remove prints

Signed-off-by: dmsuehir <[email protected]>

* Audio query functionality to multimodal backend (#8)

Signed-off-by: okhleif-IL <[email protected]>

* added in audio dict creation

Signed-off-by: okhleif-IL <[email protected]>

* separated audio from prompt

Signed-off-by: okhleif-IL <[email protected]>

* added ASR endpoint

Signed-off-by: okhleif-IL <[email protected]>

* removed ASR endpoints from mm embedding

Signed-off-by: okhleif-IL <[email protected]>

* edited return logic, fixed function call

Signed-off-by: okhleif-IL <[email protected]>

* added megaservice to elif

Signed-off-by: okhleif-IL <[email protected]>

* reworked helper func

Signed-off-by: okhleif-IL <[email protected]>

* Append audio to prompt

Signed-off-by: okhleif-IL <[email protected]>

* Reworked handle messages, added metadata

Signed-off-by: okhleif-IL <[email protected]>

* Moved dictionary logic to right place

Signed-off-by: okhleif-IL <[email protected]>

* changed logic to rely on message len

Signed-off-by: okhleif-IL <[email protected]>

* list --> empty str

Signed-off-by: okhleif-IL <[email protected]>
---------

Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: okhleif-IL <[email protected]>
Signed-off-by: dmsuehir <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed role bug where i never was > 0

Signed-off-by: okhleif-IL <[email protected]>

* Fix after merge

Signed-off-by: dmsuehir <[email protected]>

* removed whitespace

Signed-off-by: okhleif-IL <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix call to get role labels

Signed-off-by: dmsuehir <[email protected]>

* Gateway test updates images within the conversation

Signed-off-by: dmsuehir <[email protected]>

* Adds unit test coverage for audio query

Signed-off-by: Melanie Buehler <[email protected]>

* Update test to check the returned b64 types

Signed-off-by: dmsuehir <[email protected]>

* Update test since we don't expect images from the assistant

Signed-off-by: dmsuehir <[email protected]>

* Port number fix

Signed-off-by: Melanie Buehler <[email protected]>

* Formatting

Signed-off-by: Melanie Buehler <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed place where port number is set

Signed-off-by: Melanie Buehler <[email protected]>

* Remove old comment and added more accurate description

Signed-off-by: dmsuehir <[email protected]>

* add comment in code about MAX_IMAGES

Signed-off-by: dmsuehir <[email protected]>

* Add Gaudi support for image query

Signed-off-by: dmsuehir <[email protected]>

* Fix to pass the retrieved image last

Signed-off-by: dmsuehir <[email protected]>

* Revert out gateway and gateway test code, due to its move to GenAIExamples

Signed-off-by: dmsuehir <[email protected]>

* Fix retriever test for checking for b64_img_str in the result

Signed-off-by: dmsuehir <[email protected]>

---------

Signed-off-by: dmsuehir <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: okhleif-IL <[email protected]>
Co-authored-by: Omar Khleif <[email protected]>
Co-authored-by: Melanie Hart Buehler <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abolfazl Shahbazi <[email protected]>
…nv file (#17)

* changed all hardcoded ports to getenv with defaults instead

Signed-off-by: okhleif-IL <[email protected]>
---------

Signed-off-by: okhleif-IL <[email protected]>
* Initial implementation of PDF ingestion

Signed-off-by: Melanie Buehler <[email protected]>

* PDF ingestion fixes

Signed-off-by: Melanie Buehler <[email protected]>

* Adds a test for dataprep microservice

Signed-off-by: Melanie Buehler <[email protected]>

* Improved comments, variable name, and a docstring

Signed-off-by: Melanie Buehler <[email protected]>

* Updated for review feedback

Signed-off-by: Melanie Buehler <[email protected]>

---------

Signed-off-by: Melanie Buehler <[email protected]>
Copy link

codecov bot commented Jan 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines Coverage Δ
comps/cores/proto/docarray.py 99.44% <100.00%> (ø)

for frame in annotation:
page_index = frame["frame_no"]
image_index = frame["sub_video_id"]
path_to_frame = os.path.join(path_to_frames, f"page{page_index}_image{image_index}.png")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhbuehler Anyway, this is a new function. Why do we need to preserve the name of local variables relating to frames/videos (e.g., path_to_frame, video_id)? This might cause confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, I'll fix this. Thanks!

@@ -106,7 +106,7 @@ async def audio_transcriptions(
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--host", type=str, default="0.0.0.0")
parser.add_argument("--port", type=int, default=7066)
parser.add_argument("--port", type=int, default=os.getenv("WHISPER_PORT", 7066))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal container server port can be forwarded to any arbitrary host port. Why we need this also to be configurable?

Copy link
Contributor Author

@mhbuehler mhbuehler Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Our intention with WHISPER_PORT and ASR_PORT was to enable full customization to the user, but as you pointed out, hard-coding the internal server ports doesn't really matter because they can be forwarded to different external ports. The corresponding PR in GenAIExamples is here and may provide clarity.

The change to ASR_PORT is now irrelevant because it's been removed from GenAIExamples, so we will revert that change. Let me know if you prefer to have us revert WHISPER_PORT configurability and we'll do the same for it.

@@ -34,7 +34,7 @@
service_type=ServiceType.ASR,
endpoint="/v1/audio/transcriptions",
host="0.0.0.0",
port=9099,
port=int(os.getenv("ASR_PORT", 9099)),
input_datatype=Base64ByteStrDoc,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will revert this.

dmsuehir and others added 5 commits January 13, 2025 11:20
* Fixing Multimodal Retriever Redis tests

Signed-off-by: dmsuehir <[email protected]>

* Code cleanup

Signed-off-by: dmsuehir <[email protected]>

* Remove debug changes

Signed-off-by: dmsuehir <[email protected]>

* Formatting

Signed-off-by: dmsuehir <[email protected]>

---------

Signed-off-by: dmsuehir <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants