Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vLLM support to DocSum Helm chart #649

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

eero-t
Copy link
Contributor

@eero-t eero-t commented Dec 18, 2024

Description

This was split from Helm vLLM support added in #610. It adds vLLM support to DocSum Helm chart.

(Similarly to how it's already done for ChatQnA app + Agent component, there are tgi.enabled & vllm.enabled flags for selecting which LLM will be used.)

Type of change

  • New feature (non-breaking change which adds new functionality)

Dependencies

opea/llm-docsum-vllm:latest image is currently missing from CI & DockerHub registries:
opea-project/GenAIComps#961

(Although corresponding opea/llm-docsum-tgi:latest image for TGI, and opea/llm-vllm:latest vLLM text-generation images already exist.)

Tests

Manual testing with opea/llm-docsum-vllm:latest image built locally.

@eero-t eero-t marked this pull request as draft December 18, 2024 18:58
@eero-t
Copy link
Contributor Author

eero-t commented Dec 18, 2024

Setting as draft because the required image is still missing from DockerHub, and this needs retesting after currently pending DocSum changes for Comps & Examples repos have completed.

@eero-t
Copy link
Contributor Author

eero-t commented Dec 20, 2024

While CI "docsum, gaudi, ci-gaudi-vllm-values" test fails as expected, due to OPEA missing llm-docsum-vllm image...

There seems to be a bug in component unrelated to this PR, as also run "llm-uservice, xeon, ci-faqgen-values, common" CI test fails to a package missing from image:

[pod/llm-uservice20241218190439-5b9b7b79fd-r65l9/llm-uservice20241218190439]
...
   File "/home/user/comps/llms/faq-generation/tgi/langchain/llm.py", line 77, in stream_generator
     from langserve.serialization import WellKnownLCSerializer
   File "/home/user/.local/lib/python3.11/site-packages/langserve/__init__.py", line 8, in <module>
     from langserve.client import RemoteRunnable
   File "/home/user/.local/lib/python3.11/site-packages/langserve/client.py", line 24, in <module>
     from httpx._types import AuthTypes, CertTypes, CookieTypes, HeaderTypes, VerifyTypes
 ImportError: cannot import name 'VerifyTypes' from 'httpx._types' (/home/user/.local/lib/python3.11/site-packages/httpx/_types.py)

=> requirements.txt for llm-faqgen-tgi:latest image generation is not up to date in Comps repo?

@lianhao?

@eero-t
Copy link
Contributor Author

eero-t commented Dec 30, 2024

Rebased to main + dropped "draft" status, as the required OPEA image is now available in DockerHub!

@eero-t
Copy link
Contributor Author

eero-t commented Dec 30, 2024

CI still fails.

PR #659 fixes DocSum issues with updates in other repos, includes the same model ID workaround as this one, and passed CI => better to merge that first & rebase this?


"docsum, gaudi, ci-gaudi-vllm-values" fails because CI registry is out of date. Although required image has been at DockerHub for 4 days [1], fetching it still fails:
Normal BackOff 4m47s (x18 over 9m42s) kubelet Back-off pulling image "100.83.111.229:5000/opea/llm-docsum-vllm:latest"
[1] https://hub.docker.com/r/opea/llm-docsum-vllm/tags

"docsum, xeon, ci-values" fails to connection failure:

[pod/docsum20241230125518-5599c984c6-dt5fc/docsum20241230125518] aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 0.0.0.0:7066 ssl:default [Connect call failed ('0.0.0.0', 7066)]
...
testpod: Response check failed, please check the logs in artifacts!

"docsum, gaudi, ci-gaudi-tgi-values" fails to similar CI issue:

[pod/docsum20241230130614-d598c6674-qqhjf/docsum20241230130614] aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host 0.0.0.0:7066 ssl:default [Connect call failed ('0.0.0.0', 7066)]
...
testpod: Response check failed, please check the logs in artifacts!

@eero-t
Copy link
Contributor Author

eero-t commented Jan 2, 2025

Rebased to main to get CI tests passing, and dropped already merged fix. However, CI is still broken.

@daisy-ycguo CI "docsum, gaudi, ci-gaudi-vllm-values" still fails to CI registry not being up to date with DockerHub: https://hub.docker.com/r/opea/llm-docsum-vllm/tags
Failed to pull image "100.83.111.229:5000/opea/llm-docsum-vllm:latest": rpc error: code = NotFound desc = failed to pull and unpack image "100.83.111.229:5000/opea/llm-docsum-vllm:latest": failed to resolve reference "100.83.111.229:5000/opea/llm-docsum-vllm:latest": 100.83.111.229:5000/opea/llm-docsum-vllm:latest: not found

@lianhao, CI "docsum, gaudi, ci-gaudi-tgi-values" test fails now to test bug?
[pod/docsum20250102125737-llm-uservice-7d6f8d968f-v9b4w/docsum20250102125737] | huggingface_hub.errors.ValidationError: Input validation error: 'inputs' tokens + 'max_new_tokens' must be <= 4096. Given: 4095 'inputs' tokens and 17 'max_new_tokens'

Copy link
Collaborator

@lianhao lianhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eero-t I found pending PR opea-project/GenAIComps#1101 will make a big change, there will be no more llm-docsum-llm image any more, a single llm-docsum image will be able to talk to both tgi and vllm, so maybe we should wait until that PR to be merged first.

helm-charts/docsum/values.yaml Show resolved Hide resolved
@eero-t
Copy link
Contributor Author

eero-t commented Jan 7, 2025

@eero-t I found pending PR opea-project/GenAIComps#1101 will make a big change, there will be no more llm-docsum-llm image any more, a single llm-docsum image will be able to talk to both tgi and vllm, so maybe we should wait until that PR to be merged first.

Good to know, if CI registry continues missing the DockerHub image, that indeed should finally fix it.

Signed-off-by: Eero Tamminen <[email protected]>
Adapt to DocSum refactor:
opea-project/GenAIComps#1101

Signed-off-by: Eero Tamminen <[email protected]>
@eero-t
Copy link
Contributor Author

eero-t commented Jan 14, 2025

Rebased to main, squashed "rebase fix" commit, and added commit updating llm-docsum-* image names according to: opea-project/GenAIComps#1101

@eero-t
Copy link
Contributor Author

eero-t commented Jan 14, 2025

Lot of CI failures, so of them due to incomplete update to DocSum refactor, some due to issues outside of this PR...

Doc building CI test fails to bugs in doc scripts:

/home/runner/work/GenAIInfra/GenAIInfra/docs/scripts/filter-known-issues.py:47: SyntaxWarning: invalid escape sequence '\s'
  b"(?P<comment>(^\s*#.*\n)+)" \
/home/runner/work/GenAIInfra/GenAIInfra/docs/scripts/filter-known-issues.py:91: SyntaxWarning: invalid escape sequence '\.'
  file_regex = re.compile(".*\.conf$")
WARNING: The config value `myst_enable_extensions' has type `list'; expected `Any'.
WARNING: The config value `myst_fence_as_directive' has type `list'; expected `Any'.
make: *** [Makefile:65: html] Error 2

"docsum, gaudi, ci-gaudi-tgi-values":

Response check failed, please check the logs in artifacts!
Error: Process completed with exit code 1.

"Xeon / go-e2e" CI test fails to image pull failure:

   Normal   BackOff    3m42s (x7 over 8m55s)  kubelet            Back-off pulling image "100.80.243.74:5000/opea/llm-tgi:latest"
  Warning  Failed     3m42s (x7 over 8m55s)  kubelet            Error: ImagePullBackOff
Pod llm-svc-deployment-677bf47dfb-qtt22 logs:
Error from server (BadRequest): container "llm-uservice" in pod "llm-svc-deployment-677bf47dfb-qtt22" is waiting to start: trying and failing to pull image

"docsum, xeon, ci-values":

Response check failed, please check the logs in artifacts!
Error: Process completed with exit code 1.

"docsum, gaudi, ci-gaudi-vllm-values" probably fails due to missing DocSum_COMPONENT_NAME env var.

"llm-uservice, xeon, ci-docsum-values, common":

   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 302, in _request_wrapper
     hf_raise_for_status(response)
   File "/home/user/.local/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 454, in hf_raise_for_status
     raise _format(RepositoryNotFoundError, message, response) from e
huggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-6786554f-67e5da2940e1acc410ccee05;0abe054b-411e-4108-bb1c-f1d3dba03521)
 
 Repository Not Found for url: https://huggingface.co/None/resolve/main/tokenizer_config.json.
...
Traceback (most recent call last):
   File "/home/user/comps/llms/src/doc-summarization/opea_docsum_microservice.py", line 26, in <module>
     loader = OpeaComponentLoader(llm_component_name, description=f"OPEA LLM DocSum Component: {llm_component_name}")
...
   File "/home/user/.local/lib/python3.11/site-packages/transformers/utils/hub.py", line 426, in cached_file
     raise EnvironmentError(
 OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

etc.

@eero-t
Copy link
Contributor Author

eero-t commented Jan 14, 2025

Looking at the related refactor tickets:

Potentially needed:

export LLM_ENDPOINT_PORT=8008
export DOCSUM_PORT=9000
export LLM_ENDPOINT="http://${host_ip}:${LLM_ENDPOINT_PORT}"
export DocSum_COMPONENT_NAME="OPEADocSum_TGI"  # or _vLLM

And possibly also increase max tokens from 1024/2048 to 2048/4096.

@lianhao
Copy link
Collaborator

lianhao commented Jan 14, 2025

@eero-t you may need to wait until PR #696 get landin first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants