[Bug] vLLM-service runtime issue for ChatQnA #1119

louie-tsai · 2024-11-12T05:03:54Z

Priority

P3-Medium

OS type

Ubuntu

Hardware type

Gaudi2

Installation method

Pull docker images from hub.docker.com
Build docker images from source

Deploy method

Docker compose
Docker
Kubernetes
Helm

Running nodes

Single Node

What's the version?

the latest version

Description

Failed with below error when we test vllm-gaudi-service
Process SpawnProcess-1: Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/engine/multiprocessing/engine.py", line 394, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/engine/multiprocessing/engine.py", line 141, in from_engine_args return cls( File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/engine/multiprocessing/engine.py", line 78, in __init__ self.engine = LLMEngine(*args, File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/engine/llm_engine.py", line 351, in __init__ self._initialize_kv_caches() File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/engine/llm_engine.py", line 486, in _initialize_kv_caches self.model_executor.determine_num_available_blocks()) File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/executor/hpu_executor.py", line 84, in determine_num_available_blocks return self.driver_worker.determine_num_available_blocks() File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/worker/hpu_worker.py", line 180, in determine_num_available_blocks self.model_runner.profile_run() File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 1451, in profile_run self.warmup_scenario(max_batch_size, max_seq_len, True, kv_caches, File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 1523, in warmup_scenario self.execute_model(inputs, kv_caches, warmup_mode=True) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 2134, in execute_model hidden_states = self.model.forward( File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 716, in forward return wrapped_hpugraph_forward( File "/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/graphs.py", line 570, in wrapped_hpugraph_forward return orig_fwd(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/worker/hpu_model_runner.py", line 387, in forward hidden_states = self.model(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1523, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 566, in forward model_output = self.model(input_ids, positions, kv_caches, File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl result = forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 352, in forward hidden_states, residual = layer(positions, hidden_states, File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl result = forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 261, in forward hidden_states = self.self_attn(positions=positions, File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl result = forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/model_executor/models/llama.py", line 191, in forward attn_output = self.attn(q, k, v, kv_cache, attn_metadata) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1514, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1564, in _call_impl result = forward_call(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/attention/layer.py", line 100, in forward return self.impl.forward(query, File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/attention/backends/hpu_attn.py", line 208, in forward out = ops.prompt_attention( File "/usr/local/lib/python3.10/dist-packages/vllm_hpu_extension/ops.py", line 226, in prompt_attention attn_weights = FusedSDPA.apply(query, key, value, None, 0.0, True, File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 553, in apply return super().apply(*args, **kwargs) # type: ignore[misc] TypeError: FusedSDPA.forward() takes from 4 to 9 positional arguments but 12 were given Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/entrypoints/openai/api_server.py", line 585, in <module> uvloop.run(run_server(args)) File "/usr/local/lib/python3.10/dist-packages/uvloop/__init__.py", line 82, in run return loop.run_until_complete(wrapper()) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/usr/local/lib/python3.10/dist-packages/uvloop/__init__.py", line 61, in wrapper return await main File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/entrypoints/openai/api_server.py", line 552, in run_server async with build_async_engine_client(args) as engine_client: File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/entrypoints/openai/api_server.py", line 107, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/usr/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/usr/local/lib/python3.10/dist-packages/vllm-0.6.3.dev588+g1033c3eb.gaudi000-py3.10.egg/vllm/entrypoints/openai/api_server.py", line 194, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start

Reproduce steps

follow readme to test vllm-service
https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker_compose/intel/hpu/gaudi/README.md#validate-microservices-and-megaservice

Raw log

No response

The text was updated successfully, but these errors were encountered:

wangkl2 · 2024-11-26T08:31:18Z

@louie-tsai This is a known issue in HabanaAI/vllm-fork#462. The opea dev team has specified a commit for vllm-fork during docker image building in #1142. I've verified the vllm service works now with the latest opea/vllm-gaudi:latest docker image. Please check and see if we can close this issue.

devpramod · 2024-11-26T16:56:58Z

@louie-tsai @wangkl2 Verified that the vllm service works with the latest image

louie-tsai · 2024-11-27T00:38:47Z

thanks

devpramod self-assigned this Nov 12, 2024

louie-tsai closed this as completed Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] vLLM-service runtime issue for ChatQnA #1119

[Bug] vLLM-service runtime issue for ChatQnA #1119

louie-tsai commented Nov 12, 2024

wangkl2 commented Nov 26, 2024

devpramod commented Nov 26, 2024

louie-tsai commented Nov 27, 2024

[Bug] vLLM-service runtime issue for ChatQnA #1119

[Bug] vLLM-service runtime issue for ChatQnA #1119

Comments

louie-tsai commented Nov 12, 2024

Priority

OS type

Hardware type

Installation method

Deploy method

Running nodes

What's the version?

Description

Reproduce steps

Raw log

wangkl2 commented Nov 26, 2024

devpramod commented Nov 26, 2024

louie-tsai commented Nov 27, 2024