-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] how to combine with ray.data #1987
Comments
Can you share a self-contained minimal reproducible script? |
ray start --num-gpus=$GPUS_PER_NODE --head import argparse import ray.data from transformers import AutoTokenizer BASE_PROMPT = ''''#Question #Response if name == 'main':
|
Checklist
Describe the bug
ray.data.Dataset.map_batches(**args) is ok with vllm, i replace vllm.LLm with sglang.Engine, it's wrong, how to make it? Because ray.data is easy to manage data. Looking forward to you can help me
Running 0: 0 bundle [00:00, ? bundle/s]2024-11-10 23:01:22,492 ERROR streaming_executor_state.py:456 -- An exception was raised from a task of operator "MapBatches(LLM)". Dataset execution will now abort. To ignore this exception and continue, set DataContext.max_errored_blocks.
⚠️ Dataset execution failed: : 0 bundle [00:00, ? bundle/s]
2024-11-10 23:01:22,504(WARNING actor_pool_map_operator.py:265 -- To ensure full parallelization across an actor pool of size 1, the Dataset should consist of at least 1 distinct blocks. Consider increasing the parallelism when creating the Dataset.
2024-11-10 23:01:22,515tERROR exceptions.py:63 -- Exception occurred in user code, with the abbreviated stack trace below. By default, the Ray Data internal stack trace is omitted from stdout, and only written to the Ray Data log files at /tmp/ray/session_2024-11-10_23-00-50_100958_208716/logs/ray-data. To output the full stack trace to stdout, set
DataContext.log_internal_stack_trace_to_stdout
to True.e/s]Traceback (most recent call last):
File "/data/repo/batch_infer/sglang_infer.py", line 129, in
new_ds.write_json(args.output_path)
File "/root/anaconda3/envs/torch/lib/python3.10/site-packages/ray/data/dataset.py", line 2888, in write_json
self.write_datasink(
File "/root/anaconda3/envs/torch/lib/python3.10/site-packages/ray/data/dataset.py", line 3610, in write_datasink
self._write_ds = Dataset(plan, logical_plan).materialize()
File "/root/anaconda3/envs/torch/lib/python3.10/site-packages/ray/data/dataset.py", line 4598, in materialize
copy._plan.execute()
File "/root/anaconda3/envs/torch/lib/python3.10/site-packages/ray/data/exceptions.py", line 87, in handle_trace
raise e.with_traceback(None)
ray.exceptions.RayTaskError(UserCodeException): ray::MapBatches(LLM)() (pid=210888, ip=172.22.197.6, actor_id=e58321fa3f0e04abad4aba2801000000, repr=MapWorker(MapBatches(LLM)))
File "/root/anaconda3/envs/torch/lib/python3.10/site-packages/ray/data/_internal/execution/util.py", line 78, in call
return future.result()
File "/root/anaconda3/envs/torch/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/root/anaconda3/envs/torch/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/root/anaconda3/envs/torch/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/data/repo/batch_infer/sglang_infer.py", line 89, in call
responses = self.llm.generate(input_texts, self.sampling_params)
File "/root/anaconda3/envs/torch/lib/python3.10/site-packages/sglang/srt/server.py", line 760, in generate
loop = asyncio.get_event_loop()
File "/root/anaconda3/envs/torch/lib/python3.10/asyncio/events.py", line 656, in get_event_loop
raise RuntimeError('There is no current event loop in thread %r.'
RuntimeError: There is no current event loop in thread 'ThreadPoolExecutor-0_0'.
The above exception was the direct cause of the following exception:
ray::MapBatches(LLM)() (pid=210888, ip=172.22.197.6, actor_id=e58321fa3f0e04abad4aba2801000000, repr=MapWorker(MapBatches(LLM)))
File "/root/anaconda3/envs/torch/lib/python3.10/site-packages/ray/data/_internal/execution/operators/actor_pool_map_operator.py", line 364, in submit
yield from _map_task(
File "/root/anaconda3/envs/torch/lib/python3.10/site-packages/ray/data/_internal/execution/operators/map_operator.py", line 451, in _map_task
for b_out in map_transformer.apply_transform(iter(blocks), ctx):
File "/root/anaconda3/envs/torch/lib/python3.10/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 392, in call
for data in iter:
File "/root/anaconda3/envs/torch/lib/python3.10/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 134, in _udf_timed_iter
output = next(input)
File "/root/anaconda3/envs/torch/lib/python3.10/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 236, in call
yield from self._batch_fn(input, ctx)
File "/root/anaconda3/envs/torch/lib/python3.10/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 282, in transform_fn
res = fn(batch)
File "/root/anaconda3/envs/torch/lib/python3.10/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 186, in fn
_handle_debugger_exception(e)
File "/root/anaconda3/envs/torch/lib/python3.10/site-packages/ray/data/_internal/planner/plan_udf_map_op.py", line 210, in _handle_debugger_exception
raise UserCodeException() from e
ray.exceptions.UserCodeException
Reproduction
Environment
torch 2.4 cuda 12.4
The text was updated successfully, but these errors were encountered: