Skip to content

Commit

Permalink
Skip empty steps in multi step sheduling (#526)
Browse files Browse the repository at this point in the history
This change allows to skip empty steps in multistep scenario. We are
currently wasting host time on launching n-2 empty steps.
This PR removes it. The gain will be visible after device time
optimizations, as we are currently limited by HPU calculations inside
multistep.
  • Loading branch information
jkaniecki authored Nov 20, 2024
1 parent 6338608 commit efe0268
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion vllm/worker/hpu_model_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -2241,9 +2241,12 @@ def try_revert_dummy_output_tokens():
else:
raise RuntimeError(
"seq_group_metadata_list is uninitialized")
# Cache the original output token ids
for i, seq_group_metadata in enumerate(
seq_group_metadata_list):
# Skip empty steps
seq_group_metadata.state.current_step += (
num_steps - 2)
# Cache the original output token ids
cache_orig_output_tokens_len.append({})
for j, data in seq_group_metadata.seq_data.items():
cache_orig_output_tokens_len[i][j] = \
Expand Down

0 comments on commit efe0268

Please sign in to comment.