Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit decode block size #532

Merged
merged 15 commits into from
Nov 25, 2024

Conversation

mfylcek
Copy link

@mfylcek mfylcek commented Nov 21, 2024

Limit decode bucket size to num_hpu_blocks

@mfylcek mfylcek marked this pull request as draft November 21, 2024 09:27
vllm/worker/hpu_model_runner.py Outdated Show resolved Hide resolved
@mfylcek mfylcek marked this pull request as ready for review November 22, 2024 10:09
@mfylcek mfylcek marked this pull request as draft November 22, 2024 11:10
@mfylcek mfylcek marked this pull request as ready for review November 22, 2024 13:21
vllm/worker/hpu_model_runner.py Outdated Show resolved Hide resolved
@kdamaszk
Copy link

@mfylcek PR #534 is already merged and entire bucketing logic is moved to vllm-hpu-extension. I'm afraid that your changes in hpu_model_runner have to be moved into that repo.

@mfylcek
Copy link
Author

mfylcek commented Nov 22, 2024

PR in vllm-hpu-extension: HabanaAI/vllm-hpu-extension#41

@mfylcek mfylcek changed the title Limit contiguous PA bucket size Limit decode block size Nov 25, 2024
@michalkuligowski michalkuligowski merged commit 39c6b6c into habana_main Nov 25, 2024
12 checks passed
@michalkuligowski michalkuligowski deleted the dev/mfylcek/limit_cpa_bucket_size branch November 25, 2024 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants