Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support throughput benchmarking for mllama with vision input #629

Closed

Conversation

yisonzhu
Copy link

@yisonzhu yisonzhu commented Dec 13, 2024

Before, the benchmark_through.py did not support multi-modal data for most of the VLMs like mllama. The purpose of this PR is to facilitate our throughput test for mllama on HPU.
Only for internal test, not aiming for upstream.

Usage

We can now test with such commands:

export VLLM_DECODE_BLOCK_BUCKET_MAX=384
export VLLM_PROMPT_SEQ_BUCKET_MAX=128
python benchmark_throughput.py --model=meta-llama/Llama-3.2-11B-Vision-Instruct --max-model-len=4096 --input-len=13 --output-len=40 --num-prompts=32 --max-num-seqs=4 --mm-data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant