Support throughput benchmarking for mllama with vision input #629

yisonzhu · 2024-12-13T08:05:16Z

Before, the benchmark_through.py did not support multi-modal data for most of the VLMs like mllama. The purpose of this PR is to facilitate our throughput test for mllama on HPU.
Only for internal test, not aiming for upstream.

Usage

We can now test with such commands:

export VLLM_DECODE_BLOCK_BUCKET_MAX=384
export VLLM_PROMPT_SEQ_BUCKET_MAX=128
python benchmark_throughput.py --model=meta-llama/Llama-3.2-11B-Vision-Instruct --max-model-len=4096 --input-len=13 --output-len=40 --num-prompts=32 --max-num-seqs=4 --mm-data

yisonzhu added 2 commits December 13, 2024 06:53

add benchmark throughput support for mllama

53aad91

Merge branch 'habana_main' into yishan/bench_throughput_mllama

d78e77d

yisonzhu requested review from kzawora-intel, madamczykhabana, michalkuligowski and mgawarkiewicz as code owners December 13, 2024 08:05

yisonzhu closed this Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support throughput benchmarking for mllama with vision input #629

Support throughput benchmarking for mllama with vision input #629

yisonzhu commented Dec 13, 2024 •

edited by github-actions bot

Loading

Support throughput benchmarking for mllama with vision input #629

Support throughput benchmarking for mllama with vision input #629

Conversation

yisonzhu commented Dec 13, 2024 • edited by github-actions bot Loading

Usage

yisonzhu commented Dec 13, 2024 •

edited by github-actions bot

Loading