Skip to content

Commit

Permalink
Oct 28 rebase (#439)
Browse files Browse the repository at this point in the history
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Max de Bayser <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: Varad Ahirwadkar <[email protected]>
Signed-off-by: Wallas Santos <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Rafael Vasquez <[email protected]>
Signed-off-by: Yuan Zhou <[email protected]>
Signed-off-by: luka <[email protected]>
Signed-off-by: Alex-Brooks <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: mgoin <[email protected]>
Signed-off-by: Vinay Damodaran <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: charlifu <[email protected]>
Signed-off-by: Sam Stoelinga <[email protected]>
Signed-off-by: Vasily Alexeev <[email protected]>
Signed-off-by: Kevin-Yang <[email protected]>
Signed-off-by: Abatom <[email protected]>
Signed-off-by: Bill Nell <[email protected]>
Signed-off-by: wangshuai09 <[email protected]>
Signed-off-by: Qishuai [email protected]
Signed-off-by: yuze.zyz <[email protected]>
Signed-off-by: Yannick Schnider <[email protected]>
Signed-off-by: Kunjan Patel <[email protected]>
Signed-off-by: simon-mo <[email protected]>
Signed-off-by: kevin <[email protected]>
Signed-off-by: YiSheng5 <[email protected]>
Signed-off-by: yan ma <[email protected]>
Signed-off-by: Went-Liang <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: sasha0552 <[email protected]>
Signed-off-by: mzusman <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: André Jonasson <[email protected]>
Signed-off-by: Gene Su <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: Peter Salas <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Michael Green <[email protected]>
Signed-off-by: Shanshan Wang <[email protected]>
Signed-off-by: Gregory Shtrasberg <[email protected]>
Signed-off-by: daitran2k1 <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: Hissu Hyvarinen <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Linkun Chen <[email protected]>
Signed-off-by: Tomer Asida <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: sasha0552 <[email protected]>
Co-authored-by: Woosuk Kwon <[email protected]>
Co-authored-by: Li, Jiang <[email protected]>
Co-authored-by: Kuntai Du <[email protected]>
Co-authored-by: Daniele <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Co-authored-by: bnellnm <[email protected]>
Co-authored-by: Kai Wu <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
Co-authored-by: Shashwat Srijan <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Co-authored-by: Andrew Feldman <[email protected]>
Co-authored-by: afeldman-nm <[email protected]>
Co-authored-by: laishzh <[email protected]>
Co-authored-by: Max de Bayser <[email protected]>
Co-authored-by: Max de Bayser <[email protected]>
Co-authored-by: Dipika Sikka <[email protected]>
Co-authored-by: Joe Runde <[email protected]>
Co-authored-by: Haoyu Wang <[email protected]>
Co-authored-by: Russell Bryant <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Co-authored-by: tomeras91 <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Michael Goin <[email protected]>
Co-authored-by: Kunjan <[email protected]>
Co-authored-by: Kunjan Patel <kunjanp_google_com@vllm.us-central1-a.c.kunjanp-gke-dev-2.internal>
Co-authored-by: Cody Yu <[email protected]>
Co-authored-by: Thomas Parnell <[email protected]>
Co-authored-by: Chih-Chieh Yang <[email protected]>
Co-authored-by: Yue Zhang <[email protected]>
Co-authored-by: Chen Zhang <[email protected]>
Co-authored-by: Andy Dai <[email protected]>
Co-authored-by: Dhia Eddine Rhaiem <[email protected]>
Co-authored-by: yudian0504 <[email protected]>
Co-authored-by: Varad Ahirwadkar <[email protected]>
Co-authored-by: youkaichao <[email protected]>
Co-authored-by: Baoyuan Qi <[email protected]>
Co-authored-by: Wallas Henrique <[email protected]>
Co-authored-by: Travis Johnson <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Co-authored-by: ngrozae <[email protected]>
Co-authored-by: Falko1 <[email protected]>
Co-authored-by: Rafael Vasquez <[email protected]>
Co-authored-by: chenqianfzh <[email protected]>
Co-authored-by: wangshuai09 <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>
Co-authored-by: xendo <[email protected]>
Co-authored-by: Jerzy Zagorski <[email protected]>
Co-authored-by: gopalsarda <[email protected]>
Co-authored-by: Yuan <[email protected]>
Co-authored-by: Gubrud, Aaron D <[email protected]>
Co-authored-by: adgubrud <[email protected]>
Co-authored-by: Yuhong Guo <[email protected]>
Co-authored-by: Yuhong Guo <[email protected]>
Co-authored-by: Ronen Schaffer <[email protected]>
Co-authored-by: Aurick Qiao <[email protected]>
Co-authored-by: Jeremy Arnold <[email protected]>
Co-authored-by: Lucas Wilkinson <[email protected]>
Co-authored-by: yulei <[email protected]>
Co-authored-by: Seth Kimmel <[email protected]>
Co-authored-by: Kaunil Dhruv <[email protected]>
Co-authored-by: Flex Wang <[email protected]>
Co-authored-by: Mengqing Cao <[email protected]>
Co-authored-by: Alex Brooks <[email protected]>
Co-authored-by: Yongzao <[email protected]>
Co-authored-by: Yunfei Chu <[email protected]>
Co-authored-by: Vinay R Damodaran <[email protected]>
Co-authored-by: Yan Ma <[email protected]>
Co-authored-by: Zhuohan Li <[email protected]>
Co-authored-by: litianjian <[email protected]>
Co-authored-by: Harry Mellor <[email protected]>
Co-authored-by: Charlie Fu <[email protected]>
Co-authored-by: Kevin H. Luu <[email protected]>
Co-authored-by: Will Johnson <[email protected]>
Co-authored-by: pavlo-ruban <[email protected]>
Co-authored-by: Sam Stoelinga <[email protected]>
Co-authored-by: ErkinSagiroglu <[email protected]>
Co-authored-by: Vasiliy Alekseev <[email protected]>
Co-authored-by: kakao-kevin-us <[email protected]>
Co-authored-by: Kevin-Yang <[email protected]>
Co-authored-by: 科英 <[email protected]>
Co-authored-by: madt2709 <[email protected]>
Co-authored-by: litianjian <[email protected]>
Co-authored-by: Zhong Qishuai <[email protected]>
Co-authored-by: tastelikefeet <[email protected]>
Co-authored-by: Sven Seeberg <[email protected]>
Co-authored-by: yannicks1 <[email protected]>
Co-authored-by: Junichi Sato <[email protected]>
Co-authored-by: Kunjan <[email protected]>
Co-authored-by: Will Eaton <[email protected]>
Co-authored-by: Simon Mo <[email protected]>
Co-authored-by: Lily Liu <[email protected]>
Co-authored-by: YiSheng5 <[email protected]>
Co-authored-by: Went-Liang <[email protected]>
Co-authored-by: Elfie Guo <[email protected]>
Co-authored-by: Harsha vardhan manoj Bikki <[email protected]>
Co-authored-by: Guillaume Calmettes <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]>
Co-authored-by: Mor Zusman <[email protected]>
Co-authored-by: Prashant Gupta <[email protected]>
Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: André Jonasson <[email protected]>
Co-authored-by: Pavani Majety <[email protected]>
Co-authored-by: Gene Der Su <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Peter Salas <[email protected]>
Co-authored-by: sroy745 <[email protected]>
Co-authored-by: Michael Green <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Co-authored-by: Nikita Furin <[email protected]>
Co-authored-by: shanshan wang <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Gregory Shtrasberg <[email protected]>
Co-authored-by: Yang Zheng <[email protected]>
Co-authored-by: Yang Zheng(SW)(Alex) <[email protected]>
Co-authored-by: Tran Quang Dai <[email protected]>
Co-authored-by: Chauncey <[email protected]>
Co-authored-by: hissu-hyvarinen <[email protected]>
Co-authored-by: lkchen <[email protected]>
Co-authored-by: Linkun Chen <[email protected]>
Co-authored-by: Linkun Chen <[email protected]>
Co-authored-by: Gene Der Su <[email protected]>
  • Loading branch information
Show file tree
Hide file tree
Showing 590 changed files with 30,447 additions and 15,168 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# bash .buildkite/lm-eval-harness/run-lm-eval-gsm-vllm-baseline.sh -m neuralmagic/Llama-3.2-1B-Instruct-quantized.w8a8 -b "auto" -l 1000 -f 5 -t 1
model_name: "neuralmagic/Llama-3.2-1B-Instruct-quantized.w8a8"
tasks:
- name: "gsm8k"
metrics:
- name: "exact_match,strict-match"
value: 0.356
- name: "exact_match,flexible-extract"
value: 0.358
limit: 1000
num_fewshot: 5
2 changes: 1 addition & 1 deletion .buildkite/lm-eval-harness/configs/models-small.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Meta-Llama-3-8B-Instruct.yaml
Meta-Llama-3-8B-Instruct-FP8-compressed-tensors.yaml
Meta-Llama-3-8B-Instruct-INT8-compressed-tensors.yaml
Meta-Llama-3.2-1B-Instruct-INT8-compressed-tensors.yaml
Meta-Llama-3-8B-Instruct-INT8-compressed-tensors-asym.yaml
Meta-Llama-3-8B-Instruct-nonuniform-compressed-tensors.yaml
Meta-Llama-3-8B-Instruct-Channelwise-compressed-tensors.yaml
Expand Down
4 changes: 2 additions & 2 deletions .buildkite/release-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ steps:
agents:
queue: cpu_queue
commands:
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg buildkite_commit=$BUILDKITE_COMMIT --build-arg USE_SCCACHE=1 --build-arg CUDA_VERSION=12.1.0 --tag vllm-ci:build-image --target build --progress plain ."
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=12.1.0 --tag vllm-ci:build-image --target build --progress plain ."
- "mkdir artifacts"
- "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image bash -c 'cp -r dist /artifacts_host && chmod -R a+rw /artifacts_host'"
# rename the files to change linux -> manylinux1
Expand All @@ -22,7 +22,7 @@ steps:
agents:
queue: cpu_queue
commands:
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg buildkite_commit=$BUILDKITE_COMMIT --build-arg USE_SCCACHE=1 --build-arg CUDA_VERSION=11.8.0 --tag vllm-ci:build-image --target build --progress plain ."
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg USE_SCCACHE=1 --build-arg GIT_REPO_CHECK=1 --build-arg CUDA_VERSION=11.8.0 --tag vllm-ci:build-image --target build --progress plain ."
- "mkdir artifacts"
- "docker run --rm -v $(pwd)/artifacts:/artifacts_host vllm-ci:build-image bash -c 'cp -r dist /artifacts_host && chmod -R a+rw /artifacts_host'"
# rename the files to change linux -> manylinux1
Expand Down
15 changes: 8 additions & 7 deletions .buildkite/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ cleanup_docker() {
echo "Disk usage is above $threshold%. Cleaning up Docker images and volumes..."
# Remove dangling images (those that are not tagged and not used by any container)
docker image prune -f
# Remove unused volumes
docker volume prune -f
# Remove unused volumes / force the system prune for old images as well.
docker volume prune -f && docker system prune --force --filter "until=72h" --all
echo "Docker images and volumes cleanup completed."
else
echo "Disk usage is below $threshold%. No cleanup needed."
Expand Down Expand Up @@ -107,11 +107,12 @@ fi
PARALLEL_JOB_COUNT=8
# check if the command contains shard flag, we will run all shards in parallel because the host have 8 GPUs.
if [[ $commands == *"--shard-id="* ]]; then
# assign job count as the number of shards used
commands=${commands//"--num-shards= "/"--num-shards=${PARALLEL_JOB_COUNT} "}
for GPU in $(seq 0 $(($PARALLEL_JOB_COUNT-1))); do
#replace shard arguments
commands=${commands//"--shard-id= "/"--shard-id=${GPU} "}
commands=${commands//"--num-shards= "/"--num-shards=${PARALLEL_JOB_COUNT} "}
echo "Shard ${GPU} commands:$commands"
# assign shard-id for each shard
commands_gpu=${commands//"--shard-id= "/"--shard-id=${GPU} "}
echo "Shard ${GPU} commands:$commands_gpu"
docker run \
--device /dev/kfd --device /dev/dri \
--network host \
Expand All @@ -123,7 +124,7 @@ if [[ $commands == *"--shard-id="* ]]; then
-e HF_HOME=${HF_MOUNT} \
--name ${container_name}_${GPU} \
${image_name} \
/bin/bash -c "${commands}" \
/bin/bash -c "${commands_gpu}" \
|& while read -r line; do echo ">>Shard $GPU: $line"; done &
PIDS+=($!)
done
Expand Down
8 changes: 4 additions & 4 deletions .buildkite/run-cpu-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,10 @@ docker exec cpu-test bash -c "
--ignore=tests/models/decoder_only/language/test_danube3_4b.py" # Mamba and Danube3-4B on CPU is not supported

# Run compressed-tensor test
# docker exec cpu-test bash -c "
# pytest -s -v \
# tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup \
# tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynanmic_per_token"
docker exec cpu-test bash -c "
pytest -s -v \
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_static_setup \
tests/quantization/test_compressed_tensors.py::test_compressed_tensors_w8a8_dynamic_per_token"

# Run AWQ test
docker exec cpu-test bash -c "
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/run-tpu-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ remove_docker_container
# For HF_TOKEN.
source /etc/environment
# Run a simple end-to-end example.
docker run --privileged --net host --shm-size=16G -it -e HF_TOKEN=$HF_TOKEN --name tpu-test vllm-tpu /bin/bash -c "python3 -m pip install git+https://github.com/thuml/depyf.git && python3 -m pip install pytest && pytest -v -s /workspace/vllm/tests/tpu/test_custom_dispatcher.py && python3 /workspace/vllm/tests/tpu/test_compilation.py && python3 /workspace/vllm/examples/offline_inference_tpu.py"
docker run --privileged --net host --shm-size=16G -it -e HF_TOKEN=$HF_TOKEN --name tpu-test vllm-tpu /bin/bash -c "python3 -m pip install git+https://github.com/thuml/depyf.git && python3 -m pip install pytest && python3 -m pip install lm_eval[api]==0.4.4 && pytest -v -s /workspace/vllm/tests/entrypoints/openai/test_accuracy.py && pytest -v -s /workspace/vllm/tests/tpu/test_custom_dispatcher.py && python3 /workspace/vllm/tests/tpu/test_compilation.py && python3 /workspace/vllm/examples/offline_inference_tpu.py"
73 changes: 47 additions & 26 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
# label(str): the name of the test. emoji allowed.
# fast_check(bool): whether to run this on each commit on fastcheck pipeline.
# fast_check_only(bool): run this test on fastcheck pipeline only
# nightly(bool): run this test in nightly pipeline only
# optional(bool): never run this test by default (i.e. need to unblock manually)
# command(str): the single command to run for tests. incompatible with commands.
# commands(list): the list of commands to run for test. incompatbile with command.
Expand Down Expand Up @@ -77,8 +78,8 @@ steps:
- vllm/
- tests/basic_correctness/test_chunked_prefill
commands:
- VLLM_ATTENTION_BACKEND=XFORMERS VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s basic_correctness/test_chunked_prefill.py
- VLLM_ATTENTION_BACKEND=FLASH_ATTN VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s basic_correctness/test_chunked_prefill.py
- VLLM_ATTENTION_BACKEND=XFORMERS pytest -v -s basic_correctness/test_chunked_prefill.py
- VLLM_ATTENTION_BACKEND=FLASH_ATTN pytest -v -s basic_correctness/test_chunked_prefill.py

- label: Core Test # 10min
mirror_hardwares: [amd]
Expand All @@ -88,11 +89,7 @@ steps:
- vllm/distributed
- tests/core
commands:
- VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s core/test_scheduler.py
- VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s core core/test_chunked_prefill_scheduler.py
- VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s core core/block/e2e/test_correctness.py
- VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s core core/block/e2e/test_correctness_sliding_window.py
- pytest -v -s core --ignore=core/block/e2e/test_correctness.py --ignore=core/test_scheduler.py --ignore=core/test_chunked_prefill_scheduler.py --ignore=core/block/e2e/test_correctness.py --ignore=core/block/e2e/test_correctness_sliding_window.py
- pytest -v -s core

- label: Entrypoints Test # 40min
working_dir: "/vllm-workspace/tests"
Expand Down Expand Up @@ -184,15 +181,15 @@ steps:
- python3 offline_inference_vision_language_multi_image.py
- python3 tensorize_vllm_model.py --model facebook/opt-125m serialize --serialized-directory /tmp/ --suffix v1 && python3 tensorize_vllm_model.py --model facebook/opt-125m deserialize --path-to-tensors /tmp/vllm/facebook/opt-125m/v1/model.tensors
- python3 offline_inference_encoder_decoder.py
- python3 offline_profile.py --model facebook/opt-125m

- label: Prefix Caching Test # 9min
#mirror_hardwares: [amd]
source_file_dependencies:
- vllm/
- tests/prefix_caching
commands:
- VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s prefix_caching/test_prefix_caching.py
- pytest -v -s prefix_caching --ignore=prefix_caching/test_prefix_caching.py
- pytest -v -s prefix_caching

- label: Samplers Test # 36min
source_file_dependencies:
Expand All @@ -216,8 +213,7 @@ steps:
- tests/spec_decode
commands:
- pytest -v -s spec_decode/e2e/test_multistep_correctness.py
- VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest -v -s spec_decode/e2e/test_compatibility.py
- VLLM_ATTENTION_BACKEND=FLASH_ATTN pytest -v -s spec_decode --ignore=spec_decode/e2e/test_multistep_correctness.py --ignore=spec_decode/e2e/test_compatibility.py
- VLLM_ATTENTION_BACKEND=FLASH_ATTN pytest -v -s spec_decode --ignore=spec_decode/e2e/test_multistep_correctness.py

- label: LoRA Test %N # 15min each
mirror_hardwares: [amd]
Expand All @@ -234,15 +230,16 @@ steps:
- tests/compile
commands:
- pytest -v -s compile/test_basic_correctness.py
# these tests need to be separated, cannot combine
- pytest -v -s compile/piecewise/test_simple.py
- pytest -v -s compile/piecewise/test_toy_llama.py

# TODO: re-write in comparison tests, and fix symbolic shape
# for quantization ops.
# - label: "PyTorch Fullgraph Test" # 18min
# source_file_dependencies:
# - vllm/
# - tests/compile
# commands:
# - pytest -v -s compile/test_full_graph.py
- label: "PyTorch Fullgraph Test" # 18min
source_file_dependencies:
- vllm/
- tests/compile
commands:
- pytest -v -s compile/test_full_graph.py

- label: Kernels Test %N # 1h each
mirror_hardwares: [amd]
Expand Down Expand Up @@ -317,33 +314,57 @@ steps:
- pytest -v -s models/test_oot_registration.py # it needs a clean process
- pytest -v -s models/*.py --ignore=models/test_oot_registration.py

- label: Decoder-only Language Models Test # 1h36min
- label: Decoder-only Language Models Test (Standard) # 35min
#mirror_hardwares: [amd]
source_file_dependencies:
- vllm/
- tests/models/decoder_only/language
commands:
- pytest -v -s models/decoder_only/language
- pytest -v -s models/decoder_only/language/test_models.py
- pytest -v -s models/decoder_only/language/test_big_models.py

- label: Decoder-only Multi-Modal Models Test # 1h31min
- label: Decoder-only Language Models Test (Extended) # 1h20min
nightly: true
source_file_dependencies:
- vllm/
- tests/models/decoder_only/language
commands:
- pytest -v -s models/decoder_only/language --ignore=models/decoder_only/language/test_models.py --ignore=models/decoder_only/language/test_big_models.py

- label: Decoder-only Multi-Modal Models Test (Standard)
#mirror_hardwares: [amd]
source_file_dependencies:
- vllm/
- tests/models/decoder_only/audio_language
- tests/models/decoder_only/vision_language
commands:
- pytest -v -s models/decoder_only/audio_language
- pytest -v -s models/decoder_only/vision_language
- pytest -v -s models/decoder_only/audio_language -m core_model
- pytest -v -s --ignore models/decoder_only/vision_language/test_phi3v.py models/decoder_only/vision_language -m core_model

- label: Decoder-only Multi-Modal Models Test (Extended)
nightly: true
source_file_dependencies:
- vllm/
- tests/models/decoder_only/audio_language
- tests/models/decoder_only/vision_language
commands:
- pytest -v -s models/decoder_only/audio_language -m 'not core_model'
# HACK - run phi3v tests separately to sidestep this transformers bug
# https://github.com/huggingface/transformers/issues/34307
- pytest -v -s models/decoder_only/vision_language/test_phi3v.py
- pytest -v -s --ignore models/decoder_only/vision_language/test_phi3v.py models/decoder_only/vision_language -m 'not core_model'

- label: Other Models Test # 6min
#mirror_hardwares: [amd]
source_file_dependencies:
- vllm/
- tests/models/embedding/language
- tests/models/embedding/vision_language
- tests/models/encoder_decoder/language
- tests/models/encoder_decoder/vision_language
commands:
- pytest -v -s models/embedding/language
- pytest -v -s models/embedding/vision_language
- pytest -v -s models/encoder_decoder/language
- pytest -v -s models/encoder_decoder/vision_language

Expand Down Expand Up @@ -402,11 +423,11 @@ steps:
- pytest -v -s ./compile/test_basic_correctness.py
- pytest -v -s ./compile/test_wrapper.py
- VLLM_TEST_SAME_HOST=1 torchrun --nproc-per-node=4 distributed/test_same_node.py | grep -q 'Same node test passed'
- TARGET_TEST_SUITE=L4 VLLM_ALLOW_DEPRECATED_BLOCK_MANAGER_V1=1 pytest basic_correctness/ -v -s -m distributed_2_gpus
- TARGET_TEST_SUITE=L4 pytest basic_correctness/ -v -s -m distributed_2_gpus
# Avoid importing model tests that cause CUDA reinitialization error
- pytest models/encoder_decoder/language/test_bart.py -v -s -m distributed_2_gpus
- pytest models/encoder_decoder/vision_language/test_broadcast.py -v -s -m distributed_2_gpus
- pytest models/decoder_only/vision_language/test_broadcast.py -v -s -m distributed_2_gpus
- pytest models/decoder_only/vision_language/test_models.py -v -s -m distributed_2_gpus
- pytest -v -s spec_decode/e2e/test_integration_dist_tp2.py
- pip install -e ./plugins/vllm_add_dummy_model
- pytest -v -s distributed/test_distributed_oot.py
Expand Down
31 changes: 29 additions & 2 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,33 @@
/.github/
/.venv
/build
dist
Dockerfile*
vllm/*.so

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

.mypy_cache

# Distribution / packaging
.Python
/build/
cmake-build-*/
CMakeUserPresets.json
develop-eggs/
/dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
25 changes: 25 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,28 @@ updates:
directory: "/"
schedule:
interval: "weekly"
- package-ecosystem: "pip"
directory: "/"
schedule:
interval: "weekly"
labels: ["dependencies"]
open-pull-requests-limit: 5
reviewers: ["khluu", "simon-mo"]
allow:
- dependency-type: "all"
ignore:
- dependency-name: "torch"
- dependency-name: "torchvision"
- dependency-name: "xformers"
- dependency-name: "lm-format-enforcer"
- dependency-name: "gguf"
- dependency-name: "compressed-tensors"
- dependency-name: "ray[adag]"
- dependency-name: "lm-eval"
groups:
patch-update:
applies-to: version-updates
update-types: ["patch"]
minor-update:
applies-to: version-updates
update-types: ["minor"]
58 changes: 58 additions & 0 deletions .github/mergify.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
pull_request_rules:
- name: label-documentation
description: Automatically apply documentation label
conditions:
- or:
- files~=^[^/]+\.md$
- files~=^docs/
actions:
label:
add:
- documentation

- name: label-ci-build
description: Automatically apply ci/build label
conditions:
- or:
- files~=^\.github/
- files~=\.buildkite/
- files~=^cmake/
- files=CMakeLists.txt
- files~=^Dockerfile
- files~=^requirements.*\.txt
- files=setup.py
actions:
label:
add:
- ci/build

- name: label-frontend
description: Automatically apply frontend label
conditions:
- files~=^vllm/entrypoints/
actions:
label:
add:
- frontend

- name: ping author on conflicts and add 'needs-rebase' label
conditions:
- conflict
- -closed
actions:
label:
add:
- needs-rebase
comment:
message: |
This pull request has merge conflicts that must be resolved before it can be
merged. @{{author}} please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork
- name: remove 'needs-rebase' label when conflict is resolved
conditions:
- -conflict
- -closed
actions:
label:
remove:
- needs-rebase
1 change: 1 addition & 0 deletions .github/workflows/actionlint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,5 @@ jobs:

- name: "Run actionlint"
run: |
echo "::add-matcher::.github/workflows/matchers/actionlint.json"
tools/actionlint.sh -color
2 changes: 1 addition & 1 deletion .github/workflows/add_label_automerge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Add label
uses: actions/github-script@v7
uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7.0.1
with:
script: |
github.rest.issues.addLabels({
Expand Down
Loading

0 comments on commit 7578f3b

Please sign in to comment.