vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 5.2k
Star 34k

Code
Issues 1.2k
Pull requests 459
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 56 Milestones 0

New pull request New

459 Open 5,258 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Misc] Pass attention to impl backend

#12218 opened Jan 20, 2025 by wangxiyuan

Loading…

[V1] Remove _get_cache_block_size ready

ONLY add when PR is ready to merge/full CI is needed

#12214 opened Jan 20, 2025 by heheda12345

Loading…

[VLM] Merged multi-modal processor for Pixtral

#12211 opened Jan 20, 2025 by Flechman • Draft

[bugfix] catch xgrammar unsupported array constraints

#12210 opened Jan 20, 2025 by Jason-CKY

Loading…

[core][bugfix] configure env var during import vllm ready

ONLY add when PR is ready to merge/full CI is needed

#12209 opened Jan 20, 2025 by youkaichao

Loading…

[Model]: get aria to work with the lastest transfomers impl needs-rebase

#12207 opened Jan 20, 2025 by xffxff

Loading…

[Model] Introduce CUDA Graph support for DeepSeek v3

#12204 opened Jan 20, 2025 by houseroad

Loading…

[V1][Spec Decode] Ngram Spec Decode

#12193 opened Jan 19, 2025 by LiuXiaoxuanPKU • Draft

6 tasks

[Bugfix] fix race condition that leads to wrong order of token returned

#12192 opened Jan 19, 2025 by joennlae

Loading…

[misc] add cuda runtime version to usage data

#12190 opened Jan 19, 2025 by youkaichao

Loading…

[Misc] Add Gemma2 GGUF support

#12186 opened Jan 18, 2025 by Isotr0py • Draft

[Kernel] add triton fused moe kernel for gptq/awq

#12185 opened Jan 18, 2025 by jinzhen-lin

Loading…

[Hardware][Gaudi][Bugfix] Fix HPU tensor parallelism, enable multiprocessing executor

#12167 opened Jan 17, 2025 by kzawora-intel

Loading…

[Quantization/Parameter] WIP: Another Implementation of the Quantization Parameter Subclass Substitution

#12158 opened Jan 17, 2025 by cennn

Loading…

[Core] Optimize topp/topk calculation in sampler

#12156 opened Jan 17, 2025 by afierka-intel • Draft

[WIP][Hardware][CPU] testing branch for mlperf ci/build documentation

Improvements or additions to documentation

needs-rebase

#12141 opened Jan 17, 2025 by bigPYJ1151 • Draft

[Hardware][Gaudi][Feature] Support Contiguous PA

#12139 opened Jan 17, 2025 by zhouyu5 • Draft

[WIP] Multimodal model support for V1 TPU

#12133 opened Jan 16, 2025 by mgoin • Draft

[Misc] Update to Transformers 4.48 ci/build ready

ONLY add when PR is ready to merge/full CI is needed

#12120 opened Jan 16, 2025 by tlrmchlsmth

Loading…

[Feature] Support VPTQ quantization ci/build

#12117 opened Jan 16, 2025 by wejoncy • Draft

[BUILD] Add VLLM_BUILD_EXT to control custom op build ci/build

#12116 opened Jan 16, 2025 by MengqingCao

Loading…

[Misc]add modules_to_not_convert attribute to gptq series

#12103 opened Jan 16, 2025 by 1096125073

Loading…

Use CUDA 12.4 as default for release and nightly wheels ci/build documentation

Improvements or additions to documentation

#12098 opened Jan 15, 2025 by mgoin

Loading…

Add: Support for Sparse24Bitmask Compressed Models

#12097 opened Jan 15, 2025 by rahul-tuli • Draft

1 task

[V1][Perf] Reduce scheduling overhead in model runner after cuda sync

#12094 opened Jan 15, 2025 by youngkent

Loading…

Previous 1 2 3 4 5 … 18 19 Next

Previous Next

ProTip! Updated in the last three days: updated:>2025-01-17.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly