Nov 18 rebase#485
Merged
kzawora-intel merged 187 commits intohabana_mainfrom private/kzawora/nov_12_rebaseNov 18, 2024
+20,768-7,375
Commits
Commits on Nov 6, 2024
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Nov 7, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Misc] Add Gamma-Distribution Request Generation Support for Serving Benchmark. (vllm-project#10105)
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Nov 8, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Disable spec-decode + chunked-prefill for draft models with tensor parallelism > 1 (vllm-project#10136)
authored- authored
- authored
- authored
- authored
- authored
Commits on Nov 9, 2024
[Kernel][Triton] Add Triton implementation for scaled_mm_triton to support fp8 and int8 SmoothQuant, symmetric case (vllm-project#9857)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Nov 10, 2024
- authored
- authored
Commits on Nov 11, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Nov 12, 2024
- authored
- authored
- authored
- authored
- authored
[BugFix] Do not raise a
ValueError
whentool_choice
is set to the supportednone
option andtools
are not defined. (vllm-project#10000)authored- committed
- authored
- authored
- authored
[V1] Use pickle for serializing EngineCoreRequest & Add multimodal inputs to EngineCoreRequest (vllm-project#10245)
authored- authored
- authored
- authored
- authored
- authored
Commits on Nov 13, 2024
- authored
- authored
- authored
- authored
[Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions (vllm-project#10221)
authored[Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1 (vllm-project#9944)
- authored
- authored
- authored
- authored
- authored
Commits on Nov 14, 2024
- authored
- authored
- authored
- authored
[BugFix]: properly deserialize
tool_calls
iterator before processing by mistral-common when MistralTokenizer is used (vllm-project#9951)authored- authored
- authored
- authored
- authored
Commits on Nov 15, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Bugfix] Ensure special tokens are properly filtered out for guided structured output with MistralTokenizer (vllm-project#10363)
authored- authored
- authored
- authored
- authored
- authored
- authored
Commits on Nov 16, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Nov 17, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Nov 18, 2024
- authored
- authored
- authored
- authored
- authored
- committed
- committed
- committed
- committed