Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Upstream test PR #322

Closed
wants to merge 431 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
431 commits
Select commit Hold shift + click to select a range
b4f6a29
Remove mark step from static MoE loop (#231)
jkaniecki Sep 3, 2024
733524a
Add newline at EOF
xuechendi Sep 3, 2024
fb98cad
Remove requires_grad=False
xuechendi Sep 3, 2024
49ffde6
Change mask to lora_mask
hlahkar Sep 4, 2024
538c8f1
Move compute_logits to Mask Based Implementation
hlahkar Sep 4, 2024
691255b
Enable llama-405b - w/a for memory allocation error (#184)
afierka-intel Sep 4, 2024
a4e1d52
[bugfix] handle large bucket minimums correctly (#235)
kzawora-intel Sep 4, 2024
8046d81
fix guided_decode HPU failing issue
xuechendi Sep 4, 2024
7cd226c
Remove token budget from decode buckets (#241)
kzawora-intel Sep 5, 2024
d0eb7d7
[habana_main bugfix] Fix min bucket boundary calculation (#239)
kzawora-intel Sep 5, 2024
05acb89
Mask based BGMV implementation (#223)
vivekgoe Sep 5, 2024
d2e2854
fix rotary embedding
jikunshang Sep 6, 2024
97bd0fd
Avoiding torch.index_select for embedding LoRA–B
SanjuCSudhakaran Sep 3, 2024
ededdaf
Remove special handling of no-LoRA case
SanjuCSudhakaran Sep 4, 2024
b507cc4
Update test
SanjuCSudhakaran Sep 4, 2024
016f343
Fix formatting
SanjuCSudhakaran Sep 6, 2024
d9fa7cf
Dispersed dummy slots (#243)
madamczykhabana Sep 6, 2024
7488c58
Use PT_COMPILE_ONLY_MODE during warmup (#227)
mfylcek Sep 6, 2024
17447ed
Do not pass warmup_mode to execute_model_kwargs (#229)
kzawora-intel Sep 6, 2024
b50aa14
Add error handling for PT_COMPILE_ONLY_MODE (#251)
kzawora-intel Sep 6, 2024
00f1333
Hardcode fastapi version due to pydantic error (#255)
hlahkar Sep 9, 2024
b764610
Mask based BGMV implementation for LoRA Embedding (#247)
vivekgoe Sep 9, 2024
73af823
Eliminate graph breaks for torch.compile mode (#202)
yuwenzho Sep 9, 2024
5cf8441
Port flat PA from habana_next to habana_main (#169)
dolszewska Sep 10, 2024
2fed15b
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Sep 10, 2024
f74fe23
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel Sep 10, 2024
e2c8b5a
format.sh
kzawora-intel Sep 10, 2024
4194195
i did not drink my afternoon coffee and made an oopsie
kzawora-intel Sep 10, 2024
4052bdb
Add disable_tensor_cache=True to HPUGraph capture (#252)
kzawora-intel Sep 10, 2024
c9bf908
do not build core ext on hpu
kzawora-intel Sep 10, 2024
69df1e7
Fix dispersed slots (#261)
madamczykhabana Sep 10, 2024
53f96b7
Skip compilation warnings during warmup phase (#262)
jkaniecki Sep 10, 2024
d436d38
fix tensor parallelism
kzawora-intel Sep 10, 2024
61b6fbb
add missing functions
kzawora-intel Sep 10, 2024
2091161
Port PT Profiler to habana_main (#256)
adobrzyniewicz-habana Sep 11, 2024
c9bdcbe
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel Sep 11, 2024
8e41fb5
Merge remote-tracking branch 'upstream/main' into private/kzawora/vll…
kzawora-intel Sep 11, 2024
68e0f57
Reduce frequency of garbage collector
kwisniewski98 Sep 6, 2024
b776d5e
Fix LoRA test by handling mask creation inside the test
SanjuCSudhakaran Sep 11, 2024
c0ff22f
Fix LoRA test by handling mask creation inside the test (#270)
vivekgoe Sep 11, 2024
f858d43
Attn MetaData dtype should be same as model dtype (#271)
hlahkar Sep 12, 2024
acf7d54
Support Mixtral quantization using INC (#267)
dudilester Sep 12, 2024
6a734f4
Fixed ALiBi (#254)
itaraban Sep 12, 2024
543bb6d
Update gaudi-installation.rst (#279)
dolszewska Sep 12, 2024
c2c1e0f
Move setting gc threshold to separate function
kwisniewski98 Sep 12, 2024
6b3503c
Fix mypy issues
kwisniewski98 Sep 12, 2024
8535d53
Fix line too long
kwisniewski98 Sep 12, 2024
27b618a
Format files
kwisniewski98 Sep 12, 2024
35a4a98
Remove hardcoded value from softmax in flat_pa (#280)
madamczykhabana Sep 12, 2024
046cb25
Fix yapf detected format issue
xuechendi Sep 12, 2024
aa4c59c
some update to vision model
xuechendi Sep 12, 2024
181babf
resolve conflicts
xuechendi Sep 12, 2024
88b06c2
Increase garbage collector's threshold (#281)
kwisniewski98 Sep 13, 2024
54c1688
[Bugfix][Habana_main] fix guided_decode HPU failing issue (#236)
michalkuligowski Sep 13, 2024
8a92591
fix rotary embedding `rotary_dim` not equal `head_size` case (#245)
michalkuligowski Sep 13, 2024
ffa7174
[Bugfix][Habana_main] - dbrx model and arctic model codes fix to remo…
michalkuligowski Sep 13, 2024
f4ac1f9
Add Dockerfile.hpu (#200)
michalkuligowski Sep 13, 2024
1a35da2
fix ruff detected format error
xuechendi Sep 13, 2024
3b710a6
fix mypy format error
xuechendi Sep 13, 2024
5abe4d7
Move ALiBi to supported features in README_GAUDI.md
kwisniewski98 Sep 16, 2024
4c1ca3a
optimized topp/topk calculation (#195)
michalkuligowski Sep 17, 2024
1a712d5
Move ALiBi to supported features in gaudi-installation.rst
kwisniewski98 Sep 17, 2024
44c4f93
[Bugfix][Habana_main] fix multi-modal model inference - tested with l…
michalkuligowski Sep 17, 2024
a9de5ba
Add fake HPU mode to Habana components with dummy habana_frameworks m…
jmaksymczuk Sep 17, 2024
d39298c
Update documentation on support of fp8 (#288)
michalkuligowski Sep 17, 2024
ed19acd
Reduce default value of VLLM_GRAPH_RESERVED_MEM to 0.1
kzawora-intel Sep 17, 2024
6a96d9b
Removed vllm.hpu directory and changed relevant imports (#291)
tzielinski-habana Sep 17, 2024
47a89be
Reduce default value of VLLM_GRAPH_RESERVED_MEM to 0.1 (#292)
michalkuligowski Sep 17, 2024
18d6339
fix minor logging issue
schoi-habana Sep 17, 2024
83b54e9
Fix minor logging issue in habana_model_runner.py (#294)
michalkuligowski Sep 18, 2024
b62fba8
Fix blocks number calculation for Flat PA (#269)
iboiko-habana Sep 18, 2024
347f9c7
Merge branch 'habana_main' into private/kwisniewski/alibi_readme_update
kwisniewski98 Sep 19, 2024
cd7b1c1
Remove dummy seq group data creation from loop (#301)
iboiko-habana Sep 20, 2024
12d7033
optimize qwen2 model on Gaudi (#233)
czhu15 Sep 20, 2024
bc39baa
fix bug: device_str in initialize_ray_cluster requires uppercase stri…
hlin99 Sep 20, 2024
b2653ab
Fix Lora Rebase (#290)
hlahkar Sep 20, 2024
82960d8
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel Sep 20, 2024
f4d2097
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Sep 20, 2024
9f8b8e7
add missing files
kzawora-intel Sep 20, 2024
346139d
format.sh
kzawora-intel Sep 20, 2024
6d45443
more format.sh
kzawora-intel Sep 20, 2024
3a0ff3b
gha update
kzawora-intel Sep 20, 2024
6502b91
Separate LoRA algorithms
kzawora-intel Sep 20, 2024
7057da5
yapf is being a headache
kzawora-intel Sep 20, 2024
43df762
oh come on now
kzawora-intel Sep 20, 2024
3134b8a
fix fakehpu mode
kzawora-intel Sep 20, 2024
f92ffc1
Fix calculating slots for warmup (#310)
madamczykhabana Sep 23, 2024
63fae51
Removed padding block from a list of available blocks in allocators (…
tzielinski-habana Sep 23, 2024
aa507d4
Fix seq_len for padding sequences (#318)
madamczykhabana Sep 23, 2024
b70a8c2
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Sep 23, 2024
a844837
Fix lora specific conditions in profile-run
SanjuCSudhakaran Sep 23, 2024
084db0f
Fix lora specific conditions in profile-run (#317)
vivekgoe Sep 23, 2024
a9f94be
TP fixes
kzawora-intel Sep 23, 2024
9bb65b7
Run with HPU graphs even when warmup was skipped (#320)
madamczykhabana Sep 23, 2024
2a499c7
mixtral api fixes
kzawora-intel Sep 23, 2024
9372734
revert debug prints
kzawora-intel Sep 23, 2024
c15ddd2
format.sh
kzawora-intel Sep 23, 2024
f5d254d
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel Sep 23, 2024
e00ab5a
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Sep 23, 2024
3bb593a
use ray for hpu distributed inference
kzawora-intel Sep 23, 2024
f9b222e
vLLM 0.6.1 rebase (#311)
kzawora-intel Sep 23, 2024
2f23cb7
prune the easy parts
kzawora-intel Sep 23, 2024
28df6fd
prune more easy parts
kzawora-intel Sep 23, 2024
c6d2d5a
prune lora files
kzawora-intel Sep 23, 2024
97c398e
prune unnecessary docs
kzawora-intel Sep 23, 2024
6a913b3
revert requirements-build.txt changes
kzawora-intel Sep 23, 2024
c64dc83
Move profilers to vllm-hpu-extension (#323)
kzawora-intel Sep 23, 2024
f56953f
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel Sep 23, 2024
c562b02
Revert "Add fake HPU mode to Habana components with dummy habana_fram…
kzawora-intel Sep 23, 2024
cf3bbd2
fix revert
kzawora-intel Sep 23, 2024
09357b4
Revert "Initial commit"
kzawora-intel Sep 23, 2024
3713da8
cleanup
kzawora-intel Sep 23, 2024
bb6564a
remove redundant import
kzawora-intel Sep 23, 2024
c968320
Restore upstream requirements-build.txt (#324)
kzawora-intel Sep 24, 2024
58d5cde
Remove reminder_comment.yml workflow (#325)
kzawora-intel Sep 24, 2024
cf4c3e5
Don't throw "Failed to import from vllm._C" warning on HPU (#326)
kzawora-intel Sep 24, 2024
aa5edcc
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel Sep 24, 2024
f6ff4a7
restore reminder_comment.yml
kzawora-intel Sep 24, 2024
a000e62
Revert "[Doc][BugFix] Update setup instructions and reference links (…
kzawora-intel Sep 24, 2024
41217cf
Fix doc build warnings (#330)
kzawora-intel Sep 24, 2024
4eb9809
fix qwen2 model issue (#329)
jikunshang Sep 24, 2024
c1232e9
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel Sep 24, 2024
20c87dd
update docs
kzawora-intel Sep 24, 2024
9be37a3
Remove vllm.utils.is_hpu() (#331)
kzawora-intel Sep 24, 2024
c90e153
Merge remote-trackng branch 'origin/habana_main' into private/kzawora…
kzawora-intel Sep 24, 2024
874f3d8
remove get_device
kzawora-intel Sep 24, 2024
e16918d
Remove logger from layernorm (#332)
kzawora-intel Sep 24, 2024
18b0e98
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel Sep 24, 2024
347380f
Fix INC FP8 inference after rebase
kzawora-intel Sep 24, 2024
73f4b48
Fix INC FP8 inference after rebase (#333)
kzawora-intel Sep 24, 2024
fc1cf5e
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel Sep 24, 2024
e2f72e3
Merge remote-tracking branch 'upstream/main' into private/kzawora/pru…
kzawora-intel Sep 24, 2024
b582d77
Make weights_load_device not change EngineArgs.create_load_config()
kzawora-intel Sep 24, 2024
b90adac
More robust load device autodetection
kzawora-intel Sep 24, 2024
d853eeb
WA for none load device
kzawora-intel Sep 24, 2024
9111a80
Make weights_load_device not change EngineArgs.create_load_config() (…
kzawora-intel Sep 24, 2024
db8dbce
device type
kzawora-intel Sep 24, 2024
c337e93
Revert "fix guided_decode HPU failing issue"
kzawora-intel Sep 24, 2024
e8e369f
load device fix
kzawora-intel Sep 24, 2024
8c6dcae
Refine INC shutdown code (#335)
kzawora-intel Sep 25, 2024
cef2f54
Setting enough cache_size_limit for torch.compile warmup (#238)
zehao-intel Sep 25, 2024
45ee586
Change default values for decode bucket flags (#316)
iboiko-habana Sep 25, 2024
29fb5ed
Support loading checkpoints quantized using Autofp8 (#286)
Yantom1 Sep 25, 2024
4c8a6c6
Fix torch.compile issue of dispatch key set mismatch (#299)
yuwenzho Sep 26, 2024
1c6bada
Chunk prefill cache writes, remove div_i32 from insert_or_update_cach…
kzawora-intel Sep 26, 2024
fccaca0
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Sep 26, 2024
5ffcfa3
Update cpu-test.yml
kzawora-intel Sep 26, 2024
c3577af
Fix runtime errors reported when using long input sequence lengths wi…
vivekgoe Sep 27, 2024
f347a84
vLLM 0.6.2 rebase (#340)
kzawora-intel Sep 27, 2024
ed85058
Enable Async output process for HPU (#342)
zhouyu5 Sep 27, 2024
b611e20
Port last_bucket change from v1.18.0 (#347)
iboiko-habana Sep 30, 2024
3010f8c
Add setuptools_scm to requirements-hpu.txt (#349)
kzawora-intel Sep 30, 2024
44d8173
test_lora_manager fix
rsshaik1 Sep 19, 2024
188bd3a
Added both hpu and gpu specific changes confest
rsshaik1 Sep 23, 2024
f59495a
Added the changes to conftest to fix test_lora_manager
rsshaik1 Sep 30, 2024
b0a9d02
Applied the format changes in conftest
rsshaik1 Sep 30, 2024
70f544c
Resolved format issues in conftest
rsshaik1 Oct 1, 2024
ec34f88
Added changes of HPU flags
rsshaik1 Oct 1, 2024
c7b1509
Fixed lora manager tests (#315)
vivekgoe Oct 1, 2024
cafff17
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Oct 1, 2024
25f4ed9
Oct 01 rebase (#353)
kzawora-intel Oct 2, 2024
da03d8b
Lora Mask based on lora index (#348)
hlahkar Oct 3, 2024
f848d27
Add rope_scaling support for LLama3.1 (#356)
kdamaszk Oct 3, 2024
d8ba780
[Core] Support Torch profiler in Habana Worker (#357)
mswiniarsk Oct 4, 2024
250487b
[Refactor] Rename components *Habana* -> *HPU*
kzawora-intel Oct 4, 2024
eb095b3
oopsie
kzawora-intel Oct 4, 2024
65fa6f6
format.sh
kzawora-intel Oct 4, 2024
0576360
make yapf happy
kzawora-intel Oct 4, 2024
7f73cc9
Merge remote-tracking branch 'upstream/main' into private/kzawora/hab…
kzawora-intel Oct 4, 2024
b4e26d3
fix sampler metadata generation
kzawora-intel Oct 4, 2024
cfe231d
[Refactor] Rename components *Habana* -> *HPU* (#359)
kzawora-intel Oct 4, 2024
38e60f4
Oct 04 rebase (#360)
kzawora-intel Oct 4, 2024
76cbbb5
Use BF16 on HPU by default
kzawora-intel Oct 4, 2024
95a7ece
Merge remote-tracking branch 'origin/habana_main' into private/kzawor…
kzawora-intel Oct 4, 2024
d7d609f
Revert "Support loading checkpoints quantized using Autofp8 (#286)"
kzawora-intel Oct 4, 2024
c07cbc6
remove lora test
kzawora-intel Oct 4, 2024
d90bbce
revert FP8 changes
kzawora-intel Oct 4, 2024
84dc6c5
remove leftover fp8 code
kzawora-intel Oct 4, 2024
f7288de
remove weights_load_device stuff
kzawora-intel Oct 4, 2024
6899c3f
remove weights_load_device
kzawora-intel Oct 4, 2024
e5d640e
fp8 leftovers
kzawora-intel Oct 4, 2024
25388e2
Update vllm/model_executor/layers/logits_processor.py
kzawora-intel Oct 4, 2024
b4f7ffa
Rename HabanaAttention -> HPUAttention
kzawora-intel Oct 4, 2024
43959db
oopsie
kzawora-intel Oct 4, 2024
b8404ad
format.sh
kzawora-intel Oct 4, 2024
d38564f
fix comment length
kzawora-intel Oct 4, 2024
eed1b05
Merge remote-tracking branch 'origin/private/kzawora/hpu_attn' into p…
kzawora-intel Oct 4, 2024
5c3e29c
Merge remote-tracking branch 'origin/private/kzawora/hpu_bf16_default…
kzawora-intel Oct 4, 2024
33c1db0
fix comment
kzawora-intel Oct 4, 2024
05777e0
Lazily import HPU-dependent components
kzawora-intel Oct 4, 2024
1f6de5d
Lazily import HPU-dependent components (#363)
kzawora-intel Oct 7, 2024
ad08dd4
[Refactor] Rename HabanaAttention -> HPUAttention (#362)
kzawora-intel Oct 7, 2024
e00750e
Use BF16 on HPU by default (#361)
kzawora-intel Oct 7, 2024
db5aed6
Set vllm-hpu-extension to 36c7f9c (#365)
madamczykhabana Oct 7, 2024
902f575
Add AliBi to supported features in README_GAUDI.md (#287)
kzawora-intel Oct 7, 2024
27c05e1
Merge remote-tracking branch 'upstream/main' into habana_main
kzawora-intel Oct 7, 2024
bb4c23e
format.sh
kzawora-intel Oct 7, 2024
563184a
Fix hpu_set_env call in load_model in vllm (#364)
Yantom1 Oct 7, 2024
0e46492
Update offline_inference_fakehpu.py
michalkuligowski Oct 8, 2024
6028354
Timeout adjusted in MLLMEngine (#368)
jczaja Oct 8, 2024
64369fd
Add Jenkins test definitions (#369)
kzawora-intel Oct 8, 2024
69fb91c
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel Oct 8, 2024
1ee20c5
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Oct 8, 2024
388e500
Make workaround for SW-204785 broader (#374)
kzawora-intel Oct 8, 2024
8f79b6e
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel Oct 8, 2024
ca98dae
Fix LoRA tests by handling broken imports
SanjuCSudhakaran Oct 9, 2024
4030216
Fix LoRA tests by handling broken import (#376)
vivekgoe Oct 10, 2024
b70c1a5
[CI] Report test name, add properties to JUnitXML (#377)
kzawora-intel Oct 10, 2024
49444bc
Disable performance counters if profiler is not enabled (#383)
kdamaszk Oct 11, 2024
d6bd375
Remove constraints for bucket creation during warmup in LoRA
SanjuCSudhakaran Oct 11, 2024
4f1787b
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Oct 11, 2024
6cd4694
Remove constraints for bucket creation during warmup in LoRA (#382)
vivekgoe Oct 12, 2024
d8f2aa7
seed_everything function doesn't handle HPU (#384)
SanjuCSudhakaran Oct 14, 2024
03b407b
Fixed lora_manager tests with hpu_model_runner (#386)
rsshaik1 Oct 14, 2024
ebd42c4
Reformat README_GAUDI.md (#389)
kzawora-intel Oct 14, 2024
2d2bf7a
[CI] Prepare separate Jenkins tests for torch compile mode (#388)
anko-intel Oct 14, 2024
9df1d4a
Remove workaround added to resolve multi-card stall issue (#387)
SanjuCSudhakaran Oct 14, 2024
9777c9f
Update SynapseAI version in README & Dockerfile (#390)
kzawora-intel Oct 14, 2024
5ceda69
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel Oct 14, 2024
3e6a2d4
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Oct 14, 2024
9ac52ab
fix attention backend selector:
kzawora-intel Oct 14, 2024
57bc31d
Oct 7 rebase (#367)
kzawora-intel Oct 14, 2024
55dd07e
enable mixtral quantization using INC (#372)
dudilester Oct 15, 2024
401f5ae
[CI] Temporarily increase test tolerances (#392)
kzawora-intel Oct 15, 2024
e598f3f
Add quickstart section to READMEs (#391)
kzawora-intel Oct 15, 2024
f77435d
Softmax: add weighted-sum normalization (#378)
madamczykhabana Oct 16, 2024
0783d18
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel Oct 16, 2024
2fa46cd
remove jenkins files
kzawora-intel Oct 16, 2024
3683db6
restore README.md
kzawora-intel Oct 16, 2024
91af5da
remove fakehpu
kzawora-intel Oct 16, 2024
d2ce468
use sentinel in model runner base WA
kzawora-intel Oct 16, 2024
b6428cd
remove leftovers from habana_main
kzawora-intel Oct 16, 2024
5149278
remove leftovers from habana_main
kzawora-intel Oct 16, 2024
f4b356f
remove HPUExecutorAsync import
kzawora-intel Oct 16, 2024
3eee00d
remove hpu fused_moe
kzawora-intel Oct 16, 2024
a59fc7b
Remove HPU changes from cache_engine.py (#400)
kzawora-intel Oct 16, 2024
c07951b
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Oct 16, 2024
398c5c3
Merge remote-tracking branch 'origin' into HEAD
kzawora-intel Oct 16, 2024
f79d454
Merge remote-tracking branch 'origin/habana_main' into HEAD
kzawora-intel Oct 16, 2024
8b6e30d
remove hpuexecutor import
kzawora-intel Oct 16, 2024
05bcdf5
[bucketing overhaul 1/n] Add padding-aware scheduling and option to l…
kzawora-intel Oct 17, 2024
c11f23a
Add forward_hpu to RotaryEmbedding, remove custom module
kzawora-intel Oct 17, 2024
78a816c
add missing mark step in test
kzawora-intel Oct 17, 2024
640f0be
Merge branch 'private/kzawora/rope_rework' into HEAD
kzawora-intel Oct 17, 2024
e894746
Merge branch 'private/kzawora/oct_16_rebase' into HEAD
kzawora-intel Oct 17, 2024
5bc3985
cleanup
kzawora-intel Oct 17, 2024
14f8af4
padding-aware scheduler cleanup
kzawora-intel Oct 17, 2024
65e34f6
fix sentinel usage in model runner base
kzawora-intel Oct 17, 2024
4757350
doc fixes
kzawora-intel Oct 17, 2024
4c306cf
Merge remote-tracking branch 'upstream/main' into HEAD
kzawora-intel Oct 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions Dockerfile.hpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest

COPY ./ /workspace/vllm

WORKDIR /workspace/vllm

RUN pip install -v -r requirements-hpu.txt

ENV no_proxy=localhost,127.0.0.1
ENV PT_HPU_ENABLE_LAZY_COLLECTIVES=true

RUN VLLM_TARGET_DEVICE=hpu python3 setup.py install

WORKDIR /workspace/

ENTRYPOINT ["python3", "-m", "vllm.entrypoints.openai.api_server"]
402 changes: 402 additions & 0 deletions docs/source/getting_started/gaudi-installation.rst

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ vLLM is flexible and easy to use with:
* Tensor parallelism and pipeline parallelism support for distributed inference
* Streaming outputs
* OpenAI-compatible API server
* Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs, TPU, and AWS Trainium and Inferentia Accelerators.
* Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and Gaudi® accelerators, GPUs, PowerPC CPUs, TPU, and AWS Trainium and Inferentia Accelerators.
* Prefix caching support
* Multi-lora support

Expand All @@ -66,6 +66,7 @@ Documentation
getting_started/amd-installation
getting_started/openvino-installation
getting_started/cpu-installation
getting_started/gaudi-installation
getting_started/neuron-installation
getting_started/tpu-installation
getting_started/xpu-installation
Expand Down
11 changes: 11 additions & 0 deletions requirements-hpu.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Common dependencies
-r requirements-common.txt

# Dependencies for HPU code
ray == 2.32.0
triton
pandas
tabulate
setuptools>=61
setuptools-scm>=8
vllm-hpu-extension @ git+https://github.com/HabanaAI/vllm-hpu-extension.git@fd7f2e6
50 changes: 47 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,14 +246,32 @@ def run(self):
self.copy_file(file, dst_file)


def _is_hpu() -> bool:
is_hpu_available = True
try:
subprocess.run(["hl-smi"], capture_output=True, check=True)
except (FileNotFoundError, PermissionError, subprocess.CalledProcessError):
if not os.path.exists('/dev/accel/accel0') and not os.path.exists(
'/dev/accel/accel_controlD0'):
# last resort...
try:
output = subprocess.check_output(
'lsmod | grep habanalabs | wc -l', shell=True)
is_hpu_available = int(output) > 0
except (ValueError, FileNotFoundError, PermissionError,
subprocess.CalledProcessError):
is_hpu_available = False
return is_hpu_available or VLLM_TARGET_DEVICE == "hpu"


def _no_device() -> bool:
return VLLM_TARGET_DEVICE == "empty"


def _is_cuda() -> bool:
has_cuda = torch.version.cuda is not None
return (VLLM_TARGET_DEVICE == "cuda" and has_cuda
and not (_is_neuron() or _is_tpu()))
and not (_is_neuron() or _is_tpu() or _is_hpu()))


def _is_hip() -> bool:
Expand Down Expand Up @@ -291,7 +309,8 @@ def _build_custom_ops() -> bool:


def _build_core_ext() -> bool:
return not (_is_neuron() or _is_tpu() or _is_openvino() or _is_xpu())
return not (_is_neuron() or _is_tpu() or _is_openvino() or _is_xpu()
or _is_hpu())


def get_hipcc_rocm_version():
Expand Down Expand Up @@ -353,6 +372,23 @@ def get_path(*filepath) -> str:
return os.path.join(ROOT_DIR, *filepath)


def get_gaudi_sw_version():
"""
Returns the driver version.
"""
# Enable console printing for `hl-smi` check
output = subprocess.run("hl-smi",
shell=True,
text=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
env={"ENABLE_CONSOLE": "true"})
if output.returncode == 0 and output.stdout:
return output.stdout.split("\n")[2].replace(
" ", "").split(":")[1][:-1].split("-")[0]
return "0.0.0" # when hl-smi is not available


def get_vllm_version() -> str:
version = get_version(
write_to="vllm/_version.py", # TODO: move this to pyproject.toml
Expand Down Expand Up @@ -382,6 +418,12 @@ def get_vllm_version() -> str:
if neuron_version != MAIN_CUDA_VERSION:
neuron_version_str = neuron_version.replace(".", "")[:3]
version += f"{sep}neuron{neuron_version_str}"
elif _is_hpu():
# Get the Intel Gaudi Software Suite version
gaudi_sw_version = str(get_gaudi_sw_version())
if gaudi_sw_version != MAIN_CUDA_VERSION:
gaudi_sw_version = gaudi_sw_version.replace(".", "")[:3]
version += f"{sep}gaudi{gaudi_sw_version}"
elif _is_openvino():
version += f"{sep}openvino"
elif _is_tpu():
Expand Down Expand Up @@ -439,6 +481,8 @@ def _read_requirements(filename: str) -> List[str]:
requirements = _read_requirements("requirements-rocm.txt")
elif _is_neuron():
requirements = _read_requirements("requirements-neuron.txt")
elif _is_hpu():
requirements = _read_requirements("requirements-hpu.txt")
elif _is_openvino():
requirements = _read_requirements("requirements-openvino.txt")
elif _is_tpu():
Expand All @@ -449,7 +493,7 @@ def _read_requirements(filename: str) -> List[str]:
requirements = _read_requirements("requirements-xpu.txt")
else:
raise ValueError(
"Unsupported platform, please use CUDA, ROCm, Neuron, "
"Unsupported platform, please use CUDA, ROCm, Neuron, HPU, "
"OpenVINO, or CPU.")
return requirements

Expand Down
2 changes: 1 addition & 1 deletion vllm/_custom_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

logger = init_logger(__name__)

if not current_platform.is_tpu():
if not current_platform.is_tpu() and not current_platform.is_hpu():
try:
import vllm._C
except ImportError as e:
Expand Down
Loading