-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update documentation #555
base: v1.19.0
Are you sure you want to change the base?
Update documentation #555
Changes from 7 commits
0b5bf99
58be7bc
b8136a3
e19bd83
eb631ef
3b624ac
5f689dd
b2532a0
d8b7ae0
647c19c
3ecd3c0
213c716
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,7 +11,7 @@ Please follow the instructions provided in the [Gaudi Installation Guide](https: | |
- OS: Ubuntu 22.04 LTS | ||
- Python: 3.10 | ||
- Intel Gaudi accelerator | ||
- Intel Gaudi software version 1.18.0 | ||
- Intel Gaudi software version 1.19.0 | ||
|
||
## Quick start using Dockerfile | ||
``` | ||
|
@@ -44,8 +44,8 @@ It is highly recommended to use the latest Docker image from Intel Gaudi vault. | |
Use the following commands to run a Docker image: | ||
|
||
```{.console} | ||
$ docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest | ||
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest | ||
$ docker pull vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest | ||
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PT 2.5.1 |
||
``` | ||
|
||
### Build and Install vLLM-fork | ||
|
@@ -55,7 +55,7 @@ Currently, the latest features and performance optimizations are developed in Ga | |
```{.console} | ||
$ git clone https://github.com/HabanaAI/vllm-fork.git | ||
$ cd vllm-fork | ||
$ git checkout habana_main | ||
$ git checkout v.1.19.0 | ||
$ pip install -r requirements-hpu.txt | ||
$ python setup.py develop | ||
``` | ||
|
@@ -71,11 +71,11 @@ $ python setup.py develop | |
- Inference with [HPU Graphs](https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Inference_Using_HPU_Graphs.html) for accelerating low-batch latency and throughput | ||
- Attention with Linear Biases (ALiBi) | ||
- INC quantization | ||
- LoRA adapters | ||
|
||
# Unsupported Features | ||
|
||
- Beam search | ||
- LoRA adapters | ||
- AWQ quantization | ||
- Prefill chunking (mixed-batch inferencing) | ||
|
||
|
@@ -112,7 +112,7 @@ Currently in vLLM for HPU we support four execution modes, depending on selected | |
| 1 | 1 | PyTorch lazy mode | | ||
|
||
> [!WARNING] | ||
> In 1.18.0, all modes utilizing `PT_HPU_LAZY_MODE=0` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.18.0, please use HPU Graphs, or PyTorch lazy mode. | ||
> In 1.19.0, all modes utilizing `PT_HPU_LAZY_MODE=0` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.19.0, please use HPU Graphs, or PyTorch lazy mode. | ||
|
||
## Bucketing mechanism | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,7 +18,7 @@ Requirements | |
- OS: Ubuntu 22.04 LTS | ||
- Python: 3.10 | ||
- Intel Gaudi accelerator | ||
- Intel Gaudi software version 1.18.0 | ||
- Intel Gaudi software version 1.19.0 | ||
|
||
|
||
Quick start using Dockerfile | ||
|
@@ -63,8 +63,8 @@ Use the following commands to run a Docker image: | |
|
||
.. code:: console | ||
|
||
$ docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest | ||
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest | ||
$ docker pull vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PT 2.5.1 |
||
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest | ||
|
||
Build and Install vLLM | ||
~~~~~~~~~~~~~~~~~~~~~~ | ||
|
@@ -85,7 +85,7 @@ Currently, the latest features and performance optimizations are developed in Ga | |
|
||
$ git clone https://github.com/HabanaAI/vllm-fork.git | ||
$ cd vllm-fork | ||
$ git checkout habana_main | ||
$ git checkout v1.19.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be changed to a target TAG |
||
$ pip install -r requirements-hpu.txt | ||
$ python setup.py develop | ||
|
||
|
@@ -107,12 +107,12 @@ Supported Features | |
for accelerating low-batch latency and throughput | ||
- Attention with Linear Biases (ALiBi) | ||
- INC quantization | ||
- LoRA adapters | ||
|
||
Unsupported Features | ||
==================== | ||
|
||
- Beam search | ||
- LoRA adapters | ||
- AWQ quantization | ||
- Prefill chunking (mixed-batch inferencing) | ||
|
||
|
@@ -186,7 +186,7 @@ Currently in vLLM for HPU we support four execution modes, depending on selected | |
- PyTorch lazy mode | ||
|
||
.. warning:: | ||
In 1.18.0, all modes utilizing ``PT_HPU_LAZY_MODE=0`` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.18.0, please use HPU Graphs, or PyTorch lazy mode. | ||
In 1.19.0, all modes utilizing ``PT_HPU_LAZY_MODE=0`` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.19.0, please use HPU Graphs, or PyTorch lazy mode. | ||
|
||
|
||
Bucketing mechanism | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PT 2.5.1