Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation #555

Draft
wants to merge 12 commits into
base: v1.19.0
Choose a base branch
from
2 changes: 1 addition & 1 deletion Dockerfile.hpu
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
FROM vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PT 2.5.1


COPY ./ /workspace/vllm

Expand Down
12 changes: 6 additions & 6 deletions README_GAUDI.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Please follow the instructions provided in the [Gaudi Installation Guide](https:
- OS: Ubuntu 22.04 LTS
- Python: 3.10
- Intel Gaudi accelerator
- Intel Gaudi software version 1.18.0
- Intel Gaudi software version 1.19.0

## Quick start using Dockerfile
```
Expand Down Expand Up @@ -44,8 +44,8 @@ It is highly recommended to use the latest Docker image from Intel Gaudi vault.
Use the following commands to run a Docker image:

```{.console}
$ docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
$ docker pull vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PT 2.5.1

```

### Build and Install vLLM-fork
Expand All @@ -55,7 +55,7 @@ Currently, the latest features and performance optimizations are developed in Ga
```{.console}
$ git clone https://github.com/HabanaAI/vllm-fork.git
$ cd vllm-fork
$ git checkout habana_main
$ git checkout v.1.19.0
$ pip install -r requirements-hpu.txt
$ python setup.py develop
```
Expand All @@ -71,11 +71,11 @@ $ python setup.py develop
- Inference with [HPU Graphs](https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Inference_Using_HPU_Graphs.html) for accelerating low-batch latency and throughput
- Attention with Linear Biases (ALiBi)
- INC quantization
- LoRA adapters

# Unsupported Features

- Beam search
- LoRA adapters
- AWQ quantization
- Prefill chunking (mixed-batch inferencing)

Expand Down Expand Up @@ -112,7 +112,7 @@ Currently in vLLM for HPU we support four execution modes, depending on selected
| 1 | 1 | PyTorch lazy mode |

> [!WARNING]
> In 1.18.0, all modes utilizing `PT_HPU_LAZY_MODE=0` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.18.0, please use HPU Graphs, or PyTorch lazy mode.
> In 1.19.0, all modes utilizing `PT_HPU_LAZY_MODE=0` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.19.0, please use HPU Graphs, or PyTorch lazy mode.

## Bucketing mechanism

Expand Down
12 changes: 6 additions & 6 deletions docs/source/getting_started/gaudi-installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Requirements
- OS: Ubuntu 22.04 LTS
- Python: 3.10
- Intel Gaudi accelerator
- Intel Gaudi software version 1.18.0
- Intel Gaudi software version 1.19.0


Quick start using Dockerfile
Expand Down Expand Up @@ -63,8 +63,8 @@ Use the following commands to run a Docker image:

.. code:: console

$ docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
$ docker pull vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PT 2.5.1

$ docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/1.19.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest

Build and Install vLLM
~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -85,7 +85,7 @@ Currently, the latest features and performance optimizations are developed in Ga

$ git clone https://github.com/HabanaAI/vllm-fork.git
$ cd vllm-fork
$ git checkout habana_main
$ git checkout v1.19.0
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be changed to a target TAG

$ pip install -r requirements-hpu.txt
$ python setup.py develop

Expand All @@ -107,12 +107,12 @@ Supported Features
for accelerating low-batch latency and throughput
- Attention with Linear Biases (ALiBi)
- INC quantization
- LoRA adapters

Unsupported Features
====================

- Beam search
- LoRA adapters
- AWQ quantization
- Prefill chunking (mixed-batch inferencing)

Expand Down Expand Up @@ -186,7 +186,7 @@ Currently in vLLM for HPU we support four execution modes, depending on selected
- PyTorch lazy mode

.. warning::
In 1.18.0, all modes utilizing ``PT_HPU_LAZY_MODE=0`` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.18.0, please use HPU Graphs, or PyTorch lazy mode.
In 1.19.0, all modes utilizing ``PT_HPU_LAZY_MODE=0`` are highly experimental and should be only used for validating functional correctness. Their performance will be improved in the next releases. For obtaining the best performance in 1.19.0, please use HPU Graphs, or PyTorch lazy mode.


Bucketing mechanism
Expand Down
16 changes: 16 additions & 0 deletions docs/source/serving/compatibility_matrix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,7 @@ Feature x Hardware
- Hopper
- CPU
- AMD
- Gaudi
* - :ref:`CP <chunked-prefill>`
- `✗ <https://github.com/vllm-project/vllm/issues/2729>`__
- ✅
Expand All @@ -313,6 +314,7 @@ Feature x Hardware
- ✅
- ✗
- ✅
- ✗
* - :ref:`APC <apc>`
- `✗ <https://github.com/vllm-project/vllm/issues/3687>`__
- ✅
Expand All @@ -321,6 +323,7 @@ Feature x Hardware
- ✅
- ✗
- ✅
- ✅
* - :ref:`LoRA <lora>`
- ✅
- ✅
Expand All @@ -329,6 +332,7 @@ Feature x Hardware
- ✅
- `✗ <https://github.com/vllm-project/vllm/pull/4830>`__
- ✅
- ✅
* - :abbr:`prmpt adptr (Prompt Adapter)`
- ✅
- ✅
Expand All @@ -337,6 +341,7 @@ Feature x Hardware
- ✅
- `✗ <https://github.com/vllm-project/vllm/issues/8475>`__
- ✅
- ✗
* - :ref:`SD <spec_decode>`
- ✅
- ✅
Expand All @@ -345,6 +350,7 @@ Feature x Hardware
- ✅
- ✅
- ✅
- ✗
* - CUDA graph
- ✅
- ✅
Expand All @@ -353,6 +359,7 @@ Feature x Hardware
- ✅
- ✗
- ✅
- ✗
* - :abbr:`enc-dec (Encoder-Decoder Models)`
- ✅
- ✅
Expand All @@ -361,6 +368,7 @@ Feature x Hardware
- ✅
- ✅
- ✗
- ✅
* - :abbr:`logP (Logprobs)`
- ✅
- ✅
Expand All @@ -369,6 +377,7 @@ Feature x Hardware
- ✅
- ✅
- ✅
- ✅
* - :abbr:`prmpt logP (Prompt Logprobs)`
- ✅
- ✅
Expand All @@ -377,6 +386,7 @@ Feature x Hardware
- ✅
- ✅
- ✅
- ✅
* - :abbr:`async output (Async Output Processing)`
- ✅
- ✅
Expand All @@ -385,6 +395,7 @@ Feature x Hardware
- ✅
- ✗
- ✗
- ✅
* - multi-step
- ✅
- ✅
Expand All @@ -393,6 +404,7 @@ Feature x Hardware
- ✅
- `✗ <https://github.com/vllm-project/vllm/issues/8477>`__
- ✅
- ✅
* - :abbr:`MM (Multimodal)`
- ✅
- ✅
Expand All @@ -401,6 +413,7 @@ Feature x Hardware
- ✅
- ✅
- ✅
- ✅
* - best-of
- ✅
- ✅
Expand All @@ -409,6 +422,7 @@ Feature x Hardware
- ✅
- ✅
- ✅
- ✅
* - beam-search
- ✅
- ✅
Expand All @@ -417,6 +431,7 @@ Feature x Hardware
- ✅
- ✅
- ✅
- ✗
* - :abbr:`guided dec (Guided Decoding)`
- ✅
- ✅
Expand All @@ -425,3 +440,4 @@ Feature x Hardware
- ✅
- ✅
- ✅
- ✅