Skip to content

Commit

Permalink
fix cr comments
Browse files Browse the repository at this point in the history
  • Loading branch information
nirda7 committed Jan 9, 2025
1 parent f4d3c92 commit 2c0c3cb
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ Documentation
quantization/auto_awq
quantization/bnb
quantization/gguf
quantization/inc
quantization/int8
quantization/fp8
quantization/fp8_e5m2_kvcache
Expand Down
2 changes: 1 addition & 1 deletion docs/source/quantization/inc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ FP8 INC
=======

vLLM supports FP8 (8-bit floating point) weight and activation quantization using Intel® Neural Compressor (INC) on Intel® Gaudi® 2 and Intel® Gaudi® 3 AI accelerators.
Currently, quantization is supported only for Llama models.
Currently, quantization is validated only in Llama models.

Intel Gaudi supports quantization of various modules and functions, including, but not limited to ``Linear``, ``KVCache``, ``Matmul`` and ``Softmax``. For more information, please refer to:
`Supported Modules\\Supported Functions\\Custom Patched Modules <https://docs.habana.ai/en/latest/PyTorch/Inference_on_PyTorch/Quantization/Inference_Using_FP8.html#supported-modules>`_.
Expand Down
1 change: 1 addition & 0 deletions vllm/platforms/hpu.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ class HpuPlatform(Platform):
device_name: str = "hpu"
device_type: str = "hpu"
dispatch_key: str = "HPU"
supported_quantization: list[str] = ["inc"]

@classmethod
def get_default_attn_backend(cls, selected_backend: _Backend) -> _Backend:
Expand Down

0 comments on commit 2c0c3cb

Please sign in to comment.