update docs for 1.13.120+xpu release (#2587)

intel · Apr 29, 2023 · 0d689f1 · 0d689f1
1 parent c2a3701
commit 0d689f1
Show file tree

Hide file tree

Showing 16 changed files with 283 additions and 174 deletions.
diff --git a/README.md b/README.md
@@ -15,11 +15,8 @@ The extension can be loaded as a Python module for Python programs or linked as
 
 You can use either of the following 2 commands to install Intel® Extension for PyTorch\* CPU version.
 
-```python
+```bash
 python -m pip install intel_extension_for_pytorch
-```
-
-```python
 python -m pip install intel_extension_for_pytorch -f https://developer.intel.com/ipex-whl-stable-cpu
 ```
 
@@ -33,9 +30,8 @@ Compilation instruction of the latest CPU code base `master` branch can be found
 
 You can install Intel® Extension for PyTorch\* for GPU via command below.
 
-```python
-python -m pip install torch==1.13.0a0 -f https://developer.intel.com/ipex-whl-stable-xpu
-python -m pip install intel_extension_for_pytorch==1.13.120+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
+```bash
+python -m pip install torch==1.13.0a0+git6c9b55e intel_extension_for_pytorch==1.13.120+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 **Note:** The patched PyTorch 1.13.0a0 is required to work with Intel® Extension for PyTorch\* on Intel® graphics card for now.
@@ -88,7 +84,7 @@ with torch.no_grad():
 
 ## License
 
-_Apache License_, Version _2.0_. As found in [LICENSE](https://github.com/intel/intel-extension-for-pytorch/blob/master/LICENSE.txt) file.
+_Apache License_, Version _2.0_. As found in [LICENSE](https://github.com/intel/intel-extension-for-pytorch/blob/master/LICENSE) file.
 
 ## Security
 

diff --git a/docs/tutorials/blogs_publications.md b/docs/tutorials/blogs_publications.md
@@ -1,8 +1,9 @@
 Blogs & Publications
 ====================
 
-* [Accelerate PyTorch\* INT8 Inference with New “X86” Quantization Backend on X86 CPUs](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-pytorch-int8-inf-with-new-x86-backend.html)
-* [Intel® Deep Learning Boost - Improve Inference Performance of BERT Base Model from Hugging Face for Network Security Technology Guide](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-improve-inference-performance-of-bert-base-model-from-hugging-face-for-network-security-technology-guide)
+* [Intel® Deep Learning Boost (Intel® DL Boost) - Improve Inference Performance of Hugging Face BERT Base Model in Google Cloud Platform (GCP) Technology Guide, Apr 2023](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-intel-dl-boost-improve-inference-performance-of-hugging-face-bert-base-model-in-google-cloud-platform-gcp-technology-guide)
+* [Get Started with Intel® Extension for PyTorch\* on GPU | Intel Software, Mar 2023](https://www.youtube.com/watch?v=Id-rE2Q7xZ0&t=1s)
+* [Accelerate PyTorch\* INT8 Inference with New “X86” Quantization Backend on X86 CPUs, Mar 2023](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-pytorch-int8-inf-with-new-x86-backend.html)
 * [Accelerating PyTorch Transformers with Intel Sapphire Rapids, Part 1, Jan 2023](https://huggingface.co/blog/intel-sapphire-rapids)
 * [Intel® Deep Learning Boost - Improve Inference Performance of BERT Base Model from Hugging Face for Network Security Technology Guide, Jan 2023](https://networkbuilders.intel.com/solutionslibrary/intel-deep-learning-boost-improve-inference-performance-of-bert-base-model-from-hugging-face-for-network-security-technology-guide)
 * [Scaling inference on CPUs with TorchServe, PyTorch Conference, Dec 2022](https://www.youtube.com/watch?v=066_Jd6cwZg)

diff --git a/docs/tutorials/examples.md b/docs/tutorials/examples.md
@@ -166,4 +166,4 @@ Intel® Extension for PyTorch\* provides its C++ dynamic library to allow users
 
 ## Model Zoo
 
-Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/v2.9.0). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/v2.9.0#use-cases). Models verified on Intel dGPUs are marked in `Model Documentation` Column. You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
+Use cases that had already been optimized by Intel engineers are available at [Model Zoo for Intel® Architecture](https://github.com/IntelAI/models/tree/v2.11.0). A bunch of PyTorch use cases for benchmarking are also available on the [GitHub page](https://github.com/IntelAI/models/tree/v2.11.0#use-cases). Models verified on Intel dGPUs are marked in `Model Documentation` Column. You can get performance benefits out-of-box by simply running scipts in the Model Zoo.
diff --git a/docs/tutorials/features.rst b/docs/tutorials/features.rst
@@ -163,7 +163,7 @@ Intel® Extension for PyTorch* also optimizes operators and implements several c
 .. currentmodule:: intel_extension_for_pytorch.nn.functional
 .. autofunction:: interaction
 
-**Auto kernel selection** is a feature that enables users to tune for better performance with GEMM operations. It is provided as parameter –auto_kernel_selection, with boolean value, of the ipex.optimize() function. By default, the GEMM kernel is computed with oneMKL primitives. However, under certain circumstances oneDNN primitives run faster. Users are able to set –auto_kernel_selection to True to run GEMM kernels with oneDNN primitives.” -> "We aims to provide good default performance by leveraging the best of math libraries and enabled weights_prepack, and it has been verified with broad set of models. If you would like to try other alternatives, you can use auto_kernel_selection toggle in ipex.optimize to switch, and you can diesable weights_preack in ipex.optimize if you are concerning the memory footprint more than performance gain. However in majority cases, keeping default is what we recommend.
+**Auto kernel selection** is a feature that enables users to tune for better performance with GEMM operations. It is provided as parameter –auto_kernel_selection, with boolean value, of the ipex.optimize() function. By default, the GEMM kernel is computed with oneMKL primitives. However, under certain circumstances oneDNN primitives run faster. Users are able to set –auto_kernel_selection to True to run GEMM kernels with oneDNN primitives.” -> "We aim to provide good default performance by leveraging the best of math libraries and enabled weights_prepack, and it has been verified with broad set of models. If you would like to try other alternatives, you can use auto_kernel_selection toggle in ipex.optimize to switch, and you can disable weights_preack in ipex.optimize if you are concerning the memory footprint more than performance gain. However in majority cases, keeping default is what we recommend.
 
 
 Runtime Extension

diff --git a/docs/tutorials/features/DDP.md b/docs/tutorials/features/DDP.md
@@ -32,12 +32,28 @@ python setup.py install
 
 Installation for GPU:
 
+- Clone the `oneccl_bindings_for_pytorch`
+
 ```bash
-git clone https://github.com/intel/torch-ccl.git -b v1.13.100+gpu
+git clone https://github.com/intel/torch-ccl.git -b v1.13.200+gpu
 cd torch-ccl
 git submodule sync 
 git submodule update --init --recursive
-BUILD_NO_ONECCL_PACKAGE=ON COMPUTE_BACKEND=dpcpp python setup.py install
+```
+
+- Install `oneccl_bindings_for_pytorch`
+
+Option 1: build with oneCCL from third party (recommended)
+
+```bash
+COMPUTE_BACKEND=dpcpp python setup.py install
+```
+
+Option 2: build without oneCCL and use oneCCL in system
+
+```bash
+export INTELONEAPIROOT=${HOME}/intel/oneapi
+USE_SYSTEM_ONECCL=ON COMPUTE_BACKEND=dpcpp python setup.py install
 ```
 
 #### Install from prebuilt wheel:

diff --git a/docs/tutorials/features/graph_capture.md b/docs/tutorials/features/graph_capture.md
@@ -3,7 +3,7 @@ Graph Capture (Experimental)
 
 ### Feature Description
 
-This feature automatically applies a combination of TorchScript trace technique and TorchDynamo to try to generate a graph model, for providing a good user experience while keep execution fast. Specifically, the process tries to generate a graph with TorchScript trace functionality first. In case of generation failure or incorrect results detected, it changes to TorchDynamo with TorchScript backend. Failure of the graph generation with TorchDynamo triggers a warning message. Meanwhile the generated graph model falls back to the original one. I.e. the inference workload runs in eager mode. Users can take advantage of this feature through a new knob `--graph_mode` of the `ipex.optimize()` function to automatically run into graph mode.
+This feature automatically applies a combination of TorchScript trace technique and TorchDynamo to try to generate a graph model, for providing a good user experience while keeping execution fast. Specifically, the process tries to generate a graph with TorchScript trace functionality first. In case of generation failure or incorrect results detected, it changes to TorchDynamo with TorchScript backend. Failure of the graph generation with TorchDynamo triggers a warning message. Meanwhile the generated graph model falls back to the original one. I.e. the inference workload runs in eager mode. Users can take advantage of this feature through a new knob `--graph_mode` of the `ipex.optimize()` function to automatically run into graph mode.
 
 ### Usage Example
 

diff --git a/docs/tutorials/features/int8_overview.md b/docs/tutorials/features/int8_overview.md
@@ -109,7 +109,7 @@ Note: For weight observer, it only supports dtype **torch.qint8**, and the qsche
 **Suggestion**:
 
 1. For weight observer, setting **qscheme** to **torch.per_channel_symmetric** can get a better accuracy.
-2. If your CPU device doesn't support VNNI, seeting the observer's **reduce_range** to **True** can get a better accuracy, such as skylake.
+2. If your CPU device doesn't support VNNI, setting the observer's **reduce_range** to **True** can get a better accuracy, such as skylake.
 
 ### Prepare Model
 

diff --git a/docs/tutorials/features/nhwc.md b/docs/tutorials/features/nhwc.md
@@ -162,7 +162,7 @@ The general guideline has been listed under reference [Writing-memory-format-awa
 
 ### b. Register oneDNN Kernel on Channels Last
 
-Registering a oneDNN kernel under Channels Last memory format on CPU is no different from [cuDNN](https://github.com/pytorch/pytorch/pull/23861): Only very few upper level changes are needed, such as accommodate 'contiguous()' to 'contiguous(suggested_memory_format)'. The automatic reorder of oneDNN weight shall been hidden in ideep.
+Registering a oneDNN kernel under Channels Last memory format on CPU is no different from [cuDNN](https://github.com/pytorch/pytorch/pull/23861): Only very few upper level changes are needed, such as accommodate 'contiguous()' to 'contiguous(suggested_memory_format)'. The automatic reorder of oneDNN weight shall have been hidden in ideep.
 
 ## oneDNN NHWC APIs
 

diff --git a/docs/tutorials/getting_started.md b/docs/tutorials/getting_started.md
@@ -5,7 +5,7 @@
 Prebuilt wheel files are released for multiple Python versions. You can install them simply with the following pip command.
 
 ```bash
-python -m pip install torch==1.13.1+xpu torchvision==0.14.1+xpu intel_extension_for_pytorch==1.13.120+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
+python -m pip install torch==1.13.0a0+git6c9b55e torchvision==0.14.1a0 intel_extension_for_pytorch==1.13.120+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
 ```
 
 You can run a simple sanity test to double confirm if the correct version is installed, and if the software stack can get correct hardware information onboard your system.

diff --git a/docs/tutorials/installation.md b/docs/tutorials/installation.md
@@ -16,13 +16,14 @@ Verified Hardware Platforms:
 
 |Hardware|OS|Driver|
 |-|-|-|
-|Intel® Data Center GPU Flex Series|Ubuntu 22.04 (Validated), Red Hat 8.6|[Stable 540](https://dgpu-docs.intel.com/releases/stable_540_20221205.html)|
-|Intel® Data Center GPU Max Series|Red Hat 8.6, Sles 15sp3/sp4 (Validated)|[Stable 540](https://dgpu-docs.intel.com/releases/stable_540_20221205.html)|
-|Intel® Arc™ A-Series Graphics|Ubuntu 22.04|[Stable 540](https://dgpu-docs.intel.com/releases/stable_540_20221205.html)|
-|Intel® Arc™ A-Series Graphics|Windows 11 or Windows 10 21H2 (via WSL2)|[for Windows 11 or Windows 10 21H2](https://www.intel.com/content/www/us/en/download/726609/intel-arc-graphics-windows-dch-driver.html)|
-|CPU (3<sup>rd</sup> and 4<sup>th</sup> Gen of Intel® Xeon® Scalable Processors)|Linux\* distributions with glibc>=2.17. Validated on Ubuntu 18.04.|N/A|
-
-- Intel® oneAPI Base Toolkit 2023.0
+|Intel® Data Center GPU Flex Series|Ubuntu 22.04 (Validated), Red Hat 8.6|[Stable 602](https://dgpu-docs.intel.com/releases/stable_602_20230323.html)|
+|Intel® Data Center GPU Max Series|Ubuntu 22.04, Red Hat 8.6, Sles 15sp3/sp4 (Validated)|[Stable 602](https://dgpu-docs.intel.com/releases/stable_602_20230323.html)|
+|Intel® Arc™ A-Series Graphics|Ubuntu 22.04|[Stable 602](https://dgpu-docs.intel.com/releases/stable_602_20230323.html)|
+|Intel® Arc™ A-Series Graphics|Windows 11 or Windows 10 21H2 (via WSL2)|[for Windows 11 or Windows 10 21H2](https://www.intel.com/content/www/us/en/download/726609/intel-arc-iris-xe-graphics-whql-windows.html)|
+|CPU (3<sup>rd</sup> and 4<sup>th</sup> Gen of Intel® Xeon® Scalable Processors)|Linux\* distributions with glibc>=2.17. Validated on RHEL 8.|N/A|
+
+- Intel® oneAPI Base Toolkit 2023.1
+- [DPC++ Compiler hotfix](https://registrationcenter-download.intel.com/akdlm/IRC_NAS/89283df8-c667-47b0-b7e1-c4573e37bd3e/2023.1-linux-hotfix.zip)
 - Python 3.7-3.10
 - Verified with GNU GCC 11
 
@@ -32,26 +33,35 @@ Verified Hardware Platforms:
 
 |OS|Instructions for installing Intel GPU Driver|
 |-|-|
-|Linux\*|Refer to the [Installation Guides](https://dgpu-docs.intel.com/installation-guides/index.html) for the latest driver installation for individual Linux\* distributions. When installing the verified [Stable 540](https://dgpu-docs.intel.com/releases/stable_540_20221205.html) driver, use a specific version for component package names, such as `sudo apt-get install intel-opencl-icd=22.43.24595.35`|
-|Windows 11 or Windows 10 21H2 (via WSL2)|Please download drivers for Intel® Arc™ A-Series [for Windows 11 or Windows 10 21H2](https://www.intel.com/content/www/us/en/download/726609/intel-arc-graphics-windows-dch-driver.html). Please note that you would have to follow the rest of the steps in WSL2, but the drivers should be installed on Windows|
+|Linux\*|Refer to the [Installation Guides](https://dgpu-docs.intel.com/installation-guides/index.html) for the driver installation on individual Linux\* distributions. When installing the verified driver mentioned in the table above, use the specific version of each component packages mentioned in the installation guide page, such as `sudo apt-get install intel-opencl-icd=<version>`|
+|Windows 11 or Windows 10 21H2 (via WSL2)|Please download drivers for Intel® Arc™ A-Series from the web page mentioned in the table above. Please note that you would have to follow the rest of the steps in WSL2, but the drivers should be installed on Windows. Besides that, please follow Steps 4 & 5 of the [Installation Guides](https://dgpu-docs.intel.com/installation-guides/ubuntu/ubuntu-jammy-arc.html#step-4-install-run-time-packages) on WSL2 Ubuntu 22.04.|
 
 ### Install oneAPI Base Toolkit
 
-Please refer to [Install oneAPI Base Toolkit Packages](https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html#base-kit).
+Please refer to [Install oneAPI Base Toolkit Packages](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html).
 
 Need to install components of Intel® oneAPI Base Toolkit:
  - Intel® oneAPI DPC++ Compiler (`DPCPPROOT` as its installation path)
  - Intel® oneAPI Math Kernel Library (oneMKL) (`MKLROOT` as its installation path)
 
 Default installation location *{ONEAPI_ROOT}* is `/opt/intel/oneapi` for root account, `${HOME}/intel/oneapi` for other accounts. Generally, `DPCPPROOT` is `{ONEAPI_ROOT}/compiler/latest`, `MKLROOT` is `{ONEAPI_ROOT}/mkl/latest`.
 
-**_NOTE:_** You need to activate oneAPI environment when using Intel® Extension for PyTorch\* on Intel GPU.
+A DPC++ compiler patch is required to use with oneAPI Basekit 2023.1.0. Use the command below to download the patch package.
+
+```bash
+wget https://registrationcenter-download.intel.com/akdlm/IRC_NAS/89283df8-c667-47b0-b7e1-c4573e37bd3e/2023.1-linux-hotfix.zip
+```
+
+You can either follow instructions in the `README.txt` of the patch package, or use the commands below to install the patch.
 
 ```bash
+unzip 2023.1-linux-hotfix.zip
+cd 2023.1-linux-hotfix
 source {ONEAPI_ROOT}/setvars.sh
+bash installpatch.sh
 ```
 
-**_NOTE:_** You need to activate ONLY DPC++ compiler and oneMKL environment when compiling Intel® Extension for PyTorch\* from source on Intel GPU.
+If later on you are not using the environment of the patch installation, you need to activate ONLY DPC++ compiler and oneMKL environment later on when no matter **_compiling_** or **_using_** Intel® Extension for PyTorch\* on Intel GPUs.
 
 ```bash
 source {DPCPPROOT}/env/vars.sh
@@ -64,7 +74,7 @@ Intel® Extension for PyTorch\* has to work with a corresponding version of PyTo
 
 |PyTorch Version|Extension Version|
 |--|--|
-|[v1.13.\*](https://github.com/pytorch/pytorch/tree/v1.13.0) (patches needed)|[v1.13.\*](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.10+xpu)|
+|[v1.13.\*](https://github.com/pytorch/pytorch/tree/v1.13.1) (patches needed)|[v1.13.\*](https://github.com/intel/intel-extension-for-pytorch/tree/v1.13.120+xpu)|
 |[v1.10.\*](https://github.com/pytorch/pytorch/tree/v1.10.0) (patches needed)|[v1.10.\*](https://github.com/intel/intel-extension-for-pytorch/tree/v1.10.200+gpu)|
 
 ## Install via wheel files
@@ -73,6 +83,7 @@ Prebuilt wheel files availability matrix for Python versions:
 
 | Extension Version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 |
 | :--: | :--: | :--: | :--: | :--: | :--: |
+| 1.13.120+xpu |  | ✔️ | ✔️ | ✔️ | ✔️ |
 | 1.13.10+xpu |  | ✔️ | ✔️ | ✔️ | ✔️ |
 | 1.10.200+gpu | ✔️ | ✔️ | ✔️ | ✔️ |  |
 
@@ -82,16 +93,14 @@ Prebuilt wheel files for generic Python\* and Intel® Distribution for Python\*
 
 ```bash
 # General Python*
-python -m pip install torch==1.13.0a0 torchvision==0.14.1a0 intel_extension_for_pytorch==1.13.10+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
+python -m pip install torch==1.13.0a0+git6c9b55e torchvision==0.14.1a0 intel_extension_for_pytorch==1.13.120+xpu -f https://developer.intel.com/ipex-whl-stable-xpu
 
 # Intel® Distribution for Python*
-python -m pip install torch==1.13.0a0 torchvision==0.14.1a0 intel_extension_for_pytorch==1.13.10+xpu -f https://developer.intel.com/ipex-whl-stable-xpu-idp
+python -m pip install torch==1.13.0a0+git6c9b55e torchvision==0.14.1a0 intel_extension_for_pytorch==1.13.120+xpu -f https://developer.intel.com/ipex-whl-stable-xpu-idp
 ```
 
 **Note:** Wheel files for Intel® Distribution for Python\* only supports Python 3.9. The support starts from 1.13.10+xpu.
 
-**Note:** Please install Numpy 1.22.3 under Intel® Distribution for Python\*.
-
 **Note:** Installation of TorchVision is optional.
 
 **Note:** You may need to have gomp package in your system (`apt install libgomp1` or `yum/dnf install libgomp`).
@@ -111,7 +120,7 @@ Please refer to [AOT documentation](./AOT.md) for how to configure `USE_AOT_DEVL
 To ensure a smooth compilation of the bundle, including PyTorch\*, torchvision, torchaudio, Intel® Extension for PyTorch\*, a script is provided in the Github repo. If you would like to compile the binaries from source, it is highly recommended to utilize this script.
 
 ```bash
-$ wget https://github.com/intel/intel-extension-for-pytorch/blob/xpu-master/scripts/compile_bundle.sh
+$ wget https://raw.githubusercontent.com/intel/intel-extension-for-pytorch/v1.13.120+xpu/scripts/compile_bundle.sh
 $ bash compile_bundle.sh <DPCPPROOT> <MKLROOT> [AOT]
   DPCPPROOT and MKLROOT are mandatory, should be absolute or relative path to the root directory of DPC++ compiler and oneMKL respectively.
   AOT is optional, should be the text string for environment variable USE_AOT_DEVLIST.