From a28fe4f6cf7c5abc5d914a32583d6d58dc33a6f8 Mon Sep 17 00:00:00 2001 From: ZhaoqiongZ <106125927+ZhaoqiongZ@users.noreply.github.com> Date: Tue, 10 Sep 2024 19:21:35 +0800 Subject: [PATCH] Update release note and known issues (#4749) * remove basekit activate & add OCL_ICD_VENDORS setting * add python execution example * update CCL_ROOT setting with multi-gpu usage * update compile bundle bat branch to release tag * update example in getting started * correct version of intel-level-zero-gpu * remove deprecated api * set version of accelerate for finetune * fix model name * Update deepspeed in requirements.txt * update release note and known issue * update ipex.llm.optimize * update IPEX_LOG --------- Co-authored-by: Jing Xu Co-authored-by: jundu <52649791+1pikachu@users.noreply.github.com> Co-authored-by: Ye Ting Co-authored-by: zhuyuhua-v --- README.md | 2 +- csrc/gpu/utils/LogImpl.cpp | 5 +- docs/tutorials/api_doc.rst | 72 ++++------------- docs/tutorials/features.rst | 22 ++---- docs/tutorials/features/ipex_log.md | 7 +- docs/tutorials/features/simple_trace.md | 4 +- docs/tutorials/getting_started.md | 13 ++-- docs/tutorials/known_issues.md | 50 +++++++----- docs/tutorials/llm.rst | 5 +- .../llm/int4_weight_only_quantization.md | 23 +++--- .../llm/llm_optimize_transformers.md | 7 +- docs/tutorials/releases.md | 77 +++++++++++++++++++ docs/tutorials/technical_details.rst | 2 +- examples/gpu/llm/README.md | 2 + examples/gpu/llm/fine-tuning/Llama2/README.md | 2 +- examples/gpu/llm/fine-tuning/Phi3/README.md | 2 +- examples/gpu/llm/fine-tuning/README.md | 4 +- examples/gpu/llm/fine-tuning/requirements.txt | 1 + examples/gpu/llm/inference/README.md | 2 +- examples/gpu/llm/inference/requirements.txt | 1 + examples/gpu/llm/requirements.txt | 1 - scripts/compile_bundle.bat | 2 +- .../basekit_driver_install_helper.sh | 4 +- 23 files changed, 175 insertions(+), 135 deletions(-) diff --git a/README.md b/README.md index 7f3e6b4c6..2e5e4b166 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ The extension can be loaded as a Python module for Python programs or linked as In the current technological landscape, Generative AI (GenAI) workloads and models have gained widespread attention and popularity. Large Language Models (LLMs) have emerged as the dominant models driving these GenAI applications. Starting from 2.1.0, specific optimizations for certain LLM models are introduced in the Intel® Extension for PyTorch\*. Check [LLM optimizations CPU](./examples/cpu/inference/python/llm) and [LLM optimizations GPU](./examples/gpu/llm) for details. -### Optimized Model List +### Validated Model List #### LLM Inference diff --git a/csrc/gpu/utils/LogImpl.cpp b/csrc/gpu/utils/LogImpl.cpp index a7a9abed1..ebc330f79 100644 --- a/csrc/gpu/utils/LogImpl.cpp +++ b/csrc/gpu/utils/LogImpl.cpp @@ -15,7 +15,7 @@ spdlog::level::level_enum get_log_level_from_int(int level) { return spdlog::level::critical; } else { throw std::runtime_error( - "USING error log level for IPEX_LOGGING, log level should be -1 to 5, but met " + + "USING error log level for IPEX_LOG, log level should be -1 to 5, but met " + std::string{level}); } } @@ -231,7 +231,7 @@ void EventLogger::print_result(int log_level) { if (this->message_queue.size() >= 2) { auto next_time = this->message_queue.front().timestamp; auto next_step = this->message_queue.front().step_id; - // inside IPEX_LOGGING we are using nanoseconds, 1ns = 0.001us, cast to us + // inside IPEX_LOG we are using nanoseconds, 1ns = 0.001us, cast to us // here auto time_step = static_cast((next_time - this_time) / 1000); log_result_with_args( @@ -283,3 +283,4 @@ void BasicLogger::update_logger() { logger->set_pattern("[%c %z] [%l] [thread %t] %v"); spdlog::set_default_logger(logger); } + diff --git a/docs/tutorials/api_doc.rst b/docs/tutorials/api_doc.rst index d755dcd96..3d35dd00e 100644 --- a/docs/tutorials/api_doc.rst +++ b/docs/tutorials/api_doc.rst @@ -6,66 +6,12 @@ General .. currentmodule:: intel_extension_for_pytorch .. autofunction:: optimize -.. autofunction:: optimize_transformers +.. currentmodule:: intel_extension_for_pytorch.llm +.. autofunction:: optimize +.. currentmodule:: intel_extension_for_pytorch .. autofunction:: get_fp32_math_mode .. autofunction:: set_fp32_math_mode - -Miscellaneous -============= - -.. currentmodule:: intel_extension_for_pytorch.xpu -.. StreamContext -.. can_device_access_peer -.. current_blas_handle -.. autofunction:: current_device -.. autofunction:: current_stream -.. default_stream -.. autoclass:: device -.. autofunction:: device_count -.. autoclass:: device_of -.. autofunction:: get_device_name -.. autofunction:: get_device_properties -.. get_gencode_flags -.. get_sync_debug_mode -.. autofunction:: init -.. ipc_collect -.. autofunction:: is_available -.. autofunction:: is_initialized -.. memory_usage -.. autofunction:: set_device -.. set_stream -.. autofunction:: stream -.. autofunction:: synchronize - -.. currentmodule:: intel_extension_for_pytorch.xpu.fp8.fp8 -.. autofunction:: fp8_autocast - - -Random Number Generator -======================= - -.. currentmodule:: intel_extension_for_pytorch.xpu -.. autofunction:: get_rng_state -.. autofunction:: get_rng_state_all -.. autofunction:: set_rng_state -.. autofunction:: set_rng_state_all -.. autofunction:: manual_seed -.. autofunction:: manual_seed_all -.. autofunction:: seed -.. autofunction:: seed_all -.. autofunction:: initial_seed - -Streams and events -================== - -.. currentmodule:: intel_extension_for_pytorch.xpu -.. autoclass:: Stream - :members: -.. ExternalStream -.. autoclass:: Event - :members: - Memory management ================= @@ -92,9 +38,17 @@ Memory management .. autofunction:: memory_stats_as_nested_dict .. autofunction:: reset_accumulated_memory_stats + +Quantization +============ + +.. currentmodule:: intel_extension_for_pytorch.quantization.fp8 +.. autofunction:: fp8_autocast + + C++ API ======= -.. doxygenenum:: xpu::FP32_MATH_MODE +.. doxygenenum:: torch_ipex::xpu::FP32_MATH_MODE -.. doxygenfunction:: xpu::set_fp32_math_mode +.. doxygenfunction:: torch_ipex::xpu::set_fp32_math_mode diff --git a/docs/tutorials/features.rst b/docs/tutorials/features.rst index b4687da34..597629675 100644 --- a/docs/tutorials/features.rst +++ b/docs/tutorials/features.rst @@ -137,19 +137,6 @@ For more detailed information, check `torch.compile for GPU `_. - -.. toctree:: - :hidden: - :maxdepth: 1 - - features/simple_trace - Kineto Supported Profiler Tool (Prototype) ------------------------------------------ @@ -178,13 +165,13 @@ For more detailed information, check `Compute Engine `_. +For more detailed information, check `IPEX_LOG `_. .. toctree:: :hidden: @@ -193,3 +180,4 @@ For more detailed information, check `IPEX_LOGGING `_. features/ipex_log + diff --git a/docs/tutorials/features/ipex_log.md b/docs/tutorials/features/ipex_log.md index b872e4e46..1e468eb35 100644 --- a/docs/tutorials/features/ipex_log.md +++ b/docs/tutorials/features/ipex_log.md @@ -1,11 +1,11 @@ -`IPEX_LOGGING` (Prototype) +`IPEX_LOG` (Prototype) ========================== ## Introduction -`IPEX_LOGGING` provides the capability to log verbose information from Intel® Extension for PyTorch\* . Please use `IPEX_LOGGING` to get the log information or trace the execution from Intel® Extension for PyTorch\*. Please continue using PyTorch\* macros such as `TORCH_CHECK`, `TORCH_ERROR`, etc. to get the log information from PyTorch\*. +`IPEX_LOG` provides the capability to log verbose information from Intel® Extension for PyTorch\* . Please use `IPEX_LOG` to get the log information or trace the execution from Intel® Extension for PyTorch\*. Please continue using PyTorch\* macros such as `TORCH_CHECK`, `TORCH_ERROR`, etc. to get the log information from PyTorch\*. -## `IPEX_LOGGING` Definition +## `IPEX_LOG` Definition ### Log Level The supported log levels are defined as follows, default log level is `DISABLED`: @@ -81,3 +81,4 @@ Use `torch.xpu.set_log_level(0)` to get logs to replace the previous usage in `I ## Replace `IPEX_VERBOSE` Use `torch.xpu.set_log_level(1)` to get logs to replace the previous usage in `IPEX_VERBOSE`. + diff --git a/docs/tutorials/features/simple_trace.md b/docs/tutorials/features/simple_trace.md index aa73f89dc..92671e646 100644 --- a/docs/tutorials/features/simple_trace.md +++ b/docs/tutorials/features/simple_trace.md @@ -1,5 +1,5 @@ -Simple Trace Tool (Prototype) -============================= +Simple Trace Tool (Deprecated) +============================== ## Introduction diff --git a/docs/tutorials/getting_started.md b/docs/tutorials/getting_started.md index d1895c615..d96d2fe7c 100644 --- a/docs/tutorials/getting_started.md +++ b/docs/tutorials/getting_started.md @@ -32,9 +32,9 @@ model = ipex.optimize(model, dtype=dtype) ########## FP32 ############ with torch.no_grad(): ####### BF16 on CPU ######## -with torch.no_grad(), with torch.cpu.amp.autocast(): +with torch.no_grad(), torch.cpu.amp.autocast(): ##### BF16/FP16 on GPU ##### -with torch.no_grad(), with torch.xpu.amp.autocast(enabled=True, dtype=dtype, cache_enabled=False): +with torch.no_grad(), torch.xpu.amp.autocast(enabled=True, dtype=dtype, cache_enabled=False): ############################ ###### Torchscript ####### model = torch.jit.trace(model, data) @@ -49,13 +49,14 @@ More examples, including training and usage of low precision data types are avai ## Execution -Execution requires an active Intel® oneAPI environment. Suppose you have the Intel® oneAPI Base Toolkit installed in `/opt/intel/oneapi` directory, activating the environment is as simple as sourcing its environment activation bash scripts. - There are some environment variables in runtime that can be used to configure executions on GPU. Please check [Advanced Configuration](./features/advanced_configuration.html#runtime-configuration) for more detailed information. +Set `OCL_ICD_VENDORS` with default path `/etc/OpenCL/vendors`. +Set `CCL_ROOT` if you are using multi-GPU. + ```bash -source /opt/intel/oneapi/compiler/latest/env/vars.sh -source /opt/intel/oneapi/mkl/latest/env/vars.sh +export OCL_ICD_VENDORS=/etc/OpenCL/vendors +export CCL_ROOT=${CONDA_PREFIX} python