Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] Use SYCL runtime wheel instead of PTDB and g++ or clang instead of dpc++ to compile the kernel launcher #1717

Closed
vlad-penkin opened this issue Jul 29, 2024 · 15 comments · Fixed by #1857

Comments

@vlad-penkin
Copy link
Contributor

vlad-penkin commented Jul 29, 2024

Expected results:

  • Working prototype in the feature branch
  • Additional requirements if any for the SYCL runtime wheel
  • Detailed requirements for the CI integration
@vlad-penkin vlad-penkin added this to the 7. CI milestone Jul 29, 2024
@vlad-penkin vlad-penkin changed the title [PoC] Use SYCL runtime wheel instead of PTDB [PoC] Use SYCL runtime wheel instead of PTDB and g++ or clang instead of dpc++ Aug 4, 2024
@vlad-penkin vlad-penkin changed the title [PoC] Use SYCL runtime wheel instead of PTDB and g++ or clang instead of dpc++ [PoC] Use SYCL runtime wheel instead of PTDB and g++ or clang instead of dpc++ to compile the kernel launcher Aug 4, 2024
@ZzEeKkAa
Copy link
Contributor

ZzEeKkAa commented Aug 5, 2024

From what I've experienced - https://pypi.org/project/intel-sycl-rt/ is the same package as used in conda https://anaconda.org/conda-forge/intel-sycl-rt with the exception we need to set some environment variables to find libraries within python environment. In fact I was able to create POC dockerfile with support of https://github.com/IntelPython/dpctl and https://github.com/IntelPython/numba-dpex

@ZzEeKkAa
Copy link
Contributor

ZzEeKkAa commented Aug 5, 2024

The only thing that we need to ask release team to publish 2024.1.4 release of this package (the same that used in PTDB I guess)

@ZzEeKkAa
Copy link
Contributor

ZzEeKkAa commented Aug 5, 2024

There are multiple rt package to fit different purposes here listed here: https://github.com/conda-forge/intel-compiler-repack-feedstock/blob/main/recipe/meta.yaml and there should be corresponding package on pypi. We just need to ask release team to keep pypi in sync.
cc: @xaleryb

@ZzEeKkAa
Copy link
Contributor

ZzEeKkAa commented Aug 6, 2024

So, I was able to create triton runtime environment without PTDB/Oneapi toolkit:

Summary

I've used PTDB 0.5.2.18 to build upstream pytorch (main branch) and intel's triton (llvm-target branch) with some patches. I also did repack of intel-sycl-rt (2024.1.2) with sycl headers from PTDB 0.5.2.18 (compiler 2024.1.3).

Environment setup

python3.9 -m venv ./.venv
source ./.venv/bin/activate
pip install --upgrade pip
pip install ./intel_sycl_rt-2024.1.2-py2.py3-none-manylinux1_x86_64.whl ./torch-2.5.0a0+git7f58740-cp39-cp39-linux_x86_64.whl ./triton-3.0.0-cp39-cp39-linux_x86_64.whl
pip install dpcpp_cpp_rt==2024.1.2 numpy matplotlib pandas
sed -i "s/\/opt\/anaconda1anaconda2anaconda3/$(print ${VIRTUAL_ENV} | sed 's/\//\\\//g')/g" $VIRTUAL_ENV/etc/OpenCL/vendors/*
rm -rf ~/.triton
export LIBRARY_PATH=$LIBRARY_PATH:$VIRTUAL_ENV/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$VIRTUAL_ENV/lib
export CPATH=$CPATH:$VIRTUAL_ENV/include/sycl
mkdir -p $VIRTUAL_ENV/lib/python3.9/site-packages/intel_extension_for_pytorch

Now you can run tutorial by (will use clang++):

wget https://raw.githubusercontent.com/intel/intel-xpu-backend-for-triton/llvm-target/python/tutorials/01-vector-add.py
python ./01-vector-add.py

Or with g++

CXX=g++ python ./01-vector-add.py

Building wheels

Pytorch

Version: upstream main branch
Patches:

Build options: static mkl linking

Triton

Version: upstream intel's triton
Patches:

         if icpx is not None:
             cc_cmd += ["-fsycl"]
+        else:
+            cc_cmd += ["--std=gnu++17"]

Patched sycl runtime

Version: 2024.1.2
Sycl headers version: 2024.1.3 (PTDB release)

wget https://files.pythonhosted.org/packages/cc/1e/d74e608f0c040e4f72dbfcd3b183f39570f054d08de39cc431f153220d90/intel_sycl_rt-2024.1.2-py2.py3-none-manylinux1_x86_64.whl
wheel unpack intel_sycl_rt-2024.1.2-py2.py3-none-manylinux1_x86_64.whl
mkdir -p ./intel_sycl_rt-2024.1.2/intel_sycl_rt-2024.1.2.data/data/include
cp -r /opt/intel/oneapi/compiler/2024.1/include/sycl ./intel_sycl_rt-2024.*/intel_sycl_rt-2024.*.data/data/include/
wheel pack intel_sycl_rt-2024.1.2 --build headers_patch

@ZzEeKkAa
Copy link
Contributor

ZzEeKkAa commented Aug 12, 2024

UPD:

with #1857 you only need LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$VIRTUAL_ENV/lib VERBOSE=1 python ./01-vector-add.py to run triton. LIBRARY_PATH and CPATH are set directly in the PR. However all other preparations are needed. LD_LIBRARY_PATH needed for pytorch to find sycl install.

@vlad-penkin vlad-penkin linked a pull request Aug 12, 2024 that will close this issue
@leshikus
Copy link
Contributor

leshikus commented Aug 15, 2024

I wonder where ./torch-2.5.0a0+git7f58740-cp39-cp39-linux_x86_64.whl ./triton-3.0.0-cp39-cp39-linux_x86_64.whl come from?

@ZzEeKkAa
Copy link
Contributor

I wonder where ./torch-2.5.0a0+git7f58740-cp39-cp39-linux_x86_64.whl ./triton-3.0.0-cp39-cp39-linux_x86_64.whl come from?

You need to build it. I guess intel triton's nightly builds will work too.

@leshikus
Copy link
Contributor

I see that test-triton.sh already works with venv. I wonder if you plan integrating your scenario into the standard build script

@leshikus
Copy link
Contributor

pip says me,

ERROR: intel_sycl_rt-2024.1.2-headers_patch-py2.py3-none-manylinux1_x86_64.whl is not a valid wheel filename.

how did you overcome this?

@leshikus
Copy link
Contributor

leshikus commented Aug 22, 2024

I've just copied a new file on the top of the original file. I wonder if it is possible to keep a name with headers inside. In this case one need to use pip --force-reinstall option, otherwise the package will be skipped if the original was installed

@leshikus
Copy link
Contributor

leshikus commented Aug 22, 2024

I have most test passed locally with this approach (though I've modified it a bit). The next step is to make it work in CI.

#!/bin/sh

set -euvx

name=${1:-triton}

rm -rf intel-xpu-backend-for-triton/
git clone https://github.com/intel/intel-xpu-backend-for-triton -b lesh/remove-flag

set +uvx
. ~/.conda/etc/profile.d/conda.sh
#conda deactivate
conda env remove -n $name
conda env remove -n dpcpp
set -uvx
set -e

conda create -y -n triton python=3.9.*
conda env update -n $name -f intel-xpu-backend-for-triton/scripts/triton.yml
#conda env update -n $name -f intel-xpu-backend-for-triton/scripts/basekit.yml

python -m venv ./.venv
. ./.venv/bin/activate
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH:-}:$VIRTUAL_ENV/lib
export CPATH=${CPATH:-}:$VIRTUAL_ENV/include:$VIRTUAL_ENV/include/sycl

rm -rf intel_sycl_rt-2024.1.2*
wget https://files.pythonhosted.org/packages/cc/1e/d74e608f0c040e4f72dbfcd3b183f39570f054d08de39cc431f153220d90/intel_sycl_rt-2024.1.2-py2.py3-none-manylinux1_x86_64.whl
wheel unpack intel_sycl_rt-2024.1.2-py2.py3-none-manylinux1_x86_64.whl
mkdir -p ./intel_sycl_rt-2024.1.2/intel_sycl_rt-2024.1.2.data/data/include
cp -r /opt/intel/oneapi/compiler/2024.1/include/sycl ./intel_sycl_rt-2024.*/intel_sycl_rt-2024.*.data/data/include/
wheel pack intel_sycl_rt-2024.1.2 --build headers_patch

mv intel_sycl_rt-2024.1.2-headers_patch-py2.py3-none-manylinux1_x86_64.whl intel_sycl_rt-2024.1.2-py2.py3-none-manylinux1_x86_64.whl
pip install --force-reinstall ./intel_sycl_rt-2024.1.2-py2.py3-none-manylinux1_x86_64.whl
pip install dpcpp_cpp_rt==2024.1.2 numpy matplotlib pandas

find /opt/intel/oneapi/mkl/2025.0/lib/ \( -name '*.so' -or -name '*.so.*' \) -exec cp -n {} $HOME/.conda/envs/triton/lib \;
find /opt/intel/oneapi/compiler/2024.1/lib/ \( -name '*.so' -or -name '*.so.*' \) -exec cp -n {} $HOME/.conda/envs/triton/lib \;


export LD_LIBRARY_PATH=/home/jovyan/.conda/envs/triton/lib:${LD_LIBRARY_PATH:-}
ln -snf /usr/include/level_zero ~/.conda/envs/triton/bin/../x86_64-conda-linux-gnu/sysroot/usr/include/level_zero
find /usr -name libze_\* -exec ln -sf {} ~/.conda/envs/triton/lib/ \;

cd intel-xpu-backend-for-triton/
conda run --no-capture-output -n $name scripts/compile-triton.sh --triton 2>&1 | tee ../$name.log

conda run --no-capture-output -n $name bash -v -x scripts/test-triton.sh 2>&1 | tee -a ../$name.log

@ZzEeKkAa
Copy link
Contributor

@leshikus thank you for confirming. As far as I see, it is pretty much the same with the difference:

  • conda is used with python installed, instead of virtual environment
  • multiple conda packages were used instead of system wise packages
  • mkl libraries were added to the environment, instead of statically link them at pytorch build time

@leshikus
Copy link
Contributor

leshikus commented Aug 24, 2024

yes, there are differences; both conda and venv are used; I'm testing PR right now; #2000

  1. conda can be removed; 2) I have no instruction how to compile mkl statically, thus I used the simple variant; 3) another difference is I need more compiler libraries; 4) it is still much smaller dependency set than the original basekit - thanks you for your effort; 5) more tests pass here than in the original conda-basekit workflow

@leshikus
Copy link
Contributor

leshikus commented Aug 24, 2024

@vlad-penkin what do you think about our strategic direction, should it be conda, venv or both? Or none

@vlad-penkin vlad-penkin assigned vlad-penkin and anmyachev and unassigned ZzEeKkAa Aug 26, 2024
@anmyachev
Copy link
Contributor

anmyachev commented Aug 26, 2024

  1. I have no instruction how to compile mkl statically, thus I used the simple variant;

@leshikus these are probably just pip packages, according to the pytorch build script:

pip install mkl-static mkl-include

It might also be necessary to use: export USE_STATIC_MKL=1

anmyachev added a commit that referenced this issue Aug 27, 2024
Add support to https://pypi.org/project/intel-sycl-rt/ wheel package
that is described #1717

---------

Signed-off-by: Anatoly Myachev <[email protected]>
Co-authored-by: Anatoly Myachev <[email protected]>
@vlad-penkin vlad-penkin reopened this Sep 9, 2024
ZzEeKkAa added a commit to ZzEeKkAa/intel-xpu-backend-for-triton that referenced this issue Oct 29, 2024
Add support to https://pypi.org/project/intel-sycl-rt/ wheel package
that is described intel#1717

---------

Signed-off-by: Anatoly Myachev <[email protected]>
Co-authored-by: Anatoly Myachev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment