-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SYCL: Fixes for building SYCL backend for AMD GPUs #10851
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -75,7 +75,7 @@ else() | |
message(ERROR "Can't enable SYCL hip backend, GGML_SYCL_DEVICE_ARCH has not been set.") | ||
endif() | ||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsycl-targets=amdgcn-amd-amdhsa") | ||
target_link_libraries(ggml-sycl PRIVATE sycl pthread m dl onemkl) | ||
target_link_libraries(ggml-sycl PRIVATE sycl pthread m dl onemath_blas_rocblas) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this needed exactly?
If there are any issues it may be easier to stick to a slightly older version of oneMKL Interface for now: uxlfoundation/oneMath@c00154c There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When the onemkl is there it throws an error despite the AMD compiled version of oneMKL being in the LD_LIBRARY_PATH. I can't explain why it worked necessarily, DLL-hell is not my strong suit, I can only say that after building the Codeplay onAPI linked in the SYCL.md doc, no matter what I did cmake would fail complaining about onemkl until I tracked down the linking here and modified it. Have you gotten this to work successfully without this? I'm using the 2025.0.0 / rocm-6.1.0-linux version from the codeplay site. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see, thanks. We can either try to understand this if you paste the error log or we can try to reproduce the issue on our side once we have time. Most of us are on holiday so this will take some time. We have not tested on AMD devices recently. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense, I have a bunch of year-end stuff as well, we can revisit in the new year or when #10584 makes some progress, not much point building successfully if it doesn't run anyway. 😅 Happy holidays! |
||
endif() | ||
|
||
if (GGML_SYCL_DEVICE_ARCH) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding
CMAKE_SHARED_LINKER_FLAGS
should not be needed, I'm worried this could hide a deeper issue. The CMake targets should add all the flags needed. Can you describe what was the error without this change and how do you set up your environment?Also note that all the
mkl
flags are referring to the Intel oneMKL product which runs only on Intel devices. This is different from oneMath (previously known as oneMKL Interface) which can dispatch to other SYCL devices.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another thing where maybe it shouldn't be needed, and maybe hints at some bigger problem, but until I started stuffing in the additional linker flags, it would fail. While maybe not optimal, these were close to the minimum number of changes required for me to successfully build the SYCL backend for my RDNA3 / gfx1100 GPU, and for me, following the instructions/using the code as it is in HEAD 100% failed for me.
This is obviously a very niche codepath (and like I mentioned, even after building reveals that the code itself has problems running), so I if someone else has built this recently and it works, feel free to close out the PR, but I went through the instructions very carefully and I'd expect others to see the same errors I did.
(Personally, I was just curious to see what the inference speed would be after doing some recent
llama-bench
testing and being surprised that tg128 perf on the hipified ROCm backend was actually slower than Vulkan backend in recent builds and was curious to see how the SYCL backend would compare.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a side note, how does it compare? I'm happy to hear Vulkan tg performance is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, sadly after compiling SYCL successfully, I end up getting errors with
gemm_batch
when trying to benchmark: #10850But, here's the results for Vulkan vs ROCm backends. While Vulkan predictably loses pp512, it manages to beat ROCm on tg128. The W7900 has 864.0 GB/s of theoretical MBW, so the Vulkan inferences at 49.0% of that while the the ROCm backend is at 40.8%. (Just as a point of comparison, my 3090 with the CUDA backend gets 69.1% of MBW for tg128)