[SYCL] Re-enabled mul_mat_batched_sycl #8095

airMeng · 2024-06-24T12:57:53Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

cherry-pick from #8057, thanks @OuadiElfarouki and @joeatodd

Before

hengyume@mlp-618:~/llama.cpp/master$ ./bin/llama-bench -m ~/llama-2-7b.Q4_0.gguf -ngl 99 -sm none -mg 0
...
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|               Intel Arc A770M Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.28202|
...
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |  none |         pp512 |     11.10 ± 3.48 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |  none |         tg128 |     26.38 ± 0.15 |

build: d62e4aaa (3215)

After

hengyume@mlp-618:~/llama.cpp/build$ ./bin/llama-bench -m ~/llama-2-7b.Q4_0.gguf -ngl 99 -sm none -mg 0
...
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|               Intel Arc A770M Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.28202|
...
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |  none |         pp512 |  212.07 ± 335.47 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |  none |         tg128 |     28.74 ± 0.13 |

build: a8a9f6c1 (3216)

OuadiElfarouki

Sounds good! We wanted to confirm the performance improvements on the master branch from this patch alone first (as the sycl-main includes other improvements) but it should be fine at this level!

airMeng requested a review from OuadiElfarouki June 24, 2024 13:01

OuadiElfarouki approved these changes Jun 24, 2024

View reviewed changes

github-actions bot added the SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language label Jun 24, 2024

Re-enabled mul_mat_batched_sycl

de26543

airMeng force-pushed the sycl-mul-mat-batched branch from a8a9f6c to de26543 Compare June 25, 2024 00:09

airMeng merged commit 083bacc into master Jun 25, 2024
64 checks passed

airMeng deleted the sycl-mul-mat-batched branch June 25, 2024 02:19

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jun 30, 2024

[SYCL] Re-enabled mul_mat_batched_sycl (ggerganov#8095)

7a51756

MagnusS0 pushed a commit to MagnusS0/llama.cpp-normistral-tokenizer that referenced this pull request Jul 1, 2024

[SYCL] Re-enabled mul_mat_batched_sycl (ggerganov#8095)

f316764

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Re-enabled mul_mat_batched_sycl #8095

[SYCL] Re-enabled mul_mat_batched_sycl #8095

airMeng commented Jun 24, 2024 •

edited

Loading

OuadiElfarouki left a comment

[SYCL] Re-enabled mul_mat_batched_sycl #8095

[SYCL] Re-enabled mul_mat_batched_sycl #8095

Conversation

airMeng commented Jun 24, 2024 • edited Loading

OuadiElfarouki left a comment

Choose a reason for hiding this comment

airMeng commented Jun 24, 2024 •

edited

Loading