Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Re-enabled mul_mat_batched_sycl #8095

Merged
merged 1 commit into from
Jun 25, 2024
Merged

Conversation

airMeng
Copy link
Collaborator

@airMeng airMeng commented Jun 24, 2024

cherry-pick from #8057, thanks @OuadiElfarouki and @joeatodd

Before

hengyume@mlp-618:~/llama.cpp/master$ ./bin/llama-bench -m ~/llama-2-7b.Q4_0.gguf -ngl 99 -sm none -mg 0
...
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|               Intel Arc A770M Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.28202|
...
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |  none |         pp512 |     11.10 ± 3.48 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |  none |         tg128 |     26.38 ± 0.15 |

build: d62e4aaa (3215)

After

hengyume@mlp-618:~/llama.cpp/build$ ./bin/llama-bench -m ~/llama-2-7b.Q4_0.gguf -ngl 99 -sm none -mg 0
...
|  |                   |                                       |       |Max    |        |Max  |Global |                     |
|  |                   |                                       |       |compute|Max work|sub  |mem    |                     |
|ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]|               Intel Arc A770M Graphics|    1.3|    512|    1024|   32| 16225M|            1.3.28202|
...
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |  none |         pp512 |  212.07 ± 335.47 |
| llama 7B Q4_0                  |   3.56 GiB |     6.74 B | SYCL       |  99 |  none |         tg128 |     28.74 ± 0.13 |

build: a8a9f6c1 (3216)

@airMeng airMeng requested a review from OuadiElfarouki June 24, 2024 13:01
Copy link
Contributor

@OuadiElfarouki OuadiElfarouki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! We wanted to confirm the performance improvements on the master branch from this patch alone first (as the sycl-main includes other improvements) but it should be fine at this level!

@github-actions github-actions bot added the SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language label Jun 24, 2024
@airMeng airMeng force-pushed the sycl-mul-mat-batched branch from a8a9f6c to de26543 Compare June 25, 2024 00:09
@airMeng airMeng merged commit 083bacc into master Jun 25, 2024
64 checks passed
@airMeng airMeng deleted the sycl-mul-mat-batched branch June 25, 2024 02:19
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jun 30, 2024
MagnusS0 pushed a commit to MagnusS0/llama.cpp-normistral-tokenizer that referenced this pull request Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants