forked from ggerganov/llama.cpp
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
vulkan: Optimize contiguous copies (ggerganov#10254)
* tests: Fix memory bandwidth calculation for perf tests Add a flops calculation for flash attention. Add one GGML_OP_CPY perf test. * vulkan: Optimize contiguous copies Add a variant of the copy shader for when the tensors are contiguous. Avoid the complex addressing calculations, and do four elements per invocation to hide some other overhead. Apply similar changes to the scale shader, since scale is always contiguous. Add a "progress bar" for shader compiles.
- Loading branch information
1 parent
54ef9cf
commit 80dd7ff
Showing
13 changed files
with
144 additions
and
27 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
#version 450 | ||
|
||
#include "types.comp" | ||
#include "generic_unary_head.comp" | ||
|
||
#extension GL_EXT_control_flow_attributes : require | ||
|
||
const uint num_threads = 128; | ||
|
||
layout(local_size_x = num_threads, local_size_y = 1, local_size_z = 1) in; | ||
|
||
void main() { | ||
uint idx = get_idx(); | ||
|
||
// num_threads * num_iter must equal 512, to match the wg_denoms and get_idx calculation | ||
const uint num_iter = 4; | ||
|
||
// fast path for when all four iterations are in-bounds | ||
if (idx + (num_iter-1)*num_threads < p.ne) { | ||
[[unroll]] for (uint i = 0; i < num_iter; ++i) { | ||
#ifndef OPTIMIZATION_ERROR_WORKAROUND | ||
data_d[p.d_offset + idx] = D_TYPE(data_a[idx]); | ||
#else | ||
data_d[p.d_offset + idx] = data_a[idx]; | ||
#endif | ||
idx += num_threads; | ||
} | ||
} else { | ||
[[unroll]] for (uint i = 0; i < num_iter; ++i) { | ||
if (idx >= p.ne) { | ||
continue; | ||
} | ||
|
||
#ifndef OPTIMIZATION_ERROR_WORKAROUND | ||
data_d[p.d_offset + idx] = D_TYPE(data_a[idx]); | ||
#else | ||
data_d[p.d_offset + idx] = data_a[idx]; | ||
#endif | ||
idx += num_threads; | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters