We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We have achieved good performance (relative to the XeTLA library) for a GEMM kernel (see http://benchmarks.glados.intel.com/d/1pXX4hUSz/microbenchmarks?orgId=1). Now is time to focus on improving performance of several variants of the GEMM workload:
tt.dot
Work Items
AllocateSharedMemoryPass
gemm_postop_addmatrix_benchmark.py
The text was updated successfully, but these errors were encountered:
For GEMM + preOp (e.g. exp) applied to one input of tt.dot (https://github.com/intel/intel-xpu-backend-for-triton/blob/main/benchmarks/triton_kernels_benchmark/gemm_preop_exp_benchmark.py) PR #2346 improve performance by 16%.
Sorry, something went wrong.
For GEMM + matrix add (postOp), PR #2400 improves performance from ~66TFlops to ~215TFlops for a 8Kx8Kx8K shape (other shapes also improve).
This is done.
etiotto
No branches or pull requests
We have achieved good performance (relative to the XeTLA library) for a GEMM kernel (see http://benchmarks.glados.intel.com/d/1pXX4hUSz/microbenchmarks?orgId=1). Now is time to focus on improving performance of several variants of the GEMM workload:
tt.dot
(https://github.com/intel/intel-xpu-backend-for-triton/blob/main/benchmarks/triton_kernels_benchmark/gemm_preop_exp_benchmark.py)tt.dot
output (https://github.com/intel/intel-xpu-backend-for-triton/blob/main/benchmarks/triton_kernels_benchmark/gemm_postop_gelu_benchmark.py)tt.dot
output (https://github.com/intel/intel-xpu-backend-for-triton/blob/main/benchmarks/triton_kernels_benchmark/gemm_postop_addmatrix_benchmark.py)Work Items
tt.dot
operand #2346AllocateSharedMemoryPass
has possibility to allocate SLM size greater than device max share memory #1716gemm_postop_addmatrix_benchmark.py
with #2378The text was updated successfully, but these errors were encountered: