Improve out-of-box performance for GEMM kernels variants #2379

etiotto · 2024-09-27T18:05:41Z

We have achieved good performance (relative to the XeTLA library) for a GEMM kernel (see http://benchmarks.glados.intel.com/d/1pXX4hUSz/microbenchmarks?orgId=1). Now is time to focus on improving performance of several variants of the GEMM workload:

Work Items

The text was updated successfully, but these errors were encountered:

etiotto · 2024-09-27T18:09:08Z

etiotto · 2024-10-02T17:01:42Z

For GEMM + matrix add (postOp), PR #2400 improves performance from ~66TFlops to ~215TFlops for a 8Kx8Kx8K shape (other shapes also improve).

etiotto · 2024-10-15T13:59:54Z

This is done.

etiotto added the umbrella label Sep 27, 2024

vlad-penkin added this to the 4.0 [Performance] Core milestone Sep 30, 2024

vlad-penkin added the performance label Sep 30, 2024

vlad-penkin assigned etiotto Sep 30, 2024

vlad-penkin added codegen: gemm enhancement New feature or request labels Sep 30, 2024

etiotto closed this as completed Oct 2, 2024

etiotto reopened this Oct 2, 2024

etiotto closed this as completed Oct 15, 2024

Provide feedback