Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance problem of gemm_a16w8 #12

Open
xiaonans opened this issue Jul 31, 2024 · 0 comments
Open

Performance problem of gemm_a16w8 #12

xiaonans opened this issue Jul 31, 2024 · 0 comments

Comments

@xiaonans
Copy link

xiaonans commented Jul 31, 2024

I tested the performance of gemm_a16w8 kernel on AMD MI200, and found the performance is worse than pytorch(rocmblas) and triton's gemm example (https://github.com/xiaonans/triton-gemm-benchmark/blob/main/03-matrix-multiplication.py), when M is large.

I attached my performance testing results below:
image

In my performance testing, I added some codes so that I can run autotune at the first time, and do benchmark with the saved best_config. The changes I made are main...xiaonans:FLASHNN:main. I run the test with python tests/quant_gemm/test_gemm_weight_only.py.

I want to ask whether my performance testing results are expected, or there is some thing I missed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant