You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my performance testing, I added some codes so that I can run autotune at the first time, and do benchmark with the saved best_config. The changes I made are main...xiaonans:FLASHNN:main. I run the test with python tests/quant_gemm/test_gemm_weight_only.py.
I want to ask whether my performance testing results are expected, or there is some thing I missed?
The text was updated successfully, but these errors were encountered:
I tested the performance of gemm_a16w8 kernel on AMD MI200, and found the performance is worse than pytorch(rocmblas) and triton's gemm example (https://github.com/xiaonans/triton-gemm-benchmark/blob/main/03-matrix-multiplication.py), when M is large.
I attached my performance testing results below:
In my performance testing, I added some codes so that I can run autotune at the first time, and do benchmark with the saved best_config. The changes I made are main...xiaonans:FLASHNN:main. I run the test with
python tests/quant_gemm/test_gemm_weight_only.py
.I want to ask whether my performance testing results are expected, or there is some thing I missed?
The text was updated successfully, but these errors were encountered: