Assertion error on `gemm_splitk_benchmark.py` #2377

etiotto · 2024-09-27T17:36:15Z

USE_IPEX=0 python gemm_splitk_benchmark.py

/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py:25: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at /runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/pytorch/aten/src/ATen/native/ReduceOps.cpp:1823.)
 std = torch.std(times)
/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py:25: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at /runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/pytorch/aten/src/ATen/native/ReduceOps.cpp:1823.)
 std = torch.std(times)
Traceback (most recent call last):
 File "/home/jovyan/intel-xpu-backend-for-triton/benchmarks/triton_kernels_benchmark/gemm_splitk_benchmark.py", line 172, in <module>
   benchmark.run(show_plots=False, print_data=True)
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py", line 373, in run
   result_dfs.append(self._run(bench, save_path, show_plots, print_data, **kwargs))
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py", line 307, in _run
   ret = self.fn(**x_args, **{bench.line_arg: y}, **bench.args, **kwrags)
 File "/home/jovyan/intel-xpu-backend-for-triton/benchmarks/triton_kernels_benchmark/gemm_splitk_benchmark.py", line 159, in benchmark
   benchmark_suit.assert_close(triton_fn(), torch_fn(), atol=1e-4, rtol=rtol, err_msg='triton to torch')
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py", line 190, in assert_close
   np.testing.assert_allclose(x, y, atol=atol, rtol=rtol, equal_nan=True)
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 1504, in assert_allclose
   assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/contextlib.py", line 79, in inner
   return func(*args, **kwds)
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 797, in assert_array_compare
   raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.01, atol=0.0001

Mismatched elements: 10485760 / 16777216 (62.5%)
Max absolute difference: 14077.262
Max relative difference: 14.093543
x: array([[14117.621  , 14472.084  , 14199.322  , ..., 14278.562  ,
       14391.052  , 14581.361  ],
      [14417.741  , 14243.687  , 13900.123  , ..., 14021.29   ,...
y: array([[ 992., 1020., 1004., ..., 1008., 1016., 1016.],
      [ 984., 1000.,  980., ...,  988.,  996., 1004.],
      [ 992., 1016.,  984., ..., 1008., 1016., 1012.],...
      ```

The text was updated successfully, but these errors were encountered:

etiotto · 2024-10-01T20:40:03Z

Took #2378 from @LiyangLingIntel because is related to #2374 (which I currently own). Giving @LiyangLingIntel this one as it is related to the streamk implementation he worked on.

etiotto · 2024-10-02T16:14:03Z

I believe this fails only for 4Kx4Kx4K shapes, reducing priority and differing it to "make room" for other more important work items.

LiyangLingIntel · 2024-10-08T06:43:17Z

I believe this fails only for 4Kx4Kx4K shapes, reducing priority and differing it to "make room" for other more important work items.

In my local test, it works for USE_IPEX=1 python gemm_splitk_benchmark.py. We can do further investigation on the diff between IPEX and upstream Pytorch GEMM implementation for XPU.
I agree we can reduce the priority for this issue and move back when other important work items are done.

etiotto assigned LiyangLingIntel Sep 27, 2024

vlad-penkin added this to the 1.0 [UT and Tutorials][Triton 3.0] Pass rate milestone Sep 30, 2024

vlad-penkin added bug Something isn't working tests: ut labels Sep 30, 2024

vlad-penkin assigned etiotto and unassigned LiyangLingIntel Sep 30, 2024

etiotto mentioned this issue Sep 30, 2024

Improve out-of-box performance for GEMM kernels variants #2379

Closed

etiotto assigned LiyangLingIntel Oct 1, 2024

etiotto removed their assignment Oct 2, 2024

vlad-penkin modified the milestones: 1.0 [UT and Tutorials][Triton 3.0] Pass rate, 1.0 [UT and Tutorials][Triton 3.2] Pass rate Nov 12, 2024

LiyangLingIntel linked a pull request Nov 15, 2024 that will close this issue

Fix assertion error on gemm_splitk_benchmark.py #2717

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assertion error on `gemm_splitk_benchmark.py` #2377

Assertion error on `gemm_splitk_benchmark.py` #2377

etiotto commented Sep 27, 2024 •

edited

Loading

etiotto commented Oct 1, 2024

etiotto commented Oct 2, 2024

LiyangLingIntel commented Oct 8, 2024

Assertion error on gemm_splitk_benchmark.py #2377

Assertion error on gemm_splitk_benchmark.py #2377

Comments

etiotto commented Sep 27, 2024 • edited Loading

etiotto commented Oct 1, 2024

etiotto commented Oct 2, 2024

LiyangLingIntel commented Oct 8, 2024

Assertion error on `gemm_splitk_benchmark.py` #2377

Assertion error on `gemm_splitk_benchmark.py` #2377

etiotto commented Sep 27, 2024 •

edited

Loading