Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion error on gemm_splitk_benchmark.py #2377

Open
etiotto opened this issue Sep 27, 2024 · 3 comments · May be fixed by #2717
Open

Assertion error on gemm_splitk_benchmark.py #2377

etiotto opened this issue Sep 27, 2024 · 3 comments · May be fixed by #2717
Assignees
Labels
bug Something isn't working tests: ut

Comments

@etiotto
Copy link
Contributor

etiotto commented Sep 27, 2024

USE_IPEX=0 python gemm_splitk_benchmark.py

/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py:25: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at /runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/pytorch/aten/src/ATen/native/ReduceOps.cpp:1823.)
 std = torch.std(times)
/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py:25: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at /runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/pytorch/aten/src/ATen/native/ReduceOps.cpp:1823.)
 std = torch.std(times)
Traceback (most recent call last):
 File "/home/jovyan/intel-xpu-backend-for-triton/benchmarks/triton_kernels_benchmark/gemm_splitk_benchmark.py", line 172, in <module>
   benchmark.run(show_plots=False, print_data=True)
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py", line 373, in run
   result_dfs.append(self._run(bench, save_path, show_plots, print_data, **kwargs))
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py", line 307, in _run
   ret = self.fn(**x_args, **{bench.line_arg: y}, **bench.args, **kwrags)
 File "/home/jovyan/intel-xpu-backend-for-triton/benchmarks/triton_kernels_benchmark/gemm_splitk_benchmark.py", line 159, in benchmark
   benchmark_suit.assert_close(triton_fn(), torch_fn(), atol=1e-4, rtol=rtol, err_msg='triton to torch')
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py", line 190, in assert_close
   np.testing.assert_allclose(x, y, atol=atol, rtol=rtol, equal_nan=True)
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 1504, in assert_allclose
   assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/contextlib.py", line 79, in inner
   return func(*args, **kwds)
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 797, in assert_array_compare
   raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.01, atol=0.0001

Mismatched elements: 10485760 / 16777216 (62.5%)
Max absolute difference: 14077.262
Max relative difference: 14.093543
x: array([[14117.621  , 14472.084  , 14199.322  , ..., 14278.562  ,
       14391.052  , 14581.361  ],
      [14417.741  , 14243.687  , 13900.123  , ..., 14021.29   ,...
y: array([[ 992., 1020., 1004., ..., 1008., 1016., 1016.],
      [ 984., 1000.,  980., ...,  988.,  996., 1004.],
      [ 992., 1016.,  984., ..., 1008., 1016., 1012.],...
      ```
@etiotto
Copy link
Contributor Author

etiotto commented Oct 1, 2024

Took #2378 from @LiyangLingIntel because is related to #2374 (which I currently own). Giving @LiyangLingIntel this one as it is related to the streamk implementation he worked on.

@etiotto etiotto removed their assignment Oct 2, 2024
@etiotto
Copy link
Contributor Author

etiotto commented Oct 2, 2024

I believe this fails only for 4Kx4Kx4K shapes, reducing priority and differing it to "make room" for other more important work items.

@LiyangLingIntel
Copy link
Contributor

I believe this fails only for 4Kx4Kx4K shapes, reducing priority and differing it to "make room" for other more important work items.

In my local test, it works for USE_IPEX=1 python gemm_splitk_benchmark.py. We can do further investigation on the diff between IPEX and upstream Pytorch GEMM implementation for XPU.
I agree we can reduce the priority for this issue and move back when other important work items are done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tests: ut
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants