[Issue]: `nvidia-smi not found` #515

joerowell · 2024-02-20T11:37:59Z

Problem Description

The estimate_matmul functionality in Triton relies rather heavily on the underlying stats of the GPU. On CUDA platforms, this functionality is realised by calling nvidia-smi and then parsing the results. I see that this code is still present in this fork of Triton:

triton/python/triton/testing.py

Line 12 in 35edd6a

def nvsmi(attrs):

Would it be possible to get support added for rocm-smi here instead? This makes autotuning Triton kernels for GEMM etc much easier.

Operating System

CPU

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.0.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

The text was updated successfully, but these errors were encountered:

zhanglx13 · 2024-02-21T15:42:50Z

@joerowell We can add it later after we merge this fork with upstream.
For gemm tuning, we have a dedicated script to tune gemm kernels. You can refer to this README for more info and let me know if you have more questions.

zhanglx13 · 2024-04-16T03:14:13Z

@jataylo @micmelesse This seems to be related to the nvsmi related test failure. What is the status of that test?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: `nvidia-smi not found` #515

[Issue]: `nvidia-smi not found` #515

joerowell commented Feb 20, 2024

zhanglx13 commented Feb 21, 2024

zhanglx13 commented Apr 16, 2024

[Issue]: nvidia-smi not found #515

[Issue]: nvidia-smi not found #515

Comments

joerowell commented Feb 20, 2024

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

zhanglx13 commented Feb 21, 2024

zhanglx13 commented Apr 16, 2024

[Issue]: `nvidia-smi not found` #515

[Issue]: `nvidia-smi not found` #515