Skip to content

Commit

Permalink
Consolidate last 2 tabs
Browse files Browse the repository at this point in the history
  • Loading branch information
peterjunpark committed Jun 6, 2024
1 parent 0d9cef4 commit 915f342
Showing 1 changed file with 54 additions and 59 deletions.
113 changes: 54 additions & 59 deletions docs/tutorial/saxpy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ speaking, you can compute this using a single ``for`` loop over three arrays.
z[i] = a * x[i] + y[i];

In linear algebra libraries, such as BLAS (Basic Linear Algebra Subsystem) this
operation is defined as AXPY "A times X Plus Y". The term SAXPY refers to the
operation is defined as AXPY "A times X Plus Y". The term SAXPY refers to the
single-precision version of this operation

The "S" comes from
Expand Down Expand Up @@ -483,7 +483,7 @@ find out what device binary flavors are embedded into the executable?
that a compute capability 5.2 ISA got embedded into the executable, so devices
which sport compute capability 5.2 or newer will be able to run this code.

.. tab-item:: Windows & AMD
.. tab-item:: Windows and AMD
:sync: windows-amd

The HIP SDK for Windows don't yet sport the ``roc-*`` set of utilities to work
Expand Down Expand Up @@ -630,6 +630,21 @@ format our available devices use.
Name: gfx906
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Now that you know which graphics IPs our devices use, recompile your program with
the appropriate parameters.

.. code-block:: bash
amdclang++ ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -lamdhip64 -L /opt/rocm/lib -O2 --offload-arch=gfx906:sramecc+:xnack-
Now the sample will run.

.. code-block::
./saxpy
Calculating y[i] = a * x[i] + y[i] over 1000000 elements.
First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ]
.. tab-item:: Linux and NVIDIA
:sync: linux-nvidia

Expand Down Expand Up @@ -662,6 +677,26 @@ format our available devices use.
executable but is used by ``nvcc`` to determine what devices are in the
system at hand.

Now that you know which graphics IPs our devices use, recompile your program with
the appropriate parameters.

.. code-block:: bash
nvcc ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -I /opt/rocm/include -O2 -x cu -arch=sm_70,sm_86
.. note::

If you want to portably target the development machine which is compiling, you
may specify ``-arch=native`` instead.

Now the sample will run.

.. code-block::
./saxpy
Calculating y[i] = a * x[i] + y[i] over 1000000 elements.
First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ]
.. tab-item:: Windows and AMD
:sync: windows-amd

Expand All @@ -676,6 +711,21 @@ format our available devices use.
gcnArchName: gfx1032
gcnArchName: gfx1035
Now that you know which graphics IPs our devices use, recompile your program with
the appropriate parameters.

.. code-block:: powershell
clang++ .\HIP-Basic\saxpy\main.hip -o saxpy.exe -I .\Common -lamdhip64 -L ${env:HIP_PATH}lib -O2 --offload-arch=gfx1032 --offload-arch=gfx1035
Now the sample will run.

.. code-block::
.\saxpy.exe
Calculating y[i] = a * x[i] + y[i] over 1000000 elements.
First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ]
.. tab-item:: Windows and NVIDIA
:sync: windows-nvidia

Expand Down Expand Up @@ -709,63 +759,8 @@ format our available devices use.
facing executable but is used by ``nvcc`` to determine what devices are in the
system at hand.

Now that you know which graphics IPs our devices use, recompile your program with
the appropriate parameters.

.. tab-set::

.. tab-item:: Linux and AMD
:sync: linux-amd

.. code-block:: bash
amdclang++ ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -lamdhip64 -L /opt/rocm/lib -O2 --offload-arch=gfx906:sramecc+:xnack-
Now the sample will run.

.. code-block::
./saxpy
Calculating y[i] = a * x[i] + y[i] over 1000000 elements.
First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ]
.. tab-item:: Linux and NVIDIA
:sync: linux-nvidia

.. code-block:: bash
nvcc ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -I /opt/rocm/include -O2 -x cu -arch=sm_70,sm_86
.. note::

If you want to portably target the development machine which is compiling, you
may specify ``-arch=native`` instead.

Now the sample will run.

.. code-block::
./saxpy
Calculating y[i] = a * x[i] + y[i] over 1000000 elements.
First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ]
.. tab-item:: Windows and AMD
:sync: windows-amd

.. code-block:: powershell
clang++ .\HIP-Basic\saxpy\main.hip -o saxpy.exe -I .\Common -lamdhip64 -L ${env:HIP_PATH}lib -O2 --offload-arch=gfx1032 --offload-arch=gfx1035
Now the sample will run.

.. code-block::
.\saxpy.exe
Calculating y[i] = a * x[i] + y[i] over 1000000 elements.
First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ]
.. tab-item:: Windows and NVIDIA
:sync: windows-nvidia
Now that you know which graphics IPs our devices use, recompile your program with
the appropriate parameters.

.. code-block:: powershell
Expand Down

0 comments on commit 915f342

Please sign in to comment.