Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README #1

Merged
merged 2 commits into from
Sep 27, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
156 changes: 76 additions & 80 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,26 @@
# Omnitrace: Application Profiling, Tracing, and Analysis
# ROCm Systems Profiler: Application Profiling, Tracing, and Analysis

[![Ubuntu 20.04 with GCC, ROCm, and MPI](https://github.com/ROCm/omnitrace/actions/workflows/ubuntu-focal.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/ubuntu-focal.yml)
[![Ubuntu 22.04 (GCC, Python, ROCm)](https://github.com/ROCm/omnitrace/actions/workflows/ubuntu-jammy.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/ubuntu-jammy.yml)
[![OpenSUSE 15.x with GCC](https://github.com/ROCm/omnitrace/actions/workflows/opensuse.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/opensuse.yml)
[![RedHat Linux (GCC, Python, ROCm)](https://github.com/ROCm/omnitrace/actions/workflows/redhat.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/redhat.yml)
[![Installer Packaging (CPack)](https://github.com/ROCm/omnitrace/actions/workflows/cpack.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/cpack.yml)
[![Documentation](https://github.com/ROCm/omnitrace/actions/workflows/docs.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/docs.yml)
[![Ubuntu 20.04 with GCC, ROCm, and MPI](https://github.com/ROCm/rocprofiler-systems/actions/workflows/ubuntu-focal.yml/badge.svg)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/ubuntu-focal.yml)
[![Ubuntu 22.04 (GCC, Python, ROCm)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/ubuntu-jammy.yml/badge.svg)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/ubuntu-jammy.yml)
[![OpenSUSE 15.x with GCC](https://github.com/ROCm/rocprofiler-systems/actions/workflows/opensuse.yml/badge.svg)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/opensuse.yml)
[![RedHat Linux (GCC, Python, ROCm)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/redhat.yml/badge.svg)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/redhat.yml)
[![Installer Packaging (CPack)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/cpack.yml/badge.svg)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/cpack.yml)
[![Documentation](https://github.com/ROCm/rocprofiler-systems/actions/workflows/docs.yml/badge.svg)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/docs.yml)

> [!NOTE]
> Perfetto validation is done with trace_processor v46.0, as there is a known issue with v47.0.
If you are experiencing problems viewing your trace in the latest version of [Perfetto](http://ui.perfetto.dev), then try using [Perfetto UI v46.0](https://ui.perfetto.dev/v46.0-35b3d9845/#!/).

## Overview

AMD Research is seeking to improve observability and performance analysis for software running on AMD heterogeneous systems.
If you are familiar with [rocprof](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/how-to/using-rocprof.html) and/or [uProf](https://developer.amd.com/amd-uprof/),
you will find many of the capabilities of these tools available via Omnitrace in addition to many new capabilities.

Omnitrace is a comprehensive profiling and tracing tool for parallel applications written in C, C++, Fortran, HIP, OpenCL, and Python which execute on the CPU or CPU+GPU.
ROCm Systems Profiler (rocprof-sys), formerly Omnitrace, is a comprehensive profiling and tracing tool for parallel applications written in C, C++, Fortran, HIP, OpenCL, and Python which execute on the CPU or CPU+GPU.
It is capable of gathering the performance information of functions through any combination of binary instrumentation, call-stack sampling, user-defined regions, and Python interpreter hooks.
Omnitrace supports interactive visualization of comprehensive traces in the web browser in addition to high-level summary profiles with mean/min/max/stddev statistics.
In addition to runtimes, omnitrace supports the collection of system-level metrics such as the CPU frequency, GPU temperature, and GPU utilization, process-level metrics
ROCm Systems Profiler supports interactive visualization of comprehensive traces in the web browser in addition to high-level summary profiles with mean/min/max/stddev statistics.
In addition to runtimes, ROCm Systems Profiler supports the collection of system-level metrics such as the CPU frequency, GPU temperature, and GPU utilization, process-level metrics
such as the memory usage, page-faults, and context-switches, and thread-level metrics such as memory usage, CPU time, and numerous hardware counters.

> [!NOTE]
> Full documentation is available at [Omnitrace documentation](https://rocm.docs.amd.com/projects/omnitrace/en/latest/index.html) in an organized, easy-to-read, searchable format.
> Full documentation is available at [ROCm Systems Profiler documentation](https://rocm.docs.amd.com/projects/omnitrace/en/latest/index.html) in an organized, easy-to-read, searchable format.
The documentation source files reside in the [`/docs`](/docs) folder of this repository. For information on contributing to the documentation, see
[Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html)

Expand Down Expand Up @@ -95,50 +91,50 @@ The documentation source files reside in the [`/docs`](/docs) folder of this rep

### Installation

- Visit [Releases](https://github.com/ROCm/omnitrace/releases) page
- Visit [Releases](https://github.com/ROCm/rocprofiler-systems/releases) page
- Select appropriate installer (recommendation: `.sh` scripts do not require super-user priviledges unlike the DEB/RPM installers)
- If targeting a ROCm application, find the installer script with the matching ROCm version
- If you are unsure about your Linux distro, check `/etc/os-release` or use the `omnitrace-install.py` script
- If you are unsure about your Linux distro, check `/etc/os-release` or use the `rocprof-sys-install.py` script

If the above recommendation is not desired, download the `omnitrace-install.py` and specify `--prefix <install-directory>` when
If the above recommendation is not desired, download the `rocprof-sys-install.py` and specify `--prefix <install-directory>` when
executing it. This script will attempt to auto-detect a compatible OS distribution and version.
If ROCm support is desired, specify `--rocm X.Y` where `X` is the ROCm major version and `Y`
is the ROCm minor version, e.g. `--rocm 5.4`.

```console
wget https://github.com/ROCm/omnitrace/releases/latest/download/omnitrace-install.py
python3 ./omnitrace-install.py --prefix /opt/omnitrace/rocm-5.4 --rocm 5.4
wget https://github.com/ROCm/rocprofiler-systems/releases/latest/download/rocprof-sys-install.py
python3 ./rocprof-sys-install.py --prefix /opt/rocprof-sys/rocm-5.4 --rocm 5.4
```

See the [Omnitrace installation guide](https://rocm.docs.amd.com/projects/omnitrace/en/latest/install/install.html) for detailed information.
See the [ROCm Systems Profiler installation guide](https://rocm.docs.amd.com/projects/omnitrace/en/latest/install/install.html) for detailed information.

### Setup

> NOTE: Replace `/opt/omnitrace` below with installation prefix as necessary.
> NOTE: Replace `/opt/rocprof-sys` below with installation prefix as necessary.

- Option 1: Source `setup-env.sh` script

```bash
source /opt/omnitrace/share/omnitrace/setup-env.sh
source /opt/rocprof-sys/share/rocprof-sys/setup-env.sh
```

- Option 2: Load modulefile

```bash
module use /opt/omnitrace/share/modulefiles
module load omnitrace
module use /opt/rocprof-sys/share/modulefiles
module load rocprof-sys
```

- Option 3: Manual

```bash
export PATH=/opt/omnitrace/bin:${PATH}
export LD_LIBRARY_PATH=/opt/omnitrace/lib:${LD_LIBRARY_PATH}
export PATH=/opt/rocprof-sys/bin:${PATH}
export LD_LIBRARY_PATH=/opt/rocprof-sys/lib:${LD_LIBRARY_PATH}
```

### Omnitrace Settings
### ROCm Systems Profiler Settings

Generate an omnitrace configuration file using `omnitrace-avail -G omnitrace.cfg`. Optionally, use `omnitrace-avail -G omnitrace.cfg --all` for
Generate a rocprof-sys configuration file using `rocprof-sys-avail -G rocprof-sys.cfg`. Optionally, use `rocprof-sys-avail -G rocprof-sys.cfg --all` for
a verbose configuration file with descriptions, categories, etc. Modify the configuration file as desired, e.g. enable
[perfetto](https://perfetto.dev/), [timemory](https://github.com/NERSC/timemory), sampling, and process-level sampling by default
and tweak some sampling default values:
Expand All @@ -155,47 +151,47 @@ OMNITRACE_SAMPLING_CPUS = all
OMNITRACE_SAMPLING_GPUS = $env:HIP_VISIBLE_DEVICES
```

Once the configuration file is adjusted to your preferences, either export the path to this file via `OMNITRACE_CONFIG_FILE=/path/to/omnitrace.cfg`
or place this file in `${HOME}/.omnitrace.cfg` to ensure these values are always read as the default. If you wish to change any of these settings,
Once the configuration file is adjusted to your preferences, either export the path to this file via `OMNITRACE_CONFIG_FILE=/path/to/rocprof-sys.cfg`
or place this file in `${HOME}/.rocprof-sys.cfg` to ensure these values are always read as the default. If you wish to change any of these settings,
you can override them via environment variables or by specifying an alternative `OMNITRACE_CONFIG_FILE`.

### Call-Stack Sampling

The `omnitrace-sample` executable is used to execute call-stack sampling on a target application without binary instrumentation.
Use a double-hypen (`--`) to separate the command-line arguments for `omnitrace-sample` from the target application and it's arguments.
The `rocprof-sys-sample` executable is used to execute call-stack sampling on a target application without binary instrumentation.
Use a double-hypen (`--`) to separate the command-line arguments for `rocprof-sys-sample` from the target application and it's arguments.

```shell
omnitrace-sample --help
omnitrace-sample <omnitrace-options> -- <exe> <exe-options>
omnitrace-sample -f 1000 -- ls -la
rocprof-sys-sample --help
rocprof-sys-sample <rocprof-sys-options> -- <exe> <exe-options>
rocprof-sys-sample -f 1000 -- ls -la
```

### Binary Instrumentation

The `omnitrace` executable is used to instrument an existing binary. Call-stack sampling can be enabled alongside
The `rocprof-sys-instrument` executable is used to instrument an existing binary. Call-stack sampling can be enabled alongside
the execution an instrumented binary, to help "fill in the gaps" between the instrumentation via setting the `OMNITRACE_USE_SAMPLING`
configuration variable to `ON`.
Similar to `omnitrace-sample`, use a double-hypen (`--`) to separate the command-line arguments for `omnitrace` from the target application and it's arguments.
Similar to `rocprof-sys-sample`, use a double-hypen (`--`) to separate the command-line arguments for `rocprof-sys-instrument` from the target application and it's arguments.

```shell
omnitrace-instrument --help
omnitrace-instrument <omnitrace-options> -- <exe-or-library> <exe-options>
rocprof-sys-instrument --help
rocprof-sys-instrument <rocprof-sys-options> -- <exe-or-library> <exe-options>
```

#### Binary Rewrite

Rewrite the text section of an executable or library with instrumentation:

```shell
omnitrace-instrument -o app.inst -- /path/to/app
rocprof-sys-instrument -o app.inst -- /path/to/app
```

In binary rewrite mode, if you also want instrumentation in the linked libraries, you must also rewrite those libraries.
Example of rewriting the functions starting with `"hip"` with instrumentation in the amdhip64 library:

```shell
mkdir -p ./lib
omnitrace-instrument -R '^hip' -o ./lib/libamdhip64.so.4 -- /opt/rocm/lib/libamdhip64.so.4
rocprof-sys-instrument -R '^hip' -o ./lib/libamdhip64.so.4 -- /opt/rocm/lib/libamdhip64.so.4
export LD_LIBRARY_PATH=${PWD}/lib:${LD_LIBRARY_PATH}
```

Expand All @@ -206,33 +202,33 @@ Once you have rewritten your executable and/or libraries with instrumentation, y
or exectuable which loads the instrumented libraries normally, e.g.:

```shell
omnitrace-run -- ./app.inst
rocprof-sys-run -- ./app.inst
```

If you want to re-define certain settings to new default in a binary rewrite, use the `--env` option. This `omnitrace` option
If you want to re-define certain settings to new default in a binary rewrite, use the `--env` option. This `rocprof-sys` option
will set the environment variable to the given value but will not override it. E.g. the default value of `OMNITRACE_PERFETTO_BUFFER_SIZE_KB`
is 1024000 KB (1 GiB):

```shell
# buffer size defaults to 1024000
omnitrace-instrument -o app.inst -- /path/to/app
omnitrace-run -- ./app.inst
rocprof-sys-instrument -o app.inst -- /path/to/app
rocprof-sys-run -- ./app.inst
```

Passing `--env OMNITRACE_PERFETTO_BUFFER_SIZE_KB=5120000` will change the default value in `app.inst` to 5120000 KiB (5 GiB):

```shell
# defaults to 5 GiB buffer size
omnitrace-instrument -o app.inst --env OMNITRACE_PERFETTO_BUFFER_SIZE_KB=5120000 -- /path/to/app
omnitrace-run -- ./app.inst
rocprof-sys-instrument -o app.inst --env OMNITRACE_PERFETTO_BUFFER_SIZE_KB=5120000 -- /path/to/app
rocprof-sys-run -- ./app.inst
```

```shell
# override default 5 GiB buffer size to 200 MB via command-line
omnitrace-run --trace-buffer-size=200000 -- ./app.inst
rocprof-sys-run --trace-buffer-size=200000 -- ./app.inst
# override default 5 GiB buffer size to 200 MB via environment
export OMNITRACE_PERFETTO_BUFFER_SIZE_KB=200000
omnitrace-run -- ./app.inst
rocprof-sys-run -- ./app.inst
```

#### Runtime Instrumentation
Expand All @@ -242,35 +238,35 @@ linked libraries. Thus, it may be useful to exclude those libraries via the `-ME
or exclude specific functions with the `-E` regex option.

```shell
omnitrace-instrument -- /path/to/app
omnitrace-instrument -ME '^(libhsa-runtime64|libz\\.so)' -- /path/to/app
omnitrace-instrument -E 'rocr::atomic|rocr::core|rocr::HSA' -- /path/to/app
rocprof-sys-instrument -- /path/to/app
rocprof-sys-instrument -ME '^(libhsa-runtime64|libz\\.so)' -- /path/to/app
rocprof-sys-instrument -E 'rocr::atomic|rocr::core|rocr::HSA' -- /path/to/app
```

### Python Profiling and Tracing

Use the `omnitrace-python` script to profile/trace Python interpreter function calls.
Use a double-hypen (`--`) to separate the command-line arguments for `omnitrace-python` from the target script and it's arguments.
Use the `rocprof-sys-python` script to profile/trace Python interpreter function calls.
Use a double-hypen (`--`) to separate the command-line arguments for `rocprof-sys-python` from the target script and it's arguments.

```shell
omnitrace-python --help
omnitrace-python <omnitrace-options> -- <python-script> <script-args>
omnitrace-python -- ./script.py
rocprof-sys-python --help
rocprof-sys-python <rocprof-sys-options> -- <python-script> <script-args>
rocprof-sys-python -- ./script.py
```

Please note, the first argument after the double-hyphen *must be a Python script*, e.g. `omnitrace-python -- ./script.py`.
Please note, the first argument after the double-hyphen *must be a Python script*, e.g. `rocprof-sys-python -- ./script.py`.

If you need to specify a specific python interpreter version, use `omnitrace-python-X.Y` where `X.Y` is the Python
If you need to specify a specific python interpreter version, use `rocprof-sys-python-X.Y` where `X.Y` is the Python
major and minor version:

```shell
omnitrace-python-3.8 -- ./script.py
rpcprof-sys-python-3.8 -- ./script.py
```

If you need to specify the full path to a Python interpreter, set the `PYTHON_EXECUTABLE` environment variable:

```shell
PYTHON_EXECUTABLE=/opt/conda/bin/python omnitrace-python -- ./script.py
PYTHON_EXECUTABLE=/opt/conda/bin/python rocprof-sys-python -- ./script.py
```

If you want to restrict the data collection to specific function(s) and its callees, pass the `-b` / `--builtin` option after decorating the
Expand All @@ -297,51 +293,51 @@ for `foo` via the direct call within `spam`. There will be no entries for `bar`

- Visit [ui.perfetto.dev](https://ui.perfetto.dev) in the web-browser
- Select "Open trace file" from panel on the left
- Locate the omnitrace perfetto output (extension: `.proto`)
- Locate the rocprof-sys perfetto output (extension: `.proto`)

![omnitrace-perfetto](docs/data/omnitrace-perfetto.png)
![rocprof-sys-perfetto](docs/data/omnitrace-perfetto.png)

![omnitrace-rocm](docs/data/omnitrace-rocm.png)
![rocprof-sys-rocm](docs/data/omnitrace-rocm.png)

![omnitrace-rocm-flow](docs/data/omnitrace-rocm-flow.png)
![rocprof-sys-rocm-flow](docs/data/omnitrace-rocm-flow.png)

![omnitrace-user-api](docs/data/omnitrace-user-api.png)
![rocprof-sys-user-api](docs/data/omnitrace-user-api.png)

## Using Perfetto tracing with System Backend

Perfetto tracing with the system backend supports multiple processes writing to the same
output file. Thus, it is a useful technique if Omnitrace is built with partial MPI support
output file. Thus, it is a useful technique if rocprof-sys is built with partial MPI support
because all the perfetto output will be coalesced into a single file. The
installation docs for perfetto can be found [here](https://perfetto.dev/docs/contributing/build-instructions).
If you are building omnitrace from source, you can configure CMake with `OMNITRACE_INSTALL_PERFETTO_TOOLS=ON`
If you are building rocprof-sys from source, you can configure CMake with `OMNITRACE_INSTALL_PERFETTO_TOOLS=ON`
and the `perfetto` and `traced` applications will be installed as part of the build process. However,
it should be noted that to prevent this option from accidentally overwriting an existing perfetto install,
all the perfetto executables installed by omnitrace are prefixed with `omnitrace-perfetto-`, except for the `perfetto`
executable, which is just renamed `omnitrace-perfetto`.
all the perfetto executables installed by omnitrace are prefixed with `rocprof-sys-perfetto-`, except for the `perfetto`
executable, which is just renamed `rocprof-sys-perfetto`.

Enable `traced` and `perfetto` in the background:

```shell
pkill traced
traced --background
perfetto --out ./omnitrace-perfetto.proto --txt -c ${OMNITRACE_ROOT}/share/perfetto.cfg --background
perfetto --out ./rocprof-sys-perfetto.proto --txt -c ${OMNITRACE_ROOT}/share/perfetto.cfg --background
```

> ***NOTE: if the perfetto tools were installed by omnitrace, replace `traced` with `omnitrace-perfetto-traced` and***
> ***`perfetto` with `omnitrace-perfetto`.***
> ***NOTE: if the perfetto tools were installed by rocprof-sys, replace `traced` with `rocprof-sys-perfetto-traced` and***
> ***`perfetto` with `rocprof-sys-perfetto`.***

Configure omnitrace to use the perfetto system backend via the `--perfetto-backend` option of `omnitrace-run`:
Configure rocprof-sys to use the perfetto system backend via the `--perfetto-backend` option of `rocprof-sys-run`:

```shell
# enable sampling on the uninstrumented binary
omnitrace-run --sample --trace --perfetto-backend=system -- ./myapp
rocprof-sys-run --sample --trace --perfetto-backend=system -- ./myapp
# trace the instrument the binary
omnitrace-instrument -o ./myapp.inst -- ./myapp
omnitrace-run --trace --perfetto-backend=system -- ./myapp.inst
rocprof-sys-instrument -o ./myapp.inst -- ./myapp
rocprof-sys-run --trace --perfetto-backend=system -- ./myapp.inst
```

or via the `--env` option of `omnitrace-instrument` + runtime instrumentation:
or via the `--env` option of `rocprof-sys-instrument` + runtime instrumentation:

```shell
omnitrace-instrument --env OMNITRACE_PERFETTO_BACKEND=system -- ./myapp
rocprof-sys-instrument --env OMNITRACE_PERFETTO_BACKEND=system -- ./myapp
```