diff --git a/README.md b/README.md index 9abe9c8c..34c643a3 100755 --- a/README.md +++ b/README.md @@ -1,11 +1,11 @@ -# Omnitrace: Application Profiling, Tracing, and Analysis +# ROCm Systems Profiler: Application Profiling, Tracing, and Analysis -[![Ubuntu 20.04 with GCC, ROCm, and MPI](https://github.com/ROCm/omnitrace/actions/workflows/ubuntu-focal.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/ubuntu-focal.yml) -[![Ubuntu 22.04 (GCC, Python, ROCm)](https://github.com/ROCm/omnitrace/actions/workflows/ubuntu-jammy.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/ubuntu-jammy.yml) -[![OpenSUSE 15.x with GCC](https://github.com/ROCm/omnitrace/actions/workflows/opensuse.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/opensuse.yml) -[![RedHat Linux (GCC, Python, ROCm)](https://github.com/ROCm/omnitrace/actions/workflows/redhat.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/redhat.yml) -[![Installer Packaging (CPack)](https://github.com/ROCm/omnitrace/actions/workflows/cpack.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/cpack.yml) -[![Documentation](https://github.com/ROCm/omnitrace/actions/workflows/docs.yml/badge.svg)](https://github.com/ROCm/omnitrace/actions/workflows/docs.yml) +[![Ubuntu 20.04 with GCC, ROCm, and MPI](https://github.com/ROCm/rocprofiler-systems/actions/workflows/ubuntu-focal.yml/badge.svg)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/ubuntu-focal.yml) +[![Ubuntu 22.04 (GCC, Python, ROCm)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/ubuntu-jammy.yml/badge.svg)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/ubuntu-jammy.yml) +[![OpenSUSE 15.x with GCC](https://github.com/ROCm/rocprofiler-systems/actions/workflows/opensuse.yml/badge.svg)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/opensuse.yml) +[![RedHat Linux (GCC, Python, ROCm)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/redhat.yml/badge.svg)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/redhat.yml) +[![Installer Packaging (CPack)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/cpack.yml/badge.svg)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/cpack.yml) +[![Documentation](https://github.com/ROCm/rocprofiler-systems/actions/workflows/docs.yml/badge.svg)](https://github.com/ROCm/rocprofiler-systems/actions/workflows/docs.yml) > [!NOTE] > Perfetto validation is done with trace_processor v46.0, as there is a known issue with v47.0. @@ -13,18 +13,14 @@ If you are experiencing problems viewing your trace in the latest version of [Pe ## Overview -AMD Research is seeking to improve observability and performance analysis for software running on AMD heterogeneous systems. -If you are familiar with [rocprof](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/how-to/using-rocprof.html) and/or [uProf](https://developer.amd.com/amd-uprof/), -you will find many of the capabilities of these tools available via Omnitrace in addition to many new capabilities. - -Omnitrace is a comprehensive profiling and tracing tool for parallel applications written in C, C++, Fortran, HIP, OpenCL, and Python which execute on the CPU or CPU+GPU. +ROCm Systems Profiler (rocprof-sys), formerly Omnitrace, is a comprehensive profiling and tracing tool for parallel applications written in C, C++, Fortran, HIP, OpenCL, and Python which execute on the CPU or CPU+GPU. It is capable of gathering the performance information of functions through any combination of binary instrumentation, call-stack sampling, user-defined regions, and Python interpreter hooks. -Omnitrace supports interactive visualization of comprehensive traces in the web browser in addition to high-level summary profiles with mean/min/max/stddev statistics. -In addition to runtimes, omnitrace supports the collection of system-level metrics such as the CPU frequency, GPU temperature, and GPU utilization, process-level metrics +ROCm Systems Profiler supports interactive visualization of comprehensive traces in the web browser in addition to high-level summary profiles with mean/min/max/stddev statistics. +In addition to runtimes, ROCm Systems Profiler supports the collection of system-level metrics such as the CPU frequency, GPU temperature, and GPU utilization, process-level metrics such as the memory usage, page-faults, and context-switches, and thread-level metrics such as memory usage, CPU time, and numerous hardware counters. > [!NOTE] -> Full documentation is available at [Omnitrace documentation](https://rocm.docs.amd.com/projects/omnitrace/en/latest/index.html) in an organized, easy-to-read, searchable format. +> Full documentation is available at [ROCm Systems Profiler documentation](https://rocm.docs.amd.com/projects/omnitrace/en/latest/index.html) in an organized, easy-to-read, searchable format. The documentation source files reside in the [`/docs`](/docs) folder of this repository. For information on contributing to the documentation, see [Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html) @@ -95,50 +91,50 @@ The documentation source files reside in the [`/docs`](/docs) folder of this rep ### Installation -- Visit [Releases](https://github.com/ROCm/omnitrace/releases) page +- Visit [Releases](https://github.com/ROCm/rocprofiler-systems/releases) page - Select appropriate installer (recommendation: `.sh` scripts do not require super-user priviledges unlike the DEB/RPM installers) - If targeting a ROCm application, find the installer script with the matching ROCm version - - If you are unsure about your Linux distro, check `/etc/os-release` or use the `omnitrace-install.py` script + - If you are unsure about your Linux distro, check `/etc/os-release` or use the `rocprof-sys-install.py` script -If the above recommendation is not desired, download the `omnitrace-install.py` and specify `--prefix ` when +If the above recommendation is not desired, download the `rocprof-sys-install.py` and specify `--prefix ` when executing it. This script will attempt to auto-detect a compatible OS distribution and version. If ROCm support is desired, specify `--rocm X.Y` where `X` is the ROCm major version and `Y` is the ROCm minor version, e.g. `--rocm 5.4`. ```console -wget https://github.com/ROCm/omnitrace/releases/latest/download/omnitrace-install.py -python3 ./omnitrace-install.py --prefix /opt/omnitrace/rocm-5.4 --rocm 5.4 +wget https://github.com/ROCm/rocprofiler-systems/releases/latest/download/rocprof-sys-install.py +python3 ./rocprof-sys-install.py --prefix /opt/rocprof-sys/rocm-5.4 --rocm 5.4 ``` -See the [Omnitrace installation guide](https://rocm.docs.amd.com/projects/omnitrace/en/latest/install/install.html) for detailed information. +See the [ROCm Systems Profiler installation guide](https://rocm.docs.amd.com/projects/omnitrace/en/latest/install/install.html) for detailed information. ### Setup -> NOTE: Replace `/opt/omnitrace` below with installation prefix as necessary. +> NOTE: Replace `/opt/rocprof-sys` below with installation prefix as necessary. - Option 1: Source `setup-env.sh` script ```bash -source /opt/omnitrace/share/omnitrace/setup-env.sh +source /opt/rocprof-sys/share/rocprof-sys/setup-env.sh ``` - Option 2: Load modulefile ```bash -module use /opt/omnitrace/share/modulefiles -module load omnitrace +module use /opt/rocprof-sys/share/modulefiles +module load rocprof-sys ``` - Option 3: Manual ```bash -export PATH=/opt/omnitrace/bin:${PATH} -export LD_LIBRARY_PATH=/opt/omnitrace/lib:${LD_LIBRARY_PATH} +export PATH=/opt/rocprof-sys/bin:${PATH} +export LD_LIBRARY_PATH=/opt/rocprof-sys/lib:${LD_LIBRARY_PATH} ``` -### Omnitrace Settings +### ROCm Systems Profiler Settings -Generate an omnitrace configuration file using `omnitrace-avail -G omnitrace.cfg`. Optionally, use `omnitrace-avail -G omnitrace.cfg --all` for +Generate a rocprof-sys configuration file using `rocprof-sys-avail -G rocprof-sys.cfg`. Optionally, use `rocprof-sys-avail -G rocprof-sys.cfg --all` for a verbose configuration file with descriptions, categories, etc. Modify the configuration file as desired, e.g. enable [perfetto](https://perfetto.dev/), [timemory](https://github.com/NERSC/timemory), sampling, and process-level sampling by default and tweak some sampling default values: @@ -155,31 +151,31 @@ OMNITRACE_SAMPLING_CPUS = all OMNITRACE_SAMPLING_GPUS = $env:HIP_VISIBLE_DEVICES ``` -Once the configuration file is adjusted to your preferences, either export the path to this file via `OMNITRACE_CONFIG_FILE=/path/to/omnitrace.cfg` -or place this file in `${HOME}/.omnitrace.cfg` to ensure these values are always read as the default. If you wish to change any of these settings, +Once the configuration file is adjusted to your preferences, either export the path to this file via `OMNITRACE_CONFIG_FILE=/path/to/rocprof-sys.cfg` +or place this file in `${HOME}/.rocprof-sys.cfg` to ensure these values are always read as the default. If you wish to change any of these settings, you can override them via environment variables or by specifying an alternative `OMNITRACE_CONFIG_FILE`. ### Call-Stack Sampling -The `omnitrace-sample` executable is used to execute call-stack sampling on a target application without binary instrumentation. -Use a double-hypen (`--`) to separate the command-line arguments for `omnitrace-sample` from the target application and it's arguments. +The `rocprof-sys-sample` executable is used to execute call-stack sampling on a target application without binary instrumentation. +Use a double-hypen (`--`) to separate the command-line arguments for `rocprof-sys-sample` from the target application and it's arguments. ```shell -omnitrace-sample --help -omnitrace-sample -- -omnitrace-sample -f 1000 -- ls -la +rocprof-sys-sample --help +rocprof-sys-sample -- +rocprof-sys-sample -f 1000 -- ls -la ``` ### Binary Instrumentation -The `omnitrace` executable is used to instrument an existing binary. Call-stack sampling can be enabled alongside +The `rocprof-sys-instrument` executable is used to instrument an existing binary. Call-stack sampling can be enabled alongside the execution an instrumented binary, to help "fill in the gaps" between the instrumentation via setting the `OMNITRACE_USE_SAMPLING` configuration variable to `ON`. -Similar to `omnitrace-sample`, use a double-hypen (`--`) to separate the command-line arguments for `omnitrace` from the target application and it's arguments. +Similar to `rocprof-sys-sample`, use a double-hypen (`--`) to separate the command-line arguments for `rocprof-sys-instrument` from the target application and it's arguments. ```shell -omnitrace-instrument --help -omnitrace-instrument -- +rocprof-sys-instrument --help +rocprof-sys-instrument -- ``` #### Binary Rewrite @@ -187,7 +183,7 @@ omnitrace-instrument -- Rewrite the text section of an executable or library with instrumentation: ```shell -omnitrace-instrument -o app.inst -- /path/to/app +rocprof-sys-instrument -o app.inst -- /path/to/app ``` In binary rewrite mode, if you also want instrumentation in the linked libraries, you must also rewrite those libraries. @@ -195,7 +191,7 @@ Example of rewriting the functions starting with `"hip"` with instrumentation in ```shell mkdir -p ./lib -omnitrace-instrument -R '^hip' -o ./lib/libamdhip64.so.4 -- /opt/rocm/lib/libamdhip64.so.4 +rocprof-sys-instrument -R '^hip' -o ./lib/libamdhip64.so.4 -- /opt/rocm/lib/libamdhip64.so.4 export LD_LIBRARY_PATH=${PWD}/lib:${LD_LIBRARY_PATH} ``` @@ -206,33 +202,33 @@ Once you have rewritten your executable and/or libraries with instrumentation, y or exectuable which loads the instrumented libraries normally, e.g.: ```shell -omnitrace-run -- ./app.inst +rocprof-sys-run -- ./app.inst ``` -If you want to re-define certain settings to new default in a binary rewrite, use the `--env` option. This `omnitrace` option +If you want to re-define certain settings to new default in a binary rewrite, use the `--env` option. This `rocprof-sys` option will set the environment variable to the given value but will not override it. E.g. the default value of `OMNITRACE_PERFETTO_BUFFER_SIZE_KB` is 1024000 KB (1 GiB): ```shell # buffer size defaults to 1024000 -omnitrace-instrument -o app.inst -- /path/to/app -omnitrace-run -- ./app.inst +rocprof-sys-instrument -o app.inst -- /path/to/app +rocprof-sys-run -- ./app.inst ``` Passing `--env OMNITRACE_PERFETTO_BUFFER_SIZE_KB=5120000` will change the default value in `app.inst` to 5120000 KiB (5 GiB): ```shell # defaults to 5 GiB buffer size -omnitrace-instrument -o app.inst --env OMNITRACE_PERFETTO_BUFFER_SIZE_KB=5120000 -- /path/to/app -omnitrace-run -- ./app.inst +rocprof-sys-instrument -o app.inst --env OMNITRACE_PERFETTO_BUFFER_SIZE_KB=5120000 -- /path/to/app +rocprof-sys-run -- ./app.inst ``` ```shell # override default 5 GiB buffer size to 200 MB via command-line -omnitrace-run --trace-buffer-size=200000 -- ./app.inst +rocprof-sys-run --trace-buffer-size=200000 -- ./app.inst # override default 5 GiB buffer size to 200 MB via environment export OMNITRACE_PERFETTO_BUFFER_SIZE_KB=200000 -omnitrace-run -- ./app.inst +rocprof-sys-run -- ./app.inst ``` #### Runtime Instrumentation @@ -242,35 +238,35 @@ linked libraries. Thus, it may be useful to exclude those libraries via the `-ME or exclude specific functions with the `-E` regex option. ```shell -omnitrace-instrument -- /path/to/app -omnitrace-instrument -ME '^(libhsa-runtime64|libz\\.so)' -- /path/to/app -omnitrace-instrument -E 'rocr::atomic|rocr::core|rocr::HSA' -- /path/to/app +rocprof-sys-instrument -- /path/to/app +rocprof-sys-instrument -ME '^(libhsa-runtime64|libz\\.so)' -- /path/to/app +rocprof-sys-instrument -E 'rocr::atomic|rocr::core|rocr::HSA' -- /path/to/app ``` ### Python Profiling and Tracing -Use the `omnitrace-python` script to profile/trace Python interpreter function calls. -Use a double-hypen (`--`) to separate the command-line arguments for `omnitrace-python` from the target script and it's arguments. +Use the `rocprof-sys-python` script to profile/trace Python interpreter function calls. +Use a double-hypen (`--`) to separate the command-line arguments for `rocprof-sys-python` from the target script and it's arguments. ```shell -omnitrace-python --help -omnitrace-python -- -omnitrace-python -- ./script.py +rocprof-sys-python --help +rocprof-sys-python -- +rocprof-sys-python -- ./script.py ``` -Please note, the first argument after the double-hyphen *must be a Python script*, e.g. `omnitrace-python -- ./script.py`. +Please note, the first argument after the double-hyphen *must be a Python script*, e.g. `rocprof-sys-python -- ./script.py`. -If you need to specify a specific python interpreter version, use `omnitrace-python-X.Y` where `X.Y` is the Python +If you need to specify a specific python interpreter version, use `rocprof-sys-python-X.Y` where `X.Y` is the Python major and minor version: ```shell -omnitrace-python-3.8 -- ./script.py +rpcprof-sys-python-3.8 -- ./script.py ``` If you need to specify the full path to a Python interpreter, set the `PYTHON_EXECUTABLE` environment variable: ```shell -PYTHON_EXECUTABLE=/opt/conda/bin/python omnitrace-python -- ./script.py +PYTHON_EXECUTABLE=/opt/conda/bin/python rocprof-sys-python -- ./script.py ``` If you want to restrict the data collection to specific function(s) and its callees, pass the `-b` / `--builtin` option after decorating the @@ -297,51 +293,51 @@ for `foo` via the direct call within `spam`. There will be no entries for `bar` - Visit [ui.perfetto.dev](https://ui.perfetto.dev) in the web-browser - Select "Open trace file" from panel on the left -- Locate the omnitrace perfetto output (extension: `.proto`) +- Locate the rocprof-sys perfetto output (extension: `.proto`) -![omnitrace-perfetto](docs/data/omnitrace-perfetto.png) +![rocprof-sys-perfetto](docs/data/omnitrace-perfetto.png) -![omnitrace-rocm](docs/data/omnitrace-rocm.png) +![rocprof-sys-rocm](docs/data/omnitrace-rocm.png) -![omnitrace-rocm-flow](docs/data/omnitrace-rocm-flow.png) +![rocprof-sys-rocm-flow](docs/data/omnitrace-rocm-flow.png) -![omnitrace-user-api](docs/data/omnitrace-user-api.png) +![rocprof-sys-user-api](docs/data/omnitrace-user-api.png) ## Using Perfetto tracing with System Backend Perfetto tracing with the system backend supports multiple processes writing to the same -output file. Thus, it is a useful technique if Omnitrace is built with partial MPI support +output file. Thus, it is a useful technique if rocprof-sys is built with partial MPI support because all the perfetto output will be coalesced into a single file. The installation docs for perfetto can be found [here](https://perfetto.dev/docs/contributing/build-instructions). -If you are building omnitrace from source, you can configure CMake with `OMNITRACE_INSTALL_PERFETTO_TOOLS=ON` +If you are building rocprof-sys from source, you can configure CMake with `OMNITRACE_INSTALL_PERFETTO_TOOLS=ON` and the `perfetto` and `traced` applications will be installed as part of the build process. However, it should be noted that to prevent this option from accidentally overwriting an existing perfetto install, -all the perfetto executables installed by omnitrace are prefixed with `omnitrace-perfetto-`, except for the `perfetto` -executable, which is just renamed `omnitrace-perfetto`. +all the perfetto executables installed by omnitrace are prefixed with `rocprof-sys-perfetto-`, except for the `perfetto` +executable, which is just renamed `rocprof-sys-perfetto`. Enable `traced` and `perfetto` in the background: ```shell pkill traced traced --background -perfetto --out ./omnitrace-perfetto.proto --txt -c ${OMNITRACE_ROOT}/share/perfetto.cfg --background +perfetto --out ./rocprof-sys-perfetto.proto --txt -c ${OMNITRACE_ROOT}/share/perfetto.cfg --background ``` -> ***NOTE: if the perfetto tools were installed by omnitrace, replace `traced` with `omnitrace-perfetto-traced` and*** -> ***`perfetto` with `omnitrace-perfetto`.*** +> ***NOTE: if the perfetto tools were installed by rocprof-sys, replace `traced` with `rocprof-sys-perfetto-traced` and*** +> ***`perfetto` with `rocprof-sys-perfetto`.*** -Configure omnitrace to use the perfetto system backend via the `--perfetto-backend` option of `omnitrace-run`: +Configure rocprof-sys to use the perfetto system backend via the `--perfetto-backend` option of `rocprof-sys-run`: ```shell # enable sampling on the uninstrumented binary -omnitrace-run --sample --trace --perfetto-backend=system -- ./myapp +rocprof-sys-run --sample --trace --perfetto-backend=system -- ./myapp # trace the instrument the binary -omnitrace-instrument -o ./myapp.inst -- ./myapp -omnitrace-run --trace --perfetto-backend=system -- ./myapp.inst +rocprof-sys-instrument -o ./myapp.inst -- ./myapp +rocprof-sys-run --trace --perfetto-backend=system -- ./myapp.inst ``` -or via the `--env` option of `omnitrace-instrument` + runtime instrumentation: +or via the `--env` option of `rocprof-sys-instrument` + runtime instrumentation: ```shell -omnitrace-instrument --env OMNITRACE_PERFETTO_BACKEND=system -- ./myapp +rocprof-sys-instrument --env OMNITRACE_PERFETTO_BACKEND=system -- ./myapp ```