From 687d624a6810569465c59417277473fdfb856b93 Mon Sep 17 00:00:00 2001 From: Stanley Tsang Date: Wed, 31 May 2023 15:59:20 -0600 Subject: [PATCH] Documentation cherry-pick for 5.6 (#436) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Updated docs to use the ROCm standard (#418) * Convert Doxygen doc to Sphinx doc breathe needs to be patched in order to correctly look up the functions. The doc can still be generated but with generated errors within the doc. * Update README.md for Sphinx doc * Update README.md for Sphinx doc * Add svg images to regenerate png * Add comments to distinguish images generated from SVG * Order and organize doc * Remove the mention to CUB * Correct typo, improve style * Add examples to iterators * Explicit dependencies versions for building doc * Correct typos Signed-off-by: v01dxyz * Remove SVG image directory * Re-enable preprocessing for doxygen `__host__` and `__device__` function attributes are shown in the docs. * Re-enable Doxygen to expand all the macros instead of only the predefined ones. * Add cuda / hip attributes to `cpp_id_attributes` in sphinx. This fixes it erroring out on these attributes. * Add `rocprim::` suffix to all functions/classes name references in the rst docs. They were not working previously because the namespace macros were not expanded by doxygen. * Fix documentation of template overloaded methods with doxygen groups Use doxygen groups as a workaround for breathe failing to find methods that have complex template names, or both templated and non templated overloads. non-resolved problem: The `ropcrim::` prefix is not appended to the object names, contrary to the ones generated by the other directives. * Simplify macros for Doxygen * keep only one pre-defined variable to indicate Doxygen preprocessor * isolate macro definitions for Doxygen (only one ifndef instead of one per problematic macro) * Organize conf.py Gather the html options. * Correct typo * Clean Doxyfile Remove commented declarations * Render function parameters as a HTML list Ameliorate how the function signatures are rendered by using HTML list instead of one big single line signature. It is a hack since it mixes span elements with div elements side by side which is not good HTML. Furthermore, this translator would possibly need maintenance if Sphinx introduces breaking changes to the base Translator. * Simplify custom CSS * Remove documentation images Those two images were not particularly relevant and well polished. There were a logo and a draft schema to visualise the different GPU scopes. * Revert deletion of DOXYGEN_SHOULD_SKIP_THIS * Populate doxygen groups for intrinsics * Correct typo Delete duplicate title for a section that was displaced to another file * Rename ops directories * Rename filename/title for summary of the ops * Fixed rebase in config.hpp * Updated docs to the ROCm standard * Updated style and copyright date * Removed docs from top-level gitignore * Removed redundant requirements.txt * Updated CHANGELOG.md * Updated gitignore * Updated Changelog.md * Rebase fix * Updated docs/.gitignore * Added Acknowledgements --------- Signed-off-by: v01dxyz Co-authored-by: v01dxyz Co-authored-by: Gergely Meszaros * Add dependabot config and pin rocm-docs-core (#420) * Update dependabot config (#421) Co-authored-by: samjwu * Bump rocm-docs-core from 0.2.0 to 0.7.1 in /docs/.sphinx (#425) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.2.0 to 0.7.1. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/commits/v0.7.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update version string (#426) * Fix typo * Update documentation requirements (#429) Co-authored-by: samjwu * Bump rocm-docs-core from 0.11.0 to 0.13.1 in /docs/.sphinx (#435) Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.11.0 to 0.13.1. - [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases) - [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.11.0...v0.13.1) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump requests from 2.28.2 to 2.31.0 in /docs/.sphinx (#430) Bumps [requests](https://github.com/psf/requests) from 2.28.2 to 2.31.0. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](https://github.com/psf/requests/compare/v2.28.2...v2.31.0) --- updated-dependencies: - dependency-name: requests dependency-type: indirect ... Signed-off-by: dependabot[bot] Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: v01dxyz Signed-off-by: dependabot[bot] Co-authored-by: Lőrinc Serfőző Co-authored-by: v01dxyz Co-authored-by: Gergely Meszaros Co-authored-by: Sam Wu Co-authored-by: samjwu Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Sam Wu --- .github/dependabot.yml | 12 + .gitignore | 6 - .gitlab-ci.yml | 19 +- .readthedocs.yaml | 14 ++ CHANGELOG.md | 1 + CMakeLists.txt | 2 +- README.md | 17 +- custom.properties | 2 +- docs/{ => .doxygen}/Doxyfile | 10 +- docs/.doxygen/blockmodule.dox | 21 ++ docs/{ => .doxygen}/devicemodule.dox | 0 docs/{ => .doxygen}/glossary.dox | 0 docs/.doxygen/intrinsicsmodule.dox | 19 ++ docs/{ => .doxygen}/iteratormodule.dox | 0 docs/{ => .doxygen}/mainpage.dox | 0 docs/{ => .doxygen}/primitivesmodule.dox | 0 docs/{ => .doxygen}/utilsmodule.dox | 0 docs/{ => .doxygen}/warpmodule.dox | 0 docs/.gitignore | 9 + docs/.sphinx/_toc.yml.in | 14 ++ docs/.sphinx/requirements.in | 1 + docs/.sphinx/requirements.txt | 145 ++++++++++++ docs/Makefile | 23 -- docs/block_ops/data_mov_funcs.rst | 62 ++++++ docs/block_ops/index.rst | 6 + .../ops_classes/adjacent_difference.rst | 5 + docs/block_ops/ops_classes/discontinuity.rst | 5 + docs/block_ops/ops_classes/exchange.rst | 5 + docs/block_ops/ops_classes/histogram.rst | 12 + docs/block_ops/ops_classes/index.rst | 18 ++ docs/block_ops/ops_classes/load.rst | 13 ++ docs/block_ops/ops_classes/reduce.rst | 13 ++ docs/block_ops/ops_classes/scan.rst | 13 ++ docs/block_ops/ops_classes/shuffle.rst | 5 + docs/block_ops/ops_classes/sort.rst | 17 ++ docs/block_ops/ops_classes/store.rst | 13 ++ docs/blockmodule.dox | 11 - docs/conf.py | 207 +----------------- docs/device_ops/adjacent_difference.rst | 28 +++ docs/device_ops/binary_search.rst | 4 + docs/device_ops/config.rst | 29 +++ docs/device_ops/histogram.rst | 31 +++ docs/device_ops/index.rst | 20 ++ docs/device_ops/merge.rst | 13 ++ docs/device_ops/partition.rst | 12 + docs/device_ops/reduce.rst | 32 +++ docs/device_ops/run_length_encoding.rst | 17 ++ docs/device_ops/scan.rst | 51 +++++ docs/device_ops/select.rst | 14 ++ docs/device_ops/sort.rst | 69 ++++++ docs/device_ops/transform.rst | 13 ++ docs/device_ops/unique.rst | 13 ++ docs/glossary.rst | 34 +++ docs/index.rst | 20 +- docs/intrinsics.rst | 47 ++++ docs/intrinsicsmodule.dox | 10 - docs/intro.rst | 30 +++ docs/iterators.rst | 85 +++++++ docs/ops_summary.rst | 47 ++++ docs/requirements.txt | 3 - docs/run_doc.sh | 11 - docs/thread_ops.rst | 21 ++ docs/warp_ops/exchange.rst | 5 + docs/warp_ops/index.rst | 12 + docs/warp_ops/load.rst | 13 ++ docs/warp_ops/reduce.rst | 5 + docs/warp_ops/scan.rst | 5 + docs/warp_ops/shuffle.rst | 6 + docs/warp_ops/sort.rst | 5 + docs/warp_ops/store.rst | 13 ++ .../include/rocprim/block/block_histogram.hpp | 4 +- rocprim/include/rocprim/block/block_load.hpp | 12 +- .../include/rocprim/block/block_load_func.hpp | 5 +- rocprim/include/rocprim/block/block_store.hpp | 12 +- .../rocprim/block/block_store_func.hpp | 4 +- rocprim/include/rocprim/config.hpp | 23 +- rocprim/include/rocprim/intrinsics/thread.hpp | 6 +- 77 files changed, 1178 insertions(+), 326 deletions(-) create mode 100644 .github/dependabot.yml create mode 100644 .readthedocs.yaml rename docs/{ => .doxygen}/Doxyfile (99%) create mode 100644 docs/.doxygen/blockmodule.dox rename docs/{ => .doxygen}/devicemodule.dox (100%) rename docs/{ => .doxygen}/glossary.dox (100%) create mode 100644 docs/.doxygen/intrinsicsmodule.dox rename docs/{ => .doxygen}/iteratormodule.dox (100%) rename docs/{ => .doxygen}/mainpage.dox (100%) rename docs/{ => .doxygen}/primitivesmodule.dox (100%) rename docs/{ => .doxygen}/utilsmodule.dox (100%) rename docs/{ => .doxygen}/warpmodule.dox (100%) create mode 100644 docs/.gitignore create mode 100644 docs/.sphinx/_toc.yml.in create mode 100644 docs/.sphinx/requirements.in create mode 100644 docs/.sphinx/requirements.txt delete mode 100644 docs/Makefile create mode 100644 docs/block_ops/data_mov_funcs.rst create mode 100644 docs/block_ops/index.rst create mode 100644 docs/block_ops/ops_classes/adjacent_difference.rst create mode 100644 docs/block_ops/ops_classes/discontinuity.rst create mode 100644 docs/block_ops/ops_classes/exchange.rst create mode 100644 docs/block_ops/ops_classes/histogram.rst create mode 100644 docs/block_ops/ops_classes/index.rst create mode 100644 docs/block_ops/ops_classes/load.rst create mode 100644 docs/block_ops/ops_classes/reduce.rst create mode 100644 docs/block_ops/ops_classes/scan.rst create mode 100644 docs/block_ops/ops_classes/shuffle.rst create mode 100644 docs/block_ops/ops_classes/sort.rst create mode 100644 docs/block_ops/ops_classes/store.rst delete mode 100644 docs/blockmodule.dox create mode 100644 docs/device_ops/adjacent_difference.rst create mode 100644 docs/device_ops/binary_search.rst create mode 100644 docs/device_ops/config.rst create mode 100644 docs/device_ops/histogram.rst create mode 100644 docs/device_ops/index.rst create mode 100644 docs/device_ops/merge.rst create mode 100644 docs/device_ops/partition.rst create mode 100644 docs/device_ops/reduce.rst create mode 100644 docs/device_ops/run_length_encoding.rst create mode 100644 docs/device_ops/scan.rst create mode 100644 docs/device_ops/select.rst create mode 100644 docs/device_ops/sort.rst create mode 100644 docs/device_ops/transform.rst create mode 100644 docs/device_ops/unique.rst create mode 100644 docs/glossary.rst create mode 100644 docs/intrinsics.rst delete mode 100644 docs/intrinsicsmodule.dox create mode 100644 docs/intro.rst create mode 100644 docs/iterators.rst create mode 100644 docs/ops_summary.rst delete mode 100644 docs/requirements.txt delete mode 100755 docs/run_doc.sh create mode 100644 docs/thread_ops.rst create mode 100644 docs/warp_ops/exchange.rst create mode 100644 docs/warp_ops/index.rst create mode 100644 docs/warp_ops/load.rst create mode 100644 docs/warp_ops/reduce.rst create mode 100644 docs/warp_ops/scan.rst create mode 100644 docs/warp_ops/shuffle.rst create mode 100644 docs/warp_ops/sort.rst create mode 100644 docs/warp_ops/store.rst diff --git a/.github/dependabot.yml b/.github/dependabot.yml new file mode 100644 index 000000000..9cdf2d670 --- /dev/null +++ b/.github/dependabot.yml @@ -0,0 +1,12 @@ +# To get started with Dependabot version updates, you'll need to specify which +# package ecosystems to update and where the package manifests are located. +# Please see the documentation for all configuration options: +# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates + +version: 2 +updates: + - package-ecosystem: "pip" # See documentation for possible values + directory: "/docs/.sphinx" # Location of package manifests + open-pull-requests-limit: 10 + schedule: + interval: "daily" diff --git a/.gitignore b/.gitignore index c67d9cb61..31bb2fe95 100644 --- a/.gitignore +++ b/.gitignore @@ -1,12 +1,6 @@ ### Build dirs ### build/ -### Docs dirs ### -doc/html/ -doc/xml/ -doc/latex/ -doc/*.tag - # Created by https://www.gitignore.io/api/c++,cmake ### C++ ### diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml index 87a9c19b8..b9e48d038 100644 --- a/.gitlab-ci.yml +++ b/.gitlab-ci.yml @@ -26,6 +26,7 @@ include: file: - /defaults.yaml - /deps-cmake.yaml + - /deps-docs.yaml - /deps-rocm.yaml - /deps-vcpkg.yaml - /gpus-rocm.yaml @@ -352,21 +353,9 @@ test:deb: test:docs: stage: test - extends: .rules:test - image: sphinxdoc/sphinx-latexpdf:latest - needs: [] - before_script: - - apt-get update -qq - - apt-get install -y -qq doxygen - - pip3 install --user -r $CI_PROJECT_DIR/docs/requirements.txt - script: - - cd docs - - make - artifacts: - paths: - - $CI_PROJECT_DIR/docs/_build - - $CI_PROJECT_DIR/docs/docBin - expire_in: 1 week + extends: + - .rules:test + - .build:docs .benchmark-base: stage: benchmark diff --git a/.readthedocs.yaml b/.readthedocs.yaml new file mode 100644 index 000000000..43a0890c9 --- /dev/null +++ b/.readthedocs.yaml @@ -0,0 +1,14 @@ +# Read the Docs configuration file +# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details + +version: 2 + +sphinx: + configuration: docs/conf.py + +formats: [htmlzip] + +python: + version: "3.8" + install: + - requirements: docs/.sphinx/requirements.txt diff --git a/CHANGELOG.md b/CHANGELOG.md index feb417b7e..8e3410832 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,7 @@ Full documentation for rocPRIM is available at [https://rocprim.readthedocs.io/e ### Changed - Improved the performance of `block_radix_sort` and `device_radix_sort`. - Improved the performance of `device_merge_sort`. +- Updated `docs` directory structure to match the standard of [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core). Contributed by: [v01dXYZ](https://github.com/v01dXYZ). ### Known Issues - Disabled GPU error messages relating to incorrect warp operation usage with Navi GPUs on Windows, due to GPU printf performance issues on Windows. - When `ROCPRIM_DISABLE_LOOKBACK_SCAN` is set, `device_scan` fails for input sizes bigger than `scan_config::size_limit`, which defaults to `std::numeric_limits::max()`. diff --git a/CMakeLists.txt b/CMakeLists.txt index 17b1a71f1..bc0116168 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -110,7 +110,7 @@ if(USE_HIP_CPU) endif() # Setup VERSION -set(VERSION_STRING "2.10.9") +set(VERSION_STRING "2.13.0") rocm_setup_version(VERSION ${VERSION_STRING}) # Print configuration summary diff --git a/README.md b/README.md index 44b767b3e..41754684f 100644 --- a/README.md +++ b/README.md @@ -237,16 +237,21 @@ should be optimized for gfx803 GCN version, or to `900` for gfx900. ## Documentation The latest rocPRIM documentation and API description can be found [here](https://rocprim.readthedocs.io/en/latest/). -It can also be build using the following commands +It can also be built using the following commands: ```shell -# go to rocPRIM doc directory -cd rocPRIM; cd doc +# Go to rocPRIM docs directory +cd rocPRIM; cd docs -# run doxygen -doxygen Doxyfile +# Install Python dependencies +python3 -m pip install -r .sphinx/requirements.txt -# open html/index.html +# Build the documentation +python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html + +# For e.g. serve the HTML docs locally +cd _build/html +python3 -m http.server ``` ## hipCUB diff --git a/custom.properties b/custom.properties index 7e60d0e9d..3dfc8202d 100644 --- a/custom.properties +++ b/custom.properties @@ -1,3 +1,3 @@ booktitle=rocPRIM API Guide -spreadsheet.xml=doc/classification-map.xml +spreadsheet.xml=docs/classification-map.xml document.locale=enus \ No newline at end of file diff --git a/docs/Doxyfile b/docs/.doxygen/Doxyfile similarity index 99% rename from docs/Doxyfile rename to docs/.doxygen/Doxyfile index f66c1dfd0..ed773609c 100644 --- a/docs/Doxyfile +++ b/docs/.doxygen/Doxyfile @@ -58,7 +58,7 @@ PROJECT_LOGO = # entered, it will be relative to the location where doxygen was started. If # left blank the current directory will be used. -OUTPUT_DIRECTORY = +OUTPUT_DIRECTORY = docBin # If the CREATE_SUBDIRS tag is set to YES then doxygen will create 4096 sub- # directories (in 2 levels) under the output directory of each output format and @@ -781,7 +781,7 @@ INPUT = mainpage.dox \ iteratormodule.dox \ intrinsicsmodule.dox \ glossary.dox \ - ../rocprim/include/rocprim + ../../rocprim/include/rocprim # This tag can be used to specify the character encoding of the source files # that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses @@ -837,7 +837,7 @@ EXCLUDE_SYMLINKS = NO # Note that the wildcards are matched against the file with absolute path, so to # exclude all test directories for example use the pattern */test/* -EXCLUDE_PATTERNS = */detail/* +EXCLUDE_PATTERNS = # The EXCLUDE_SYMBOLS tag can be used to specify one or more symbol names # (namespaces, classes, functions, etc.) that should be excluded from the @@ -2007,7 +2007,7 @@ ENABLE_PREPROCESSING = YES # The default value is: NO. # This tag requires that the tag ENABLE_PREPROCESSING is set to YES. -MACRO_EXPANSION = NO +MACRO_EXPANSION = YES # If the EXPAND_ONLY_PREDEF and MACRO_EXPANSION tags are both set to YES then # the macro expansion is limited to the macros specified with the PREDEFINED and @@ -2047,7 +2047,7 @@ INCLUDE_FILE_PATTERNS = # recursively expanded use the := operator instead of the = operator. # This tag requires that the tag ENABLE_PREPROCESSING is set to YES. -PREDEFINED = "DOXYGEN_SHOULD_SKIP_THIS=1" +PREDEFINED = "DOXYGEN_SHOULD_SKIP_THIS=1" "DOXYGEN_DOCUMENTATION_BUILD" # If the MACRO_EXPANSION and EXPAND_ONLY_PREDEF tags are set to YES then this # tag can be used to specify a list of macro names that should be expanded. The diff --git a/docs/.doxygen/blockmodule.dox b/docs/.doxygen/blockmodule.dox new file mode 100644 index 000000000..59156ed17 --- /dev/null +++ b/docs/.doxygen/blockmodule.dox @@ -0,0 +1,21 @@ +/** +@brief rocPRIM Block-wide parallel primitives +@author +@file +*/ + +/** + * \defgroup blockmodule Block-wide + * \ingroup primitivesmodule + * + */ + +/** + * \defgroup blockmodule_warp_load_functions + * \ingroup blockmodule + */ + +/** + * \defgroup blockmodule_warp_store_functions + * \ingroup blockmodule + */ \ No newline at end of file diff --git a/docs/devicemodule.dox b/docs/.doxygen/devicemodule.dox similarity index 100% rename from docs/devicemodule.dox rename to docs/.doxygen/devicemodule.dox diff --git a/docs/glossary.dox b/docs/.doxygen/glossary.dox similarity index 100% rename from docs/glossary.dox rename to docs/.doxygen/glossary.dox diff --git a/docs/.doxygen/intrinsicsmodule.dox b/docs/.doxygen/intrinsicsmodule.dox new file mode 100644 index 000000000..0b2851210 --- /dev/null +++ b/docs/.doxygen/intrinsicsmodule.dox @@ -0,0 +1,19 @@ +/** +@brief rocPRIM intrinsic functions. +@author +@file +*/ + +/** + * \defgroup intrinsicsmodule Intrinsic functions + */ + +/** + * \defgroup intrinsicsmodule_flat_id + * \ingroup intrinsicsmodule + */ + +/** + * \defgroup intrinsicsmodule_warp_id + * \ingroup intrinsicsmodule + */ diff --git a/docs/iteratormodule.dox b/docs/.doxygen/iteratormodule.dox similarity index 100% rename from docs/iteratormodule.dox rename to docs/.doxygen/iteratormodule.dox diff --git a/docs/mainpage.dox b/docs/.doxygen/mainpage.dox similarity index 100% rename from docs/mainpage.dox rename to docs/.doxygen/mainpage.dox diff --git a/docs/primitivesmodule.dox b/docs/.doxygen/primitivesmodule.dox similarity index 100% rename from docs/primitivesmodule.dox rename to docs/.doxygen/primitivesmodule.dox diff --git a/docs/utilsmodule.dox b/docs/.doxygen/utilsmodule.dox similarity index 100% rename from docs/utilsmodule.dox rename to docs/.doxygen/utilsmodule.dox diff --git a/docs/warpmodule.dox b/docs/.doxygen/warpmodule.dox similarity index 100% rename from docs/warpmodule.dox rename to docs/.doxygen/warpmodule.dox diff --git a/docs/.gitignore b/docs/.gitignore new file mode 100644 index 000000000..ff4315605 --- /dev/null +++ b/docs/.gitignore @@ -0,0 +1,9 @@ +/_build/ +/_doxygen/ +/_images/ +/_static/ +/_templates/ +/.doxygen/docBin +/.doxygen/rocPRIM.tag +/.sphinx/_toc.yml +/api diff --git a/docs/.sphinx/_toc.yml.in b/docs/.sphinx/_toc.yml.in new file mode 100644 index 000000000..8b76e5609 --- /dev/null +++ b/docs/.sphinx/_toc.yml.in @@ -0,0 +1,14 @@ +# Anywhere {branch} is used, the branch name will be substituted. +# These comments will also be removed. +root: index +subtrees: + - entries: + - file: intro + - file: ops_summary + - file: device_ops/index + - file: block_ops/index + - file: warp_ops/index + - file: thread_ops + - file: iterators + - file: intrinsics + - file: glossary diff --git a/docs/.sphinx/requirements.in b/docs/.sphinx/requirements.in new file mode 100644 index 000000000..d06afe41a --- /dev/null +++ b/docs/.sphinx/requirements.in @@ -0,0 +1 @@ +rocm-docs-core==0.13.1 diff --git a/docs/.sphinx/requirements.txt b/docs/.sphinx/requirements.txt new file mode 100644 index 000000000..152e8a2e3 --- /dev/null +++ b/docs/.sphinx/requirements.txt @@ -0,0 +1,145 @@ +# +# This file is autogenerated by pip-compile with Python 3.8 +# by the following command: +# +# pip-compile requirements.in +# +accessible-pygments==0.0.3 + # via pydata-sphinx-theme +alabaster==0.7.13 + # via sphinx +babel==2.12.1 + # via + # pydata-sphinx-theme + # sphinx +beautifulsoup4==4.11.2 + # via pydata-sphinx-theme +breathe==4.34.0 + # via rocm-docs-core +certifi==2022.12.7 + # via requests +cffi==1.15.1 + # via + # cryptography + # pynacl +charset-normalizer==3.1.0 + # via requests +click==8.1.3 + # via sphinx-external-toc +cryptography==40.0.2 + # via pyjwt +deprecated==1.2.13 + # via pygithub +docutils==0.19 + # via + # breathe + # myst-parser + # pydata-sphinx-theme + # sphinx +fastjsonschema==2.16.3 + # via rocm-docs-core +gitdb==4.0.10 + # via gitpython +gitpython==3.1.31 + # via rocm-docs-core +idna==3.4 + # via requests +imagesize==1.4.1 + # via sphinx +jinja2==3.1.2 + # via + # myst-parser + # sphinx +linkify-it-py==1.0.3 + # via myst-parser +markdown-it-py==2.2.0 + # via + # mdit-py-plugins + # myst-parser +markupsafe==2.1.2 + # via jinja2 +mdit-py-plugins==0.3.5 + # via myst-parser +mdurl==0.1.2 + # via markdown-it-py +myst-parser[linkify]==1.0.0 + # via rocm-docs-core +packaging==23.0 + # via + # pydata-sphinx-theme + # sphinx +pycparser==2.21 + # via cffi +pydata-sphinx-theme==0.13.3 + # via + # rocm-docs-core + # sphinx-book-theme +pygithub==1.58.1 + # via rocm-docs-core +pygments==2.14.0 + # via + # accessible-pygments + # pydata-sphinx-theme + # sphinx +pyjwt[crypto]==2.6.0 + # via pygithub +pynacl==1.5.0 + # via pygithub +pyyaml==6.0 + # via + # myst-parser + # rocm-docs-core + # sphinx-external-toc +requests==2.31.0 + # via + # pygithub + # sphinx +rocm-docs-core==0.13.1 + # via -r requirements.in +smmap==5.0.0 + # via gitdb +snowballstemmer==2.2.0 + # via sphinx +soupsieve==2.4 + # via beautifulsoup4 +sphinx==5.3.0 + # via + # breathe + # myst-parser + # pydata-sphinx-theme + # rocm-docs-core + # sphinx-book-theme + # sphinx-copybutton + # sphinx-design + # sphinx-external-toc + # sphinx-notfound-page +sphinx-book-theme==1.0.1 + # via rocm-docs-core +sphinx-copybutton==0.5.1 + # via rocm-docs-core +sphinx-design==0.4.1 + # via rocm-docs-core +sphinx-external-toc==0.3.1 + # via rocm-docs-core +sphinx-notfound-page==0.8.3 + # via rocm-docs-core +sphinxcontrib-applehelp==1.0.4 + # via sphinx +sphinxcontrib-devhelp==1.0.2 + # via sphinx +sphinxcontrib-htmlhelp==2.0.1 + # via sphinx +sphinxcontrib-jsmath==1.0.1 + # via sphinx +sphinxcontrib-qthelp==1.0.3 + # via sphinx +sphinxcontrib-serializinghtml==1.1.5 + # via sphinx +typing-extensions==4.5.0 + # via pydata-sphinx-theme +uc-micro-py==1.0.1 + # via linkify-it-py +urllib3==1.26.15 + # via requests +wrapt==1.15.0 + # via deprecated diff --git a/docs/Makefile b/docs/Makefile deleted file mode 100644 index 774ad26a4..000000000 --- a/docs/Makefile +++ /dev/null @@ -1,23 +0,0 @@ -# Minimal makefile for Sphinx documentation -# - -# You can set these variables from the command line. -SPHINXOPTS = -SPHINXBUILD = sphinx-build -SOURCEDIR = . -BUILDDIR = _build - -# Put it first so that "make" without argument is like "make help". -help: - @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) - -.PHONY: help Makefile clean - -# Catch-all target: route all unknown targets to Sphinx using the new -# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). -%: Makefile - @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) - -clean: - rm -rf docBin/ api/ - @$(SPHINXBUILD) -M clean "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/docs/block_ops/data_mov_funcs.rst b/docs/block_ops/data_mov_funcs.rst new file mode 100644 index 000000000..a80bd0242 --- /dev/null +++ b/docs/block_ops/data_mov_funcs.rst @@ -0,0 +1,62 @@ +Data movement functions +----------------------- + +Direct Blocked +~~~~~~~~~~~~~~ + +Load +.... + +.. doxygenfunction:: rocprim::block_load_direct_blocked(unsigned int flat_id, InputIterator block_input, T (&items)[ItemsPerThread]) +.. doxygenfunction:: rocprim::block_load_direct_blocked(unsigned int flat_id, InputIterator block_input, T (&items)[ItemsPerThread], unsigned int valid) +.. doxygenfunction:: rocprim::block_load_direct_blocked (unsigned int flat_id, InputIterator block_input, T(&items)[ItemsPerThread], unsigned int valid, Default out_of_bounds) + +Store +..... + +.. doxygenfunction:: rocprim::block_store_direct_blocked (unsigned int flat_id, OutputIterator block_output, T(&items)[ItemsPerThread]) +.. doxygenfunction:: rocprim::block_store_direct_blocked (unsigned int flat_id, OutputIterator block_output, T(&items)[ItemsPerThread], unsigned int valid) + +Direct Blocked Vectorized +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Load +.... + +.. doxygenfunction:: rocprim::block_load_direct_blocked_vectorized (unsigned int flat_id, T *block_input, U(&items)[ItemsPerThread]) + +Store +..... + +.. doxygenfunction:: rocprim::block_store_direct_blocked_vectorized (unsigned int flat_id, T *block_output, U(&items)[ItemsPerThread]) + +Direct Striped +~~~~~~~~~~~~~~ + +Load +.... + +.. doxygenfunction:: rocprim::block_load_direct_striped (unsigned int flat_id, InputIterator block_input, T(&items)[ItemsPerThread]) +.. doxygenfunction:: rocprim::block_load_direct_striped (unsigned int flat_id, InputIterator block_input, T(&items)[ItemsPerThread], unsigned int valid) +.. doxygenfunction:: rocprim::block_load_direct_striped (unsigned int flat_id, InputIterator block_input, T(&items)[ItemsPerThread], unsigned int valid, Default out_of_bounds) + +Store +..... + +.. doxygenfunction:: rocprim::block_store_direct_striped (unsigned int flat_id, OutputIterator block_output, T(&items)[ItemsPerThread]) +.. doxygenfunction:: rocprim::block_store_direct_striped (unsigned int flat_id, OutputIterator block_output, T(&items)[ItemsPerThread], unsigned int valid) + +Direct Warp Striped +~~~~~~~~~~~~~~~~~~~ + +Load +.... + +.. doxygengroup:: blockmodule_warp_load_functions + :content-only: + +Store +..... + +.. doxygengroup:: blockmodule_warp_store_functions + :content-only: diff --git a/docs/block_ops/index.rst b/docs/block_ops/index.rst new file mode 100644 index 000000000..f10888cac --- /dev/null +++ b/docs/block_ops/index.rst @@ -0,0 +1,6 @@ +Block-Wide Operations +===================== + +.. toctree:: + ops_classes/index + data_mov_funcs diff --git a/docs/block_ops/ops_classes/adjacent_difference.rst b/docs/block_ops/ops_classes/adjacent_difference.rst new file mode 100644 index 000000000..c16609e8b --- /dev/null +++ b/docs/block_ops/ops_classes/adjacent_difference.rst @@ -0,0 +1,5 @@ +Adjacent difference +~~~~~~~~~~~~~~~~~~~ + +.. doxygenclass:: rocprim::block_adjacent_difference + :members: diff --git a/docs/block_ops/ops_classes/discontinuity.rst b/docs/block_ops/ops_classes/discontinuity.rst new file mode 100644 index 000000000..f5f444636 --- /dev/null +++ b/docs/block_ops/ops_classes/discontinuity.rst @@ -0,0 +1,5 @@ +Discontinuity +~~~~~~~~~~~~~ + +.. doxygenclass:: rocprim::block_discontinuity + :members: diff --git a/docs/block_ops/ops_classes/exchange.rst b/docs/block_ops/ops_classes/exchange.rst new file mode 100644 index 000000000..cbbb4e673 --- /dev/null +++ b/docs/block_ops/ops_classes/exchange.rst @@ -0,0 +1,5 @@ +Exchange +~~~~~~~~ + +.. doxygenclass:: rocprim::block_exchange + :members: diff --git a/docs/block_ops/ops_classes/histogram.rst b/docs/block_ops/ops_classes/histogram.rst new file mode 100644 index 000000000..68bf970d6 --- /dev/null +++ b/docs/block_ops/ops_classes/histogram.rst @@ -0,0 +1,12 @@ +Histogram +~~~~~~~~~ +Class +..... + +.. doxygenclass:: rocprim::block_histogram + :members: + +Algorithms +.......... + +.. doxygenenum:: rocprim::block_histogram_algorithm diff --git a/docs/block_ops/ops_classes/index.rst b/docs/block_ops/ops_classes/index.rst new file mode 100644 index 000000000..cf651d630 --- /dev/null +++ b/docs/block_ops/ops_classes/index.rst @@ -0,0 +1,18 @@ +Operation classes +----------------- + +.. toctree:: + load + store + + adjacent_difference + discontinuity + + scan + reduce + + shuffle + exchange + sort + + histogram diff --git a/docs/block_ops/ops_classes/load.rst b/docs/block_ops/ops_classes/load.rst new file mode 100644 index 000000000..1b4de9269 --- /dev/null +++ b/docs/block_ops/ops_classes/load.rst @@ -0,0 +1,13 @@ +Load +~~~~ + +Class +..... + +.. doxygenclass:: rocprim::block_load + :members: + +Algorithms +.......... + +.. doxygenenum:: rocprim::block_load_method diff --git a/docs/block_ops/ops_classes/reduce.rst b/docs/block_ops/ops_classes/reduce.rst new file mode 100644 index 000000000..f2e5a8119 --- /dev/null +++ b/docs/block_ops/ops_classes/reduce.rst @@ -0,0 +1,13 @@ +Reduce +~~~~~~ + +Class +..... + +.. doxygenclass:: rocprim::block_reduce + :members: + +Algorithms +.......... + +.. doxygenenum:: rocprim::block_reduce_algorithm diff --git a/docs/block_ops/ops_classes/scan.rst b/docs/block_ops/ops_classes/scan.rst new file mode 100644 index 000000000..1b66f45a6 --- /dev/null +++ b/docs/block_ops/ops_classes/scan.rst @@ -0,0 +1,13 @@ +Scan +~~~~ + +Class +..... + +.. doxygenclass:: rocprim::block_scan + :members: + +Algorithms +.......... + +.. doxygenenum:: rocprim::block_scan_algorithm diff --git a/docs/block_ops/ops_classes/shuffle.rst b/docs/block_ops/ops_classes/shuffle.rst new file mode 100644 index 000000000..9335eaf95 --- /dev/null +++ b/docs/block_ops/ops_classes/shuffle.rst @@ -0,0 +1,5 @@ +Shuffle +~~~~~~~ + +.. doxygenclass:: rocprim::block_shuffle + :members: diff --git a/docs/block_ops/ops_classes/sort.rst b/docs/block_ops/ops_classes/sort.rst new file mode 100644 index 000000000..f5cce2e14 --- /dev/null +++ b/docs/block_ops/ops_classes/sort.rst @@ -0,0 +1,17 @@ +Sort +~~~~ + +generic +....... + + +.. doxygenclass:: rocprim::block_sort + :members: + +.. doxygenenum:: rocprim::block_sort_algorithm + +radix sort +.......... + +.. doxygenclass:: rocprim::block_radix_sort + :members: diff --git a/docs/block_ops/ops_classes/store.rst b/docs/block_ops/ops_classes/store.rst new file mode 100644 index 000000000..41eaf7bb2 --- /dev/null +++ b/docs/block_ops/ops_classes/store.rst @@ -0,0 +1,13 @@ +Store +~~~~~ + +Class +..... + +.. doxygenclass:: rocprim::block_store + :members: + +Algorithms +.......... + +.. doxygenenum:: rocprim::block_store_method diff --git a/docs/blockmodule.dox b/docs/blockmodule.dox deleted file mode 100644 index 40cd16ab3..000000000 --- a/docs/blockmodule.dox +++ /dev/null @@ -1,11 +0,0 @@ -/** -@brief rocPRIM Block-wide parallel primitives -@author -@file -*/ - -/** - * \defgroup blockmodule Block-wide - * \ingroup primitivesmodule - * - */ \ No newline at end of file diff --git a/docs/conf.py b/docs/conf.py index 898d65233..63415f374 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -1,204 +1,17 @@ -# -*- coding: utf-8 -*- +# Configuration file for the Sphinx documentation builder. # -# rocPRIM documentation build configuration file, created by -# sphinx-quickstart on Mon Jan 8 16:34:42 2018. -# -# This file is execfile()d with the current directory set to its -# containing dir. -# -# Note that not all possible configuration values are present in this -# autogenerated file. -# -# All configuration values have a default; values that are commented out -# serve to show the default. - -# If extensions (or modules to document with autodoc) are in another directory, -# add these directories to sys.path here. If the directory is relative to the -# documentation root, use os.path.abspath to make it absolute, like shown here. -# -# import os -# import sys -# sys.path.insert(0, os.path.abspath('.')) - -import os -import sys -import subprocess - -read_the_docs_build = os.environ.get('READTHEDOCS', None) == 'True' - -# -- General configuration ------------------------------------------------ - -# If your documentation needs a minimal Sphinx version, state it here. -# -# needs_sphinx = '1.0' - -# Add any Sphinx extension module names here, as strings. They can be -# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom -# ones. -extensions = ['sphinx.ext.mathjax', 'breathe', 'exhale'] -breathe_projects = { "rocPRIM": "./xml" } -breathe_default_project = "rocPRIM" - -# Add any paths that contain templates here, relative to this directory. -templates_path = ['_templates'] - -# The suffix(es) of source filenames. -# You can specify multiple suffix as a list of string: -# -# source_suffix = ['.rst', '.md'] -source_suffix = '.rst' - -# The master toctree document. -master_doc = 'index' - -# General information about the project. -project = u'rocPRIM' -copyright = u'Copyright 2017-2022 Advanced Micro Devices' -author = u'Advanced Micro Devices' - -# The version info for the project you're documenting, acts as replacement for -# |version| and |release|, also used in various other places throughout the -# built documents. -# -# The short X.Y version. -version = u'2.13.0' -# The full version, including alpha/beta/rc tags. -release = u'2.13.0' - -# The language for content autogenerated by Sphinx. Refer to documentation -# for a list of supported languages. -# -# This is also used if you do content translation via gettext catalogs. -# Usually you set "language" from the command line for these cases. -language = None - -# List of patterns, relative to source directory, that match files and -# directories to ignore when looking for source files. -# This patterns also effect to html_static_path and html_extra_path -exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] - -# The name of the Pygments (syntax highlighting) style to use. -pygments_style = 'sphinx' - -# If true, `todo` and `todoList` produce output, else they produce nothing. -todo_include_todos = False - - -# -- Options for HTML output ---------------------------------------------- +# This file only contains a selection of the most common options. For a full +# list see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html -# The theme to use for HTML and HTML Help pages. See the documentation for -# a list of builtin themes. -# -# html_theme = 'alabaster' - -if read_the_docs_build: - html_theme = 'default' -else: - import sphinx_rtd_theme - html_theme = "sphinx_rtd_theme" - html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] - -# Theme options are theme-specific and customize the look and feel of a theme -# further. For a list of options available for each theme, see the -# documentation. -# -# html_theme_options = {} - -# Add any paths that contain custom static files (such as style sheets) here, -# relative to this directory. They are copied after the builtin static files, -# so a file named "default.css" will overwrite the builtin "default.css". -# html_static_path = ['_static'] - -# Custom sidebar templates, must be a dictionary that maps document names -# to template names. -# -# This is required for the alabaster theme -# refs: http://alabaster.readthedocs.io/en/latest/installation.html#sidebars -# html_sidebars = { -# '**': [ -# 'relations.html', # needs 'show_related': True theme option to display -# 'searchbox.html', -# ] -# } - - -# -- Options for HTMLHelp output ------------------------------------------ - -# Output file base name for HTML help builder. -htmlhelp_basename = 'rocPRIMdoc' - - -# -- Options for LaTeX output --------------------------------------------- - -latex_elements = { - # The paper size ('letterpaper' or 'a4paper'). - # - # 'papersize': 'letterpaper', +from rocm_docs import ROCmDocs - # The font size ('10pt', '11pt' or '12pt'). - # - # 'pointsize': '10pt', +docs_core = ROCmDocs("rocPRIM Documentation") +docs_core.run_doxygen() +docs_core.setup() - # Additional stuff for the LaTeX preamble. - # - # 'preamble': '', +for sphinx_var in ROCmDocs.SPHINX_VARS: + globals()[sphinx_var] = getattr(docs_core, sphinx_var) - # Latex figure (float) alignment - # - # 'figure_align': 'htbp', -} - -# Grouping the document tree into LaTeX files. List of tuples -# (source start file, target name, title, -# author, documentclass [howto, manual, or own class]). -latex_documents = [ - (master_doc, 'rocPRIM.tex', u'rocPRIM Documentation', - u'Advanced Micro Devices', 'manual'), -] - - -# -- Options for manual page output --------------------------------------- - -# One entry per manual page. List of tuples -# (source start file, name, description, authors, manual section). -man_pages = [ - (master_doc, 'rocPRIM', u'rocPRIM Documentation', - [author], 1) -] - - -# -- Options for Texinfo output ------------------------------------------- - -# Grouping the document tree into Texinfo files. List of tuples -# (source start file, target name, title, author, -# dir menu entry, description, category) -texinfo_documents = [ - (master_doc, 'rocPRIM', u'rocPRIM Documentation', - author, 'rocPRIM', 'Reusable software components for rocm developers.', - 'Miscellaneous'), -] - - - -# Setup the exhale extension -exhale_args = { - # These arguments are required - "containmentFolder": "./api", - "rootFileName": "library_root.rst", - "rootFileTitle": "Library API", - "doxygenStripFromPath": "..", - # Suggested optional arguments - "createTreeView": True, - # TIP: if using the sphinx-bootstrap-theme, you need - # "treeViewIsBootstrap": True, - "exhaleExecutesDoxygen": True, - "exhaleUseDoxyfile": True -} - -# Tell sphinx what the primary language being documented is. -primary_domain = 'cpp' cpp_id_attributes = ["__global__", "__device__", "__host__", "__forceinline__", "static"] cpp_paren_attributes = ["__declspec"] - -# Tell sphinx what the pygments highlight language should be. -highlight_language = 'cpp' diff --git a/docs/device_ops/adjacent_difference.rst b/docs/device_ops/adjacent_difference.rst new file mode 100644 index 000000000..90213cbee --- /dev/null +++ b/docs/device_ops/adjacent_difference.rst @@ -0,0 +1,28 @@ +Adjacent difference +------------------- + +Configuring the kernel +~~~~~~~~~~~~~~~~~~~~~~ + +.. doxygenstruct:: rocprim::adjacent_difference_config + +left +~~~~ + +.. doxygenfunction:: rocprim::adjacent_difference(void *const temporary_storage, std::size_t &storage_size, const InputIt input, const OutputIt output, const std::size_t size, const BinaryFunction op=BinaryFunction {}, const hipStream_t stream=0, const bool debug_synchronous=false) + +left, inplace +~~~~~~~~~~~~~ + +.. doxygenfunction:: rocprim::adjacent_difference_inplace(void *const temporary_storage, std::size_t &storage_size, const InputIt values, const std::size_t size, const BinaryFunction op=BinaryFunction {}, const hipStream_t stream=0, const bool debug_synchronous=false) + +right +~~~~~ + +.. doxygenfunction:: rocprim::adjacent_difference_right(void *const temporary_storage, std::size_t &storage_size, const InputIt input, const OutputIt output, const std::size_t size, const BinaryFunction op=BinaryFunction {}, const hipStream_t stream=0, const bool debug_synchronous=false) + +right, inplace +~~~~~~~~~~~~~~ + +.. doxygenfunction:: rocprim::adjacent_difference_right_inplace(void *const temporary_storage, std::size_t &storage_size, const InputIt values, const std::size_t size, const BinaryFunction op=BinaryFunction {}, const hipStream_t stream=0, const bool debug_synchronous=false) + diff --git a/docs/device_ops/binary_search.rst b/docs/device_ops/binary_search.rst new file mode 100644 index 000000000..8e33ba3b0 --- /dev/null +++ b/docs/device_ops/binary_search.rst @@ -0,0 +1,4 @@ +Binary Search +------------- + +.. doxygenfunction:: rocprim::binary_search(void *temporary_storage, size_t &storage_size, HaystackIterator haystack, NeedlesIterator needles, OutputIterator output, size_t haystack_size, size_t needles_size, CompareFunction compare_op=CompareFunction(), hipStream_t stream=0, bool debug_synchronous=false) diff --git a/docs/device_ops/config.rst b/docs/device_ops/config.rst new file mode 100644 index 000000000..ef567f178 --- /dev/null +++ b/docs/device_ops/config.rst @@ -0,0 +1,29 @@ +Configuring the Kernels +======================= + +A kernel config is a way to select the grid/block dimensions, but also +how the data will be fetched and stored (the algorithms used for +``load``/``store`` ) for the operations using them (such as ``select``). + +.. doxygenstruct:: rocprim::kernel_config + +Setting the configuration is important to better tune the kernel to a given GPU model. +``rocPRIM`` uses a placeholder type to let the macros select the default configuration for +the GPU model + +.. doxygenstruct:: rocprim::default_config + +.. warning:: + + To provide information about the GPU you're targeting, you have to + set ``ROCPRIM_TARGET_ARCH``. + + If the target is not supported by ``rocPRIM``, the templates will + use the configuration for the model ``900``. + + If ``ROCPRIM_TARGET_TARGET`` is not defined, it defaults to ``0``, + which is not supported by ``rocPRIM`` and thus the configurations + will be for the model ``900``. + + + diff --git a/docs/device_ops/histogram.rst b/docs/device_ops/histogram.rst new file mode 100644 index 000000000..10d03d217 --- /dev/null +++ b/docs/device_ops/histogram.rst @@ -0,0 +1,31 @@ +Histogram +--------- + +Configuring the kernel +~~~~~~~~~~~~~~~~~~~~~~ + +.. doxygenstruct:: rocprim::histogram_config + +histogram_even +~~~~~~~~~~~~~~ + +.. doxygenfunction:: rocprim::histogram_even(void *temporary_storage, size_t &storage_size, SampleIterator samples, unsigned int size, Counter *histogram, unsigned int levels, Level lower_level, Level upper_level, hipStream_t stream=0, bool debug_synchronous=false) +.. doxygenfunction:: rocprim::histogram_even(void *temporary_storage, size_t &storage_size, SampleIterator samples, unsigned int columns, unsigned int rows, size_t row_stride_bytes, Counter *histogram, unsigned int levels, Level lower_level, Level upper_level, hipStream_t stream=0, bool debug_synchronous=false) + +multi_histogram_even +~~~~~~~~~~~~~~~~~~~~ + +.. doxygenfunction:: rocprim::multi_histogram_even(void *temporary_storage, size_t &storage_size, SampleIterator samples, unsigned int size, Counter *histogram[ActiveChannels], unsigned int levels[ActiveChannels], Level lower_level[ActiveChannels], Level upper_level[ActiveChannels], hipStream_t stream=0, bool debug_synchronous=false) +.. doxygenfunction:: rocprim::multi_histogram_even(void *temporary_storage, size_t &storage_size, SampleIterator samples, unsigned int columns, unsigned int rows, size_t row_stride_bytes, Counter *histogram[ActiveChannels], unsigned int levels[ActiveChannels], Level lower_level[ActiveChannels], Level upper_level[ActiveChannels], hipStream_t stream=0, bool debug_synchronous=false) + +histogram_range +~~~~~~~~~~~~~~~ + +.. doxygenfunction:: rocprim::histogram_range(void *temporary_storage, size_t &storage_size, SampleIterator samples, unsigned int size, Counter *histogram, unsigned int levels, Level *level_values, hipStream_t stream=0, bool debug_synchronous=false) +.. doxygenfunction:: rocprim::histogram_range(void *temporary_storage, size_t &storage_size, SampleIterator samples, unsigned int columns, unsigned int rows, size_t row_stride_bytes, Counter *histogram, unsigned int levels, Level *level_values, hipStream_t stream=0, bool debug_synchronous=false) + +multi_histogram_range +~~~~~~~~~~~~~~~~~~~~~ + +.. doxygenfunction:: rocprim::multi_histogram_range(void *temporary_storage, size_t &storage_size, SampleIterator samples, unsigned int size, Counter *histogram[ActiveChannels], unsigned int levels[ActiveChannels], Level *level_values[ActiveChannels], hipStream_t stream=0, bool debug_synchronous=false) +.. doxygenfunction:: rocprim::multi_histogram_range(void *temporary_storage, size_t &storage_size, SampleIterator samples, unsigned int columns, unsigned int rows, size_t row_stride_bytes, Counter *histogram[ActiveChannels], unsigned int levels[ActiveChannels], Level *level_values[ActiveChannels], hipStream_t stream=0, bool debug_synchronous=false) diff --git a/docs/device_ops/index.rst b/docs/device_ops/index.rst new file mode 100644 index 000000000..85ea3563e --- /dev/null +++ b/docs/device_ops/index.rst @@ -0,0 +1,20 @@ +Device-Wide Operations +====================== + +.. toctree:: + :maxdepth: 6 + + config + + transform + unique + sort + merge + partition + run_length_encoding + scan + select + reduce + adjacent_difference + binary_search + histogram diff --git a/docs/device_ops/merge.rst b/docs/device_ops/merge.rst new file mode 100644 index 000000000..34642e4a8 --- /dev/null +++ b/docs/device_ops/merge.rst @@ -0,0 +1,13 @@ +Merge +----- + +Configuring the kernel +~~~~~~~~~~~~~~~~~~~~~~ + +.. doxygentypedef:: rocprim::merge_config + +merge +~~~~~ + +.. doxygenfunction:: rocprim::merge (void *temporary_storage, size_t &storage_size, InputIterator1 input1, InputIterator2 input2, OutputIterator output, const size_t input1_size, const size_t input2_size, BinaryFunction compare_function=BinaryFunction(), const hipStream_t stream=0, bool debug_synchronous=false) +.. doxygenfunction:: rocprim::merge (void *temporary_storage, size_t &storage_size, KeysInputIterator1 keys_input1, KeysInputIterator2 keys_input2, KeysOutputIterator keys_output, ValuesInputIterator1 values_input1, ValuesInputIterator2 values_input2, ValuesOutputIterator values_output, const size_t input1_size, const size_t input2_size, BinaryFunction compare_function=BinaryFunction(), const hipStream_t stream=0, bool debug_synchronous=false) diff --git a/docs/device_ops/partition.rst b/docs/device_ops/partition.rst new file mode 100644 index 000000000..78a234bc2 --- /dev/null +++ b/docs/device_ops/partition.rst @@ -0,0 +1,12 @@ +Partition +--------- + +partition +~~~~~~~~~ + +.. doxygenfunction:: rocprim::partition(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, SelectedCountOutputIterator selected_count_output, const size_t size, UnaryPredicate predicate, const hipStream_t stream=0, const bool debug_synchronous=false) + +partition_three_way +~~~~~~~~~~~~~~~~~~~ + +.. doxygenfunction:: rocprim::partition_three_way(void *temporary_storage, size_t &storage_size, InputIterator input, FirstOutputIterator output_first_part, SecondOutputIterator output_second_part, UnselectedOutputIterator output_unselected, SelectedCountOutputIterator selected_count_output, const size_t size, FirstUnaryPredicate select_first_part_op, SecondUnaryPredicate select_second_part_op, const hipStream_t stream = 0, const bool debug_synchronous = false) diff --git a/docs/device_ops/reduce.rst b/docs/device_ops/reduce.rst new file mode 100644 index 000000000..d819dcdcb --- /dev/null +++ b/docs/device_ops/reduce.rst @@ -0,0 +1,32 @@ +Reduce +------ + +Configuring the kernel +~~~~~~~~~~~~~~~~~~~~~~ + +reduce +...... + +.. doxygenstruct:: rocprim::reduce_config + +reduce_by_key +............. + +.. doxygenstruct:: rocprim::reduce_by_key_config + +reduce +~~~~~~ + +.. doxygenfunction:: rocprim::reduce(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, const InitValueType initial_value, const size_t size, BinaryFunction reduce_op=BinaryFunction(), const hipStream_t stream=0, bool debug_synchronous=false) + +.. doxygenfunction:: rocprim::reduce(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, const size_t size, BinaryFunction reduce_op=BinaryFunction(), const hipStream_t stream=0, bool debug_synchronous=false) + +segmented_reduce +~~~~~~~~~~~~~~~~ + +.. doxygenfunction:: rocprim::segmented_reduce(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, BinaryFunction reduce_op=BinaryFunction(), InitValueType initial_value=InitValueType(), hipStream_t stream=0, bool debug_synchronous=false) + +reduce_by_key +~~~~~~~~~~~~~ + +.. doxygenfunction:: rocprim::reduce_by_key(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, ValuesInputIterator values_input, unsigned int size, UniqueOutputIterator unique_output, AggregatesOutputIterator aggregates_output, UniqueCountOutputIterator unique_count_output, BinaryFunction reduce_op=BinaryFunction(), KeyCompareFunction key_compare_op=KeyCompareFunction(), hipStream_t stream=0, bool debug_synchronous=false) diff --git a/docs/device_ops/run_length_encoding.rst b/docs/device_ops/run_length_encoding.rst new file mode 100644 index 000000000..9fa0a0d6c --- /dev/null +++ b/docs/device_ops/run_length_encoding.rst @@ -0,0 +1,17 @@ +Run Length Encode +----------------- + +Configuring the kernel +~~~~~~~~~~~~~~~~~~~~~~ + +.. doxygenstruct:: rocprim::run_length_encode_config + +run_length_encode +~~~~~~~~~~~~~~~~~ + +.. doxygenfunction:: rocprim::run_length_encode(void *temporary_storage, size_t &storage_size, InputIterator input, unsigned int size, UniqueOutputIterator unique_output, CountsOutputIterator counts_output, RunsCountOutputIterator runs_count_output, hipStream_t stream=0, bool debug_synchronous=false) + +run_length_encode_non_trivial_runs +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. doxygenfunction:: rocprim::run_length_encode_non_trivial_runs(void *temporary_storage, size_t &storage_size, InputIterator input, unsigned int size, OffsetsOutputIterator offsets_output, CountsOutputIterator counts_output, RunsCountOutputIterator runs_count_output, hipStream_t stream=0, bool debug_synchronous=false) diff --git a/docs/device_ops/scan.rst b/docs/device_ops/scan.rst new file mode 100644 index 000000000..948dfdc67 --- /dev/null +++ b/docs/device_ops/scan.rst @@ -0,0 +1,51 @@ +Scan +---- + +Configuring the kernel +~~~~~~~~~~~~~~~~~~~~~~ + +scan +.... + +.. doxygenstruct:: rocprim::scan_config + +scan_by_key +........... + +.. doxygenstruct:: rocprim::scan_by_key_config + +scan +~~~~ + +inclusive +......... + +.. doxygenfunction:: rocprim::inclusive_scan(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, const size_t size, BinaryFunction scan_op=BinaryFunction(), const hipStream_t stream=0, bool debug_synchronous=false) + +exclusive +......... + +.. doxygenfunction:: rocprim::exclusive_scan(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, const InitValueType initial_value, const size_t size, BinaryFunction scan_op=BinaryFunction(), const hipStream_t stream=0, bool debug_synchronous=false) + +segmented, inclusive +.................... + +.. doxygenfunction:: rocprim::segmented_inclusive_scan(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, BinaryFunction scan_op=BinaryFunction(), hipStream_t stream=0, bool debug_synchronous=false) + +segmented, exclusive +.................... + +.. doxygenfunction:: rocprim::segmented_exclusive_scan(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, const InitValueType initial_value, BinaryFunction scan_op=BinaryFunction(), hipStream_t stream=0, bool debug_synchronous=false)x + +scan_by_key +~~~~~~~~~~~ + +inclusive +......... + +.. doxygenfunction:: rocprim::inclusive_scan_by_key(void *const temporary_storage, size_t &storage_size, const KeysInputIterator keys_input, const ValuesInputIterator values_input, const ValuesOutputIterator values_output, const size_t size, const BinaryFunction scan_op=BinaryFunction(), const KeyCompareFunction key_compare_op=KeyCompareFunction(), const hipStream_t stream=0, const bool debug_synchronous=false) + +exclusive +......... + +.. doxygenfunction:: rocprim::exclusive_scan_by_key(void *const temporary_storage, size_t &storage_size, const KeysInputIterator keys_input, const ValuesInputIterator values_input, const ValuesOutputIterator values_output, const InitialValueType initial_value, const size_t size, const BinaryFunction scan_op=BinaryFunction(), const KeyCompareFunction key_compare_op=KeyCompareFunction(), const hipStream_t stream=0, const bool debug_synchronous=false) diff --git a/docs/device_ops/select.rst b/docs/device_ops/select.rst new file mode 100644 index 000000000..c5ecd58d2 --- /dev/null +++ b/docs/device_ops/select.rst @@ -0,0 +1,14 @@ +Select +------ + +Configuring the kernel +~~~~~~~~~~~~~~~~~~~~~~ + +.. doxygenstruct:: rocprim::select_config + +select +~~~~~~ + +.. doxygenfunction:: rocprim::select(void *temporary_storage, size_t &storage_size, InputIterator input, FlagIterator flags, OutputIterator output, SelectedCountOutputIterator selected_count_output, const size_t size, const hipStream_t stream=0, const bool debug_synchronous=false) +.. doxygenfunction:: rocprim::select(void *temporary_storage, size_t &storage_size, InputIterator input, OutputIterator output, SelectedCountOutputIterator selected_count_output, const size_t size, UnaryPredicate predicate, const hipStream_t stream=0, const bool debug_synchronous=false) + diff --git a/docs/device_ops/sort.rst b/docs/device_ops/sort.rst new file mode 100644 index 000000000..dcaf2778d --- /dev/null +++ b/docs/device_ops/sort.rst @@ -0,0 +1,69 @@ +Sort +---- + +Configuring the kernel +~~~~~~~~~~~~~~~~~~~~~~ + +merge_sort +.......... + +.. doxygentypedef:: rocprim::merge_sort_config + +radix_sort +.......... + +.. doxygenstruct:: rocprim::radix_sort_config + +merge_sort +~~~~~~~~~~ + +.. doxygenfunction:: rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, const size_t size, BinaryFunction compare_function=BinaryFunction(), const hipStream_t stream=0, bool debug_synchronous=false) +.. doxygenfunction:: rocprim::merge_sort(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, const size_t size, BinaryFunction compare_function=BinaryFunction(), const hipStream_t stream=0, bool debug_synchronous=false) + + +radix_sort_keys +~~~~~~~~~~~~~~~ + +ascending +......... + +.. doxygenfunction:: rocprim::radix_sort_keys(void *temporary_storage, size_t &storage_size, double_buffer< Key > &keys, Size size, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key), hipStream_t stream=0, bool debug_synchronous=false) + +descending +.......... + +.. doxygenfunction:: rocprim::radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, double_buffer< Key > &keys, Size size, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key), hipStream_t stream=0, bool debug_synchronous=false) + +segmented, ascending +.................... + +.. doxygenfunction:: rocprim::segmented_radix_sort_keys(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key), hipStream_t stream=0, bool debug_synchronous=false) + +segmented, descending +..................... + +.. doxygenfunction:: rocprim::segmented_radix_sort_keys_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key), hipStream_t stream=0, bool debug_synchronous=false) + +radix_sort_pairs +~~~~~~~~~~~~~~~~ + +ascending +......... + +.. doxygenfunction:: rocprim::radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key), hipStream_t stream=0, bool debug_synchronous=false) + +descending +.......... + +.. doxygenfunction:: rocprim::radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, Size size, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key), hipStream_t stream=0, bool debug_synchronous=false) + +segmented, ascending +.................... + +.. doxygenfunction:: rocprim::segmented_radix_sort_pairs(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key), hipStream_t stream=0, bool debug_synchronous=false) + +segmented, ascending +.................... + +.. doxygenfunction:: rocprim::segmented_radix_sort_pairs_desc(void *temporary_storage, size_t &storage_size, KeysInputIterator keys_input, KeysOutputIterator keys_output, ValuesInputIterator values_input, ValuesOutputIterator values_output, unsigned int size, unsigned int segments, OffsetIterator begin_offsets, OffsetIterator end_offsets, unsigned int begin_bit=0, unsigned int end_bit=8 *sizeof(Key), hipStream_t stream=0, bool debug_synchronous=false) + diff --git a/docs/device_ops/transform.rst b/docs/device_ops/transform.rst new file mode 100644 index 000000000..a401140bf --- /dev/null +++ b/docs/device_ops/transform.rst @@ -0,0 +1,13 @@ +Transform +--------- + +Configuring the kernel +~~~~~~~~~~~~~~~~~~~~~~ + +.. doxygentypedef:: rocprim::transform_config + +transform +~~~~~~~~~ + +.. doxygenfunction:: rocprim::transform(InputIterator, OutputIterator, const size_t, UnaryFunction, const hipStream_t stream, bool) +.. doxygenfunction:: rocprim::transform(InputIterator1, InputIterator2, OutputIterator, const size_t, BinaryFunction, const hipStream_t, bool) diff --git a/docs/device_ops/unique.rst b/docs/device_ops/unique.rst new file mode 100644 index 000000000..b4d686dc0 --- /dev/null +++ b/docs/device_ops/unique.rst @@ -0,0 +1,13 @@ +Unique +------ + +unique +~~~~~~ + +.. doxygenfunction:: rocprim::unique(void *, size_t &, InputIterator, OutputIterator, UniqueCountOutputIterator, const size_t, EqualityOp, const hipStream_t, const bool) + +unique_by_key +~~~~~~~~~~~~~ + +.. doxygenfunction:: rocprim::unique_by_key(void *, size_t &, const KeyIterator, const ValueIterator, const OutputKeyIterator, const OutputValueIterator, const UniqueCountOutputIterator, const size_t, const EqualityOp, const hipStream_t, const bool) + diff --git a/docs/glossary.rst b/docs/glossary.rst new file mode 100644 index 000000000..6b99f830d --- /dev/null +++ b/docs/glossary.rst @@ -0,0 +1,34 @@ +Glossary +======== + +This glossary is to help users understand the basic concepts or terminologies used in the rocPRIM library. + +Terminologies + +.. glossary:: + Warp + Refers to a group of threads that execute in SIMT (Single Instruction, Multiple Thread) fashion. Also known as wavefronts on AMD GPUs. + + Hardware Warp Size + Refers to the number of threads in a warp defined by the hardware. On Nvidia GPUs, a warp size is 32 while on AMD GPUs, a warp size is 64. + + Logical Warp Size + Refers to the number of threads in a warp defined by the user, which can be equal to or less than the size of the hardware warp size. + + Lane ID + Refers to the thread identifier within the warp. A logical lane ID refers to the thread identifier in a "logical + warp", which can be smaller than a hardware warp size (And can be defined as ``lane_id() % WarpSize``). + + Warp ID + Refers to the identifier of the hardware/logical warp in a block. Warp ID is guaranteed to be unique among warps. + + Block + Refers to a group of threads that are executed on the same compute unit (streaming multiprocessor). These threads can \n + be indexed using 1 Dimension {X}, 2 Dimensions {X, Y} or 3 Dimensions {X, Y, Z}. A block consists of multiple warps. + + Tile + Refers to a block, but in the C++AMP/HCC nomenclature. + + Flat ID + Refers to a flattened identifier of a block (tile) or a thread identifier. Flat ID is a 1D value created from 2D or 3D \n + identifier. Example: flat id of thread id (X, Y) in 2D thread block 128x4 (XxY) is Y * 128 + X. diff --git a/docs/index.rst b/docs/index.rst index b6684edb8..694229ca8 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,16 +1,12 @@ -.. rocPRIMs documentation master file ++++++++++++++++++++++++ + rocPRIM Documentation ++++++++++++++++++++++++ -Welcome to rocPRIM's documentation! -================================== +``rocPRIM`` is a header-only library providing HIP parallel primitives to ease the maintainability of performant and yet portable GPU-accelerated code on AMD ROCm platform. -.. toctree:: - :maxdepth: 3 - :caption: Contents: +Acknowledgements +================ - api/library_root - -Indices and tables -================== +The following contributors helped to make this documentation better: -* :ref:`genindex` -* :ref:`search` +* `v01dXYZ `_ has proposed a new structure for the documentation. diff --git a/docs/intrinsics.rst b/docs/intrinsics.rst new file mode 100644 index 000000000..dc5ae191c --- /dev/null +++ b/docs/intrinsics.rst @@ -0,0 +1,47 @@ +Intrinsics +========== + +Bitwise +------- + +.. doxygenfunction:: rocprim::get_bit(int x, int i) +.. doxygenfunction:: rocprim::bit_count(unsigned int x) +.. doxygenfunction:: rocprim::bit_count(unsigned long long x) + +Warp size +--------- + +.. doxygenfunction:: rocprim::warp_size() +.. doxygenfunction:: rocprim::host_warp_size() +.. doxygenfunction:: rocprim::device_warp_size() + +Lane and Warp ID +---------------- + +.. doxygengroup:: intrinsicsmodule_warp_id + :content-only: + +Flat ID +------- + +.. doxygengroup:: intrinsicsmodule_flat_id + :content-only: + +Flat Size +--------- + +.. doxygenfunction:: rocprim::flat_block_size() +.. doxygenfunction:: rocprim::flat_tile_size() + +Synchronization +--------------- + +.. doxygenfunction:: rocprim::syncthreads() +.. doxygenfunction:: rocprim::wave_barrier() + +Active threads +-------------- + + +.. doxygenfunction:: rocprim::ballot (int predicate) +.. doxygenfunction:: rocprim::masked_bit_count (lane_mask_type x, unsigned int add=0) diff --git a/docs/intrinsicsmodule.dox b/docs/intrinsicsmodule.dox deleted file mode 100644 index d50386a01..000000000 --- a/docs/intrinsicsmodule.dox +++ /dev/null @@ -1,10 +0,0 @@ -/** -@brief rocPRIM intrinsic functions. -@author -@file -*/ - -/** - * \defgroup intrinsicsmodule Intrinsic functions - * - */ \ No newline at end of file diff --git a/docs/intro.rst b/docs/intro.rst new file mode 100644 index 000000000..4a3b8ffa6 --- /dev/null +++ b/docs/intro.rst @@ -0,0 +1,30 @@ +Introduction +============ + +Operations and Sequences +------------------------ + +A ``rocPRIM`` operation is a computation over a sequence of objects returning one value (e.g. ``reduce``) , another sequence (e.g. ``sort``) or multiple sequences (e.g. ``partition``). The elements of the sequence could be of any type or class, although template specialization allows ``rocPRIM`` to optimize the computations over the usual numerical datatypes. Operations handle sequences by expecting ``iterators`` as input and mutable ones as output. + +A high level view of the available operations could be consulted there: :doc:`/ops_summary`. As you can see, those are really generic operations that are difficult to avoid on a day to day basis. + +Scope +----- + +An important property of a ``rocPRIM`` operation is its scope defining at which level of the computing model the processing will take place. That means which parts of the GPU will cooperate together to compute the result. +The scope has a direct influence on how the data will be subdivided into chunks to be eventually processed by the computing units or VALUs. + +* *Device/Grid* the operation and data will be split and dispatched to all the CUs. +* :term:`Block` The operation should take place within the same block by the same CU. +* :term:`Warp` as above but with a warp and a VALU. +* *Thread* The operation will take place sequentially in the same thread. We also call those thread-wide operations *Utilities* since it perfectly coincides to utility functions we use on a CPU. + +The scope has an impact on how the operation is initiated: + +* *Device/Grid* it is a kernel, thus it is dispatched with its own grid/block dimensions. +* *Block/Wrap/Thread* it is a function call, and inherits the dimensions of the current kernel. + +This point dictates how synchronization should be done to wait for completion: + +* *Device/Grid* Synchronization is done via wait lists and queue barriers (``stream``). +* *Block/Wrap/Thread* it is in the same control flow of the caller threads. Synchronization is done via memory barriers. diff --git a/docs/iterators.rst b/docs/iterators.rst new file mode 100644 index 000000000..fe6784b24 --- /dev/null +++ b/docs/iterators.rst @@ -0,0 +1,85 @@ +Iterators +========= + +Constant +-------- + +.. doxygenclass:: rocprim::constant_iterator + :members: + + +.. note:: + + For example, ``constant_iterator(20)`` generates the infinite sequence:: + + 20 + 20 + 20 + ... + +Counting +-------- + +.. doxygenclass:: rocprim::counting_iterator + :members: + +.. note:: + For example, ``counting_iterator(20)`` generates the infinite sequence:: + + 20 + 21 + 22 + 23 + ... + +Transform +--------- + +.. doxygenclass:: rocprim::transform_iterator + :members: + +.. note:: + + ``transform_iterator(sequence, transform)`` should generate the sequence:: + + transform(sequence(0)) + transform(sequence(1)) + ... + +Pairing Values with Indices +--------------------------- + +.. doxygenclass:: rocprim::arg_index_iterator + :members: + +.. note:: + ``arg_index_iterator(sequence)`` generates the sequence of tuples:: + + (0, sequence[0]) + (1, sequence[1]) + ... + +Zip +--- + +.. doxygenclass:: rocprim::zip_iterator + :members: + +.. note:: + ``zip_iterator(sequence_X, sequence_Y)`` generates the sequence of tuples:: + + (sequence_X[0], sequence_Y[0]) + (sequence_X[1], sequence_Y[1]) + ... + +Discard +------- + +.. doxygenclass:: rocprim::discard_iterator + :members: + +Texture Cache +------------- + +.. doxygenclass:: rocprim::texture_cache_iterator + :members: diff --git a/docs/ops_summary.rst b/docs/ops_summary.rst new file mode 100644 index 000000000..c8233375c --- /dev/null +++ b/docs/ops_summary.rst @@ -0,0 +1,47 @@ +Summary of the Operations +========================= + +Basics +------ + +* ``transform`` applies a function to each element of the sequence, equivalent to the functional operation ``map`` +* ``select`` takes the first N elements of the sequence satisfying a condition (via a selection mask or a predicate function) +* ``unique`` +* ``histogram`` generates a summary of the statistical distribution of the sequence. + +Aggregation +----------- + +* ``reduce`` traverses the sequence while accumulating some data, equivalent to the functional operation ``fold_left``. +* ``scan`` is the cumulative version of ``reduce`` which returns the sequence of the intermediate values taken by the accumulator. + +Differentiation +--------------- + +* ``adjacent_difference`` computes the difference between the current element and the previous or next one in the sequence. +* ``discontinuity`` detects value change between the current element and the previous or next one in the sequence. + +Rearrangement +------------- + +* ``sort`` rearranges the sequence by sorting it. It could be according to a comparison operator or a value using a radix approach. +* ``exchange`` rearranges the elements according to a different stride configuration which is equivalent to a tensor axis transposition +* ``shuffle`` rotates the elements. + +Partition/Merge +--------------- + +* ``partition`` divides the sequence into two or more sequences according to a predicate while preserving some ordering properties. +* ``merge`` merges two ordered sequences into one while preserving the order. + +Data Movement +------------- + +* ``store`` stores the sequence to a continuous memory zone. There are variations to use an optimized path or to specify how to store the sequence to better fit the access patterns of the CUs. +* ``load`` the complementary operations of the above ones. + +Other operations +---------------- + +* ``run_length_encode`` generates a compact representation of a sequence +* ``binary_search`` finds for each element the index of an element with the same value in another sequence (which has to be sorted). diff --git a/docs/requirements.txt b/docs/requirements.txt deleted file mode 100644 index 9b5e91520..000000000 --- a/docs/requirements.txt +++ /dev/null @@ -1,3 +0,0 @@ -breathe<4.33.1 -exhale -sphinx_rtd_theme \ No newline at end of file diff --git a/docs/run_doc.sh b/docs/run_doc.sh deleted file mode 100755 index c173bd9ef..000000000 --- a/docs/run_doc.sh +++ /dev/null @@ -1,11 +0,0 @@ -#!/bin/bash - -set -eu - -# Make this directory the PWD -cd "$(dirname "${BASH_SOURCE[0]}")" - -# Build sphinx docs (Exhale will automatically run Doxygen) -make clean -make html -make latexpdf diff --git a/docs/thread_ops.rst b/docs/thread_ops.rst new file mode 100644 index 000000000..7c4538a38 --- /dev/null +++ b/docs/thread_ops.rst @@ -0,0 +1,21 @@ +Thread-Level Operations (Utilities) +=================================== + +Scan +---- + +exclusive +......... + +.. doxygenfunction:: thread_scan_exclusive(T (&input)[LENGTH], T (&output)[LENGTH], ScanOp scan_op, T prefix, bool apply_prefix = true) +.. doxygenfunction:: thread_scan_exclusive(T *input, T *output, ScanOp scan_op, T prefix, bool apply_prefix = true) +.. doxygenfunction:: thread_scan_exclusive(T inclusive, T exclusive, T *input, T *output, ScanOp scan_op, Int2Type) + +inclusive +......... + +.. doxygenfunction:: thread_scan_inclusive (T inclusive, T *input, T *output, ScanOp scan_op, Int2Type< LENGTH >) +.. doxygenfunction:: thread_scan_inclusive (T *input, T *output, ScanOp scan_op) +.. doxygenfunction:: thread_scan_inclusive (T(&input)[LENGTH], T(&output)[LENGTH], ScanOp scan_op) +.. doxygenfunction:: thread_scan_inclusive (T *input, T *output, ScanOp scan_op, T prefix, bool apply_prefix=true) +.. doxygenfunction:: thread_scan_inclusive (T(&input)[LENGTH], T(&output)[LENGTH], ScanOp scan_op, T prefix, bool apply_prefix=true) diff --git a/docs/warp_ops/exchange.rst b/docs/warp_ops/exchange.rst new file mode 100644 index 000000000..0410c83b5 --- /dev/null +++ b/docs/warp_ops/exchange.rst @@ -0,0 +1,5 @@ +Exchange +-------- + +.. doxygenclass:: rocprim::warp_exchange + :members: diff --git a/docs/warp_ops/index.rst b/docs/warp_ops/index.rst new file mode 100644 index 000000000..e8fda8ee2 --- /dev/null +++ b/docs/warp_ops/index.rst @@ -0,0 +1,12 @@ +Warp-Level Operations +===================== + +.. toctree:: + + load + store + reduce + scan + sort + shuffle + exchange diff --git a/docs/warp_ops/load.rst b/docs/warp_ops/load.rst new file mode 100644 index 000000000..26568c60f --- /dev/null +++ b/docs/warp_ops/load.rst @@ -0,0 +1,13 @@ +Load +---- + +Class +..... + +.. doxygenclass:: rocprim::warp_load + :members: + +Algorithms +.......... + +.. doxygenenum:: rocprim::warp_load_method diff --git a/docs/warp_ops/reduce.rst b/docs/warp_ops/reduce.rst new file mode 100644 index 000000000..fbc70f7d2 --- /dev/null +++ b/docs/warp_ops/reduce.rst @@ -0,0 +1,5 @@ +Reduce +------ + +.. doxygenclass:: rocprim::warp_reduce + :members: diff --git a/docs/warp_ops/scan.rst b/docs/warp_ops/scan.rst new file mode 100644 index 000000000..89c923bd9 --- /dev/null +++ b/docs/warp_ops/scan.rst @@ -0,0 +1,5 @@ +Scan +---- + +.. doxygenclass:: rocprim::warp_scan + :members: diff --git a/docs/warp_ops/shuffle.rst b/docs/warp_ops/shuffle.rst new file mode 100644 index 000000000..d5aa02ff6 --- /dev/null +++ b/docs/warp_ops/shuffle.rst @@ -0,0 +1,6 @@ +Shuffle +------- + +.. doxygenfunction:: rocprim::warp_shuffle (const T &input, const int src_lane, const int width) +.. doxygenfunction:: rocprim::warp_shuffle_down (const T &input, const unsigned int delta, const int width) +.. doxygenfunction:: rocprim::warp_shuffle_xor (const T &input, const int lane_mask, const int width) diff --git a/docs/warp_ops/sort.rst b/docs/warp_ops/sort.rst new file mode 100644 index 000000000..dee641a5b --- /dev/null +++ b/docs/warp_ops/sort.rst @@ -0,0 +1,5 @@ +Sort +---- + +.. doxygenclass:: rocprim::warp_sort + :members: diff --git a/docs/warp_ops/store.rst b/docs/warp_ops/store.rst new file mode 100644 index 000000000..e6ab791dd --- /dev/null +++ b/docs/warp_ops/store.rst @@ -0,0 +1,13 @@ +Store +----- + +Class +..... + +.. doxygenclass:: rocprim::warp_store + :members: + +Algorithms +.......... + +.. doxygenenum:: rocprim::warp_store_method diff --git a/rocprim/include/rocprim/block/block_histogram.hpp b/rocprim/include/rocprim/block/block_histogram.hpp index 3135806b8..426b0c268 100644 --- a/rocprim/include/rocprim/block/block_histogram.hpp +++ b/rocprim/include/rocprim/block/block_histogram.hpp @@ -1,4 +1,4 @@ -// Copyright (c) 2017-2021 Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2017-2023 Advanced Micro Devices, Inc. All rights reserved. // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to deal @@ -94,7 +94,7 @@ struct select_block_histogram_impl /// /// \par Overview /// * block_histogram has two alternative implementations: \p block_histogram_algorithm::using_atomic -/// and block_histogram_algorithm::using_sort. +/// and \p block_histogram_algorithm::using_sort. /// /// \par Examples /// \parblock diff --git a/rocprim/include/rocprim/block/block_load.hpp b/rocprim/include/rocprim/block/block_load.hpp index 7b3df9929..83162fdd2 100644 --- a/rocprim/include/rocprim/block/block_load.hpp +++ b/rocprim/include/rocprim/block/block_load.hpp @@ -1,4 +1,4 @@ -// Copyright (c) 2017-2021 Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2017-2023 Advanced Micro Devices, Inc. All rights reserved. // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to deal @@ -100,11 +100,11 @@ enum class block_load_method /// /// \par Overview /// * The \p block_load class has a number of different methods to load data: -/// * [block_load_direct](\ref ::block_load_method::block_load_direct) -/// * [block_load_striped](\ref ::block_load_method::block_load_striped) -/// * [block_load_vectorize](\ref ::block_load_method::block_load_vectorize) -/// * [block_load_transpose](\ref ::block_load_method::block_load_transpose) -/// * [block_load_warp_transpose](\ref ::block_load_method::block_load_warp_transpose) +/// * \p block_load_method::block_load_direct +/// * \p block_load_method::block_load_striped +/// * \p block_load_method::block_load_vectorize +/// * \p ::block_load_method::block_load_transpose +/// * \p ::block_load_method::block_load_warp_transpose /// /// \par Example: /// \parblock diff --git a/rocprim/include/rocprim/block/block_load_func.hpp b/rocprim/include/rocprim/block/block_load_func.hpp index 83ebd5d84..23f53f03c 100644 --- a/rocprim/include/rocprim/block/block_load_func.hpp +++ b/rocprim/include/rocprim/block/block_load_func.hpp @@ -1,4 +1,4 @@ -// Copyright (c) 2017-2021 Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2017-2023 Advanced Micro Devices, Inc. All rights reserved. // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to deal @@ -346,6 +346,7 @@ void block_load_direct_striped(unsigned int flat_id, /// \brief Loads data from continuous memory into a warp-striped arrangement of items /// across the thread block. /// +/// \ingroup blockmodule_warp_load_functions /// The warp-striped arrangement is assumed to be (\p WarpSize * \p ItemsPerThread) items /// across a thread block. Each thread uses a \p flat_id to load a range of /// \p ItemsPerThread into \p items. @@ -396,6 +397,7 @@ void block_load_direct_warp_striped(unsigned int flat_id, /// \brief Loads data from continuous memory into a warp-striped arrangement of items /// across the thread block, which is guarded by range \p valid. /// +/// \ingroup blockmodule_warp_load_functions /// The warp-striped arrangement is assumed to be (\p WarpSize * \p ItemsPerThread) items /// across a thread block. Each thread uses a \p flat_id to load a range of /// \p ItemsPerThread into \p items. @@ -453,6 +455,7 @@ void block_load_direct_warp_striped(unsigned int flat_id, /// across the thread block, which is guarded by range with a fall-back value /// for out-of-bound elements. /// +/// \ingroup blockmodule_warp_load_functions /// The warp-striped arrangement is assumed to be (\p WarpSize * \p ItemsPerThread) items /// across a thread block. Each thread uses a \p flat_id to load a range of /// \p ItemsPerThread into \p items. diff --git a/rocprim/include/rocprim/block/block_store.hpp b/rocprim/include/rocprim/block/block_store.hpp index 95daef942..1f35a502e 100644 --- a/rocprim/include/rocprim/block/block_store.hpp +++ b/rocprim/include/rocprim/block/block_store.hpp @@ -1,4 +1,4 @@ -// Copyright (c) 2017-2021 Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2017-2023 Advanced Micro Devices, Inc. All rights reserved. // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to deal @@ -101,11 +101,11 @@ enum class block_store_method /// /// \par Overview /// * The \p block_store class has a number of different methods to store data: -/// * [block_store_direct](\ref ::block_store_method::block_store_direct) -/// * [block_store_striped](\ref ::block_store_method::block_store_striped) -/// * [block_store_vectorize](\ref ::block_store_method::block_store_vectorize) -/// * [block_store_transpose](\ref ::block_store_method::block_store_transpose) -/// * [block_store_warp_transpose](\ref ::block_store_method::block_store_warp_transpose) +/// * \p ::block_store_method::block_store_direct +/// * \p ::block_store_method::block_store_striped +/// * \p ::block_store_method::block_store_vectorize +/// * \p ::block_store_method::block_store_transpose +/// * \p ::block_store_method::block_store_warp_transpose /// /// \par Example: /// \parblock diff --git a/rocprim/include/rocprim/block/block_store_func.hpp b/rocprim/include/rocprim/block/block_store_func.hpp index 31c1e37cf..95ffd4ad3 100644 --- a/rocprim/include/rocprim/block/block_store_func.hpp +++ b/rocprim/include/rocprim/block/block_store_func.hpp @@ -1,4 +1,4 @@ -// Copyright (c) 2017-2019 Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2017-2023 Advanced Micro Devices, Inc. All rights reserved. // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to deal @@ -274,6 +274,7 @@ void block_store_direct_striped(unsigned int flat_id, /// \brief Stores a warp-striped arrangement of items from across the thread block /// into a blocked arrangement on continuous memory. /// +/// \ingroup blockmodule_warp_store_functions /// The warp-striped arrangement is assumed to be (\p WarpSize * \p ItemsPerThread) items /// across a thread block. Each thread uses a \p flat_id to store a range of /// \p ItemsPerThread \p items to the thread block. @@ -328,6 +329,7 @@ void block_store_direct_warp_striped(unsigned int flat_id, /// \brief Stores a warp-striped arrangement of items from across the thread block /// into a blocked arrangement on continuous memory, which is guarded by range \p valid. /// +/// \ingroup blockmodule_warp_store_functions /// The warp-striped arrangement is assumed to be (\p WarpSize * \p ItemsPerThread) items /// across a thread block. Each thread uses a \p flat_id to store a range of /// \p ItemsPerThread \p items to the thread block. diff --git a/rocprim/include/rocprim/config.hpp b/rocprim/include/rocprim/config.hpp index b3561429a..47da62723 100644 --- a/rocprim/include/rocprim/config.hpp +++ b/rocprim/include/rocprim/config.hpp @@ -37,7 +37,7 @@ #error "rocPRIM requires at least C++14" #endif -#ifndef ROCPRIM_DEVICE +#if !defined(ROCPRIM_DEVICE) || defined(DOXYGEN_DOCUMENTATION_BUILD) #define ROCPRIM_DEVICE __device__ #define ROCPRIM_HOST __host__ #define ROCPRIM_HOST_DEVICE __host__ __device__ @@ -45,7 +45,7 @@ #ifdef WIN32 #define ROCPRIM_KERNEL __global__ static #else - #define ROCPRIM_KERNEL __global__ + #define ROCPRIM_KERNEL __global__ #endif // TODO: These parameters should be tuned for NAVI in the close future. #ifndef ROCPRIM_DEFAULT_MAX_BLOCK_SIZE @@ -54,13 +54,20 @@ #ifndef ROCPRIM_DEFAULT_MIN_WARPS_PER_EU #define ROCPRIM_DEFAULT_MIN_WARPS_PER_EU 1 #endif - // Currently HIP on Windows has a bug involving inline device functions generating - // local memory/register allocation errors during compilation. Current workaround is to - // use __attribute__((always_inline)) for the affected functions - #ifdef WIN32 - #define ROCPRIM_INLINE inline __attribute__((always_inline)) + + #ifndef DOXYGEN_DOCUMENTATION_BUILD + // Currently HIP on Windows has a bug involving inline device functions generating + // local memory/register allocation errors during compilation. Current workaround is to + // use __attribute__((always_inline)) for the affected functions + #ifdef _WIN32 + #define ROCPRIM_INLINE inline __attribute__((always_inline)) + #else + #define ROCPRIM_INLINE inline + #endif #else - #define ROCPRIM_INLINE inline + // Prefer simpler signatures to let Sphinx/Breathe parse them + #define ROCPRIM_FORCE_INLINE inline + #define ROCPRIM_INLINE inline #endif #define ROCPRIM_FORCE_INLINE __attribute__((always_inline)) #endif diff --git a/rocprim/include/rocprim/intrinsics/thread.hpp b/rocprim/include/rocprim/intrinsics/thread.hpp index f6ab44038..dd5c493bd 100644 --- a/rocprim/include/rocprim/intrinsics/thread.hpp +++ b/rocprim/include/rocprim/intrinsics/thread.hpp @@ -1,4 +1,4 @@ -// Copyright (c) 2017-2022 Advanced Micro Devices, Inc. All rights reserved. +// Copyright (c) 2017-2023 Advanced Micro Devices, Inc. All rights reserved. // // Permission is hereby granted, free of charge, to any person obtaining a copy // of this software and associated documentation files (the "Software"), to deal @@ -101,6 +101,7 @@ unsigned int lane_id() } /// \brief Returns flat (linear, 1D) thread identifier in a multidimensional block (tile). +/// \ingroup intrinsicsmodule_flat_id ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int flat_block_thread_id() { @@ -143,6 +144,7 @@ unsigned int flat_tile_thread_id() } /// \brief Returns warp id in a block (tile). +/// \ingroup intrinsicsmodule_warp_id ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int warp_id() { @@ -156,6 +158,7 @@ unsigned int warp_id(unsigned int flat_id) } /// \brief Returns warp id in a block (tile). Use template parameters to optimize 1D or 2D kernels. +/// \ingroup intrinsicsmodule_warp_id template ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int warp_id() @@ -164,6 +167,7 @@ unsigned int warp_id() } /// \brief Returns flat (linear, 1D) block identifier in a multidimensional grid. +/// \ingroup intrinsicsmodule_flat_id ROCPRIM_DEVICE ROCPRIM_INLINE unsigned int flat_block_id() {