Releases: Sandia-OpenSHMEM/SOS
Sandia OpenSHMEM v1.5.3
The Sandia OpenSHMEM development team is pleased to announce SOS v1.5.3. This full release includes the changes listed below from v1.5.3rc1 and the following additional changes:
- Fixed bugs in
shmem_team_split_strided
andshmem_team_split_2d
operations. - Improve team wraparound sequence detection that causes undefined behavior.
- Included
-lpmi_simple
inLDFLAGS
when simple PMI is enabled. - Moved the warning that SOS could not detect any NICs with affinity to the process to
SHMEM_DEBUG
output. - Additional bugfixes, including a fix for collectives with the OFI CXI provider and team-relative PE numbering for signal add and set.
Sandia OpenSHMEM v1.5.3rc1
The Sandia OpenSHMEM development team is pleased to announce SOS v1.5.3, release candidate 1. Below are some of the changes included in this release:
- Added several enhancements to better support SOS as a backend for Intel® SHMEM.
- Added extension support for GPU RDMA and external heap creation.
- Added support for multi-NIC configurations via libfabric. The feature is enabled by default. It can be disabled with the environment variable,
SHMEM_OFI_DISABLE_MULTIRAIL=1
. - Added initial support for multi-NIC topology optimizations via hwloc. Detection of hwloc is enabled by default. It can be disabled with the configuration flag,
--without-hwloc
. - Moved the "tests-sos" package of unit tests and performance benchmarks to a new Git submodule hosted at https://github.com/openshmem-org/tests-sos.
- Added
shmemx_ibput
andshmemx_ibget
as extension APIs. - Added
shmemx_signal_add
andshmemx_signal_set
as extension APIs. - Added several configuration flags to optimize for the CXI libfabric provider:
--enable-ofi-inject
and--enable-nonfetch-amo
, which are enabled by default. - Manpage generation is now disabled by default to shorten build times. It can be re-enabled during configuration with the
--enable-manpages
flag. - Included multiple bugfixes, including in teams configuration, remote-virtual-addressing checks, buffer argument overlap checks, and more.
Sandia OpenSHMEM v1.5.2
The Sandia OpenSHMEM development team is pleased to announce SOS v1.5.2. This full release includes the changes listed below from v1.5.2rc1 and the following additional changes:
- Added configuration support for building Sandia OpenSHMEM with oneAPI compilers.
- Resolved a performance issue with fetching atomic operations when using the CXI libfabric provider.
- In the OFI transport, added the FI_RECV capability to contexts (for driving progress) resolving an issue with the Omni-Path Express (opx) provider.
Sandia OpenSHMEM v1.5.2rc1
The Sandia OpenSHMEM development team is pleased to announce SOS v1.5.2, release candidate 1. Below are some of the changes included in this release:
- Added support for the CXI libfabric provider, enabling SOS on the Slingshot interconnect. Setup instructions are on SOS's Github wiki.
- Added support for shmem_team_ptr routine.
- Added support for negatives strides in the OpenSHMEM teams APIs.
- Added checks for incorrect buffer overlap when error-checking is enabled.
- Added a configure option to enable deprecated tests,
--enable-deprecated-tests
, which is disabled by default. - Added a configure option to enable libfabric manual progress,
--enable-ofi-manual-progress
, which is disabled by default. - Added experimental support for a hint to shmem_malloc_with_hints,
SHMEMX_MALLOC_NO_BARRIER
, which removes the barrier at exit from the routine. Users must synchronize appropriately after such an operation. - Initialized the OFI tx/rx capabilities appropriately to limit what is enabled by providers. This resolves an issue with the Omni-Path Express (opx) provider.
- Patched the symmetric data segment initialization to better support MacOS.
Sandia OpenSHMEM v1.5.1
The Sandia OpenSHMEM development team is pleased to announce SOS v1.5.1. This full release includes the changes listed below from v1.5.1rc1 and the following additional change:
- Added a Dockerfile so users can create containers that provide a configurable Ubuntu sandbox for running SOS applications and benchmarks.
Sandia OpenSHMEM v1.5.1rc1
The Sandia OpenSHMEM development team is pleased to announce SOS v1.5.1, release candidate 1. Below are some of the changes included in this release:
- Added deprecation warnings for
shmem_barrier
, active-set reductions, and other deprecated routines, such as the 32/64-bit collectives. - Removed usage of deprecated APIs in shmem_perf_suite, apps, and spec-examples.
- Added missing types for the OpenSHMEM v1.5 reductions (e.g., fixed-width integers,
ptrdiff_t
,size_t
, andschar
). - Fixed an incorrect function signature for
shmem_team_get_config()
. - Resolved some critical issues with the UCX transport.
- Updated upstream configuration scripts from OpenMPI for UCX, PMI, etc.
- Fixed issues with the OpenSHMEM teams API (e.g. team-based broadcasts now update the destination object on all PEs including the root).
- Corrected the return value for
shmem_test_all
andshmem_test_all_vector
. - Added support for
shmem_signal_wait_until
. - Resolved a critical build issue on Mac OSX.
- Fixed several issues in the SOS unit tests and updated them to use the OpenSHMEM v1.5 APIs.
- Migrated SOS continuous integration from Travis CI to Github Actions and Workflows.
Sandia OpenSHMEM v1.5.0
The Sandia OpenSHMEM development team is pleased to announce SOS v1.5.0. Below are some of the changes included in this release:
- Support for the OpenSHMEM 1.5 specification.
- This full release includes the changes listed below for v1.5.0rc1 and the following additional changes.
- Added a multiplier for scaling the number of trial iterations in performance test suite benchmarks, configurable by environment variable,
SHMEM_PERF_SUITE_TRIALS_MULTIPLIER
. - Added an {SOS base}/examples directory with simple OpenSHMEM example programs and instructions on how to build and run those.
- With the recent fixes in libfabric v1.11.x, the RXM/Verbs provider support does not require setting the
FI_MR_CACHE_MAX_COUNT
environment variable to 0 when threading is used. - Additional bugfixes, including profiling symbols for some team-based collectives, support for C99 compilation, and unit test failures with single PE.
Sandia OpenSHMEM v1.5.0rc1
The Sandia OpenSHMEM development team is pleased to announce SOS v1.5.0, release candidate 1. Below are some of the changes included in this release:
- Added support for the OpenSHMEM v1.5 specification.
- New features include: a teams API and teams-based collectives, put-with-signal routines, nonblocking atomic routines, multiple-element wait/test vector comparison routines, shmem_malloc_with_hints, and a profiling interface. See OpenSHMEM 1.5 Specification, Annex G for details.
- Deprecations include: active-set-based library constants and collective routines, shmem_barrier, and short/unsigned short variants for
shmem_wait_until and shmem_test. See OpenSHMEM 1.5 Specification, Annex F for details. - Added support for the UCX transport.
- Added shmem_malloc_with_hints. Currently, no hint values are supported.
- Added shmem_signal_fetch.
- Multiple bugfixes, including in teams resource management, put completion logic, and configure issues when multiple transports are detected.
Sandia OpenSHMEM v1.4.5
The Sandia OpenSHMEM development team is pleased to announce SOS v1.4.5. Below are some of the changes included in this release:
- Added a complete OpenSHMEM teams API to the shmemx interfaces.
- Added the OpenSHMEM wait/test vector API to the shmemx interfaces.
- Added support for shared memory atomic operations with the appropriate acquire/release memory ordering where required. This feature requires the --enable-shr-transport build flag.
- Improved experimental support for the RXM/Verbs provider stack with libfabric 1.8 and newer.
- Updated SOS to use the new OFI memory registration mode flags instead of the deprecated FI_MR_SCALABLE/FI_MR_BASIC modes.
- Updated the wait/test any/all/some "status" array semantics to reflect recent changes to the OpenSHMEM specification.
- Added a memory sync before returning from barriers to ensure shared memory updates and remote updates cached in the NIC are visible in memory.
- Updated the utility atomics (spinlocks and counters) to use __atomic built-in functions rather than the deprecated __sync builtins.
- Removed redundant shmem_barrier_all from the GUPS example application.
- Moved unit tests derived from OpenSHMEM specification examples to the test/spec-example directory.
Sandia OpenSHMEM v1.4.4
The Sandia OpenSHMEM development team is pleased to announce SOS v1.4.4. Below are some of the changes included in this release:
- Experimental support for RXM/Verbs provider stack with libfabric 1.8. Requires --enable-hard-polling and --enable-ofi-mr=basic build flags. The FI_MR_CACHE_MAX_COUNT environment variable should be set to 0 when threading is used. Depending on the libfabric build, setting SHMEM_OFI_PROVIDER="verbs;ofi_rxm" may be necessary to select the provider.
- Rewrite of node-level detection/management to enable improvements in on-node communication and resource management.
- Added bandwidth optimized ring reduction algorithm, selectable by setting the SHMEM_REDUCE_ALGORITHM environment variable to "ring". See README for details.
- Added the SHMEM_COLL_SIZE_CROSSOVER environment variable to control the message size at which collective communication transitions between latency and bandwidth optimization.
- Added SHMEM_OFI_STX_AUTO environment variable to enable automatic partitioning of STX resources on node. See README for details.
- Updated unit tests and performance test suite to properly handle failed context creation.
- Updated OFI transport to support providers that do not support the FI_SHARED_CONTEXT TX attribute (STX) on endpoints.
- Updated OFI transport to bind a CQ to the target EP (required by some providers) and poll the target CQ when driving manual progress. As a result, manual progress no longer requries a target-side counter and can be used in conjunction with hard polling.
- Added support for decimal values in the SHMEM_SYMMETRIC_SIZE environment variable (e.g., 1.5G).
- Updated shmem_init_thread routine to return an error when library initialization fails (e.g., due to invalid SHMEM_SYMMETRIC_SIZE value).
- Fixed argument/context handling in Mandelbrot example program.
- Improved handling of context creation failure.
- Added missing pshmem_global_exit symbol to profiling interfaces.
- Updated shmem_ptr to return a pointer for the local process, even when inter-PE shared memory is not enabled.
- Updated fence, quiet, and context destroy to perform no operation for SHMEMX_CTX_INVALID.
- Additional bugfixes and improvement (see git log for details).