Skip to content

6.0.x Feature List

Edgar Gabriel edited this page Nov 13, 2024 · 41 revisions

Time Line

Target date - end CY24.

When should we plan to cut the 6.0.x branch? As late as possible, unless we are blocking 7.0 changes (ABI).

Strike through means feature is complete and committed to Open MPI main branch.

What to get done by end of CY24 Q2

  • Extended Accelerator API:
    • CUDA support for IPC
  • Reduction op (and others) offload support (Joseph)
  • Collectives:
    • Merge XHC if they can commit to supporting it.
    • Merge acoll once it passes CI
    • smdirect won't be merged, salvage for parts.
    • propose JSON format for tuning file
    • Remove coll/sm (tuned is OK fallback, XHC/acoll coming soon)
    • Performance testing of Luke's han alltoall pr with UCX.
  • Remove:
    • GNI BTL
    • udredge_rcache
    • Remove pvfs2 components
  • Big Count:
    • Collective embiggening Phase 1 (everything except *v *w collectives)

What to get done by end of CY24 Q3

  • Collective embiggening Phase 2 (*v *w collectives)

What to get done by end of CY24 Q4

  • Switch over to forked PRRTe Phase 1
    • Documentation Changes
    • Remove Remove prte binaries (Univ. Louisville)
    • Remove --with-prte configure option from ompi (Univ. Louisville)
    • Some MCAs (Univ. Louisville/rhc54)
  • Big Count:
    • API-level function generation (PR open and ready for review)
  • Memory Kind support:
    • Add memory-kind option
    • Return supported memory kinds
  • ROMIO Refresh
  • Remove:
    • Remove use TKR in MPI module for Fortran (old NAG compiler complicates things)

Likely to miss the 6.0.0 release

  • Phase 2 PRRTE
    • MCA parameters move into ompi namespace.
    • prte_info is gone, move those to ompi_info, perhaps a prte-mca option?
  • BTL Self accelerator aware (probably defer to later release)

List of Features planned for the 6.0.x release stream

ABI:

  • If Jacob's ABI work is ready, it might help solidify the standard to have our implementation done.
    • Merge ABI work into main, enable it only when requested, and stress in documentation it is experimental.

MPI 4.0 (critical):

  • Big count support
    • API level functions (in progress 1-2 months)(DONE PR OPEN)
    • Collective embiggening (discussed at F2F, stage in none v,w functions first) (DONE)
    • Changes to datatype engine/combiner support (could be a challenge)
    • ROMIO refresh
    • Embiggen man pages and other documentation
    • Remove hcol component? (its API doesn't support big count and its been superseded by UCC)
  • PRRTE switch Phase 1

MPI 4.0 (tentative):

  • MPI_T events (probably won't do for 6.0.x).

Accelerator support:

  • extended accelerator API functionality (IPC) and conversion of the last components to use accelerator API (DONE for ROCM and CUDA, not ZE).
  • level zero (ze) accelerator component (DONE basic support, IPC not implemented, Howard)
  • support for MPI 4.1 memory kinds info object (assume we have PRRTE move, 1 month for basic support)
  • reduction op (and others) offload support (Joseph estimates 1-2 months to get in)
  • SMSC accelerator (Edgar - not sure yet about this one for 6.0.x)
    • Stream-aware datatype engine.
  • Datatype engine accelerator awareness(e.g. memcpy2d) (George).

What about smart pointers? Probably could not get this in to a 6.0.x.

MPI 4.1:

  • implement memory allocation kind info. (see above for accelerator features)

Things to remove:

  • GNI BTL - no longer have access to systems to support this (Howard) (DONE)
  • UDREG Rcache - no longer have access to systems that can use this (Howard) (DONE)
  • FS/PVFS2 an FBTL/PVFS2 - no longer have access to systems to support this (Edgar) (DONE)
  • coll/sm (DONE)
  • Remove TKR version of use mpi module. (Howard)
    • This was deferred from 4.0.x because in April/May 2018 (and then deferred again from v5.0.x in October 2018), it was discovered that:
      1. The RHEL 7.x default gcc (4.8.5) still uses the TKR mpi module
      2. The NAG compiler still uses the TKR mpi module.

Collectives:

  • mca/coll: blocking reduction on accelerator (this is discussed above, Joseph)
  • mca/coll: hierarchical MPI_Alltoall(v), MPI_Gatherv, MPI_Scatterv. (various orgs working on this)
  • mca/coll: new algorithms (various orgs working on this)

There are quite a few open PRs related to collectives. Can some of these get merged? See notes from 2024 F2F Meeting

Random:

  • Sessions - add support for UCX PML (Howard, 2-3 weeks)
  • Sessions - various small fixes (Howard, 1 month)
  • ZE support for IPC (maybe)
  • Atomics - can we just rely on C11 and remove some of this code? We are currently using gcc atomics for performance reasons. Joseph would like to have a wrapper for atomic types and direct load/store access.
Clone this wiki locally