Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI_Type, MPI_Alltoallw, mpp_global_field update #5

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Commits on Nov 10, 2021

  1. MPI_Type, MPI_Alltoallw, mpp_global_field update

    This patch contains three new features for FMS: Support for MPI datatypes, an
    MPI_Alltoallw interface, and modifications to mpp_global_field to use these
    changes for select operations.
    
    These changes were primarily made to improve stability of large (>4000
    rank) MPI jobs under OpenMPI at NCI.
    
    There are differences in the performance of mpp_global_field,
    occasionally even very large differences, but there is no consistency
    across various MPI libraries.  One method will be faster in one library,
    and slower in another, even across MPI versions.  Generally, the
    MPI_Alltoallw method showed improved performance on our system, but this
    is not a universal result.  We therefore introduce a flag to control
    this feature.
    
    The inclusion of MPI_Type support may also be seen as an opportunity to
    introduce other new MPI features for other operations, e.g. halo
    exchange.
    
    Detailed changes are summarised below.
    
    - MPI data transfer type ("MPI_Type") support has been added to FMS.  This is
      done with the following features:
    
      -  A `mpp_type` derived type has been added, which manages the type details
        and hides the MPI internals from the model developer.  Types are managed
        inside of an internal linked list, `datatypes`.
    
        Note: The name `mpp_type` is very similar to the preprocessor variable
        `MPP_TYPE_` and should possibly be renamed to something else, e.g.
        `mpp_datatype`.*
    
      - `mpp_type_create` and `mpp_type_free` are used to create and release these
        types within the MPI library.  These append and remove mpp_types from the
        internal linked list, and include reference counters to manage duplicates.
    
      - A `mpp_byte` type is created as a module-level variable for default
        operations.
    
        NOTE: As the first element of the list, it also inadvertently provides
        access to the rest of `datatypes`, which is private, but there is probably
        some ways to address this.*
    
    - A MPI_Alltoallw wrapper, using MPI_Types, has been added to the mpp_alltoall
      interface.
    
    - An implementation of mpp_global_field using MPI_Alltoallw and mpp_types has
      been added.  In addition to replacing the point-to-point operations with a
      collective, it also eliminates the need to use the internal MPP stack.
    
      Since MPI_Alltoallw requires that the input field by contiguous, it is only
      enabled for data domains (i.e. compute + halo).  This limitation can be
      overcome, either by copying or more careful attention to layout, but it can
      be addressed in a future patch.
    
      This method is enabled in the `mpp_domains_nml` namelist group, by setting
      the `use_alltoallw` flag to True.
    
    Provisional interfaces to SHMEM and serial ("nocomm") builds have been added,
    although they are as yet untested and primarily meant as placeholders for now.
    
    This patch also includes the following changes to support this work.
    
    - In `get_peset`, the method used to generate MPI subcommunicators has been
      changed; specifically `MPI_Comm_create` has been replaced with
      `MPI_Comm_create_group`.  The former is blocking over all ranks, while the
      latter is only blocking over ranks in the subgroup.
    
      This was done to accommodate IO domains of a single rank, usually due to
      masking, which would result in no communication and cause a model hang.
    
      It seems that more recent changes in FMS related to handling single-rank
      communicators were made to avoid this particular scenario from happening, but
      I still think that it's more correct to use `MPI_Comm_create_group` and have
      left the change.
    
      This is an MPI 3.0 feature, so this might be an issue for older MPI
      libraries.
    
    - Logical interfaces added to mpp_alltoall and mpp_alltoallv
    
    - Single-rank PE checks in mpp_alltoall were removed to prevent model hangs
      with the subcommunicators.
    
    - NULL_PE checks have been added to the original point-to-point implementation
      of mpp_global_field, although these may not be required anymore due to
      changes in the subcommunicator implementation.
    
      This work was by Nic Hannah, and may actually be part of an existing pull
      request.  (TODO: Check this!)
    
    - Timer events have been added to mpp_type_create and mpp_type_free, although
      they are not yet initialized anywhere.
    
    - The diagnostic field count was increased from 150 to 250, to support the
      current needs of researchers.
    marshallward authored and aidanheerdegen committed Nov 10, 2021
    Configuration menu
    Copy the full SHA
    e0e9a83 View commit details
    Browse the repository at this point in the history