Skip to content

WeeklyTelcon_20190702

Geoffrey Paulsen edited this page Jul 2, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Artem Polyakov (Mellanox)
  • Brendan Cunningham (Intel)
  • Dan Topa (LANL)
  • Geoff Paulsen (IBM)
  • George Bosilca (UTK)
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Joshua Ladd (Mellanox)
  • Matthew Dosanjh (Sandia)
  • Ralph Castain (Intel)
  • Thomas Naughton
  • Todd Kordenbrock

not there today (I keep this for easy cut-n-paste for future notes)

  • Akshay Venkatesh (nVidia)
  • Aravind Gopalakrishnan (Intel)
  • Arm (UTK)
  • Brandon Yates (Intel)
  • Brian Barrett (Amazon)
  • David Bernhold
  • Edgar Gabriel (UH)
  • Geoffroy Vallee
  • Jake Hemstad
  • Josh Hursey (IBM)
  • Matias Cabral
  • Michael Heinz (Intel)
  • Nathan Hjelm
  • Noah Evans (Sandia)
  • Peter Gottesman (Cisco)
  • Xin Zhao (Mellanox)
  • mohan

Agenda/New Business


Infrastrastructure

Transition website, and email to AWS

  • Complete

Process enforcement bots

  • No update

Submodule prototype

  • Suggest just doing hwloc (stable and not too much development) first
  • No update

Release Branches

Review v3.0.x Milestones v3.0.4

Review v3.1.x Milestones v3.1.4

  • PRs for PMIx update.
    • Mellanox potential CI issue Mellanox is looking at.

Review v4.0.x Milestones v4.0.2

  • New Issue 6785
    • UCX is not in all distros yet, so this is a blocker.
  • 2nd Put issue PR 6568 (Vader deadlocking with 4MB transfers)
  • New Datatype work https://github.com/open-mpi/ompi/pull/6695
    • Want for v4.0.2
  • https://github.com/open-mpi/ompi/issues/6568 - put protocol has lost it's pipelining.
    • Right now only shows in vader, because all others prefer get protocol.
    • Vader generate a bunch of 32K frags. so for 4MBs overwhelms vader.
    • Does NOT occur with single copy like CMA or KNEM.

Review Master Master Pull Requests

  • PR6556 and 6621 should go to the release branches.
    • no update
  • Good reminder that we now need to be careful about OPAL's ABI.

v5.0.0

  • When do we get rid of 32bit?
  • Still don't have any release manager.
    • Need to identify someone in next few months.

Depdendancies

PMIx Update

  • PMIx v3.1.3 is ready to release.
    • put a tarball in ompi's v4.0.x for integrated test
    • So far looking good.
    • One issue on Mellanox CI, probably cluster, or test config
  • PMIx v2.2 update could be ready soon after that.
    • Doesn't have MPIR fix.
    • Missing something else. - Ralph will audit.

ORTE/PRRTE

  • Take a look at Gile's PRRTE work. He may have done SOME of that. He should have done that all in PRRTE layer, maybe just some MPI layer work remains.

Next face to face

  • Need people to react and do things.
  • Fall Face to face is canceled due to lack of agenda
    • PRTE transtion still requires dedicated discussion
  • Might meet in New Mexico, University of Tennisee, or Dallas (IBM)
    • Should make a meeting prep page
    • Jeff will make doodle.
    • Two days

MTT

  • IBM has some new failures.
    • Geoff will get some time to look at this week.
  • AWS - Scale testing not sure of status of that.

Back to 2019 WeeklyTelcon-2019

Clone this wiki locally