Skip to content

WeeklyTelcon_20170314

Geoffrey Paulsen edited this page Jan 9, 2018 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Howard
  • Nathan Hjelm
  • Thomas Naughton
  • Jeff Squyres
  • Artem Polyakov
  • Brian Barrett
  • Ralph
  • Joshua Ladd

Agenda

  • Thought we are ready today, BUT ...
  • Nathan found a blocker Bandwidth regression for the OB1 RDMA (openib) and ugenie BTLs:
    • PR3164
    • OB1 and Ugenie caches leave-pinned, which affects add/procs (somehow).
    • KNEM uses registration, so it would be affected.
    • Master and v2.1
    • Why is MTT not seeing this? Because we don't test performance.
  • At a minimum another MTT run, and another RC.
    • Earliest could release is next Tuesday.
  • Do we really need another RC?
    • Concerned about issues creeping into other components.
    • It would affect "out of the box" experience.

v2.0.x

  • A couple of PRs waiting. We probably want to do a v2.0.x release, because it has the madvise bug (serious)
  • Enough of a tipping point to begin a v2.0.3.
  • Some PRs waiting for review.
  • If SGE fix gets into v2.0.3 that'd be great. Not going to push for it on v2.1.0
  • PMIx - reason we're doing an accelerated v3.0
  • Whitelist Issue 3107
  • UCX has it's own Multithreading API that needs to be enabled. UCX is thread safe. Inside UCX PML
    • allocator will be inside of OSHMEM.
    • Sounds reasonable (component level stuff).
    • Allocator is Merged into Master.
    • Multithreading is going through Mellanox internal review. So maybe end of this week.
  • Still think we should branch TODAY to get it over with.
  • People can add it to their MTT tests.
  • What else is on Chopping block for v3.0.0?
    • Remove Yoda.
    • MxM MTL can go, but Yalla must stay.
    • For OSes, we'll say whatever one is running on Nathan's little MTT box.
    • 32bit Linux is a challenge.
    • Should remove ARM? Solaris? - Howard and Brian will reach out.
    • Compilers is a mix.
    • which nVidia GPUs?
    • Should we remove SM/BTLs?
  • Brian it's not in Master right now, and not a bug affecting customers, we should not do a bunch of code work to remove stuff for v3.0 due to aggressive schedule.
    • Others liked this concept.

PMIx v2.0 status

  • Almost feature complete.
  • Ralph is doing wiring in Orte at the moment.

AWS Testing

  • Should be able to Torque, SLURM, SGE.
  • IBM committed to testing Platform LSF.
  • IBM will check if they care about load-leveler. Think it's just a legacy product.
    • If IBM doesn't care, Recommend removing this one because it's clunky interface.
  • Eventually it'd be nicer to have CI for these things, but for v3.0.0, Either CI or MTT might be enough.
    • Testing matrix of schedulers is more nightly MTT.
  • Don't break the build!!!

MTT Dev status:


Exceptional topics

  • We should begin thinking about scheduling our next face to face.
    • Geoff will put out doodle for June and July and begin to nail down a schedule.

Status Updates:

Status Update Rotation

  1. Cisco, ORNL, UTK, NVIDIA
  2. Mellanox, Sandia, Intel
  3. LANL, Houston, IBM, Fujitsu

Back to 2017 WeeklyTelcon-2017

Clone this wiki locally