-
Notifications
You must be signed in to change notification settings - Fork 860
WeeklyTelcon_20170314
Geoffrey Paulsen edited this page Jan 9, 2018
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Geoff Paulsen
- Howard
- Nathan Hjelm
- Thomas Naughton
- Jeff Squyres
- Artem Polyakov
- Brian Barrett
- Ralph
- Joshua Ladd
Review All Open Blockers
Review Milestones v2.1.0
- Thought we are ready today, BUT ...
- Nathan found a blocker Bandwidth regression for the OB1 RDMA (openib) and ugenie BTLs:
- PR3164
- OB1 and Ugenie caches leave-pinned, which affects add/procs (somehow).
- KNEM uses registration, so it would be affected.
- Master and v2.1
- Why is MTT not seeing this? Because we don't test performance.
- At a minimum another MTT run, and another RC.
- Earliest could release is next Tuesday.
- Do we really need another RC?
- Concerned about issues creeping into other components.
- It would affect "out of the box" experience.
- A couple of PRs waiting. We probably want to do a v2.0.x release, because it has the madvise bug (serious)
- Enough of a tipping point to begin a v2.0.3.
- Some PRs waiting for review.
- If SGE fix gets into v2.0.3 that'd be great. Not going to push for it on v2.1.0
- PMIx - reason we're doing an accelerated v3.0
- Whitelist Issue 3107
- UCX has it's own Multithreading API that needs to be enabled. UCX is thread safe. Inside UCX PML
- allocator will be inside of OSHMEM.
- Sounds reasonable (component level stuff).
- Allocator is Merged into Master.
- Multithreading is going through Mellanox internal review. So maybe end of this week.
- Still think we should branch TODAY to get it over with.
- People can add it to their MTT tests.
- What else is on Chopping block for v3.0.0?
- Remove Yoda.
- MxM MTL can go, but Yalla must stay.
- For OSes, we'll say whatever one is running on Nathan's little MTT box.
- 32bit Linux is a challenge.
- Should remove ARM? Solaris? - Howard and Brian will reach out.
- Compilers is a mix.
- which nVidia GPUs?
- Should we remove SM/BTLs?
- Brian it's not in Master right now, and not a bug affecting customers, we should not do a bunch of code work to remove stuff for v3.0 due to aggressive schedule.
- Others liked this concept.
- Almost feature complete.
- Ralph is doing wiring in Orte at the moment.
- Should be able to Torque, SLURM, SGE.
- IBM committed to testing Platform LSF.
- IBM will check if they care about load-leveler. Think it's just a legacy product.
- If IBM doesn't care, Recommend removing this one because it's clunky interface.
- Eventually it'd be nicer to have CI for these things, but for v3.0.0, Either CI or MTT might be enough.
- Testing matrix of schedulers is more nightly MTT.
Review Master Pull Requests
- Don't break the build!!!
Review Master MTT testing
- We should begin thinking about scheduling our next face to face.
- Geoff will put out doodle for June and July and begin to nail down a schedule.
- Cisco, ORNL, UTK, NVIDIA
- Mellanox, Sandia, Intel
- LANL, Houston, IBM, Fujitsu