-
Notifications
You must be signed in to change notification settings - Fork 860
WeeklyTelcon_20200414
Geoffrey Paulsen edited this page Apr 16, 2020
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- Akshay Venkatesh (NVIDIA)
- Austen Lauria (IBM)
- Brendan Cunningham (Intel)
- David Bernhold (ORNL)
- Edgar Gabriel (UH)
- Geoffrey Paulsen (IBM)
- George Bosilca (UTK)
- Harumi Kuno (HPE)
- Howard Pritchard (LANL)
- Jeff Squyres (Cisco)
- Joseph Schuchart
- Josh Hursey (IBM)
- Joshua Ladd (Mellanox)
- Matthew Dosanjh (Sandia)
- Michael Heinz (Intel)
- Noah Evans (Sandia)
- Ralph Castain (Intel)
- Thomas Naughton (ORNL)
- Todd Kordenbrock (Sandia)
- William Zhang (AWS)
- Artem Polyakov (Mellanox)
- Brian Barrett (AWS)
- Geoffroy Vallee (ARM)
- Scott Breyer (Sandia?)
- Erik Zeiske
- Shintaro iwasaki
- Nathan Hjelm (Google)
- Charles Shereda (LLNL)
- Brandon Yates (Intel)
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Xin Zhao (Mellanox)
- mohan (AWS)
- Be nice to update automation to accept this.
- Would need to update legal text saying that if users do this they're agreeing to the terms of the community.
- use a new openmpi-params.ini instead of open-mpi-mca-params.conf
- symlink for forward compatibility.
- This is similar for
--tune <file>
to provide more command line and env parameters from file. - Two parts to "visible" rule:
- command line values takes precidence over file values.
- If there's a conflict between things you don't see, that's an error.
- Already slated for v5.0.x, slightly different than v4.0.x
-
--net
says take a single line from a parameter. -
--tune
says take an entire parameter file.
- Writing manpages in nroff is painful
- Jeff wrote MPI_T.5.md man page in Markdown
- Make converts markdown files to nroff via new tool
pandoc
- Don't want to require users to install
pandoc
, so will convert .md files to nroff inmake dist
- Configure will error if you don't have
pandoc
v1.19 in path- will check if we can lower requirements to v1.12.3 (Comes with CentOS 7.7)
- Initial testing looks good, verifying now.
- Make converts markdown files to nroff via new tool
- Native nroff and markdown can co-exist, so don't need to do them all at once.
- Can we suppress generation of manpages if there is no pandoc?
- No. Don't want to support dist without "full" contents.
-
--without-manpages
- maybe could make this work.
- Jeff will send something to packagers downstream
- Worst case, we could pull this for v5.0
- MTT -
- If you change your MTT to startup PRRTE at begining of session, and just use prun.
- Can see times cut in half or more.
- This is good, but also need to test mpirun wrapper.
- Cisco is converting half of MPI installs to use prrte/prun
- OMPI master submodule pointers setup to track PMIx and PRRTE master.
Blockers All Open Blockers
Review v4.0.x Milestones v4.0.4
- v4.0.4 in the works.
- 7616 - ABI break introduced in OMPI v4.0.3 for some f08 symbols.
- May drive an earlier v4.0.4 to fix.
- 7617 - Howard is looking at this, may want for v4.0.4
- OLD - Do we want to integrate with latest PMIx v3.1 branch (commits after v3.1.5)?
- open question for RMs.
- Comm Spawn failure on v4.0.3, possibly related to PMIx v3.1.4 commit.
- Ralph Can't reproduce it. Complex app. Easy workaround PMIx 3.1.3 or earlier.
- Ralph is looking at.
- If this is in PMIX, then this might drive a new v3.1 release.
-
Schedule:
- Feature Freeze: April 30
- Release: End of June
-
Discussing Features on google sheets document
- (https://docs.google.com/spreadsheets/d/1OXxoxT9P_YLtepHg6vsW3-vp4pdzGQgyknNbkzenYvw/edit#gid=0) which were taken from the face to face wiki.
-
PMIx v4.0.0 - on track
- Schedule:
- PMIX - Won't release v4.0 in time for OMPI v5.0, but will drop a tag that Open-MPI can use.
-
PRRTE v2.0 - on track
-
A number of new MTT failures.
-
Issues not tracked on spreadsheet.
- libopal isn't slurped into Open-MPI correctly (related to 7560)
- Jeff and Brian will meet Friday
- libopal isn't slurped into Open-MPI correctly (related to 7560)
-
Heriarchacal collectives
- If someone wants to do, PMIx has much of this information already.
- Not too hard to do, and they're much faster. Will be in next version of competitor MPI
- Probably not for v5.0
-
Static linking is failing on master right now.
- Issue 7560
- May be an issue in static build support in PMIx and PRTE as well as how we're pulling it in.
- Affects everything, just masked at the moment because static linking is broken.
- Jeff will investigate
- No progress.
-
SLURM PMIx plugin has been locked on PMIx v2 for some time.
- There are some NEW PMIx calls that SHOULD be added to bring it up.
- Ralph has started a PR, but needs help.
- So for now, there's some optional info that won't be passed correctly.
- No OMPI_INFO for now.
- Ralph gets pinged occasionally.
- Not sure priority of this.
- There are some NEW PMIx calls that SHOULD be added to bring it up.
-
MTT on master is looking pretty good.
- Defered.
- scale-testing, PRs have to opt-into it.
Review Master Master Pull Requests
- CI testing only tests build and did it run, but doesn't test HOW it ran.
- Environment setup can be a bit different.
- For example no-permissions in
/tmp
. Might pass on one machine, and fail on another without/tmp
permissions.