-
Notifications
You must be signed in to change notification settings - Fork 860
WeeklyTelcon_20201208
Geoffrey Paulsen edited this page Jan 19, 2021
·
1 revision
- Dialup Info: (Do not post to public mailing list or public wiki)
- NOT-YET-UPDATED
- v4.0.6 toward the end of the month.
- 7199 - ras based orted on headnode.
- No other blockers currently on
- Geoff will create milestone issue for v4.0.6
- PMIX 3.2
- We'll need an hwloc version of PR8222.
- What should we do about Issue 8222
- Giles extended autotools to patch libtool.
- PR8222 master late last night. Can easily be cherry-picked.
- Do we need similar PRs in hwloc and pmix?
- Probably do.
- Want to take this back to v4.0.x, and v4.1.x
- PR 8187 - deterministic build.
- rhc will review.
- 4.0.x
- PMIx fix for RPM spec needed as well.
- Got ARM fix merged, and new PMIx fix
- Waiting on Final word from Howard on ROMIO
- Will do an RC without a refresh.
- We could do a future bugfix later in the v4.1 series.
- Will need to do some scale release
- Dave Love has a fix that doesn't seem to hit
- RC should go out today.
-
Jeff Squyres want the v5.0 RMs to generate a list of versions it'll support, to document.
- Still need to coirdinate on this. He'd like this, this week.
-
PMIx v4.0 working on Tools, hopefully done soon.
- PMIx go through python bindings.
- a new Shmem component to replace
- Still working on.
-
Dave Wooten pushed up some PRRTE patches, and making some progress there.
- Slow but steady progress.
- Once tool work is more stabilized on PMIx v4.0, will add some tool tests to CI.
- Probably won't start until first of the year.
-
How is the submodule reference updatees on Open-MPI master
- Josh was still looking to see about adding some cross checking CI
- When making a PRTE PR, could add some comment to the PR and it'll trigger Open-MPI CI with that PR.
- Josh wanted to bring something up from PMIx call.
-
What do we want to do about ROMIO in general.
- There have been some OMPI specific changes put into ROMIO, meaning upstream maintainers refuse to help us with it.
- Long Term we need to figure out what to do about this.
- We may be able to work with upstream to make a clear API between the two.
-
Need to look at this treematch thing. Upstream package that is now inside of Open-MPI.
-
ROMIO vs OMPIO on mailing list
- Edgar is looking at OMPIO on their system to ensure they're running it correctly.
- Also Issue
-
How's the state of https://github.com/open-mpi/ompi-tests-public/
- Putting new tests there
- Very little there so far, but working on adding some more.
- Should have some new Sessions tests
-
What's going to be the state of the SM Cuda BTL and CUDA support in v5.0?
- What's the general state? Any known issues?
- AWS would like to get.
- Josh Ladd - Will take internally to see what they have to say.
- From nVidia/Mellanox, Cuda Support is through UCX, SM Cuda isn't tested that much.
- Hessam Mirsadeg - All Cuda awareness through UCX
- May ask George Bosilica about this.
- Don't want to remove a BTL if someone is interested in it.
- UCX also supports TCP via CUDA
- PRRTE CLI on v5.0 will have some GPU functionality that Ralph is working on
-
Update 11/17/2020
- UTK is interested in this BTL, and maybe others.
- Still gap in the MTL use-case.
- nVidia is not maintaining SMCuda anymore. All CUDA support will be through UCX
- What's the state of the shared memory in the BTL?
- This is the really old generation Shared Memory. Older than Vader.
- Was told after a certain point, no more development in SM Cuda.
- One option might be to
- Another option might be to bring that SM in SMCuda to Vader(now SM)
-
Restructure Tech Doc (more features than Markdown, including crossrefrences)
- Jeff had a first stab at this, but take a look. Sent it out to devel-list.
- All work for master / v5.0
- Might just be useful to do README for v4.1.? (don't block v4.1.0 for this)
- Sphynx is tool to generate docs from restructured doc.
- can handle current markdown manpages together with new docs.
- readthedocs.io encourages "restructured text" format over markdown.
- They also support a hybrid for projects that have both.
- Thomas Naughton has done the restructured text, and it allows
- LICENSE question - what license would the docs be available under? Open-MPI BSD license, or
-
Ralph tried the Instant on at scale:
- 10,000 nodes x 32PPN
- Ralph verified Open-MPI could do all of that in < 5 seconds, Instant-On.
- Through MPI_Init() (if using Instant-On)
- TCP and Slingshot (OFI provider private now)
- PRRTE with PMIx v4.0 support
- SLURM has some of the integration, but hasn't taken this patch yet.
-
Discussion on:
- Draft Request Make default static https://github.com/open-mpi/ompi/pull/8132
- One con is that many providers hard link against libraries, which would then make libmpi dependent on this.
- Talking about amending to request MCAs to know if it should be slurped in.
- (if the component hard links or dlopens their libraries)
- Roadrunner experiments... The Bottleneck in launching was I/O in loading all the .sos
- spindle, and burst buffer reduce this, but still
- Still going through function pointers, no additional inlining.
- can do this today.
- Still different than STATIC (sharing this image across process), just not calling dlopen that many times.
- New proposal is to have a 3rd option where component decides it's default is to be slurped into libmpi
- It's nice to have fabric provider's not bring their dependencies into libmpi so that the main libmpi can be run on nodes that may not have the provider's dependencies installed.
- Low priority thing anyway, if we get it in for v5.0 it'd be nice, but not critical.
- George and Jeff are leading
- No new updates this week (see last week)