-
Notifications
You must be signed in to change notification settings - Fork 860
WeeklyTelcon_20220920
- Dialup Info: (Do not post to public mailing list or public wiki)
- Brendan Cunningham (Cornelis Networks)
- Christoph Niethammer (HLRS)
- David Bernhold (ORNL)
- Geoffrey Paulsen (IBM)
- Harumi Kuno (HPE)
- Howard Pritchard (LANL)
- Jeff Squyres (Cisco)
- Joseph Schuchart
- Josh Fisher (Cornelis Networks)
- Thomas Naughton (ORNL)
- Todd Kordenbrock (Sandia)
- Tommy Janjusic (nVidia)
- William Zhang (AWS)
- Akshay Venkatesh (NVIDIA)
- Artem Polyakov (nVidia)
- Aurelien Bouteiller (UTK)
- Austen Lauria (IBM)
- Brandon Yates (Intel)
- Brian Barrett (AWS)
- Charles Shereda (LLNL)
- Edgar Gabriel (UoH)
- Erik Zeiske
- George Bosilca (UTK)
- Hessam Mirsadeghi (UCX/nVidia)
- Jan (Sandia -ULT support in Open MPI)
- Jingyin Tang
- Josh Hursey (IBM)
- Marisa Roman (Cornelius)
- Mark Allen (IBM)
- Matias Cabral (Intel)
- Matthew Dosanjh (Sandia)
- Michael Heinz (Cornelis Networks)
- Nathan Hjelm (Google)
- Noah Evans (Sandia)
- Raghu Raja (AWS)
- Ralph Castain (Intel)
- Sam Gutierrez (LLNL)10513
- Scott Breyer (Sandia?)
- Shintaro iwasaki
- Xin Zhao (nVidia)
- Multiple weeks on CVE from nvidia.
- v4.1.5
- Schedule: targeting ~6 mon (Targeting October)
- No driver on schedule yet.
- 10583 - Potential CVE from 4 years ago issue in libevent.. but might not need to do anything.
- Updated one company reported scanner didn't report anything.
- Waiting on confirmation that patches to remove dead was enough.
-
An RC this week.
-
Discuss
MCA
https://github.com/open-mpi/ompi/pull/10793- When you pass mca parameter to PRTERUN, it has to figure out which MCA system it's going to.
- If you want to be sure, just say -omca,-prtemca,-pmximca
- Jeff and Briant came up with a solution, they're working on.
- When you pass mca parameter to PRTERUN, it has to figure out which MCA system it's going to.
-
Is this related to submodule?
-
Unrelated to -MCA, we share a lot of replicated M4 code between OMPI, PMIX, PRRTE.
- They have diverged in radical and subtle ways.
-
Last week, added another submodule pointer to OMPI
-
Took handful of M4 macros and combined them there.
-
More consolidation there over time.
-
Most part this is behind the scene, but will need to git submodule init.
-
Purpose is it'll just be M4 files.
-
--mca
is how we've set OMPI mca parameters in Open MPI- Could PRRTE just "do the right thing" for
--mca
- Agree
--mca
is Open MPI specific options. - when pprte and pmix split off they prefixed.
- They don't have ownership over MCA.
- End of the day our docs can't change bec
- Could PRRTE just "do the right thing" for
-
-
10779 OPAL "core" library for internal usage
- NEED to see if it made it's way to v5
- Approach to seperate out pieces of OPAL for core and top
- All internal things, not exposed to user
- Brian and George worked on it, and then Josh picked it up and PRed 10779
- Still in Draft because he wants to resolve any high level issues
- As far as code layout, could move some things around, but if we do this too much, worried about dropping history...
- We'd have hundreds or thousands of
-
Discuss
mca_base_env_list
https://github.com/open-mpi/ompi/pull/10788- Did google around, and this is documented https://oar.imag.fr/wiki:passing_environment_variables_to_openmpi_nodes
- Mentions that
-x
is deprecated?
- Mentions that
- Easy to fix Mellanox CI, but SHOULD we?
- Lets remove the test, and add it to an Issue 10698.
- Did google around, and this is documented https://oar.imag.fr/wiki:passing_environment_variables_to_openmpi_nodes
-
Discuss Remaining PRRTE CLI issues (https://github.com/open-mpi/ompi/issues/10698)
-
-N
document an error if they try to error if--map-by
conflict. -
--show-progress
- do the little...
on terminal to display, now it doesn't do anything.- DOE may set this by default in MCA parameters (makes some users feel happy)
-
--display-topo
Generally we've tried to be backwards compatible. -
-v
version -
-V
verbose -
-s|--preload-binary
<- functionally it works, but with-n
gets messed up - rankfile <- NOT deprecating
- --mca is Open MPI's framework
- No gprtemca. Created by PRRTE, but do we continue to support --gpmixmca?
- --test-suicide and others all prrtedameon not exposed to the users.
- passed to prrte launcher
-
-
Posted Issue Open-MPI #10698 with about 13 issue, that will need
-
No longer trust the verbage here, based on Ralph's comment
- Not recognized from mpirun, but sited in --help.
- Some of these aren't possible??? and mpirun -> prterun (one shot thing)
-
Should mpirun be able to talk to an existing dvm???
- Or is it always a 1 shot thing?
- If we have it talk to an existing DVM,
- prte to startup prteds, and pruns at that.
- If you're using MPI front-end, and want to interact with DVM, how should we tell users to do that?
- What should they do?
- Go through mpirun, or go through prun (with ompi personality?)
- Thomas can look and see if you can get everything you need.
- There were some common things that were difficult when switching between the two.
- Was there an option for this in v4.1?
- Yes, but perhaps wasn't working much.
- Are there legacy command line options that we should support or alias?
-
Are we dropping DVM support for v5?
- How did this work in v4?
- Howard thought you fired up an orte something, and that would provide a command line
- Couldn't do all of this with mpirun, it was a two stage process.
- Had to start DVM manually, and got back a URI
- But thought if you sourced this scziso and gave it a URI, it would do all of the right things.
- Could add support if the user fired up using PRTE the DVM, and got URI back.
- Don't have ompi-dvm executable in v5, so this is already a deviation.
- What do we do?
- support same CLI options (and executables, etc as documented for v4.x
- Don't support at all in v5, and if you want to do DVM things
- Maybe something in the middle.
- Does anyone care about DVM?
- Can we run ompi_scizo / personality with vanilla PRUN?
- Some people on call DO care about DVM.
- Early days of Sesions needed DVM run (no longer needed in main/v5)
-
Usually if customers are interested in doing this, they're willing do to a bit more work.
- But if we want to get v5.0.0 out in near future, it'd be more likely if we
- Thomas gets a lot of use with mini-task, some are MPI parallel.
- This is where DVM is useful because slamming lots of serial and parallel jobs in a short time.
- If they can do this via prun to get ompi_schziso doesn't matter the path.
- Thomas will investigate proper options.
- Could do a CLI interface for mpirun in a future version to have mpirun not call prterun
- Don't want to rush this.
-
Schedule:
- PMIx and PRRTE changes coming at end of August.
- PMIx v3.2 released.
- Try to have bugfixes PRed end of August, to give time to iterate and merged.
- Still using Critical v5.0.x Issues (https://github.com/open-mpi/ompi/projects/3) yesterday
- PMIx and PRRTE changes coming at end of August.
-
Docs
-
mpirun --help
is OUT OF DATE.- Have to do this relatively quickly, before PRRTE releases.
- Austen, Geoff and Tomi will be
- REASON for this, is because mpirun command line is in PRRTE.
-
-
mpirun manpage needs to be re-written.
- Docs are online and can be updates asyncronously.
- Jeff posted PR to document runpath vs rpath
- Our configure checks some linker flags, but there might be default in linker or in system that really governs what happens.
-
Symbol Pollution - Need an issue posted.
- OPAL_DECLSPEC - Do we have docs on this?
- No. Intent is where do you want a symbol available?
- Outside of your library, then use OPAL_DECLSPEC (like Windows DECLSPEC)
- I want you to export this symbol.
- No. Intent is where do you want a symbol available?
- need to clean up as much as possible.
- Open-MPI community's perspective, our ABI is just MPI_Symbols
- Still unfortunate. We need to clean up as much as possible.
- OPAL_DECLSPEC - Do we have docs on this?
- Case of QThreds, where they need a recursive lock.
- A configury problem was fixed.
- Just working on getting it ready for OMPI.
- converting structures to OPAL objects.
- Also adding libcuda linking (instead of DLOPEN)
- William will test JEff's PR [10763?] this week.
- In Jeffs Roll Up the docs
- Called out accelerator and show-load-errors
- Not sure what distros will want to do, since some of these accelerator are not open
- Packager building Open MPI,
- Example: say only 20% of nodes have accelerators, only installed libraries on those nodes.
- Problem why everything today is dlopened...
- Get scary warnings to fail to open components on some nodes.
- If you build accelerator components by default, they'll be part of libmpi.so
- But if you know you have hetrogenous in software/hardware (only accelerators on 20%)
- Build accelerator components as so components.
- Can still run, but don't want scary warnings.
- Packager build accelerator components as SOs.
- Put SOs in sub package of Open MPI, and only that subpackage depends on ACCELERATOR LIBs
- WONT get scary message since SOs only on nodes that have these libs
- Switching to builtin atomics,
- 10613 - Prefered PR. GCC / Clang should have that.
- Next step would be to refactor the atomics for post v5.0.
- Waiting on Brian's review and CI fixes.
- Joseph will post some additional info thing in the ticket
- Jenkins is currently messed up. Brian is looking at it.
- New PRs will be stuck for a while.
- We're probably not getting together in person anytime soon.
- So we'll send around a doodle to have time to talk about our rules.
- Reflect the way we worked several years ago, but not really right now.
- we're to review the admin steering committee in July (per our rules):
- we're to review the technical steering committee in July (per our rules):
- We should also review all the OMPI github, slack, and coverity members during the month of July.
- Jeff will kick that off sometime this week or next week.
- In the call we mentioned this, but no real discussion.
- Wiki for face to face: https://github.com/open-mpi/ompi/wiki/Meeting-2022
- Might be better to do a half-day/day-long virtual working session.
- Due to company's travel policies, and convenience.
- Could do administrative tasks here too.
- Might be better to do a half-day/day-long virtual working session.