Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exodus inside MPI fails #174

Open
PeriHub opened this issue Feb 20, 2024 · 6 comments
Open

Exodus inside MPI fails #174

PeriHub opened this issue Feb 20, 2024 · 6 comments

Comments

@PeriHub
Copy link

PeriHub commented Feb 20, 2024

Hi there, if I run this MPI example and use the Exodus Package, MPI is crashing

using MPI
using Exodus

MPI.Init()

comm = MPI.COMM_WORLD
println("Hello world, I am $(MPI.Comm_rank(comm)) of $(MPI.Comm_size(comm))")
MPI.Barrier(comm)

mpiexec -n 3 julia --project script.jl

Error:

--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  opal_shmem_base_select failed
  --> Returned value -1 instead of OPAL_SUCCESS
--------------------------------------------------------------------------

I'm pretty sure that the package used to work with MPI, but I also tried older releases of the package.
Maybe I'm missing something. How can I keep using Exodus.

@cmhamel
Copy link
Owner

cmhamel commented Feb 20, 2024

Can you let me know the version numbers of MPI, Exodus, julia, NetCDF_jll and HDF5_jll that'd being used in this example?

@PeriHub
Copy link
Author

PeriHub commented Feb 20, 2024

  • MPI v0.20.19
  • Exodus v0.11.1
  • julia v1.10.1
  • NetCDF_jll v400.902.209+0
  • HDF5_jll 1.14.3+1

And I forgot to mention, if I put using Exodus below MPI.Init() it don't crash.
Thank you for the quick response!

@cmhamel
Copy link
Owner

cmhamel commented Feb 20, 2024

There could be conflicting mpi versions then potentially. Have you tried running the example with mpiexecjl rather than mpiexec? See the MPI.jl docs if you're not sire what I mean.

@JTHesse
Copy link

JTHesse commented Feb 20, 2024

Yes you could be right, mpiexecjl is indeed working, but unfortunately we can't use it on our HPC. Is there maybe another solution?

@cmhamel
Copy link
Owner

cmhamel commented Feb 20, 2024

I'm no MPI expert but I think the solution is to rebuild Exodus_jll.jl locally built against your system MPI. This will likely involve modifying the build_tarballs.jl file found for Exodus in Yggdrasil https://github.com/JuliaPackaging/Yggdrasil

I don't think building jll packages with BinaryBuilder with MPI is well documented, but there are examples that can be followed, such as Trilinos or HDF5.

Alternatively, you can bypass Exodus_jll all together and build exodus locally for the SEACAS github page and then link things appropriately in a fork of Exodus.jl.

I'm not sure if there's a better solution.

@JTHesse
Copy link

JTHesse commented Feb 20, 2024

Thank you I will look into that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants