Make Darshan work #35

lyon-fnal · 2020-12-14T16:09:00Z

Try Darshan with Julia. It loads with LD_PRELOAD. May need to change HDF5.jl and MPI.jl so that they don't explicitly give the library name on ccall`.

The text was updated successfully, but these errors were encountered:

lyon-fnal · 2020-12-29T21:04:02Z

Problem in MPI.jl. The shared object needs to be loaded with Libdl.dlopen in MPI.__init__ because loading the .so is a runtime thing. But in src/MPI.jl close to the top there's an include("implementations.jl") That defines Get_library_version which does a ccall to MPI_Get_library_version. That would be ok, except that in that file, it calls Get_library_version(). Since this happens before MPI.__init__, the library isn't loaded and it fails. Move this call to the __init__ function and make MPI_LIBRARY_VERSION_STRING a reference. Do the same thing for Get_version too further down in implementations.jl.

In fact I think all of implementations.jl (or at least all of the calls) needs to be moved to the __init__ function.'

Actually - scratch the work with implementations.jl - all it does is get versions and implementation types. So it's fine to do the ccall with the library - that stuff should not be overridden.

lyon-fnal · 2020-12-31T17:45:38Z

Darshan works on NERSC Cori Haswell! Here's a summary of what I did...

I have my own build of Darshan 3.2.1 against cray-hdf5-parallel/1.12.0.0 and cray-mpich/7.7.10 (see here for configure and build instructions). Configure has --enable-hdf5-mod=$HDF5_DIR.
The Darshan build includes a module specification. Activate it with,

module rm darshan   # Unload the default Cori Darshan (currently an old 3.1.7)
module use $HOME/apps.cori-hsw/darshan-3.2.1/share/craype-2.x/modulefiles
module load darshan

Build my modified MPI.jl and HDF5.jl against system libraries (see above).
Run a simple script with srun --export=ALL,LD_PRELOAD=libdarshan.so julia --project tryit.jl (need to run in batch or on a Cori node with sallocate).

Things to do:

Submit PRs for MPI.jl #450 and HDF5.jl #791
Try with a more complex script
Try with the package compiler (I don't think that will make a difference)

lyon-fnal · 2021-01-24T06:45:20Z

So I have this working with HDF5.jl. I've altered energyByCal.jl to make collective reads an option. Even with it turned on, I don't see any collective reads in Darshan (MPIIO_COLL_READS is 0). Even with H5D_USE_MPIIO_COLLECTIVE true for each column. Not sure what's going on there. Seems like HDF5 is making some decision about using collective reads or not.

mpio.jl DOES do collective reads. For that, each rank reads from its own chunk.

Trying mpio2.jl where ranks will not all read from their own chunk.

lyon-fnal · 2021-01-24T07:14:39Z

mpio2.jl (which is now a nice example of collective writes) does read many chunks. And it's doing a collective read (one per rank). Not sure why energyByCal.jl does no collective reads. Maybe this doesn't really matter.

lyon-fnal self-assigned this Dec 14, 2020

lyon-fnal added the investigation label Dec 14, 2020

lyon-fnal added a commit that referenced this issue Jan 24, 2021

Changes from using Darshan for #35

f421638

lyon-fnal added a commit that referenced this issue Mar 12, 2021

Trying mpio.jl #35

a94cf87

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Darshan work #35

Make Darshan work #35

lyon-fnal commented Dec 14, 2020

lyon-fnal commented Dec 29, 2020 •

edited

Loading

lyon-fnal commented Dec 31, 2020 •

edited

Loading

lyon-fnal commented Jan 24, 2021

lyon-fnal commented Jan 24, 2021

Make Darshan work #35

Make Darshan work #35

Comments

lyon-fnal commented Dec 14, 2020

lyon-fnal commented Dec 29, 2020 • edited Loading

lyon-fnal commented Dec 31, 2020 • edited Loading

lyon-fnal commented Jan 24, 2021

lyon-fnal commented Jan 24, 2021

lyon-fnal commented Dec 29, 2020 •

edited

Loading

lyon-fnal commented Dec 31, 2020 •

edited

Loading