You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I cannot get anything with more than about 128 total cores to run on TACC's Vista Grace-Hopper system, including the 3darray test problem. It's not dependent on number of nodes or cores-per-node individually: e.g. 16 nodes with 16 cores/node fails, as does 4 nodes with 64 cores/node, but 4 nodes with 16 cores/node runs. I'm using v8.0.1 with "mpi-linux-arm8". Similar outcomes for both GNU and NVIDIA compilers. Output is shown below for 4 node 64 cores/node configuration:
Running as 256 OS processes: ./hello 256
charmrun> /usr/bin/setarch aarch64 -R mpirun -np 256 ./hello 256
Charm++> Running on MPI library: Open MPI v5.0.5, package: Open MPI [email protected] Distribution, ident: 5.0.5, repo rev: v5.0.5, Jul 22, 2024 (MPI standard: 3.1)
Charm++> Level of thread support used: MPI_THREAD_SINGLE (desired: MPI_THREAD_SINGLE)
Charm++> Running in non-SMP mode: 256 processes (PEs)
Converse/Charm++ Commit ID: v8.0.1
Charm++ built without optimization.
Do not use for performance benchmarking (build with --with-production to do so).
Charm++ built with internal error checking enabled.
Do not use for performance benchmarking (build without --enable-error-checking to do so).
[i618-032:899711:0:899822] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xa00028010060)
[i618-032:899684:0:899805] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xa00028010060)
[i618-032:899704:0:899791] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xa00028010060)
==== backtrace (tid: 899791) ====
0 /opt/apps/ucx/1.17.0/lib/libucs.so.0(ucs_handle_error+0x288) [0x40003765eb98]
1 /opt/apps/ucx/1.17.0/lib/libucs.so.0(+0x2ece8) [0x40003765ece8]
2 /opt/apps/ucx/1.17.0/lib/libucs.so.0(+0x2f07c) [0x40003765f07c]
3 linux-vdso.so.1(__kernel_rt_sigreturn+0) [0x4000363707f0]
4 /opt/apps/nvidia24/openmpi/5.0.5/lib/libpmix.so.2(pmix_gds_shmem2_fetch+0xf0) [0x400037873530]
5 /opt/apps/nvidia24/openmpi/5.0.5/lib/libpmix.so.2(+0x638b8) [0x4000377538b8]
6 /opt/apps/nvidia24/openmpi/5.0.5/lib/libevent_core-2.1.so.7(+0x24244) [0x400037984244]
7 /opt/apps/nvidia24/openmpi/5.0.5/lib/libevent_core-2.1.so.7(+0x23954) [0x400037983954]
8 /opt/apps/nvidia24/openmpi/5.0.5/lib/libevent_core-2.1.so.7(event_base_loop+0x1d4) [0x40003797cf14]
9 /opt/apps/nvidia24/openmpi/5.0.5/lib/libpmix.so.2(+0xb5b2c) [0x4000377a5b2c]
10 /lib64/libc.so.6(+0x82a38) [0x400037002a38]
11 /lib64/libc.so.6(+0x2bb9c) [0x400036fabb9c]
The text was updated successfully, but these errors were encountered:
I cannot get anything with more than about 128 total cores to run on TACC's Vista Grace-Hopper system, including the 3darray test problem. It's not dependent on number of nodes or cores-per-node individually: e.g. 16 nodes with 16 cores/node fails, as does 4 nodes with 64 cores/node, but 4 nodes with 16 cores/node runs. I'm using v8.0.1 with "mpi-linux-arm8". Similar outcomes for both GNU and NVIDIA compilers. Output is shown below for 4 node 64 cores/node configuration:
The text was updated successfully, but these errors were encountered: