You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running a ULT job with 1024 PEs and 16 nodes with 8 ABT threads SOS fails to initialize the transport endpoint.
e.g. isx_micro
This is the warning:
[0132] WARN: transport_ofi.c:621: bind_enable_ep_resources
[0132] fi_enable on endpoint failed
[0132] WARN: transport_ofi.c:1430: shmem_transport_ofi_ctx_init
[0132] context bind/enable endpoint failed (No space left on device)
When running a ULT job with 1024 PEs and 16 nodes with 8 ABT threads SOS fails to initialize the transport endpoint.
e.g. isx_micro
This is the warning:
[0132] WARN: transport_ofi.c:621: bind_enable_ep_resources
[0132] fi_enable on endpoint failed
[0132] WARN: transport_ofi.c:1430: shmem_transport_ofi_ctx_init
[0132] context bind/enable endpoint failed (No space left on device)
The job hangs afterwords.
Parameters:
PMI_MAX_KVS_ENTRIES=10000000
SHMEM_SYMMETRIC_SIZE=6G
SHMEM_ADAPTIVE_THREAD_SCHEDULE=1
FI_PROVIDER=cxi
The text was updated successfully, but these errors were encountered: