We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
https://github.com/Sandia-OpenSHMEM/SOS/actions/runs/10924530684/job/30323799111?pr=1146
The occurrence seems intermittent and rare.
FAIL: shmem_ctx =============== [0001] DEBUG: ../../src/init.c:376: shmem_internal_heap_postinit [0001] Thread level=MULTIPLE, Num. PEs=2 [0001] Sym. heap=0x5555c0000000 len=537919488 -- data=0x555555558000 len=104 [0001] DEBUG: ../../src/init.c:457: shmem_internal_heap_postinit [0001] Affinity to 4 processor cores: { 0 1 2 3 } Sandia OpenSHMEM 1.5.3rc1 SHMEM_INFO 1 (type: bool, default: 0) Print library information message at startup SHMEM_VERSION 0 (type: bool, default: 0) Print library version at startup SHMEM_DEBUG 1 (type: bool, default: 0) Enable debugging messages SHMEM_SYMMETRIC_SIZE 536870912 (type: size, default: 536870912) Symmetric heap size Additional options: make[6]: *** [Makefile:1180: test-suite.log] Error 1 make[5]: *** [Makefile:1288: check-TESTS] Error 2 make[4]: *** [Makefile:1495: check-am] Error 2 make[3]: *** [Makefile:469: check-recursive] Error 1 SHMEM_SYMMETRIC_HEAP_USE_HUGE_PAGES 0 (type: bool, default: 0) Use Linux huge pages for symmetric heap SHMEM_SYMMETRIC_HEAP_PAGE_SIZE 2097152 (type: size, default: 2097152) Page size to use for huge pages SHMEM_SYMMETRIC_HEAP_USE_MALLOC 0 (type: bool, default: 0) Allocate the symmetric heap using malloc SHMEM_BOUNCE_SIZE 0 (type: size, default: 2048) Maximum message size to bounce buffer SHMEM_MAX_BOUNCE_BUFFERS 128 (type: long, default: 128) Maximum number of bounce buffers per context SHMEM_TRAP_ON_ABORT 0 (type: bool, default: 0) Generate trap if the program aborts or calls shmem_global_exit SHMEM_TEAMS_MAX 10 (type: long, default: 10) Maximum number of teams per PE SHMEM_TEAM_SHARED_ONLY_SELF 0 (type: bool, default: 0) Include only the self PE in SHMEM_TEAM_SHARED SHMEM_BACKTRACE (type: string, default: ) Specify the mechanism to use for backtracing on failure Collectives options: SHMEM_COLL_CROSSOVER 4 (type: long, default: 4) Crossover between linear and tree collectives (num. PEs) SHMEM_COLL_SIZE_CROSSOVER 16384 (type: size, default: 16384) Crossover between latency and bandwidth optimized collectives (msg. size) SHMEM_COLL_RADIX 4 (type: long, default: 4) Radix for tree-based collectives SHMEM_BARRIER_ALGORITHM auto (type: string, default: auto) Algorithm for barrier. Options are auto, linear, tree, dissem SHMEM_BCAST_ALGORITHM auto (type: string, default: auto) Algorithm for broadcast. Options are auto, linear, tree SHMEM_REDUCE_ALGORITHM auto (type: string, default: auto) Algorithm for reductions. Options are auto, linear, tree, recdbl SHMEM_COLLECT_ALGORITHM auto (type: string, default: auto) Algorithm for collect. Options are auto, linear SHMEM_FCOLLECT_ALGORITHM auto (type: string, default: auto) Algorithm for fcollect. Options are auto, linear, ring, recdbl SHMEM_BARRIERS_FLUSH 0 (type: bool, default: 0) Flush stdout and stderr on barrier Network transport: UCX SHMEM_PROGRESS_INTERVAL 1000 (type: long, default: 1000) Polling interval for progress thread in microseconds (0 to disable) On-node transport: Linux CMA SHMEM_CMA_PUT_MAX 8192 (type: size, default: 8192) Size below which to use CMA for puts SHMEM_CMA_GET_MAX 16384 (type: size, default: 16384) Size below which to use CMA for gets Build information: Git Version v1.5.3rc1-2-gb4c42c16 (HEAD) Configure Args '--prefix=/home/runner/work/SOS/SOS/install/sos' '--with-ucx=/home/runner/work/SOS/SOS/install/ucx' '--with-cma' '--enable-error-checking' '--enable-profiling' '--enable-pmi-simple' '--disable-fortran' '--with-hwloc=no' Build Date Wed Sep 18 14:45:16 UTC 2024 Build CC gcc Build CFLAGS -std=gnu11 -g -O2 -Wall -rdynamic -fvisibility=hidden [0000] DEBUG: ../../src/init.c:376: shmem_internal_heap_postinit [0000] Thread level=MULTIPLE, Num. PEs=2 [0000] Sym. heap=0x5555c0000000 len=537919488 -- data=0x555555558000 len=104 [0000] DEBUG: ../../src/init.c:457: shmem_internal_heap_postinit [0000] Affinity to 4 processor cores: { 0 1 2 3 } [1726670878.702695] [fv-az565-923:63824:0] parser.c:1626 UCX WARN unused env variable: UCX_INSTALL_DIR (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) [1726670878.702702] [fv-az565-923:63825:0] parser.c:1626 UCX WARN unused env variable: UCX_INSTALL_DIR (set UCX_WARN_UNUSED_ENV_VARS=n to suppress this warning) [0000] DEBUG: ../../src/transport_ucx.c:130: shmem_transport_init [0000] UCX thread mode 2, requested 2 [0001] DEBUG: ../../src/transport_ucx.c:130: shmem_transport_init [0001] UCX thread mode 2, requested 2 [0000] DEBUG: ../../src/init.c:483: shmem_internal_heap_postinit [0000] Local rank=0, Num. local=1, Shr. rank=0, Num. shr=2 [0001] DEBUG: ../../src/init.c:483: shmem_internal_heap_postinit [0001] Local rank=0, Num. local=1, Shr. rank=1, Num. shr=2 [0000] DEBUG: ../../src/shmem_team.c:139: shmem_internal_team_init [0000] SHMEM_TEAM_SHARED: start=0, stride=1, size=1 [0000] DEBUG: ../../src/shmem_team.c:167: shmem_internal_team_init [0000] SHMEMX_TEAM_NODE: start=0, stride=1, size=2 [0001] DEBUG: ../../src/shmem_team.c:139: shmem_internal_team_init [0001] SHMEM_TEAM_SHARED: start=1, stride=1, size=1 [0001] DEBUG: ../../src/shmem_team.c:167: shmem_internal_team_init [0001] SHMEMX_TEAM_NODE: start=0, stride=1, size=2 [fv-az565-923:63825:0:63825] Caught signal 7 (Bus error: nonexistent physical address) ==== backtrace (tid: 63825) ==== 0 /home/runner/work/SOS/SOS/install/ucx/lib/libucs.so.0(ucs_handle_error+0x2a4) [0x7ffff773e394] 1 /home/runner/work/SOS/SOS/install/ucx/lib/libucs.so.0(+0x2a56f) [0x7ffff773e56f] 2 /home/runner/work/SOS/SOS/install/ucx/lib/libucs.so.0(+0x2a856) [0x7ffff773e856] 3 /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7ffff76ff420] 4 /home/runner/work/SOS/SOS/install/ucx/lib/libuct.so.0(uct_mm_ep_flush+0x11) [0x7ffff76bf8c1] 5 /home/runner/work/SOS/SOS/install/ucx/lib/libucp.so.0(+0x39f61) [0x7ffff77a6f61] 6 /home/runner/work/SOS/SOS/install/ucx/lib/libucp.so.0(ucp_ep_flush_internal+0x12d) [0x7ffff77a7f2d] 7 /home/runner/work/SOS/SOS/install/ucx/lib/libucp.so.0(ucp_ep_close_nbx+0xde) [0x7ffff778b9be] 8 /home/runner/work/SOS/SOS/install/ucx/lib/libucp.so.0(ucp_ep_close_nb+0x49) [0x7ffff778b8b9] 9 /home/runner/work/SOS/SOS/build/src/.libs/libsma.so.0(+0x4b78f) [0x7ffff7ac578f] 10 /home/runner/work/SOS/SOS/build/src/.libs/libsma.so.0(+0x320d1) [0x7ffff7aac0d1] 11 /home/runner/work/SOS/SOS/build/modules/tests-sos/test/spec-example/.libs/shmem_ctx(+0x12f3) [0x5555555552f3] 12 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7ffff783e083] 13 /home/runner/work/SOS/SOS/build/modules/tests-sos/test/spec-example/.libs/shmem_ctx(_start+0x2e) [0x55555555532e] ================================= =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = PID 63825 RUNNING AT fv-az565-923 = EXIT CODE: 135 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions FAIL shmem_ctx (exit status: 135) ============================================================================ Testsuite summary for Sandia OpenSHMEM 1.5.3rc1 ============================================================================ # TOTAL: 18 # PASS: 17 # SKIP: 0 # XFAIL: 0 # FAIL: 1 # XPASS: 0 # ERROR: 0 ============================================================================ See modules/tests-sos/test/spec-example/test-suite.log Please report to https://github.com/Sandia-OpenSHMEM/SOS/issues ============================================================================ make[6]: Leaving directory '/home/runner/work/SOS/SOS/build/modules/tests-sos/test/spec-example' make[5]: Leaving directory '/home/runner/work/SOS/SOS/build/modules/tests-sos/test/spec-example' make[4]: Leaving directory '/home/runner/work/SOS/SOS/build/modules/tests-sos/test/spec-example' make[3]: Leaving directory '/home/runner/work/SOS/SOS/build/modules/tests-sos/test' make[2]: *** [Makefile:471: check-recursive] Error 1 make[1]: *** [Makefile:469: check-recursive] Error 1 make: *** [Makefile:562: check-recursive] Error 1 make[2]: Leaving directory '/home/runner/work/SOS/SOS/build/modules/tests-sos' make[1]: Leaving directory '/home/runner/work/SOS/SOS/build/modules' Error: Process completed with exit code 2.
The text was updated successfully, but these errors were encountered:
UCX + XPMEM, also affected, see https://github.com/Sandia-OpenSHMEM/SOS/actions/runs/11333460511/job/31521960164?pr=1154
Sorry, something went wrong.
UCX + pmi-simple (no CMA) can be affected: https://github.com/Sandia-OpenSHMEM/SOS/actions/runs/11392002614/job/31697047000?pr=1156
No branches or pull requests
https://github.com/Sandia-OpenSHMEM/SOS/actions/runs/10924530684/job/30323799111?pr=1146
The occurrence seems intermittent and rare.
The text was updated successfully, but these errors were encountered: