You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The new test for #2016, #2018, #1921, and #95, api.static_signal, is
failing once every hundred or so runs. The ctest output is a rank order
violation:
pre-DR stop
<rank order violation report_buf_lock(mutex)@/work/dr/git/src/core/utils.c:2053 acquired after memory_info_buf_lock(mutex)@/work/dr/git/src/core/unix/memquery_linux.c:71 in tid:7cbe>
<end of output>
Trying to pause there does not work: that thread is killed immediately for
unknown reasons.
Running -no_deadlock_avoidance hits an ASSERT(dynamo_exited) in
build_bb_ilist: so the dcontext is NULL. Adding some detach output
indicates that this is after entering detach_on_permanent_stack() but
before its synchall finishes.
The 2nd thread in this app shouldn't be making new bbs at this point.
Could it be bb creation for xl8 during the synch? But why is it missing a
dcontext? And why does it die?
Finally got some data by unlimiting cores:
in dr_client_main
Sending SIGUSR1 pre-DR-start
Got SIGUSR1
pre-DR start
Sending SIGUSR1 under DR
Got SIGUSR1
pre-DR stop
<Detaching from application /work/dr/git/build_x64_dbg_tests/suite/tests/bin/api.static_signal (30151)>
<rank order violation report_buf_lock(mutex)@/work/dr/git/src/core/utils.c:2053 acquired after memory_info_buf_lock(mutex)@/work/dr/git/src/core/unix/memquery_linux.c:71 in tid:75c8>
Segmentation fault (core dumped)
Core was generated by `suite/tests/bin/api.static_signal'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
180 pop REG_XBX
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.21-13.fc22.x86_64
(gdb) p 0x75c8
$1 = 30152
(gdb) thread apply all bt
Thread 2 (Thread 0x7f03ddead700 (LWP 30151)):
#0 syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
#1 0x000000000080c6d0 in ?? ()
#2 0x00000000006ebacf in ksynch_wait (futex=0x4dfb90dc, mustbe=0) at /work/dr/git/src/core/unix/ksynch_linux.c:120
#3 0x00000000006b8753 in os_thread_suspend (tr=0x4def0fa0) at /work/dr/git/src/core/unix/os.c:3173
#4 0x00000000005e4a79 in synch_with_thread (id=30152, block=false, hold_initexit_lock=true, caller_state=THREAD_SYNCH_NONE,
desired_state=THREAD_SYNCH_SUSPENDED_VALID_MCONTEXT, flags=2) at /work/dr/git/src/core/synch.c:926
#5 0x00000000005e636a in synch_with_all_threads (desired_synch_state=THREAD_SYNCH_SUSPENDED_VALID_MCONTEXT,
threads_out=0x7ffe20220f60, num_threads_out=0x7ffe20220f5c, cur_state=THREAD_SYNCH_NO_LOCKS_NO_XFER, flags=2)
at /work/dr/git/src/core/synch.c:1326
#6 0x00000000005e84e2 in detach_on_permanent_stack (internal=true, do_cleanup=true) at /work/dr/git/src/core/synch.c:1912
#7 0x000000000047401e in dr_app_stop_and_cleanup () at /work/dr/git/src/core/dynamo.c:2647
#8 0x000000000040c0a2 in main (argc=1, argv=0x7ffe202210c8) at /work/dr/git/src/suite/tests/api/static_signal.c:166
Thread 1 (Thread 0x7f03dd1cf700 (LWP 30152)):
#0 syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
#1 0x0000000000000006 in ?? ()
#2 0x00000000006ebacf in ksynch_wait (futex=0x7b74a0 <memory_info_buf_lock>, mustbe=1)
at /work/dr/git/src/core/unix/ksynch_linux.c:120
#3 0x00000000006c7c4b in mutex_wait_contended_lock (lock=0x7b74a0 <memory_info_buf_lock>) at /work/dr/git/src/core/unix/os.c:8861
#4 0x00000000004fa79d in mutex_lock (lock=0x7b74a0 <memory_info_buf_lock>) at /work/dr/git/src/core/utils.c:888
#5 0x00000000006e2ed3 in memquery_iterator_start (iter=0x4dfe2430, start=0x4dfe2000 '\253' <repeats 200 times>..., may_alloc=false)
at /work/dr/git/src/core/unix/memquery_linux.c:139
#6 0x00000000006e37df in memquery_from_os (pc=0x4dfe2000 '\253' <repeats 200 times>..., info=0x4dfe2550, have_type=0x4dfe251e)
at /work/dr/git/src/core/unix/memquery_linux.c:328
#7 0x00000000006c7a99 in query_memory_ex_from_os (pc=0x4dfe2000 '\253' <repeats 200 times>..., info=0x4dfe2550)
at /work/dr/git/src/core/unix/os.c:8775
#8 0x00000000006c79f6 in get_memory_info (pc=0x4dfe2000 '\253' <repeats 200 times>..., base_pc=0x0, size=0x0, prot=0x4dfe25a4)
at /work/dr/git/src/core/unix/os.c:8752
#9 0x00000000006cf8ad in copy_frame_to_stack (dcontext=0x4df93980, sig=12, frame=0x4dfe2738, sp=0x4dfe2738 "\357\260i",
from_pending=false) at /work/dr/git/src/core/unix/signal.c:2682
#10 0x00000000006dc213 in sig_detach (dcontext=0x4df93980, frame=0x4dfe2738, detached=0x4dfb90f8)
at /work/dr/git/src/core/unix/signal.c:6217
#11 0x00000000006dc848 in handle_suspend_signal (dcontext=0x4df93980, ucxt=0x4dfe2740, frame=0x4dfe2738)
at /work/dr/git/src/core/unix/signal.c:6352
#12 0x00000000006d5747 in master_signal_handler_C (sig=12, siginfo=0x4dfe2870, ucxt=0x4dfe2740, xsp=0x4dfe2738 "\357\260i")
at /work/dr/git/src/core/unix/signal.c:4444
(gdb) info thread
Id Target Id Frame
2 Thread 0x7f03ddead700 (LWP 30151) syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
+ 1 Thread 0x7f03dd1cf700 (LWP 30152) syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
(gdb) info reg
rax 0xfffffffffffffffc -4
rbp 0x4dfe2340 0x4dfe2340
rsp 0x4dfe2300 0x4dfe2300
(gdb) x/10i $rip-5
0x6f3312 <syscall_ready>: mov %rcx,%r10
0x6f3315 <syscall_ready+3>: syscall
=> 0x6f3317 <syscall_ready+5>: pop %rbx
0x6f3318 <syscall_ready+6>: retq
So SYS_futex gets interrupted (-4 == EINTR) and then crashes b/c its stack
is messed up?!?
Why is memory_info_buf_lock going down a contended path?
Why is Thread 1 running sig_detach() already, when Thread 2 is just trying
to suspend it??
The state doesn't make sense: if 1 was somehow already in the suspend loop
in handle_suspend_signal() when 2 asked to suspend it here, and it made its
way to the sig_detach, ostd_resumed should now be 1, not 0. Plus,
suspend_count should prevent that from happening.
It cannot be two detaches: it would have asserted.
Theory: if synch fails due to bad xl8, it will try again, and the resume
there will then check doing_detach and call sig_detach. The bug is using
the master flag set up front: it has to use a local flag set after the
synch suspends are finished.
The text was updated successfully, but these errors were encountered:
The new test for #2016, #2018, #1921, and #95, api.static_signal, is
failing once every hundred or so runs. The ctest output is a rank order
violation:
Trying to pause there does not work: that thread is killed immediately for
unknown reasons.
Running -no_deadlock_avoidance hits an ASSERT(dynamo_exited) in
build_bb_ilist: so the dcontext is NULL. Adding some detach output
indicates that this is after entering detach_on_permanent_stack() but
before its synchall finishes.
The 2nd thread in this app shouldn't be making new bbs at this point.
Could it be bb creation for xl8 during the synch? But why is it missing a
dcontext? And why does it die?
Finally got some data by unlimiting cores:
So SYS_futex gets interrupted (-4 == EINTR) and then crashes b/c its stack
is messed up?!?
Why is memory_info_buf_lock going down a contended path?
Why is Thread 1 running sig_detach() already, when Thread 2 is just trying
to suspend it??
The state doesn't make sense: if 1 was somehow already in the suspend loop
in handle_suspend_signal() when 2 asked to suspend it here, and it made its
way to the sig_detach, ostd_resumed should now be 1, not 0. Plus,
suspend_count should prevent that from happening.
The suspend signal interrupted another EINTR syscall:
It cannot be two detaches: it would have asserted.
Theory: if synch fails due to bad xl8, it will try again, and the resume
there will then check doing_detach and call sig_detach. The bug is using
the master flag set up front: it has to use a local flag set after the
synch suspends are finished.
The text was updated successfully, but these errors were encountered: