ASSERT dynamo_exited: on detach, thread suddenly dies #2021

derekbruening · 2016-10-13T04:01:01Z

The new test for #2016, #2018, #1921, and #95, api.static_signal, is
failing once every hundred or so runs. The ctest output is a rank order
violation:

  pre-DR stop
  <rank order violation report_buf_lock(mutex)@/work/dr/git/src/core/utils.c:2053 acquired after memory_info_buf_lock(mutex)@/work/dr/git/src/core/unix/memquery_linux.c:71 in tid:7cbe>
  <end of output>

Trying to pause there does not work: that thread is killed immediately for
unknown reasons.

Running -no_deadlock_avoidance hits an ASSERT(dynamo_exited) in
build_bb_ilist: so the dcontext is NULL. Adding some detach output
indicates that this is after entering detach_on_permanent_stack() but
before its synchall finishes.

  pre-DR stop
  <Detaching from application /work/dr/git/build_x64_dbg_tests/suite/tests/bin/api.static_signal (24005)>
  <Application /work/dr/git/build_x64_dbg_tests/suite/tests/bin/api.static_signal (24005).  Internal Error: DynamoRIO debug check failure: /work/dr/git/src/core/arch/interp.c:3271 dynamo_exited
  (Error occurred @530 frags)
  version 6.2.17086, custom build
  -no_deadlock_avoidance -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct 
  0x000000004f2ba970 0x00000000004f8041
  0x000000004f2baad0 0x0000000000666061
  0x000000004f2bad30 0x000000000066ebf1
  0x000000004f2baf10 0x00000000004e1f59
  0x000000004f2baff0 0x000000004f20eec5
  0x00007f68e25008e0 0x00007f68e2ad7430
  0x00007f68e2500f00 0x000000000040bd6e
  0x00007f68e2500f50 0x00007f68e2ace555>

The 2nd thread in this app shouldn't be making new bbs at this point.
Could it be bb creation for xl8 during the synch? But why is it missing a
dcontext? And why does it die?

Finally got some data by unlimiting cores:

in dr_client_main
Sending SIGUSR1 pre-DR-start
Got SIGUSR1
pre-DR start
Sending SIGUSR1 under DR
Got SIGUSR1
pre-DR stop
<Detaching from application /work/dr/git/build_x64_dbg_tests/suite/tests/bin/api.static_signal (30151)>
<rank order violation report_buf_lock(mutex)@/work/dr/git/src/core/utils.c:2053 acquired after memory_info_buf_lock(mutex)@/work/dr/git/src/core/unix/memquery_linux.c:71 in tid:75c8>
Segmentation fault (core dumped)

Core was generated by `suite/tests/bin/api.static_signal'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
180             pop      REG_XBX
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.21-13.fc22.x86_64

(gdb) p 0x75c8
$1 = 30152
(gdb) thread apply all bt

Thread 2 (Thread 0x7f03ddead700 (LWP 30151)):
#0  syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
#1  0x000000000080c6d0 in ?? ()
#2  0x00000000006ebacf in ksynch_wait (futex=0x4dfb90dc, mustbe=0) at /work/dr/git/src/core/unix/ksynch_linux.c:120
#3  0x00000000006b8753 in os_thread_suspend (tr=0x4def0fa0) at /work/dr/git/src/core/unix/os.c:3173
#4  0x00000000005e4a79 in synch_with_thread (id=30152, block=false, hold_initexit_lock=true, caller_state=THREAD_SYNCH_NONE, 
    desired_state=THREAD_SYNCH_SUSPENDED_VALID_MCONTEXT, flags=2) at /work/dr/git/src/core/synch.c:926
#5  0x00000000005e636a in synch_with_all_threads (desired_synch_state=THREAD_SYNCH_SUSPENDED_VALID_MCONTEXT, 
    threads_out=0x7ffe20220f60, num_threads_out=0x7ffe20220f5c, cur_state=THREAD_SYNCH_NO_LOCKS_NO_XFER, flags=2)
    at /work/dr/git/src/core/synch.c:1326
#6  0x00000000005e84e2 in detach_on_permanent_stack (internal=true, do_cleanup=true) at /work/dr/git/src/core/synch.c:1912
#7  0x000000000047401e in dr_app_stop_and_cleanup () at /work/dr/git/src/core/dynamo.c:2647
#8  0x000000000040c0a2 in main (argc=1, argv=0x7ffe202210c8) at /work/dr/git/src/suite/tests/api/static_signal.c:166

Thread 1 (Thread 0x7f03dd1cf700 (LWP 30152)):
#0  syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
#1  0x0000000000000006 in ?? ()
#2  0x00000000006ebacf in ksynch_wait (futex=0x7b74a0 <memory_info_buf_lock>, mustbe=1)
    at /work/dr/git/src/core/unix/ksynch_linux.c:120
#3  0x00000000006c7c4b in mutex_wait_contended_lock (lock=0x7b74a0 <memory_info_buf_lock>) at /work/dr/git/src/core/unix/os.c:8861
#4  0x00000000004fa79d in mutex_lock (lock=0x7b74a0 <memory_info_buf_lock>) at /work/dr/git/src/core/utils.c:888
#5  0x00000000006e2ed3 in memquery_iterator_start (iter=0x4dfe2430, start=0x4dfe2000 '\253' <repeats 200 times>..., may_alloc=false)
    at /work/dr/git/src/core/unix/memquery_linux.c:139
#6  0x00000000006e37df in memquery_from_os (pc=0x4dfe2000 '\253' <repeats 200 times>..., info=0x4dfe2550, have_type=0x4dfe251e)
    at /work/dr/git/src/core/unix/memquery_linux.c:328
#7  0x00000000006c7a99 in query_memory_ex_from_os (pc=0x4dfe2000 '\253' <repeats 200 times>..., info=0x4dfe2550)
    at /work/dr/git/src/core/unix/os.c:8775
#8  0x00000000006c79f6 in get_memory_info (pc=0x4dfe2000 '\253' <repeats 200 times>..., base_pc=0x0, size=0x0, prot=0x4dfe25a4)
    at /work/dr/git/src/core/unix/os.c:8752
#9  0x00000000006cf8ad in copy_frame_to_stack (dcontext=0x4df93980, sig=12, frame=0x4dfe2738, sp=0x4dfe2738 "\357\260i", 
    from_pending=false) at /work/dr/git/src/core/unix/signal.c:2682
#10 0x00000000006dc213 in sig_detach (dcontext=0x4df93980, frame=0x4dfe2738, detached=0x4dfb90f8)
    at /work/dr/git/src/core/unix/signal.c:6217
#11 0x00000000006dc848 in handle_suspend_signal (dcontext=0x4df93980, ucxt=0x4dfe2740, frame=0x4dfe2738)
    at /work/dr/git/src/core/unix/signal.c:6352
#12 0x00000000006d5747 in master_signal_handler_C (sig=12, siginfo=0x4dfe2870, ucxt=0x4dfe2740, xsp=0x4dfe2738 "\357\260i")
    at /work/dr/git/src/core/unix/signal.c:4444
(gdb) info thread
  Id   Target Id         Frame 
  2    Thread 0x7f03ddead700 (LWP 30151) syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
+ 1    Thread 0x7f03dd1cf700 (LWP 30152) syscall_ready () at /work/dr/git/src/core/arch/x86/x86_shared.asm:180
(gdb) info reg
rax            0xfffffffffffffffc       -4
rbp            0x4dfe2340       0x4dfe2340
rsp            0x4dfe2300       0x4dfe2300
(gdb) x/10i $rip-5
   0x6f3312 <syscall_ready>:    mov    %rcx,%r10
   0x6f3315 <syscall_ready+3>:  syscall 
=> 0x6f3317 <syscall_ready+5>:  pop    %rbx
   0x6f3318 <syscall_ready+6>:  retq

So SYS_futex gets interrupted (-4 == EINTR) and then crashes b/c its stack
is messed up?!?

Why is memory_info_buf_lock going down a contended path?

(gdb) p *lock
$2 = {
  lock_requests = 1, 
  contended_event = -1, 
  name = 0x77d248 "memory_info_buf_lock(mutex)@/work/dr/git/src/core/unix/memquery_linux.c:71", 
  rank = 86, 
  owner = 30152,

Why is Thread 1 running sig_detach() already, when Thread 2 is just trying
to suspend it??

The state doesn't make sense: if 1 was somehow already in the suspend loop
in handle_suspend_signal() when 2 asked to suspend it here, and it made its
way to the sig_detach, ostd_resumed should now be 1, not 0. Plus,
suspend_count should prevent that from happening.

(gdb) p /x *ostd
$5 = {
  stack_base = 0x0, 
  stack_top = 0x0, 
  suspend_lock = {
    lock_requests = 0xffffffff, 
    contended_event = 0xffffffff, 
    name = 0x774be8, 
    rank = 0x3d, 
    owner = 0x0, 
    owning_dcontext = 0x0, 
    prev_owned_lock = 0x0, 
    count_times_acquired = 0x29, 
    count_times_contended = 0x0, 
    count_times_spin_pause = 0x0, 
    max_contended_requests = 0x0, 
    count_times_spin_only = 0x0, 
    prev_process_lock = 0x7bb180, 
    next_process_lock = 0x7baa40, 
    callstack = {0x0, 0x0, 0x0, 0x0}, 
    app_lock = 0x0, 
    deleted = 0x0
  }, 
  suspend_count = 0x1, 
  fork_threads = 0x0, 
  fork_num_threads = 0x0, 
  suspended = 0x0, 
  wakeup = 0x0, 
  resumed = 0x0, 
  suspended_sigcxt = 0x0, 
  terminate = 0x0, 
  terminated = 0x0, 
  detached = 0x0, 
  retakeover = 0x0, 
  processing_signal = 0x0, 
  priv_lib_tls_base = 0x4dfa6700, 
  priv_alt_tls_base = 0x4dfa1000, 
  dr_tls_base = 0x4dfa1000, 
  app_thread_areas = 0x4dfb97a8
}

The suspend signal interrupted another EINTR syscall:

(gdb) p /x *ucxt
$3 = {
  uc_flags = 0x1, 
  uc_link = 0x0, 
  uc_stack = {
    ss_sp = 0x4dfe1000, 
    ss_flags = 0x1, 
    ss_size = 0xe000
  }, 
  uc_mcontext = {
    r8 = 0x0, 
    r9 = 0x0, 
    r10 = 0x0, 
    r11 = 0x202, 
    r12 = 0x7ffe20220f3f, 
    r13 = 0x7f03dd1cf700, 
    r14 = 0x800000, 
    r15 = 0x0, 
    rdi = 0x7b74a0, 
    rsi = 0x0, 
    rbp = 0x4dfe2d40, 
    rbx = 0x6, 
    rdx = 0x1, 
    rax = 0xfffffffffffffffc, 
    rcx = 0x6f3317, 
    rsp = 0x4dfe2d00, 
    rip = 0x6f3317, 
    eflags = 0x202, 
    cs = 0x33, 
    gs = 0x0, 
    fs = 0x0, 
    __pad0 = 0xabab, 
    err = 0x0, 
    trapno = 0x0, 
    oldmask = 0xfffffffffffbf2ff, 
    cr2 = 0x0, 
    fpstate = 0x4dfe2900, 
    reserved1 = {0xabababababababab, 0xabababababababab, 0xabababababababab, 0xabababababababab, 0xabababababababab, 
      0xabababababababab, 0xabababababababab, 0xabababababababab}
  }, 
  uc_sigmask = {
    sig = {0xfffffffe7ffbfeff}
  }
}
(gdb) x/4i ucxt->uc_mcontext.rip
   0x6f3317 <syscall_ready+5>:  pop    %rbx
   0x6f3318 <syscall_ready+6>:  retq   
   0x6f3319 <dr_fpu_exception_init>:    fninit 
   0x6f331b <dr_fpu_exception_init+2>:  pushq  $0x1f80
(gdb) x/200wx 0x4dfe2d00
0x4dfe2d00:     0x00000006      0x00000000      0x006ebacf      0x00000000
0x4dfe2d10:     0x00000000      0x00000000      0x00000000      0x00000000
0x4dfe2d20:     0x4dfe2d40      0x00000001      0x007b74a0      0x00000000
0x4dfe2d30:     0x00000000      0xffffffff      0x00000006      0x00000002
0x4dfe2d40:     0x4dfe2d80      0x00000000      0x006c7c4b      0x00000000
0x4dfe2d50:     0x4dfe2d01      0x00000000      0x007b74a0      0x00000000
0x4dfe2d60:     0xabababab      0xabababab      0x007b74a0      0x00000000
0x4dfe2d70:     0x00000006      0x00000000      0x4df93980      0x00000000
0x4dfe2d80:     0x4dfe2db0      0x00000000      0x004fa79d      0x00000000
0x4dfe2d90:     0xabababab      0xabababab      0x007b74a0      0x00000000
0x4dfe2da0:     0xabababab      0xabababab      0x0100abab      0x00001b58
0x4dfe2db0:     0x4dfe2e00      0x00000000      0x006e2ed3      0x00000000
0x4dfe2dc0:     0xabababab      0xabababab      0xabababab      0xababab00
0x4dfe2dd0:     0x4dfe3000      0x00000000      0x4dfe2e30      0x00000000
0x4dfe2de0:     0xabababab      0xabababab      0xabababab      0xabababab
0x4dfe2df0:     0xabababab      0xabababab      0x4dfe2e61      0x00000000
0x4dfe2e00:     0x4dfe2ef0      0x00000000      0x006e37df      0x00000000
0x4dfe2e10:     0xabababab      0xabababab      0x4dfe2f1e      0x00000000
0x4dfe2e20:     0x4dfe2f50      0x00000000      0x4dfe3000      0x00000000
0x4dfe2e30:     0xabababab      0xabababab      0xabababab      0xabababab
0x4dfe2e40:     0xabababab      0xabababab      0xabababab      0xabababab
0x4dfe2e50:     0xabababab      0xabababab      0xabababab      0xabababab
0x4dfe2e60:     0xabababab      0xabababab      0xabababab      0xabababab
0x4dfe2e70:     0xabababab      0xabababab      0xabababab      0xabababab
0x4dfe2e80:     0xabababab      0xabababab      0xabababab      0xabababab
0x4dfe2e90:     0xabababab      0xabababab      0xabababab      0xabababab
0x4dfe2ea0:     0xabababab      0xabababab      0xabababab      0xabababab
0x4dfe2eb0:     0xabababab      0xabababab      0xabababab      0xabababab
0x4dfe2ec0:     0xabababab      0xabababab      0xabababab      0x00ababab
0x4dfe2ed0:     0xffffffff      0xffffffff      0x00000000      0x00000000
0x4dfe2ee0:     0xabababab      0xabababab      0x00000006      0x00000000
0x4dfe2ef0:     0x4dfe2f20      0x00000000      0x006c7a99      0x00000000
0x4dfe2f00:     0x4dfe2f50      0x00000000      0x4dfe3000      0x00000000
0x4dfe2f10:     0x00000001      0x00000000      0x4dfe3000      0x00000000
0x4dfe2f20:     0x4dfe2f70      0x00000000      0x006c79f6      0x00000000
0x4dfe2f30:     0x4dfe2fa4      0x00000000      0x00000000      0x00000000
0x4dfe2f40:     0x00000000      0x00000000      0x4dfe3000      0x00000000
0x4dfe2f50:     0x4dfe3138      0x00000000      0x4dfe2f70      0x00000000
0x4dfe2f60:     0x006ceb52      0x00000000      0x4dfe3138      0x00000000
0x4dfe2f70:     0x4dfe2fe0      0x00000000      0x006cf8ad      0x00000000
0x4dfe2f80:     0x4dfe3138      0x00000000      0x4dfe3138      0x00000000
0x4dfe2f90:     0xababab00      0x0000000c      0x4df93980      0x00000000
0x4dfe2fa0:     0xabababab      0x0000000c      0x4df93980      0x00000000
0x4dfe2fb0:     0xabababab      0xabababab      0x4dfe3168      0x00000000
0x4dfe2fc0:     0x4dfb9140      0x00000538      0x000001b8      0x01000000
0x4dfe2fd0:     0x4dfb9140      0x00000000      0x4dfe3000      0x00000000
0x4dfe2fe0:     0x4dfe3040      0x00000000      0x006dc213      0x00000000
0x4dfe2ff0:     0xabababab      0xabababab      0x4dfb90f8      0x00000000
0x4dfe3000:     0x4dfe3138      0x00000000      0x4df93980      0x00000000
0x4dfe3010:     0x00000000      0x00000000      0x00000000      0x00000000
(gdb) x/1i 0x006ebacf
   0x6ebacf <ksynch_wait+127>:  add    $0x10,%rsp
(gdb) x/1i 0x006c7c4b
   0x6c7c4b <mutex_wait_contended_lock+175>:    mov    %rax,-0x18(%rbp)
(gdb) x/1i 0x006e2ed3
   0x6e2ed3 <memquery_iterator_start+95>:       mov    -0x8(%rbp),%rax
(gdb) x/1i 0x006c7a99
   0x6c7a99 <query_memory_ex_from_os+43>:       mov    %al,-0x1(%rbp)
(gdb) x/1i 0x006c79f6
   0x6c79f6 <get_memory_info+64>:       xor    $0x1,%eax
(gdb) x/1i 0x006c79f6
   0x6c79f6 <get_memory_info+64>:       xor    $0x1,%eax
(gdb) x/1i 0x006c79f6-5
   0x6c79f1 <get_memory_info+59>:       callq  0x6c7a6e <query_memory_ex_from_os>
(gdb) x/1i get_memory_info
   0x6c79b6 <get_memory_info>:  push   %rbp
(gdb) x/1i 0x006cf8ad
   0x6cf8ad <copy_frame_to_stack+352>:  mov    %al,-0x29(%rbp)
(gdb) x/1i 0x006dc213
   0x6dc213 <sig_detach+254>:   mov    -0x10(%rbp),%rax

It cannot be two detaches: it would have asserted.

Theory: if synch fails due to bad xl8, it will try again, and the resume
there will then check doing_detach and call sig_detach. The bug is using
the master flag set up front: it has to use a local flag set after the
synch suspends are finished.

The text was updated successfully, but these errors were encountered:

derekbruening · 2016-10-13T04:36:23Z

After the fix:

# for ((i=0; i<10000; i++)); do suite/tests/bin/api.static_signal; done > /tmp/x 2>&1; grep -c 'all done' /tmp/x
10000

derekbruening added OpSys-Linux Bug-Assert Bug-DRCrash labels Oct 13, 2016

derekbruening self-assigned this Oct 13, 2016

derekbruening closed this as completed in 4dd0651 Oct 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASSERT dynamo_exited: on detach, thread suddenly dies #2021

ASSERT dynamo_exited: on detach, thread suddenly dies #2021

derekbruening commented Oct 13, 2016

derekbruening commented Oct 13, 2016

ASSERT dynamo_exited: on detach, thread suddenly dies #2021

ASSERT dynamo_exited: on detach, thread suddenly dies #2021

Comments

derekbruening commented Oct 13, 2016

derekbruening commented Oct 13, 2016