support re-attach after full detach with the same DR library instance #2157

derekbruening · 2017-01-27T17:51:14Z

Split from #95

For something like a ptrace-based external attach with an injected DR library, the solution here would be to remove the library completely on detach, leaving no extra work for a re-attach. This issue covers instead a re-attach for a DR library that we cannot remove, as it is either statically linked with the app or was not loaded by us as part of the attach but rather by the system loader up front.

Xref discussion on needing to re-attach after a full detach for start/stop when stop always does a full detach: #95 (comment)

It's worth repeating the main paragraph there:

Supporting re-takeover when stopping is tied to full cleanup is problematic
as it requires that DR fully zero all static and global variables. There
are many cases of static variables scattered around, such as inside
DO_ONCE, in the initializers for Extensions (drmgr, etc.), in memoized
functions, etc. We'd have to make all those non-global-scope static vars
exposed to get access to them, or try to zero out the whole .data and .bss
(which by itself is not enough as there's a lot of non-zero-init stuff in
.data). This has performance impliciations for chains of short-lived
processes. We also have to deal with subtle things like #1271, where we
threw out the .1config file under the assumption that we wouldn't re-read
the options later. Plus, even if we make DR work in this model,
third-party Extensions are unlikely to follow this: we would have to
noisily demand a different programming model than is usually assumed.

Despite all of those problems, in the past we have gotten such a re-attach to work for simple cases, and even if the solution is fragile and "hacky" and does not cover all corner cases it may still be worth best-effort support as it removes a severe limitation of useful usage scenarios such as bursty traces.

derekbruening · 2017-02-14T06:32:27Z

I put in initial best-effort support in 2dd9659

However, it ends up failing on Travis in tests that pass locally:

https://travis-ci.org/DynamoRIO/dynamorio/builds/201376528
debug-internal-32: 259 tests passed, **** 3 tests failed: ****
	code_api|tool.drcacheoff.burst_static =>    (16821).  Internal Error: DynamoRIO debug check failure: 
	code_api|tool.drcacheoff.burst_client =>    (16840).  Internal Error: DynamoRIO debug check failure: 
	code_api|api.static_detach =>  Application /home/travis/build/DynamoRIO/dynamorio/build_debug-internal-32/suite/tests/bin/api.static_detach (16921).  Internal Error: DynamoRIO debug check failure: /home/travis/build/DynamoRIO/dynamorio/core/unix/os.c:8907 vsyscall_page_start == NULL

I put a diagnostic into a pull request and:

https://travis-ci.org/DynamoRIO/dynamorio/jobs/201391836
254: Test command: /home/travis/build/DynamoRIO/dynamorio/build_debug-internal-32/bin32/runstats "-s" "90" "-killpg" "-silent" "-env" "LD_LIBRARY_PATH" "/home/travis/build/DynamoRIO/dynamorio/build_debug-internal-32/lib32/debug:/home/travis/build/DynamoRIO/dynamorio/build_debug-internal-32/ext/lib32/debug:" "-env" "DYNAMORIO_OPTIONS" "-stderr_mask 0xC -dumpcore_mask 0 -code_api" "/home/travis/build/DynamoRIO/dynamorio/build_debug-internal-32/suite/tests/bin/api.static_detach"
254: Test timeout computed to be: 600
252: pre-DR stop
252: all done
254: pre-DR init
254: vsyscall_page_start is 0x00000000
254: in dr_client_main
254: pre-DR start
254: pre-DR detach
254: Saw some bb events
254: clearing vsyscall_page_start
254: re-attach attempt
254: vsyscall_page_start is 0x00000000
254: vsyscall_page_start is 0xf77bb000
254: <Application /home/travis/build/DynamoRIO/dynamorio/build_debug-internal-32/suite/tests/bin/api.static_detach (17053).  Internal Error: DynamoRIO debug check failure: /home/travis/build/DynamoRIO/dynamorio/core/unix/os.c:8909 vsyscall_page_start == NULL
254: (Error occurred @457 frags)
254: version 6.2.17211, custom build
254: -stderr_mask 12 -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct 
254: 0xffae96ec 0x08136695
254: 0xffae991c 0x082ea30c
254: 0xffae9a20 0x081d9d31
254: 0xffae9aac 0x080b2f1c
254: 0xffaea2e8 0x080b65ee
254: 0xffaea300 0x080b6875
254: 0xffaea310 0x08051c86
254: 0xffaea328 0xf75c7ad3>
254/262 Test #254: code_api|api.static_detach .......................................***Failed  Required regular expression not found.Regex=[^pre-DR init

So either vdso is in the maps file twice, or find_executable_vm_areas is
called twice. Both are odd. I'm disabling the assert temporarily while I try to reproduce this or investigate further using pull requests.

derekbruening · 2017-02-14T18:06:11Z

I can repro in a 14.04.5 VM (but not in 15.04 or on Fedora). The vdso pages are split into two entries,
presumably by something DR did to them (vsyscall hook I suppose):

f7740000-f7741000 r-xp 00000000 00:00 0                                  [vdso]
f7741000-f7742000 r-xp 00000000 00:00 0                                  [vdso]

When doing_detach is false, the current stack frame is actually in the heap, so unmapping causes a segfault. Issue #2157

On UNIX we're on a permanent non-vmm stack at detach, so we can free the full vmm region. I also included a fix to vmm_heap_unit_init which accidentally left vmh->alloc_start uninitialized in the branch related to reserving OS memory at a preferred location. Issue: #2157

Take care to set the registered_fault bool back to false after event unregister so that it can be re-registered later. Issue #2157

On UNIX we're on a permanent non-vmm stack at detach, so we can free the full vmm region. I also included a fix to vmm_heap_unit_init which accidentally left vmh->alloc_start uninitialized in the branch related to reserving OS memory at a preferred location. Issue: #2157

Take care to set the registered_fault bool back to false after event unregister so that it can be re-registered later. Issue #2157

Fixes a reattach-based crash where this TLS leak caused drmgr to run out of TLS slots. Issue #2157

After 5 seconds of waiting for a thread to acknowledge a received signal, os_thread_suspend now returns false so that the caller can retry. Issue #2157

On Unix, after 5 seconds of waiting for a thread to acknowledge a received signal, os_thread_suspend now returns false so that the caller can retry. This fixes a thread related to creating a new application thread close to the time when detach happens. Issue: #2157

Issue #2157

Issue: #2157

derekbruening · 2018-06-22T15:40:30Z

Xref #3065

This commit is the supplement for PR #3050. We also need to clean postcall_cache on drwrap_exit, otherwise post_callback will not be invoked at re-attach. This is because the registration of post_callback relies on pre_callback, and the pre_callback checks postcall_cache before registering the post_callback. The stale data in postcall_cache prevents post_callback being registered to the hash table. Issue: #3065, #2157 Fixes #3049

derekbruening added the Type-Feature label Jan 27, 2017

derekbruening self-assigned this Jan 27, 2017

derekbruening mentioned this issue Feb 14, 2017

APP CRASH post-detach after re-attach #2175

Closed

derekbruening mentioned this issue Feb 15, 2017

static_detach test failing with extra output after reattach #2188

Closed

This was referenced Jul 28, 2017

vmarea and heap exit flags not reset on re-init #2571

Closed

i#2571 exit flags not reset: reset on init. #2572

Merged

derekbruening assigned Carrotman42 and unassigned derekbruening Sep 6, 2017

Carrotman42 mentioned this issue Sep 8, 2017

i#2157 re-attach: Always free the heap on unix #2630

Merged

Carrotman42 added a commit that referenced this issue Sep 11, 2017

i#2157 re-attach: Only free the heap during detach

2e8ab01

When doing_detach is false, the current stack frame is actually in the heap, so unmapping causes a segfault. Issue #2157

derekbruening mentioned this issue Oct 6, 2017

static detach/reattach: options with append semantics are appended to on subsequent reattaches #2661

Closed

Carrotman42 added a commit that referenced this issue Nov 15, 2017

i#2157 re-attach: Ensure drmgr events get reinstalled

c346f63

Take care to set the registered_fault bool back to false after event unregister so that it can be re-registered later. Issue #2157

Carrotman42 mentioned this issue Nov 15, 2017

i#2157 re-attach: Ensure drmgr events get reinstalled #2704

Merged

Carrotman42 added a commit that referenced this issue Nov 16, 2017

i#2157 re-attach: Ensure drmgr events get reinstalled (#2704)

9fbe348

Take care to set the registered_fault bool back to false after event unregister so that it can be re-registered later. Issue #2157

fhahn pushed a commit that referenced this issue Dec 4, 2017

i#2157 re-attach: Ensure drmgr events get reinstalled (#2704)

cd083a1

Take care to set the registered_fault bool back to false after event unregister so that it can be re-registered later. Issue #2157

Carrotman42 added a commit that referenced this issue Dec 8, 2017

i#2157 reattach: Unregister TLS field in drcovlib

814bbdf

Fixes a reattach-based crash where this TLS leak caused drmgr to run out of TLS slots. Issue #2157

Carrotman42 mentioned this issue Dec 8, 2017

i#2157 reattach: Unregister TLS field in drcovlib #2750

Merged

Carrotman42 added a commit that referenced this issue Dec 9, 2017

i#2157 reattach: Unregister TLS field in drcovlib (#2750)

9959b14

Fixes a reattach-based crash where this TLS leak caused drmgr to run out of TLS slots. Issue #2157

Carrotman42 added a commit that referenced this issue Dec 13, 2017

i#2157 re-attach: Add timeout to os_thread_suspend

8d81ca6

After 5 seconds of waiting for a thread to acknowledge a received signal, os_thread_suspend now returns false so that the caller can retry. Issue #2157

Carrotman42 mentioned this issue Dec 13, 2017

i#2157 re-attach: Add timeout to os_thread_suspend #2762

Merged

Carrotman42 added a commit that referenced this issue Dec 15, 2017

i#2157 re-attach: fix the mac build broken by PR 2762

097349f

Issue #2157

Carrotman42 mentioned this issue Dec 15, 2017

i#2157 re-attach: fix the mac build broken by PR 2762 #2771

Merged

Carrotman42 added a commit that referenced this issue Dec 15, 2017

i#2157 re-attach: fix the mac build broken by PR 2762 (#2771)

d7865bf

Issue: #2157

derekbruening mentioned this issue Apr 25, 2018

Reset stats at detach time when linked statically #2964

Closed

derekbruening mentioned this issue Jun 22, 2018

static DR re-attach extension support #3065

Open

derekbruening mentioned this issue Jun 22, 2018

i#3049: clean postcall_cache when drwrap_exit #3064

Merged

derekbruening mentioned this issue Mar 7, 2021

Missing unregistering of drmgr's tls slot in drcachesim's func_trace #4769

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support re-attach after full detach with the same DR library instance #2157

support re-attach after full detach with the same DR library instance #2157

derekbruening commented Jan 27, 2017

derekbruening commented Feb 14, 2017

derekbruening commented Feb 14, 2017

derekbruening commented Jun 22, 2018

support re-attach after full detach with the same DR library instance #2157

support re-attach after full detach with the same DR library instance #2157

Comments

derekbruening commented Jan 27, 2017

derekbruening commented Feb 14, 2017

derekbruening commented Feb 14, 2017

derekbruening commented Jun 22, 2018