-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support re-attach after full detach with the same DR library instance #2157
Comments
I put in initial best-effort support in 2dd9659 However, it ends up failing on Travis in tests that pass locally:
I put a diagnostic into a pull request and:
So either vdso is in the maps file twice, or find_executable_vm_areas is |
I can repro in a 14.04.5 VM (but not in 15.04 or on Fedora). The vdso pages are split into two entries,
|
When doing_detach is false, the current stack frame is actually in the heap, so unmapping causes a segfault. Issue #2157
On UNIX we're on a permanent non-vmm stack at detach, so we can free the full vmm region. I also included a fix to vmm_heap_unit_init which accidentally left vmh->alloc_start uninitialized in the branch related to reserving OS memory at a preferred location. Issue: #2157
Take care to set the registered_fault bool back to false after event unregister so that it can be re-registered later. Issue #2157
Take care to set the registered_fault bool back to false after event unregister so that it can be re-registered later. Issue #2157
On UNIX we're on a permanent non-vmm stack at detach, so we can free the full vmm region. I also included a fix to vmm_heap_unit_init which accidentally left vmh->alloc_start uninitialized in the branch related to reserving OS memory at a preferred location. Issue: #2157
Take care to set the registered_fault bool back to false after event unregister so that it can be re-registered later. Issue #2157
Fixes a reattach-based crash where this TLS leak caused drmgr to run out of TLS slots. Issue #2157
Fixes a reattach-based crash where this TLS leak caused drmgr to run out of TLS slots. Issue #2157
After 5 seconds of waiting for a thread to acknowledge a received signal, os_thread_suspend now returns false so that the caller can retry. Issue #2157
On Unix, after 5 seconds of waiting for a thread to acknowledge a received signal, os_thread_suspend now returns false so that the caller can retry. This fixes a thread related to creating a new application thread close to the time when detach happens. Issue: #2157
Xref #3065 |
This commit is the supplement for PR #3050. We also need to clean postcall_cache on drwrap_exit, otherwise post_callback will not be invoked at re-attach. This is because the registration of post_callback relies on pre_callback, and the pre_callback checks postcall_cache before registering the post_callback. The stale data in postcall_cache prevents post_callback being registered to the hash table. Issue: #3065, #2157 Fixes #3049
Split from #95
For something like a ptrace-based external attach with an injected DR library, the solution here would be to remove the library completely on detach, leaving no extra work for a re-attach. This issue covers instead a re-attach for a DR library that we cannot remove, as it is either statically linked with the app or was not loaded by us as part of the attach but rather by the system loader up front.
Xref discussion on needing to re-attach after a full detach for start/stop when stop always does a full detach: #95 (comment)
It's worth repeating the main paragraph there:
Supporting re-takeover when stopping is tied to full cleanup is problematic
as it requires that DR fully zero all static and global variables. There
are many cases of static variables scattered around, such as inside
DO_ONCE, in the initializers for Extensions (drmgr, etc.), in memoized
functions, etc. We'd have to make all those non-global-scope static vars
exposed to get access to them, or try to zero out the whole .data and .bss
(which by itself is not enough as there's a lot of non-zero-init stuff in
.data). This has performance impliciations for chains of short-lived
processes. We also have to deal with subtle things like #1271, where we
threw out the .1config file under the assumption that we wouldn't re-read
the options later. Plus, even if we make DR work in this model,
third-party Extensions are unlikely to follow this: we would have to
noisily demand a different programming model than is usually assumed.
Despite all of those problems, in the past we have gotten such a re-attach to work for simple cases, and even if the solution is fragile and "hacky" and does not cover all corner cases it may still be worth best-effort support as it removes a severe limitation of useful usage scenarios such as bursty traces.
The text was updated successfully, but these errors were encountered: