Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

currently-in-app-state thread now can't handle a signal #2128

Open
derekbruening opened this issue Jan 9, 2017 · 2 comments
Open

currently-in-app-state thread now can't handle a signal #2128

derekbruening opened this issue Jan 9, 2017 · 2 comments

Comments

@derekbruening
Copy link
Contributor

The latest DR TLS changes from #2089 break the DrM nudge test:

Trying to update the DR used by DrM results in the nudge test failing (I
had to add custom output to see the nudge-out file contents):

62: <received nudge mask=0x40000 id=0x00000000 arg=0x0000000000000000>
62: <ERROR: master_signal_handler with no siginfo (i#26?): tid=16408, sig=4>
62: |
62: 
62: CMake Error at runtest.cmake:249 (message):
62:   Timed out waiting for summary output
62: 
master_signal_handler: sig=4, retaddr=0xf776bbd0
handle_nudge_signal: sig=4 code=-1 errno=268697600
received nudge version=1 flags=0x0 mask=0x40000 id=0x00000000 arg=0x0000000000000000
SYSLOG_INFORMATION: received nudge mask=0x40000 id=0x00000000 arg=0x0000000000000000
        master_signal_handler 4 returning now

Exit from F1647(0x08048901).0xe79cbcc0 (shared) 
 (UNKNOWN DIRECT EXIT F1647.0xe79cbcc0->F1647)

dispatch: target = 0x08048901
synch with all threads my id = 17188 Giving 4 permission and seeking 4 state
Skipping synch with thread 17188
Finished synch with all threads: result=1
        returning holding initexit_lock and all_threads_synch_lock
os_switch_seg_to_base: switching to app, setting gs to 0x63
os_switch_seg_to_base to app: set_thread_area successful for thread 17188 base 0x0804cc80
TLS copy is 0xe77516e4
os_switch_seg_to_base: switching to dr, setting fs to 0x73
os_switch_seg_to_base to DR: set_thread_area successful for thread 17188 base 0xe77516e4
=> <failure here>
os_switch_seg_to_base: switching to dr, setting gs to 0x63
os_switch_seg_to_base to DR: set_thread_area successful for thread 17188 base 0xe77c2b70
os_switch_seg_to_base: switching to dr, setting fs to 0x73
os_switch_seg_to_base to DR: set_thread_area successful for thread 17188 base 0xe77bd000
os_file_exists failed: 0xfffffffe
os_file_exists failed: 0xfffffffe
os_file_exists failed: 0xfffffffe
Just flushed targetf, next_tag is 0x08048901
Entry into F1647(0x08048901).0xe79cbc77 (shared)

I can repro like this:

# tests/run_app_in_bg -out ./nudge-out /work/drmemory/git/build_x86_dbg/bin/drmemory -debug -dr_debug -dr /work/dr/git/exports -batch -dr_ops -loglevel -dr_ops 2 -callstack_style 0x27 -no_results_to_stderr -- tests/infloop
17294
# for ((i=0; i<2; i++)); do bin/drmemory -debug -dr_debug -dr /work/dr/git/exports -batch -nudge 17294; done

The failure happens when the 2nd nudge arrives when the thread is in app state.
The #2089 changes mean that a nudge signal arriving in a native thread
will always cause this assert. Previously dr_switch_to_app_state() did
nothing on Linux which is why nothing fired.

For the DrM failure, it's a leak scan nudge, which calls
dr_switch_to_app_state() for Windows TEB. For now I will just have DrM
only do the swap to app state on Windows. Longer term we need a better
signal-in-cur-app-thread strategy.

Xref the related #1921.

@derekbruening
Copy link
Contributor Author

We have another short-term problem too: alloc_replace.c calls dr_switch_to_app_state() all over the place.

Maybe dr_switch_to_app_state() for a full-control client shouldn't swap the DR TLS on Linux.
Though that won't help solve the start/stop signal handling it will avoid this regression. I can add a dr_state_flags_t controlling it, as a negative to keep compatibility, or only maintaining source compat.

This all means there's also a racy window pre-clone where a nudge or other signal will fail, so we want a broader solution.

@derekbruening
Copy link
Contributor Author

549ce61 disables DR TLS swapping by default for clients. Leaving this issue open for more general handling including covering signals during the mentioned race window.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant