Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEST HANG(10.0.19732 client.attach_blocking): Test hangs if yama.ptrace_scope != 0 #6558

Open
xdje42 opened this issue Jan 12, 2024 · 3 comments

Comments

@xdje42
Copy link
Contributor

xdje42 commented Jan 12, 2024

Describe the bug

The client.attach_blocking test hangs if yama.ptrace_scope != 0.

To Reproduce

$ ctest -R client.attach_blocking
<hang>
Ctrl-Z
$ ps
    PID TTY          TIME CMD
3715888 pts/12   00:00:00 bash
3748099 pts/12   00:00:00 ctest
3748101 pts/12   00:00:00 cmake
3748103 pts/12   00:00:00 linux.infloop
3748479 pts/12   00:00:00 sleep
3748480 pts/12   00:00:00 ps
$ kill %1
$ sudo sysctl -w kernel.yama.ptrace_scope=0
$ ctest -R client.attach_blocking
<no hang>

Versions

  • What version of DynamoRIO are you using? HEAD as of 1/11/2024 (commit d28ac5b)
  • Does the latest build from https://github.com/DynamoRIO/dynamorio/releases solve the problem? No
  • What operating system version are you running on? ("Windows 10" is not sufficient: give the release number.) Linux, amd64 and aarch64
  • Is your application 32-bit or 64-bit? 64-bit

Additional context

Related issue: #37
Related issue: #38

@xdje42 xdje42 changed the title TEST HANG(10.0.19732 client.attach_blocking): Test hands if yama.ptrace_scope != 0 TEST HANG(10.0.19732 client.attach_blocking): Test hangs if yama.ptrace_scope != 0 Jan 12, 2024
@derekbruening
Copy link
Contributor

The action item would be for the test config to look at the procfs setting and gate it -- like we do for sudo tests but automated or sthg. Or better for drrun (or drinjectlib) to try to check privs -- but that may not be easy to check all cases of missing privs.

@derekbruening
Copy link
Contributor

Another action item is to shrink the timeout which is excessive.

derekbruening added a commit that referenced this issue Jan 12, 2024
Now that we've enabled ptrace privileges on the a64 testing machine we
can remove the attach/detach tests from the flaky list as they no
longer hang.  The attach test passed 100x in a row for me so it
doesn't seem to be hitting flakes seen on other platforms like #6452.

Issue: #5740, #6558, #6127
Fixes #5740
derekbruening added a commit that referenced this issue Jan 12, 2024
Now that we've enabled ptrace privileges on the a64 testing machine we
can remove the attach/detach tests from the flaky list as they no longer
hang. The attach_test, attach_blocking, and deatch_test each passed 200x
in a row on this machine so it doesn't seem to be hitting flakes seen on
other platforms like #6452.

Issue: #5740, #6558, #6127
Fixes #5740
derekbruening added a commit that referenced this issue Jan 12, 2024
Before, we relied on drrun -s for all test suite timeouts except for
runcmp tests where we set a CTest timeout.  This resulted in the
default 10 minute CTest timeout for all tests, which was the only
timeout for runall tests and caused long suite times on the AArch64
machine which accidentally had no ptrace privileges (#5740, #6558,

Here, we set the CTest time for runall in addition to runcmp, and for
all other tests with no timeout specified (which are presumably
relying on drrun -s) we set a timeout of the drrun timeout plus 30
seconds.

Tested on the attach test:

Before:
```
123: Test timeout computed to be: 1500
```
Now:
```
$ echo 1 | sudo tee /proc/sys/kernel/yama/ptrace_scope; /usr/bin/time ctest -V -R client.attach_test; echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
...
    Start 121: code_api|client.attach_test
...
121: Test timeout computed to be: 90
1/1 Test #121: code_api|client.attach_test ......***Timeout  90.11 sec
...
The following tests FAILED:
	121 - code_api|client.attach_test (Timeout)
...
Command exited with non-zero status 8
1.13user 0.80system 1:30.14elapsed 2%CPU (0avgtext+0avgdata 13196maxresident)k
```

Fixes #6127
Issue: #6127, #6558, #5740
@derekbruening
Copy link
Contributor

I am fixing the timeout issue in #6563

derekbruening added a commit that referenced this issue Jan 16, 2024
Before, we relied on drrun -s for all test suite timeouts except for
runcmp tests where we set a CTest timeout. This resulted in the default
10 minute CTest timeout for all tests, which was the only timeout for
runall tests and caused long suite times on the AArch64 machine which
accidentally had no ptrace privileges (#5740, #6558,

Here, we set the CTest time for runall in addition to runcmp, and for
all other tests with no timeout specified (which are presumably relying
on drrun -s) we set a timeout of the drrun timeout plus 30 seconds.

Tested on the attach test:

Before:
```
123: Test timeout computed to be: 1500
```
Now:
```
$ echo 1 | sudo tee /proc/sys/kernel/yama/ptrace_scope; /usr/bin/time ctest -V -R client.attach_test; echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
...
    Start 121: code_api|client.attach_test
...
121: Test timeout computed to be: 90
1/1 Test #121: code_api|client.attach_test ......***Timeout  90.11 sec
...
The following tests FAILED:
	121 - code_api|client.attach_test (Timeout)
...
Command exited with non-zero status 8
1.13user 0.80system 1:30.14elapsed 2%CPU (0avgtext+0avgdata 13196maxresident)k
```

The property being set on more tests was confirmed on debug x86-64:
Before:
```
$ grep -c TIMEOUT suite/tests/CTestTestfile.cmake
94
```
After:
```
$ grep -c TIMEOUT suite/tests/CTestTestfile.cmake
463
```
There seem to still be a few missing the property: the ones that don't
go through suite/. There were other efforts to avoid hangs on those such
as PR #6137.

Issue: #6127, #6558, #5740
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants