Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#5431: Support glibc's rseq support #5711

Merged
merged 42 commits into from
Nov 18, 2022
Merged

i#5431: Support glibc's rseq support #5711

merged 42 commits into from
Nov 18, 2022

Conversation

abhinav92003
Copy link
Contributor

@abhinav92003 abhinav92003 commented Oct 28, 2022

Fixes issues with DR's rseq handling in glibc 2.35+.

Glibc 2.35 added support for the Linux rseq feature. See
https://lwn.net/Articles/883104/ for details. TLDR; glibc registers
its own struct rseq at init time, and stores its offset from the
thread pointer in __rseq_offset. The glibc-registered struct rseq is
present in the struct pthread. If glibc's rseq support isn't
available, either due to some issue or because the user disabled
it by exporting GLIBC_TUNABLES=glibc.pthread.rseq=0, it will
set __rseq_size to zero.

Improves the heuristic to find the registered struct rseq. For the
glibc-support case: on AArch64, it is at a -ve offset from app lib
seg base, whereas on x86 it's at a +ve offset. On both AArch64
and x86, the offset is of the opposite sign than what it would be
if the app registered the struct rseq manually in its static TLS
(which happens for older glibc and when glibc's rseq support
is disabled).

Detects whether the glibc rseq support is enabled by looking at
the sign of the struct rseq offset.

Removes the drrun -disable_rseq workaround added by #5695.

Adjusts the linux.rseq test to get the struct rseq registered by
glibc, when it's available. Also fixes some issues in the test.

Adds the Ubuntu_22 tag to rseq tests so that they are enabled.

Our Ubuntu-20 CI tests the case without rseq support in glibc,
where the app registers the struct rseq. This also helps test the
case where the app is not using glibc.

Also, our Ubuntu-22 CI tests the case with Glibc rseq support.
Manually tested the disabled rseq support case on glibc 2.35,
but not adding a CI version of it.

Fixes #5431

@abhinav92003 abhinav92003 marked this pull request as ready for review November 2, 2022 02:39
core/unix/rseq_linux.c Outdated Show resolved Hide resolved
core/unix/rseq_linux.c Outdated Show resolved Hide resolved
core/unix/rseq_linux.c Outdated Show resolved Hide resolved
core/unix/rseq_linux.c Outdated Show resolved Hide resolved
core/unix/rseq_linux.c Outdated Show resolved Hide resolved
core/unix/rseq_linux.c Outdated Show resolved Hide resolved
core/unix/rseq_linux.c Outdated Show resolved Hide resolved
suite/tests/linux/rseq.c Outdated Show resolved Hide resolved
suite/tests/linux/rseq.c Outdated Show resolved Hide resolved
suite/tests/linux/rseq.c Show resolved Hide resolved
core/unix/rseq_linux.c Show resolved Hide resolved
core/unix/rseq_linux.c Show resolved Hide resolved
core/unix/rseq_linux.c Outdated Show resolved Hide resolved
core/unix/rseq_linux.h Outdated Show resolved Hide resolved
core/unix/rseq_linux.c Outdated Show resolved Hide resolved
core/unix/rseq_linux.c Outdated Show resolved Hide resolved
@abhinav92003
Copy link
Contributor Author

On AArch64 with glibc 2.35 (not in our CI), the tool.drcacheoff.rseq test is failing with

238:   Trace invariant failure in T-1 at ref # 0: Serial schedule entry count does
238:   not match trace

But I see the same failure without this change and glibc's rseq disabled. So it's probably unrelated.

@derekbruening
Copy link
Contributor

On AArch64 with glibc 2.35 (not in our CI), the tool.drcacheoff.rseq test is failing with

238:   Trace invariant failure in T-1 at ref # 0: Serial schedule entry count does
238:   not match trace

But I see the same failure without this change and glibc's rseq disabled. So it's probably unrelated.

Please file an issue on this: never seen this before.

@abhinav92003
Copy link
Contributor Author

I've created #5734 and #5733 for two test issues that happen even without this PR and seem unrelated to i#5431.

#5733 happens even on the Jenkins A64 machine with the older glibc, in addition to on an A64 VM with newer glibc.
#5734 happens with the newer glibc A64 VM only.

@abhinav92003
Copy link
Contributor Author

run arm tests

1 similar comment
@abhinav92003
Copy link
Contributor Author

run arm tests

@abhinav92003
Copy link
Contributor Author

attach_blocking and attach_test are timing out on A64.

@abhinav92003
Copy link
Contributor Author

run arm tests

@derekbruening
Copy link
Contributor

attach_blocking and attach_test are timing out on A64.

Please cite an existing issue or file a new one, as with any test failure.

@abhinav92003
Copy link
Contributor Author

Please cite an existing issue or file a new one, as with any test failure.

Wasn't sure before if it's my PR that's causing it. But I ran it without these changes on the Jenkins machine and they still time out. Maybe it's related to the Jenkins machine update.

@abhinav92003
Copy link
Contributor Author

Please cite an existing issue or file a new one, as with any test failure.

Wasn't sure before if it's my PR that's causing it. But I ran it without these changes on the Jenkins machine and they still time out. Maybe it's related to the Jenkins machine update.

#5740.

@abhinav92003
Copy link
Contributor Author

Run arm tests

@abhinav92003 abhinav92003 merged commit 5ef7b9e into master Nov 18, 2022
@abhinav92003 abhinav92003 deleted the i5431-glibc-rseq branch November 18, 2022 05:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CRASH from new glibc 2.35 rseq on any app (-disable_rseq solves)
2 participants