Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tool.drcacheoff.rseq failing with glibc 2.35 #5734

Open
abhinav92003 opened this issue Nov 13, 2022 · 5 comments
Open

tool.drcacheoff.rseq failing with glibc 2.35 #5734

abhinav92003 opened this issue Nov 13, 2022 · 5 comments

Comments

@abhinav92003
Copy link
Contributor

The tool.drcacheoff.rseq test fails due to an invariant error, on an A64 VM with glibc 2.35 (disabled glibc's rseq due to i#5431). The same also happens with the #5711 fix, with and without glibc's rseq.

238:   Trace invariant failure in T-1 at ref # 0: Serial schedule entry count does
238:   not match trace

The test works fine on the A64 Jenkins machine (which has an older glibc).

@abhinav92003
Copy link
Contributor Author

Same happens with -disable_rseq also.

abhinav92003 added a commit that referenced this issue Dec 16, 2022
Adds -max_trace_size to api.rseq to avoid generating too much trace data, which
may cause us to run out of disk space sometimes. The issue on varying run times
(and therefore trace data) is not reproducing now on the Jenkins machine, but
with this change, the trace size comes down to 512K from 130M+ which is
better.

Fixes an issue in the invariant checker that affects error reporting when there's
no shard stream available. This came up in the tool.drcacheoff.rseq test.

Issue: #5733, #5734
abhinav92003 added a commit that referenced this issue Dec 19, 2022
Adds -max_trace_size to api.rseq to avoid generating too much trace data,
which may cause us to run out of disk space. The issue on varying run times
(and therefore trace data) is not reproducing now on the Jenkins machine,
but with this change, the trace size comes down to 512K from 130M+ which
is better.

Fixes an issue in the invariant checker that causes a SIGSEGV crash during
error reporting when there's no shard stream available. This came up in the
tool.drcacheoff.rseq test during the check_schedule_data checks that
happen at the very end in print_results of the invariant checker, where
we do not have the shard stream anymore. The invariant error itself is yet
to be debugged (i#5734). For the missing shard stream, we now use a
default stream with manually set ref ordinal.

Issue: #5733, #5734
dolanzhao pushed a commit that referenced this issue Jan 30, 2023
Adds -max_trace_size to api.rseq to avoid generating too much trace data,
which may cause us to run out of disk space. The issue on varying run times
(and therefore trace data) is not reproducing now on the Jenkins machine,
but with this change, the trace size comes down to 512K from 130M+ which
is better.

Fixes an issue in the invariant checker that causes a SIGSEGV crash during
error reporting when there's no shard stream available. This came up in the
tool.drcacheoff.rseq test during the check_schedule_data checks that
happen at the very end in print_results of the invariant checker, where
we do not have the shard stream anymore. The invariant error itself is yet
to be debugged (i#5734). For the missing shard stream, we now use a
default stream with manually set ref ordinal.

Issue: #5733, #5734
joshua-warburton added a commit that referenced this issue Feb 24, 2023
The rseq test fails on various different machines, and in
different contexts including during postcommit on the
test runner, but not on precommit. Ignoring this by
default due to the flakiness

issue: #5734
Change-Id: Ifada989df7e27a5bf638062c3cc7f1360badc5df
joshua-warburton added a commit that referenced this issue Feb 24, 2023
The rseq test fails on various different machines, and in different
contexts including during postcommit on the test runner, but not on
precommit. Ignoring this by default due to the flakiness

issue: #5734
@derekbruening
Copy link
Contributor

For the Jenkins disabling the failures are different and not coming from glibc 2.35:

#5885 (comment)

No, it's still on 2.31 and the problem only occurs during postcommit testing, which is very odd.

abhinav92003 added a commit that referenced this issue Apr 4, 2023
Glibc 2.35 made some changes to the __rseq_offset. We have logic that uses a
heuristic to search for the struct rseq offset. But we also assert on the
expected known rseq offset of glibc in the linux.rseq test; this is to
proactively detect glibc changes in rseq handling.

This fixes all rseq tests on glibc 2.35 except tool.drcacheoff.rseq which is
covered by #5734.

Fixes: #5955
@abhinav92003 abhinav92003 changed the title tool.drcacheoff.rseq failing on A64 VM with glibc 2.35 tool.drcacheoff.rseq failing with glibc 2.35 Apr 4, 2023
@abhinav92003
Copy link
Contributor Author

Saw this on x86 too with glibc 2.36.

@derekbruening
Copy link
Contributor

Saw this on x86 too with glibc 2.36.

tool.drcacheoff.rseq passes with glibc 2.36 on x86-64 for me (if I work around the up-front assert failure #5955 with the env var): are you seeing this only once every N runs?

@abhinav92003
Copy link
Contributor Author

This issue is to make it work with glibc's rseq support enabled (that is, export GLIBC_TUNABLES=glibc.pthread.rseq=1). Since enabled is the default setting on newer glibcs, I didn't explicitly mention it.

I tried running with disabled glibc rseq, and it still fails consistently. I wonder if there's some other env difference between your machine and mine.

abhinav92003 added a commit that referenced this issue Apr 5, 2023
Adds struct rseq offset for glibc 2.36, while preserving the existing ones for
older glibc since our CI workflows are still on the older versions.

Glibc 2.36 made some changes to the __rseq_offset. We have logic that uses a
heuristic to search for the struct rseq offset. But we also assert on the
expected known glibc rseq offset in the linux.rseq test; this is to
proactively detect glibc changes in rseq handling.

This fixes all rseq tests on glibc 2.36 except tool.drcacheoff.rseq which is
covered by #5734.

Fixes: #5955
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants