Replay failed with "Expected syscall_bp_vm to be clear" #3285

Keno · 2022-06-23T19:22:14Z

In trace julia-9 from https://buildkite.com/organizations/julialang/pipelines/julia-master/builds/13170/jobs/01819131-f9a7-48e4-926f-23e48419c663/artifacts/018191c4-b7b8-461b-97c8-6f6f44bfc49f, we have when attempting to replay:

[FATAL src/ReplaySession.cc:632:enter_syscall()]
 (task 14658 (rec:1978) at time 2968298)
 -> Assertion `false' failed to hold. Expected syscall_bp_vm to be clear but it's 1978's address space with a breakpoint at 0x7ff02dca8744 while we're at 0x70000002
Tail of trace dump:

[snip]

  real_time:52858533.422724 global_time:2968294, event:`SYSCALL: epoll_wait' (state:EXITING_SYSCALL) tid:579, ticks:415456293513
rax:0x1 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x400 rsi:0x7f50b86166c0 rdi:0x5 rbp:0x0 rsp:0x681ffde0 r8:0x0 r9:0x7f50ccc3fca0 r10:0xbb9 r11:0x246 r12:0xbb9 r13:0x7f50d858006e r14:0x7f50b86166c0 r15:0x7f50d858306e rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xe8 fs_base:0x7f50df703180 gs_base:0x0
  { tid:579, addr:0x7f50b86166c0, length:0x3000 }
}
{
  real_time:52858533.423096 global_time:2968295, event:`SYSCALLBUF_FLUSH' tid:579, ticks:415456303524
  { syscall:'clock_gettime', ret:0x0, size:0x20 }
  { syscall:'read', ret:0xfff, size:0x100f, desched:1 }
  { syscall:'clock_gettime', ret:0x0, size:0x20 }
  { syscall:'clock_gettime', ret:0x0, size:0x20 }
  { syscall:'epoll_wait', ret:0x1, size:0x1c }
  { syscall:'clock_gettime', ret:0x0, size:0x20 }
  { syscall:'read', ret:0x6e4, size:0x6f4, desched:1 }
  { syscall:'clock_gettime', ret:0x0, size:0x20 }
  { syscall:'epoll_wait', ret:0x0, size:0x10 }
}
{
  real_time:52858533.423110 global_time:2968296, event:`SYSCALL: epoll_wait' (state:ENTERING_SYSCALL) tid:579, ticks:415456303524
rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x400 rsi:0x7f50bcb6de60 rdi:0x5 rbp:0x0 rsp:0x681ffde0 r8:0x0 r9:0x7f50ccc3fca0 r10:0xa36 r11:0x246 r12:0xa36 r13:0x7f50d85817f6 r14:0x7f50bcb6de60 r15:0x7f50d85847f6 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xe8 fs_base:0x7f50df703180 gs_base:0x0
}
{
  real_time:52858533.423128 global_time:2968297, event:`SYSCALLBUF_RESET' tid:579, ticks:415456303524
}
{
  real_time:52858533.423177 global_time:2968298, event:`SYSCALL: exit_group' (state:ENTERING_SYSCALL) tid:1978, ticks:372899180
rax:0xffffffffffffffda rbx:0x7ff02dd98760 rcx:0xffffffffffffffff rdx:0x8f rsi:0x3c rdi:0x8f rbp:0x8f rsp:0x7ff02157ef68 r8:0xe7 r9:0xffffffffffffff78 r10:0xfffffffffffff81f r11:0x246 r12:0x7ff02dd98760 r13:0x63d r14:0x7ff02dda1428 r15:0x0 rip:0x7ff02dca8746 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xe7 fs_base:0x7ff02dbbf180 gs_base:0x0
}
{
  real_time:52858533.423188 global_time:2968299, event:`EXIT' tid:1978, ticks:372899180
}

Will look into this further.

The text was updated successfully, but these errors were encountered:

khuey · 2022-06-23T19:34:09Z

This is usually a rarer way for a divergence to manifest (i.e. we missed the expected syscall breakpoint and ended up somewhere else).

Keno · 2022-06-23T22:51:39Z

Ok, I believe what happens here is that the tracee rewrites the ip in sigreturn to hijack execution, write out the error messages and then exit. This is safe to do in the context of the original application because the syscalls after the hijack are carefully controlled, but unfortunately for us, I believe this causes us to jump out of the syscallbuf, causing corrupted syscallbuf state the next time we enter the syscall.

Let me see if I can come up with a reproducer.

Keno · 2022-06-25T02:22:29Z

This one is tricky. Because of the possibility of switching away in the signal handler, I don't think we can really support taking signals in the syscallbuf at all, because that'll leave syscallbuf state undefined. Our current mechanism for deferring syscalls is also insufficient, because it doesn't handle syscall restarts properly. I think we may need to significantly rejigger the way we handle this to always force the tracee out of the syscallbuf before delivering a signal or restarting a desched syscall.

rocallahan · 2022-06-25T20:14:59Z

Yeah, for a long time I've considered that it might be good to handle signal interruption of buffered syscalls by unwinding the syscall buffering so signal handlers see the "correct" stack.

One issue with that, though, is that I think we would need to unpatch the syscall as well so the IP is correct and the syscall restarts normally. And that sounds expensive if we have to constantly unpatch and repatch syscalls.

rocallahan · 2022-06-25T20:17:26Z

Maybe we could get away with a set of special stubs that do "syscall; jmp back to application code" which we set the IP to after we've unwound the syscallbuf stack. IP wouldn't be in an application binary mapping, which could confuse some signal handlers, but registers and stack would at least be correct.

Keno · 2022-07-01T23:01:57Z

I've thought about this a fair bit and I think what I want to do here is to put an extra syscall instruction and jump back to the application hook into the extended jump patch. One core issue here is that we somewhere need to store the address to jump back to. I briefly considered using some of the padding inside the sigframe to store the value of stub_scratch_1 and restore it appropriately on sigreturn. I think that would also work, but I think it's cleaner to just use a bit of extra space in the jump patch and less likely to conflict with other applications that may do weird things with the sigframe.

rocallahan · 2022-07-01T23:29:19Z

That's pretty much what I had in mind.

Keno · 2022-07-05T00:34:55Z

So, I've got this basically working, but of course stack unwinding out of the bail path is a problem, because that's now in the extended jump patch rather than in the syscallbuf code for which we had previously carefully crafted the unwind info. The best thing I could come up with to fix that is to have librrpage create a bunch of empty pages that the return stubs go into, give them appropriate unwind info and use that to return. I thought about dynamic registration, but the problem is that all those unwinders have function based apis that are generally not re-entrant or signal safe and we have no guarantee about when we add an extended jump patch. A similar problem applies to dynamically loading a .so that has template space. As a result of this we'd only have a finite number of syscall locations that could have jump stubs with proper unwind info. That said, I think as long as we set that number high enough, we're probably unlikely to run into any problem, so it's more of a theoretical concern, but I'd like to hear your thoughts.

Keno · 2022-07-05T00:36:26Z

I guess I should say that there is an alternative where rr writes out its own unwind table format and we teach the various unwind libraries how to read that with a one-time registration, but of course that would force the application developers to upgrade or switch unwind libraries, which is probably a no-go.

yuyichao · 2022-07-05T00:39:43Z

Does the bail path have to be in the stub? I made sure the clone fallback path on aarch64 is in librrpreload for better unwind info....
OTOH, aarch64 also has a fallback path that makes a syscall in the stub already but that should only be used for invalid syscalls...

Keno · 2022-07-05T00:49:48Z

It doesn't if we can have a place to store the jump back address to that gets properly saved/restored across sigreturns. As I mentioned above I considered using the signal frame padding for this purpose, but I'm not 100% comfortable with it, because we don't know what external code might do with it. We could potentially play games with r11, since that gets cleared overridden on syscall entry anyway, but that might be a bit too magic.

yuyichao · 2022-07-05T00:55:32Z

So a fixed address in thread local storage won't work because syscall in the signal handler might mess with it?

Keno · 2022-07-05T00:57:29Z

yes, the sigreturn may switch away and abandon the stack, never returning to the interrupted syscall.

Keno · 2022-07-05T01:00:11Z

That said, perhaps the best option is just to unpatch the syscall if we see a signal being actually delivered in the bail path and just fudging things out from under the signal handler to make it seem that the signal was delivered in the ordinary program. We could re-patch on the next execution as long as we add a special case to the sigreturn code to unpatch again if there's a sigreturn into a patched region.

Patching/unpatching is reasonably expensive, but it's not all that more expensive than setting/unsetting breakpoints, which we already do in this situation, so perhaps it's not that bad.

yuyichao · 2022-07-05T01:01:47Z

Would RR know if the sig return finished without returning to where it started? If so then it seems that just using alt stack (or another stack accessed from thread local storage) would work and the tracer would need to manually reset it if an unusual return happens.

Keno · 2022-07-05T01:03:53Z

Would RR know if the sig return finished without returning to where it started?

There isn't really anything that prevents the tracee from just capturing the state arbitrarily and jumping to it later at arbitrary points (and real applications rely on this). From the kernel's perspective, getting a signal is a setjmp and sigreturn is a longjmp with some slightly fancy edge cases, but we can't really rely on any sort of pairing.

yuyichao · 2022-07-05T01:04:14Z

I was mainly thinking that I'm pretty sure in an earlier version of the julia signal handler (may or may not be in the merged version) I was directly using assembly code to jump to a different target (also I think the SIGFPE handler still does that).

yuyichao · 2022-07-05T01:07:43Z

but we can't really rely on any sort of pairing

yeah, so I was mainly wondering if we could notice that on the next sigreturn that the previous one didn't run. But I guess there's nothing illegal for the application to just spend hours in a signal handler before doing a normal sigreturn and short of reliably unwinding the stack we wouldn't know if that has happened.

I was also thinking the value of the stack pointer could be an indication that the tracee has abandoned the signal frame but if the application does any sort of tricks with the stack then that won't work....

Keno · 2022-07-05T01:20:27Z

There's probably heuristics that could work reasonably well, but 99% probably isn't good enough here. So far I'm liking the patching and unpatching the best. There's some annoying edge cases, but it gives us the nice property that application signal handlers will never see any ip outside the application code. I think it'd be too expensive for every desched, but as long as it's only on delivered signals, I think we're probably ok.

Keno · 2022-07-05T01:22:59Z

Actually, maybe we don't even have to unpatch at all. What if we just fudge the ip in the signal frame to point into the original location where the syscall instruction would have been and fix that back up on sigreturns. The signal handler would be able to unwind correctly, because the cfi still matches what was there before. Sure, if it actually went and read instruction memory, it'd get confused, but that's the case with my unpatching scheme also.

rocallahan · 2022-07-07T21:38:12Z

fix that back up on sigreturns.

Do you mean unpatch on sigreturns, or adjust the IP on sigreturns? I think the former would be safer right?

Keno · 2022-07-07T21:58:32Z

I mean adjust the IP to point into the tail of the extended jump patch. I've got this working now and it seems to be working well. Just fighting with GDB a bit to give somewhat reasonable unwind info. Will have a PR up soon.

rocallahan · 2022-07-07T22:10:50Z

OK. I wonder how you handle the trailing instructions of the syscall hooks.

Keno · 2022-07-07T22:13:15Z

OK. I wonder how you handle the trailing instructions of the syscall hooks.

Just copy them into the extended jump patch, so the patch looks like

<setup instructions>
callq syscall_hook
# Bail path returns here with stack already switched back
syscall
<trailing instructions>
# Normal path could return here (but in practice just jumps right through to make GDB happier)
jmpq *return_addr

This is a major redesign of the syscallbuf code with the goal of establishing the invariant that we never switch away from a tracee while it's in the syscallbuf code. Instead, we unwind the syscallbuf code completely and execute the syscall at a special syscall instruction now placed in the extended jump patch. The primary motivation for this that this fixes #3285, but I think the change is overall very beneficial. We have significant complexity in the recorder to deal with the possibility of interrupting the tracee during the syscallbuf. This commit does not yet remove this complexity (the change is already very big), but that should be easy to do as a follow up. Additionally, we used to be unable to perform syscall buffering for syscalls performed inside a signal handler that interrupted a syscall. This had performance implications on use cases like stack walkers, which often perform multiple memory-probing system calls for every frame to deal with the possibility of invalid unwind info. There are many details here, but here's a high level overview. The layout of the new extended jump patch is: ``` <stack setup> call <syscallbuf_hook> // Bail path returns here <stack restore> syscall <code from the original patch site> // Non-bail path returns here. jmp return_addr ``` One detail worth mentioning is what happens if a signal gets delivered once the tracee is out of the syscallbuf, but still in the extended jump patch (i.e. after the stack restore). In this case, rr will rewrite the ip of the signal frame to point to the equivalent ip in the original, now patched code section. Of course the instructions in question are no longer there, but the CFI will nevertheless be generally accurate for the current register state (excluding weird CFI that explicitly references the ip of course). This allows unwinders in the end-user-application to never have to unwind through any frame in the rr syscallbuf, which seems like a desirable property. Of course, `sigreturn` must perform the opposite transformation to avoid actually returning into a patched-out location. The main drawback of this scheme is that while the application will never see a location without CFI, GDB does still lack unwind information in the extended jump stub. This is not a new problem, but syscall events are now in the extended jump stub, so they come up quite frequently. I don't think this is a huge problem - it's basically the same situation we used to have before the vdso changes. I believe the best way to fix this would be to establish some way of having rr inform gdb of its jump patches (in fact gdb already has this kind of mechanism for tracepoints, it's just not exposed for tracepoints initiated by the gdb server), but I don't intend to do this myself anytime in the near future. That said, I should note that doing this would not require any changes on the record side, so could be done anytime and start working retroactively for already recorded traces.

This adds a test case to model #3285, where the test case pokes the sigframe to force sigreturn to switch to a different function than that which incurred the signal.

This is a major redesign of the syscallbuf code with the goal of establishing the invariant that we never switch away from a tracee while it's in the syscallbuf code. Instead, we unwind the syscallbuf code completely and execute the syscall at a special syscall instruction now placed in the extended jump patch. The primary motivation for this that this fixes #3285, but I think the change is overall very beneficial. We have significant complexity in the recorder to deal with the possibility of interrupting the tracee during the syscallbuf. This commit does not yet remove this complexity (the change is already very big), but that should be easy to do as a follow up. Additionally, we used to be unable to perform syscall buffering for syscalls performed inside a signal handler that interrupted a syscall. This had performance implications on use cases like stack walkers, which often perform multiple memory-probing system calls for every frame to deal with the possibility of invalid unwind info. There are many details here, but here's a high level overview. The layout of the new extended jump patch is: ``` <stack setup> call <syscallbuf_hook> // Bail path returns here <stack restore> syscall <code from the original patch site> // Non-bail path returns here. jmp return_addr ``` One detail worth mentioning is what happens if a signal gets delivered once the tracee is out of the syscallbuf, but still in the extended jump patch (i.e. after the stack restore). In this case, rr will rewrite the ip of the signal frame to point to the equivalent ip in the original, now patched code section. Of course the instructions in question are no longer there, but the CFI will nevertheless be generally accurate for the current register state (excluding weird CFI that explicitly references the ip of course). This allows unwinders in the end-user-application to never have to unwind through any frame in the rr syscallbuf, which seems like a desirable property. Of course, `sigreturn` must perform the opposite transformation to avoid actually returning into a patched-out location. The main drawback of this scheme is that while the application will never see a location without CFI, GDB does still lack unwind information in the extended jump stub. This is not a new problem, but syscall events are now in the extended jump stub, so they come up quite frequently. I don't think this is a huge problem - it's basically the same situation we used to have before the vdso changes. I believe the best way to fix this would be to establish some way of having rr inform gdb of its jump patches (in fact gdb already has this kind of mechanism for tracepoints, it's just not exposed for tracepoints initiated by the gdb server), but I don't intend to do this myself anytime in the near future. That said, I should note that doing this would not require any changes on the record side, so could be done anytime and start working retroactively for already recorded traces.

This adds a test case to model #3285, where the test case pokes the sigframe to force sigreturn to switch to a different function than that which incurred the signal.

This is a major redesign of the syscallbuf code with the goal of establishing the invariant that we never switch away from a tracee while it's in the syscallbuf code. Instead, we unwind the syscallbuf code completely and execute the syscall at a special syscall instruction now placed in the extended jump patch. The primary motivation for this that this fixes #3285, but I think the change is overall very beneficial. We have significant complexity in the recorder to deal with the possibility of interrupting the tracee during the syscallbuf. This commit does not yet remove this complexity (the change is already very big), but that should be easy to do as a follow up. Additionally, we used to be unable to perform syscall buffering for syscalls performed inside a signal handler that interrupted a syscall. This had performance implications on use cases like stack walkers, which often perform multiple memory-probing system calls for every frame to deal with the possibility of invalid unwind info. There are many details here, but here's a high level overview. The layout of the new extended jump patch is: ``` <stack setup> call <syscallbuf_hook> // Bail path returns here <stack restore> syscall <code from the original patch site> // Non-bail path returns here. jmp return_addr ``` One detail worth mentioning is what happens if a signal gets delivered once the tracee is out of the syscallbuf, but still in the extended jump patch (i.e. after the stack restore). In this case, rr will rewrite the ip of the signal frame to point to the equivalent ip in the original, now patched code section. Of course the instructions in question are no longer there, but the CFI will nevertheless be generally accurate for the current register state (excluding weird CFI that explicitly references the ip of course). This allows unwinders in the end-user-application to never have to unwind through any frame in the rr syscallbuf, which seems like a desirable property. Of course, `sigreturn` must perform the opposite transformation to avoid actually returning into a patched-out location. The main drawback of this scheme is that while the application will never see a location without CFI, GDB does still lack unwind information in the extended jump stub. This is not a new problem, but syscall events are now in the extended jump stub, so they come up quite frequently. I don't think this is a huge problem - it's basically the same situation we used to have before the vdso changes. I believe the best way to fix this would be to establish some way of having rr inform gdb of its jump patches (in fact gdb already has this kind of mechanism for tracepoints, it's just not exposed for tracepoints initiated by the gdb server), but I don't intend to do this myself anytime in the near future. That said, I should note that doing this would not require any changes on the record side, so could be done anytime and start working retroactively for already recorded traces.

This adds a test case to model #3285, where the test case pokes the sigframe to force sigreturn to switch to a different function than that which incurred the signal.

This is a major redesign of the syscallbuf code with the goal of establishing the invariant that we never switch away from a tracee while it's in the syscallbuf code. Instead, we unwind the syscallbuf code completely and execute the syscall at a special syscall instruction now placed in the extended jump patch. The primary motivation for this that this fixes #3285, but I think the change is overall very beneficial. We have significant complexity in the recorder to deal with the possibility of interrupting the tracee during the syscallbuf. This commit does not yet remove this complexity (the change is already very big), but that should be easy to do as a follow up. Additionally, we used to be unable to perform syscall buffering for syscalls performed inside a signal handler that interrupted a syscall. This had performance implications on use cases like stack walkers, which often perform multiple memory-probing system calls for every frame to deal with the possibility of invalid unwind info. There are many details here, but here's a high level overview. The layout of the new extended jump patch is: ``` <stack setup> call <syscallbuf_hook> // Bail path returns here <stack restore> syscall <code from the original patch site> // Non-bail path returns here. jmp return_addr ``` One detail worth mentioning is what happens if a signal gets delivered once the tracee is out of the syscallbuf, but still in the extended jump patch (i.e. after the stack restore). In this case, rr will rewrite the ip of the signal frame to point to the equivalent ip in the original, now patched code section. Of course the instructions in question are no longer there, but the CFI will nevertheless be generally accurate for the current register state (excluding weird CFI that explicitly references the ip of course). This allows unwinders in the end-user-application to never have to unwind through any frame in the rr syscallbuf, which seems like a desirable property. Of course, `sigreturn` must perform the opposite transformation to avoid actually returning into a patched-out location. The main drawback of this scheme is that while the application will never see a location without CFI, GDB does still lack unwind information in the extended jump stub. This is not a new problem, but syscall events are now in the extended jump stub, so they come up quite frequently. I don't think this is a huge problem - it's basically the same situation we used to have before the vdso changes. I believe the best way to fix this would be to establish some way of having rr inform gdb of its jump patches (in fact gdb already has this kind of mechanism for tracepoints, it's just not exposed for tracepoints initiated by the gdb server), but I don't intend to do this myself anytime in the near future. That said, I should note that doing this would not require any changes on the record side, so could be done anytime and start working retroactively for already recorded traces.

This adds a test case to model #3285, where the test case pokes the sigframe to force sigreturn to switch to a different function than that which incurred the signal.

cebtenzzre · 2023-09-29T15:19:18Z

I built from the PR (after cherry-picking some commits for gcc 13 compatibility), and still get this crash:

Details

[FATAL /usr/src/debug/rr-git/rr/src/ReplaySession.cc:635:enter_syscall()]                                                                                                                     
 (task 32428 (rec:32354) at time 33968)
 -> Assertion `false' failed to hold. Expected syscall_bp_vm to be clear but it's 32354's address space with a breakpoint at 0x7f967e923059 while we're at 0x7f967f069122
Tail of trace dump:
{
  real_time:8147.767345 global_time:33948, event:`SYSCALL: openat' (state:ENTERING_SYSCALL) tid:32354, ticks:319645440
rax:0xffffffffffffffda rbx:0x7f967cab128c rcx:0xffffffffffffffff rdx:0x90800 rsi:0x5617be879efd rdi:0xffffff9c rbp:0x7ffe0e33fcc0 rsp:0x7ffe0e33fc30 r8:0x5617bd4df010 r9:0x0 r10:0x0 r11:0x246 r12:0x5617be879efd r13:0x0 r14:0x5617be870850 r15:0x5617be879f1a rip:0x7f967f0690af eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x101 fs_base:0x7f967efd1e80 gs_base:0x0
}
{
  real_time:8147.767352 global_time:33949, event:`SYSCALLBUF_RESET' tid:32354, ticks:319645440
}
{
  real_time:8147.767387 global_time:33950, event:`SYSCALL: openat' (state:EXITING_SYSCALL) tid:32354, ticks:319645440
rax:0xfffffffffffffffe rbx:0x7f967cab128c rcx:0xffffffffffffffff rdx:0x90800 rsi:0x5617be879efd rdi:0xffffff9c rbp:0x7ffe0e33fcc0 rsp:0x7ffe0e33fc30 r8:0x5617bd4df010 r9:0x0 r10:0x0 r11:0x246 r12:0x5617be879efd r13:0x0 r14:0x5617be870850 r15:0x5617be879f1a rip:0x7f967f0690af eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x101 fs_base:0x7f967efd1e80 gs_base:0x0
}
{
  real_time:8147.767441 global_time:33951, event:`SYSCALLBUF_FLUSH' tid:32354, ticks:319645502
  { syscall:'openat', ret:0xfffffffffffffffe, size:0x10, desched:1 }
}
{
  real_time:8147.767448 global_time:33952, event:`SYSCALL: openat' (state:ENTERING_SYSCALL) tid:32354, ticks:319645502
rax:0xffffffffffffffda rbx:0x7f967cab128c rcx:0xffffffffffffffff rdx:0x90800 rsi:0x5617be879f1a rdi:0xffffff9c rbp:0x7ffe0e33fcc0 rsp:0x7ffe0e33fc30 r8:0x5617bd4df010 r9:0x0 r10:0x0 r11:0x246 r12:0x5617be879f1a r13:0x0 r14:0x5617be870850 r15:0x5617be879f4b rip:0x7f967f0690af eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x101 fs_base:0x7f967efd1e80 gs_base:0x0
}
{
  real_time:8147.767454 global_time:33953, event:`SYSCALLBUF_RESET' tid:32354, ticks:319645502
}
{
  real_time:8147.767491 global_time:33954, event:`SYSCALL: openat' (state:EXITING_SYSCALL) tid:32354, ticks:319645502
rax:0xfffffffffffffffe rbx:0x7f967cab128c rcx:0xffffffffffffffff rdx:0x90800 rsi:0x5617be879f1a rdi:0xffffff9c rbp:0x7ffe0e33fcc0 rsp:0x7ffe0e33fc30 r8:0x5617bd4df010 r9:0x0 r10:0x0 r11:0x246 r12:0x5617be879f1a r13:0x0 r14:0x5617be870850 r15:0x5617be879f4b rip:0x7f967f0690af eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x101 fs_base:0x7f967efd1e80 gs_base:0x0
}
{
  real_time:8147.767538 global_time:33955, event:`SYSCALLBUF_FLUSH' tid:32354, ticks:319645562
  { syscall:'openat', ret:0xfffffffffffffffe, size:0x10, desched:1 }
}
{
  real_time:8147.767545 global_time:33956, event:`SYSCALL: openat' (state:ENTERING_SYSCALL) tid:32354, ticks:319645562
rax:0xffffffffffffffda rbx:0x7f967cab128c rcx:0xffffffffffffffff rdx:0x90800 rsi:0x5617be879f4b rdi:0xffffff9c rbp:0x7ffe0e33fcc0 rsp:0x7ffe0e33fc30 r8:0x5617bd4df010 r9:0x0 r10:0x0 r11:0x246 r12:0x5617be879f4b r13:0x0 r14:0x5617be870850 r15:0x5617be879f74 rip:0x7f967f0690af eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x101 fs_base:0x7f967efd1e80 gs_base:0x0
}
{
  real_time:8147.767551 global_time:33957, event:`SYSCALLBUF_RESET' tid:32354, ticks:319645562
}
{
  real_time:8147.767587 global_time:33958, event:`SYSCALL: openat' (state:EXITING_SYSCALL) tid:32354, ticks:319645562
rax:0xfffffffffffffffe rbx:0x7f967cab128c rcx:0xffffffffffffffff rdx:0x90800 rsi:0x5617be879f4b rdi:0xffffff9c rbp:0x7ffe0e33fcc0 rsp:0x7ffe0e33fc30 r8:0x5617bd4df010 r9:0x0 r10:0x0 r11:0x246 r12:0x5617be879f4b r13:0x0 r14:0x5617be870850 r15:0x5617be879f74 rip:0x7f967f0690af eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x101 fs_base:0x7f967efd1e80 gs_base:0x0
}
{
  real_time:8147.768371 global_time:33959, event:`SYSCALLBUF_FLUSH' tid:32354, ticks:319645858
  { syscall:'openat', ret:0x3, size:0x10, desched:1 }
  { syscall:'readlinkat', ret:0x22, size:0x32 }
}
{
  real_time:8147.768379 global_time:33960, event:`SYSCALL: fstatat' (state:ENTERING_SYSCALL) tid:32354, ticks:319645858
rax:0xffffffffffffffda rbx:0x3 rcx:0xffffffffffffffff rdx:0x7ffe0e33fbe0 rsi:0x7f967e9b8bd5 rdi:0x3 rbp:0x7ffe0e33fbe0 rsp:0x7ffe0e33fbd8 r8:0x0 r9:0x0 r10:0x1000 r11:0x246 r12:0x5617be879f74 r13:0x0 r14:0x5617be870850 r15:0x5617be879f96 rip:0x7f967e91d64e eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x106 fs_base:0x7f967efd1e80 gs_base:0x0
}
{
  real_time:8147.768386 global_time:33961, event:`SYSCALLBUF_RESET' tid:32354, ticks:319645858
}
{
  real_time:8147.768421 global_time:33962, event:`SYSCALL: fstatat' (state:EXITING_SYSCALL) tid:32354, ticks:319645858
rax:0x0 rbx:0x3 rcx:0xffffffffffffffff rdx:0x7ffe0e33fbe0 rsi:0x7f967e9b8bd5 rdi:0x3 rbp:0x7ffe0e33fbe0 rsp:0x7ffe0e33fbd8 r8:0x0 r9:0x0 r10:0x1000 r11:0x246 r12:0x5617be879f74 r13:0x0 r14:0x5617be870850 r15:0x5617be879f96 rip:0x7f967e91d64e eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x106 fs_base:0x7f967efd1e80 gs_base:0x0
  { tid:32354, addr:0x7ffe0e33fbe0, length:0x90 }
}
{
  real_time:8147.768512 global_time:33963, event:`SYSCALLBUF_FLUSH' tid:32354, ticks:319647896
  { syscall:'getdents64', ret:0x68, size:0x78 }
  { syscall:'access', ret:0x0, size:0x10 }
  { syscall:'access', ret:0x0, size:0x10 }
  { syscall:'access', ret:0x0, size:0x10 }
  { syscall:'getdents64', ret:0x0, size:0x10 }
  { syscall:'close', ret:0x0, size:0x10 }
  { syscall:'openat', ret:0x3, size:0x10, desched:1 }
  { syscall:'readlinkat', ret:0x42, size:0x52 }
}
{
  real_time:8147.768518 global_time:33964, event:`SYSCALL: fstatat' (state:ENTERING_SYSCALL) tid:32354, ticks:319647896
rax:0xffffffffffffffda rbx:0x5617bdacb4d0 rcx:0xffffffffffffffff rdx:0x7ffe0e340ae0 rsi:0x7f967e9b8bd5 rdi:0x3 rbp:0x7f967e9f5070 rsp:0x7ffe0e340ad8 r8:0x8 r9:0x1 r10:0x1000 r11:0x246 r12:0x7ffe0e340c80 r13:0x100 r14:0x7ffe0e340c80 r15:0x7f967efd1e80 rip:0x7f967e91d64e eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x106 fs_base:0x7f967efd1e80 gs_base:0x0
}
{
  real_time:8147.768522 global_time:33965, event:`SYSCALLBUF_RESET' tid:32354, ticks:319647896
}
{
  real_time:8147.768545 global_time:33966, event:`SYSCALL: fstatat' (state:EXITING_SYSCALL) tid:32354, ticks:319647896
rax:0x0 rbx:0x5617bdacb4d0 rcx:0xffffffffffffffff rdx:0x7ffe0e340ae0 rsi:0x7f967e9b8bd5 rdi:0x3 rbp:0x7f967e9f5070 rsp:0x7ffe0e340ad8 r8:0x8 r9:0x1 r10:0x1000 r11:0x246 r12:0x7ffe0e340c80 r13:0x100 r14:0x7ffe0e340c80 r15:0x7f967efd1e80 rip:0x7f967e91d64e eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x106 fs_base:0x7f967efd1e80 gs_base:0x0
  { tid:32354, addr:0x7ffe0e340ae0, length:0x90 }
}
{
  real_time:8147.770177 global_time:33967, event:`SYSCALLBUF_FLUSH' tid:32354, ticks:319767963
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x1000, size:0x1010, desched:1 }
  { syscall:'read', ret:0x59a, size:0x5aa, desched:1 }
  { syscall:'read', ret:0x0, size:0x10, desched:1 }
  { syscall:'lseek', ret:0x1159a, size:0x10 }
  { syscall:'lseek', ret:0x0, size:0x10 }
  { syscall:'lseek', ret:0x0, size:0x10 }
  { syscall:'ioctl', ret:0x0, size:0x10 }
  { syscall:'read', ret:0x11000, size:0x10, desched:1 }
  { syscall:'read', ret:0x59a, size:0x5aa, desched:1 }
}
{
  real_time:8147.770183 global_time:33968, event:`SYSCALL: brk' (state:ENTERING_SYSCALL) tid:32354, ticks:319767963
rax:0xffffffffffffffda rbx:0x5617be8bb000 rcx:0xffffffffffffffff rdx:0x0 rsi:0x7f967e9f6ac0 rdi:0x5617be8bb000 rbp:0x5617be89a000 rsp:0x7ffe0e340808 r8:0x20 r9:0x1 r10:0x5617be899fe0 r11:0x246 r12:0x7f967e9fe090 r13:0x1000 r14:0x40 r15:0x21000 rip:0x7f967e92305b eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xc fs_base:0x7f967efd1e80 gs_base:0x0
}
{
  real_time:8147.770189 global_time:33969, event:`SYSCALLBUF_RESET' tid:32354, ticks:319767963
}
=== Start rr backtrace:
rr(_ZN2rr13dump_rr_stackEv+0x5e)[0x561de6223cce]
rr(_ZN2rr9GdbServer15emergency_debugEPNS_4TaskE+0xc9)[0x561de60b5609]
rr(+0xf2850)[0x561de60c9850]
rr(_ZN2rr21EmergencyDebugOstreamD1Ev+0x8f)[0x561de60c9c0f]
rr(_ZN2rr13ReplaySession13enter_syscallEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0x152)[0x561de618a1f2]
rr(_ZN2rr13ReplaySession18try_one_trace_stepEPNS_10ReplayTaskERKNS0_15StepConstraintsE+0xfe)[0x561de61939de]
rr(_ZN2rr13ReplaySession11replay_stepERKNS0_15StepConstraintsE+0x1c3)[0x561de6193f33]
rr(_ZN2rr14ReplayTimeline19replay_step_forwardENS_10RunCommandE+0xc1)[0x561de61b27b1]
rr(_ZN2rr9GdbServer14debug_one_stepERNS_10GdbRequestE+0xb71)[0x561de60b45f1]
rr(_ZN2rr9GdbServer12serve_replayERKNS0_15ConnectionFlagsE+0x7cb)[0x561de60b511b]
rr(+0x1ada36)[0x561de6184a36]
rr(_ZN2rr13ReplayCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x51c)[0x561de618558c]
rr(main+0x1dc)[0x561de603bd7c]
/usr/lib/libc.so.6(+0x27cd0)[0x7fddfbc45cd0]
/usr/lib/libc.so.6(__libc_start_main+0x8a)[0x7fddfbc45d8a]
rr(_start+0x25)[0x561de603bfe5]
=== End rr backtrace
Launch gdb with
  gdb '-l' '10000' '-ex' 'set sysroot /' '-ex' 'target extended-remote 127.0.0.1:32428' /mnt/nobackup/rr/python-0/mmap_clone_5_python3.11

Keno mentioned this issue Jun 23, 2022

LibGit2/test: Print failed process PID on challenge prompt failure JuliaLang/julia#45798

Closed

Keno mentioned this issue Jun 25, 2022

rr invents spurious syscall restart #3287

Closed

Keno added a commit that referenced this issue Jul 8, 2022

Add test case for switching out of desched syscall

74ae10a

This adds a test case to model #3285, where the test case pokes the sigframe to force sigreturn to switch to a different function than that which incurred the signal.

Keno linked a pull request Jul 8, 2022 that will close this issue

Redesign syscallbuf to always unwind on interruption #3322

Open

Keno added a commit that referenced this issue Jul 9, 2022

Add test case for switching out of desched syscall

8971638

This adds a test case to model #3285, where the test case pokes the sigframe to force sigreturn to switch to a different function than that which incurred the signal.

Keno added a commit that referenced this issue Jul 9, 2022

Add test case for switching out of desched syscall

7eb33ab

This adds a test case to model #3285, where the test case pokes the sigframe to force sigreturn to switch to a different function than that which incurred the signal.

Keno added a commit that referenced this issue Jul 17, 2022

Add test case for switching out of desched syscall

618715c

This adds a test case to model #3285, where the test case pokes the sigframe to force sigreturn to switch to a different function than that which incurred the signal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replay failed with "Expected syscall_bp_vm to be clear" #3285

Replay failed with "Expected syscall_bp_vm to be clear" #3285

Keno commented Jun 23, 2022

khuey commented Jun 23, 2022

Keno commented Jun 23, 2022

Keno commented Jun 25, 2022

rocallahan commented Jun 25, 2022

rocallahan commented Jun 25, 2022

Keno commented Jul 1, 2022

rocallahan commented Jul 1, 2022

Keno commented Jul 5, 2022

Keno commented Jul 5, 2022

yuyichao commented Jul 5, 2022

Keno commented Jul 5, 2022

yuyichao commented Jul 5, 2022

Keno commented Jul 5, 2022

Keno commented Jul 5, 2022

yuyichao commented Jul 5, 2022

Keno commented Jul 5, 2022

yuyichao commented Jul 5, 2022

yuyichao commented Jul 5, 2022

Keno commented Jul 5, 2022

Keno commented Jul 5, 2022 •

edited

Loading

rocallahan commented Jul 7, 2022

Keno commented Jul 7, 2022

rocallahan commented Jul 7, 2022

Keno commented Jul 7, 2022

cebtenzzre commented Sep 29, 2023

Replay failed with "Expected syscall_bp_vm to be clear" #3285

Replay failed with "Expected syscall_bp_vm to be clear" #3285

Comments

Keno commented Jun 23, 2022

khuey commented Jun 23, 2022

Keno commented Jun 23, 2022

Keno commented Jun 25, 2022

rocallahan commented Jun 25, 2022

rocallahan commented Jun 25, 2022

Keno commented Jul 1, 2022

rocallahan commented Jul 1, 2022

Keno commented Jul 5, 2022

Keno commented Jul 5, 2022

yuyichao commented Jul 5, 2022

Keno commented Jul 5, 2022

yuyichao commented Jul 5, 2022

Keno commented Jul 5, 2022

Keno commented Jul 5, 2022

yuyichao commented Jul 5, 2022

Keno commented Jul 5, 2022

yuyichao commented Jul 5, 2022

yuyichao commented Jul 5, 2022

Keno commented Jul 5, 2022

Keno commented Jul 5, 2022 • edited Loading

rocallahan commented Jul 7, 2022

Keno commented Jul 7, 2022

rocallahan commented Jul 7, 2022

Keno commented Jul 7, 2022

cebtenzzre commented Sep 29, 2023

Keno commented Jul 5, 2022 •

edited

Loading