Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early WIP: rr/sampling profiler hybrid #1754

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Keno
Copy link
Member

@Keno Keno commented Jul 13, 2016

This is an idea I've been kicking around. When doing sampling profiling, you really want to minimize the amount of work you do while sampling in order for it to a) be fast and b) not disrupt the program too much. Unfortunately this is of course in direct conflict with actually collecting anything useful. What I'm proposing here is to use rr to do the actual data collection as a post-processing step. The way this is done is to sample the current ip and the value of the retired branch counter, thus hopefully allowing us to find this position again during replay and do whatever we want to do (backtrace, collect values, more fancy things, etc...).

This is nowhere near done, but I figured people may have early feedback.

@bgirard
Copy link
Contributor

bgirard commented Jul 13, 2016

Looks very interesting! Thanks for exploring this. You forgot to mention an important benefit from this approach which is having the option to 'Jump to the debugger' when looking at a profile to analyze what causes a slow path.

It looks like in the patch you're 'sampling' every 4k CPU cycles (for the current process?). AIUI this would effectively be a 'user/process' CPU time trigger rather than a wall clock trigger?

@rocallahan
Copy link
Collaborator

Seems to me you could send a signal to the tracee, like the perf-event signal now, that interrupts the tracee and is treated like any other async signal by rr so you can easily replay to delivery of that signal using the existing logic.

@Keno
Copy link
Member Author

Keno commented Jul 14, 2016

Looks very interesting! Thanks for exploring this. You forgot to mention an important benefit from this approach which is having the option to 'Jump to the debugger' when looking at a profile to analyze what causes a slow path.

Yes, that's a little tricky of course with sampling. However, I do consider this essentially the same problem. In theory your sampling profiler could just walk the stack, record all variables, etc, but in practice nobody does because it would make sampling impractical.

It looks like in the patch you're 'sampling' every 4k CPU cycles (for the current process?). AIUI this would effectively be a 'user/process' CPU time trigger rather than a wall clock trigger?

Not quite, by setting the freq field. I'm asking for a sample at 4kHz, i.e. wall clock time.

@Keno
Copy link
Member Author

Keno commented Jul 14, 2016

Seems to me you could send a signal to the tracee, like the perf-event signal now, that interrupts the tracee and is treated like any other async signal by rr so you can easily replay to delivery of that signal using the existing logic.

I was hoping to avoid the overhead of the extra context switch.

@GitMensch
Copy link
Contributor

GitMensch commented Dec 29, 2021

@Keno Would you mind to rebase the changes on current master?
Are the changes to PerfCounters.cc "intrusive" and/or the new command not usable? If yes then it seems reasonable to convert this PR to a draft, otherwise it may could go in as experimental feature instead of laying around another 5 years...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants