-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many Innovative FlameGraph using in Different Scenarios #325
Open
bboymimi
wants to merge
8
commits into
brendangregg:master
Choose a base branch
from
bboymimi:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Gavin Guo <[email protected]>
…ace too As the stackcollparse-tracecmd.pl is originally written for the trace-cmd output, however, the trace-cmd is from the Ftrace output. So, the patch is to improve the parsing of the format to apply on the Ftrace original output too(/sys/kerne/debug/tracing/trace). In the future, the parser can be improved in many perspectives: 1). memory: To base on the byte size/number of pages allocated/latency/order/pid/migrate type... 2). IO: to based on the latency. Signed-off-by: Gavin Guo <[email protected]>
vmlinux debug symbol When the user on the Ubuntu installs the debug sybmol, the "perf script" output would have the following format: gnome-shell 5190 [003] 26473.858920: 179427 cycles: ffffffff81aec99b _raw_spin_lock_irqsave+0xb (/usr/lib/debug/boot/vmlinux-5.4.0-28-lowlatency) ffffffffc05738e1 gen6_write32+0x51 (/lib/modules/5.4.0-28-lowlatency/kernel/drivers/gpu/drm/i915/i915.ko) ffffffffc059317e i9xx_submit_request+0x3e (/lib/modules/5.4.0-28-lowlatency/kernel/drivers/gpu/drm/i915/i915.ko) ffffffffc05b9a83 submit_notify+0x63 (/lib/modules/5.4.0-28-lowlatency/kernel/drivers/gpu/drm/i915/i915.ko) ffffffffc05784c0 __i915_sw_fence_complete+0x40 (/lib/modules/5.4.0-28-lowlatency/kernel/drivers/gpu/drm/i915/i915.ko) The patch improves the regular expression technique to scan the vmlinux and ko on the Ubuntu platform. And the improvement can make the different color scheme work for the flamegraph.pl java palette. Signed-off-by: Gavin Guo <[email protected]>
When using Ftrace/trace-cmd to profile the memory allocation source, it will be much more useful to generate the memory allocation size flamegraph instead of the sampling count flamegraph. The steps: 1). sudo trace-cmd record -l __alloc_pages_nodemask -e kmem:mm_page_alloc -T sleep 5 2). trace-cmd report --cpu 1 > trace-cmd.log # generate the event only from CPU 1 3). FlameGraph/stackcollapse-tracecmd.pl --process_name --page_type --page_order ./trace-cmd.log | ~/os/FlameGraph/flamegraph.pl --countname kb > ./trace-cmd.svg 4). firefox ./trace-cmd.svg # or google-chrome-stable The --page_type is used to classify based on the migration/gfp_flags. The --page_order is to classify the callstack by the allcation order. The stackcollapse-tracecmd.pl supports both ftrace/trace-cmd format parsing. Signed-off-by: Gavin Guo <[email protected]>
The bt(backtrace) command is pretty useful to diagnose the hungtask/softlockup/panic issue inside the Kdump. Add the stackcollapse-crashbt.pl parser, and it can parse the output of "bt" command. The following example is parsing the output of "foreach UN bt" from the crashdump environment: ~/os/FlameGraph/stackcollapse-crashbt.pl ~/Downloads/foreach-UN-bt.log | grep -v idle | ~/os/FlameGraph/flamegraph.pl --title "Flamegraph of crashdump backtrace" --subtitle "foreach UN bt" --hash > foreach-UN-bt.svg https://people.canonical.com/~gavinguo/flamegraph/304304_foreach-UN-bt.svg https://people.canonical.com/~gavinguo/flamegraph/304304_foreach_bt.svg https://people.canonical.com/~gavinguo/flamegraph/304304_bt-a.svg Signed-off-by: Gavin Guo <[email protected]>
The kernel log may have a lot of call traces and is hard to diagnose by human reading. The parser will extract the OOM/hungtask/softlockup information. For the OOM, the order and GFP flags will be used as the classification. There are still some bugs needing to be fixed: 1). Some call traces cannot figure out the kernel version and type of call trace(oom/hungtask/softlockup). 2). The BUG() warning messages need to be handled instead of classifying into the unknown call trace. 3). The type rcu stall detection needs to be implemented. 4). The nested stack which is above the RIP line needs to be implemented. 5). The <IRQ> nested stack needs to be identified. The options can be showed by "./stackcollapse-kernel.pl -h": The important one to mention: "--all_calltrace": # Show all the call trace even there is no OOM/hungtask/soft lockup type error. It's useful when there are call traces captured inside the kern.log, and it doesn't match any types already implemented. For example: "./stackcollapse-kernel.pl --all_calltrace --no_offset ~/Downloads/306942-kern.log | ./flamegraph.pl --hash --bgcolor "#c3c7c7" > oom.svg" Signed-off-by: Gavin Guo <[email protected]>
flamegraph Latency Flamegraph is useful to identify the slow part of the function call. It's especially important in the IO operations. Ftrace with function_graph filter is good to generate the callgraph. However, it's lacking of the method to analysize easily. The Latency flamegraph can be used to visualize the maximum latency function and figure out the most slow part. sudo trace-cmd record -p function_graph -a -g __x64_sys_pread64 sleep 1 sudo trace-cmd report > pread64.log ./stackcollapse-ftracelat.pl ~/os/FlameGraph/pread64.log| ./flamegraph.pl --count ns --hash --bgcolor "#c3c7c7" --title "pread64" > pread64.svg $ grep -P '\:\s+[$@*#!+\s]+\s+\d+\.\d+\s+\w+\s+\|\s{2}\}' ./trace-physical-20210405-10000.log | sort -rn -k6 | head -n 10 <...>-69485 [023] 1075935.727961: funcgraph_exit: ! 330.199 us | } <...>-69485 [024] 1075918.004610: funcgraph_exit: ! 272.085 us | } <...>-69485 [016] 1075947.017290: funcgraph_exit: ! 264.160 us | } <...>-69485 [029] 1075904.191274: funcgraph_exit: ! 262.039 us | } <...>-69485 [021] 1075944.495187: funcgraph_exit: ! 260.546 us | } <...>-69485 [029] 1075904.372854: funcgraph_exit: ! 244.026 us | } <...>-69485 [029] 1075917.470951: funcgraph_exit: ! 217.430 us | } <...>-69485 [029] 1075916.952132: funcgraph_exit: ! 213.317 us | } <...>-69485 [029] 1075904.219169: funcgraph_exit: ! 210.568 us | } <...>-69485 [029] 1075911.971882: funcgraph_exit: ! 209.655 us | } $ trace-cmd report --profile -i ./trace-virtual-20210405-10000.dat | grep SyS_writev Event: func: SyS_writev() (10000) Total: 1387394948 Avg: 138739 Max: 6456840(ts:2877935.156972) Min:56775(ts:2877967.138703) Signed-off-by: Gavin Guo <[email protected]>
flamegraph and the symbol sorting table stackcollapse-tracecmd.py is to reimplemented the stackcollapse-tracecmd.pl. The Perl version has some bugs and also hard to do the statistics analysis compared to Python version. So, just reimplement the parser in Python and do the time/count statistics. $ sudo taskset -c 0 trace-cmd record -p function_graph -g proc_sys_write sysctl -w vm.nr_hugepages=2 $ trace-cmd report > test.log $ cp test.log test1.log $ python3 ./scan_symbol.py test.log test1.log -f > analysis.log $ ls -ltrh test* -rw-rw-r-- 1 gavin gavin 1.6K May 6 09:54 test.log -rw-rw-r-- 1 gavin gavin 1.6K May 7 14:29 test1.log -rw-rw-r-- 1 gavin gavin 516 May 27 15:55 test_flamegraph.log -rw-rw-r-- 1 gavin gavin 527 May 27 15:55 test1_flamegraph.log $ python3 ./scan_symbol.py test.log test1.log -f -m $ ls -ltrh merged_flamegraph.log -rw-rw-r-- 1 gavin gavin 1.1K May 27 15:56 merged_flamegraph.log $ cat merged_flamegraph.log | ~/os/FlameGraph/flamegraph.pl --count ns --hash --bgcolor "#c3c7c7" --title "test_merge" > test_merge.svg Signed-off-by: Gavin Guo <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have implemented various Flamegraph applications that can fit in
different situation:
[stackcollapse] Add the stackcollapse-kernel.pl to parse dmesg/kern.log
bboymimi@bbd2a30
This is the step to generate the flamegraph.svg:
"./stackcollapse-kernel.pl --all_calltrace --no_offset
~/Downloads/306942-kern.log | ./flamegraph.pl --hash --bgcolor "#c3c7c7"
bboymimi@3f759f1
sudo trace-cmd record -p function_graph -a -g __x64_sys_pread64 sleep 1
sudo trace-cmd report > pread64.log
./stackcollapse-ftracelat.pl ~/os/FlameGraph/pread64.log| ./flamegraph.pl --count ns --hash --bgcolor "#c3c7c7" --title "pread64" > pread64.svg
It's based on the function latency instead of the sample count.
Example:
https://people.canonical.com/~gavinguo/flamegraph/304837/
~/os/FlameGraph/stackcollapse-crashbt.pl ~/Downloads/foreach-UN-bt.log | grep -v idle | ~/os/FlameGraph/flamegraph.pl --title "Flamegraph of crashdump backtrace" --subtitle "foreach UN bt" --hash > foreach-UN-bt.svg
https://people.canonical.com/~gavinguo/flamegraph/304304_foreach-UN-bt.svg
https://people.canonical.com/~gavinguo/flamegraph/304304_foreach_bt.svg
https://people.canonical.com/~gavinguo/flamegraph/304304_bt-a.svg
[stackcollapse] Add the size based memory allocation flamegraph
bboymimi@c4612ab
1). sudo trace-cmd record -l __alloc_pages_nodemask -e kmem:mm_page_alloc -T sleep 5
2). trace-cmd report --cpu 1 > trace-cmd.log # generate the event only
from CPU 1
3). FlameGraph/stackcollapse-tracecmd.pl --process_name --page_type --page_order ./trace-cmd.log | ~/os/FlameGraph/flamegraph.pl --countname kb > ./trace-cmd.svg
4). firefox ./trace-cmd.svg # or google-chrome-stable
This is the method to generate the memory flamegraph. And of course, the memory flamegraph can be
expanded to the general usage by the trace-cmd record to trace a specific call path profiling.