Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many Innovative FlameGraph using in Different Scenarios #325

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

bboymimi
Copy link

I have implemented various Flamegraph applications that can fit in
different situation:

This is the step to generate the flamegraph.svg:
"./stackcollapse-kernel.pl --all_calltrace --no_offset
~/Downloads/306942-kern.log | ./flamegraph.pl --hash --bgcolor "#c3c7c7"

oom.svg"
example: https://people.canonical.com/~gavinguo/318898/

  • 2). Ftrace latency flamegraph parser
    bboymimi@3f759f1
    sudo trace-cmd record -p function_graph -a -g __x64_sys_pread64 sleep 1
    sudo trace-cmd report > pread64.log
    ./stackcollapse-ftracelat.pl ~/os/FlameGraph/pread64.log| ./flamegraph.pl --count ns --hash --bgcolor "#c3c7c7" --title "pread64" > pread64.svg

It's based on the function latency instead of the sample count.
Example:
https://people.canonical.com/~gavinguo/flamegraph/304837/

  • 3). kdump flamegraph (command: "foreach UN bt"):
    ~/os/FlameGraph/stackcollapse-crashbt.pl ~/Downloads/foreach-UN-bt.log | grep -v idle | ~/os/FlameGraph/flamegraph.pl --title "Flamegraph of crashdump backtrace" --subtitle "foreach UN bt" --hash > foreach-UN-bt.svg

https://people.canonical.com/~gavinguo/flamegraph/304304_foreach-UN-bt.svg
https://people.canonical.com/~gavinguo/flamegraph/304304_foreach_bt.svg
https://people.canonical.com/~gavinguo/flamegraph/304304_bt-a.svg

  • 4). Memory allocation flamegraph
    [stackcollapse] Add the size based memory allocation flamegraph
    bboymimi@c4612ab
    1). sudo trace-cmd record -l __alloc_pages_nodemask -e kmem:mm_page_alloc -T sleep 5
    2). trace-cmd report --cpu 1 > trace-cmd.log # generate the event only
    from CPU 1
    3). FlameGraph/stackcollapse-tracecmd.pl --process_name --page_type --page_order ./trace-cmd.log | ~/os/FlameGraph/flamegraph.pl --countname kb > ./trace-cmd.svg
    4). firefox ./trace-cmd.svg # or google-chrome-stable

This is the method to generate the memory flamegraph. And of course, the memory flamegraph can be
expanded to the general usage by the trace-cmd record to trace a specific call path profiling.

Gavin Guo added 8 commits March 30, 2021 14:53
…ace too

As the stackcollparse-tracecmd.pl is originally written for the
trace-cmd output, however, the trace-cmd is from the Ftrace output. So,
the patch is to improve the parsing of the format to apply on the Ftrace
original output too(/sys/kerne/debug/tracing/trace).

In the future, the parser can be improved in many perspectives:
1). memory: To base on the byte size/number of pages allocated/latency/order/pid/migrate type...
2). IO: to based on the latency.

Signed-off-by: Gavin Guo <[email protected]>
vmlinux debug symbol

When the user on the Ubuntu installs the debug sybmol, the "perf script"
output would have the following format:

gnome-shell  5190 [003] 26473.858920:     179427 cycles:
        ffffffff81aec99b _raw_spin_lock_irqsave+0xb (/usr/lib/debug/boot/vmlinux-5.4.0-28-lowlatency)
        ffffffffc05738e1 gen6_write32+0x51 (/lib/modules/5.4.0-28-lowlatency/kernel/drivers/gpu/drm/i915/i915.ko)
        ffffffffc059317e i9xx_submit_request+0x3e (/lib/modules/5.4.0-28-lowlatency/kernel/drivers/gpu/drm/i915/i915.ko)
        ffffffffc05b9a83 submit_notify+0x63 (/lib/modules/5.4.0-28-lowlatency/kernel/drivers/gpu/drm/i915/i915.ko)
        ffffffffc05784c0 __i915_sw_fence_complete+0x40 (/lib/modules/5.4.0-28-lowlatency/kernel/drivers/gpu/drm/i915/i915.ko)

The patch improves the regular expression technique to scan the vmlinux
and ko on the Ubuntu platform. And the improvement can make the
different color scheme work for the flamegraph.pl java palette.

Signed-off-by: Gavin Guo <[email protected]>
When using Ftrace/trace-cmd to profile the memory allocation source, it
will be much more useful to generate the memory allocation size
flamegraph instead of the sampling count flamegraph.

The steps:
1). sudo trace-cmd record -l __alloc_pages_nodemask -e kmem:mm_page_alloc  -T sleep 5
2). trace-cmd report --cpu 1 > trace-cmd.log # generate the event only
from CPU 1
3). FlameGraph/stackcollapse-tracecmd.pl --process_name --page_type --page_order  ./trace-cmd.log | ~/os/FlameGraph/flamegraph.pl --countname kb > ./trace-cmd.svg
4). firefox ./trace-cmd.svg # or google-chrome-stable

The --page_type is used to classify based on the migration/gfp_flags.
The --page_order is to classify the callstack by the allcation order.

The stackcollapse-tracecmd.pl supports both ftrace/trace-cmd format
parsing.

Signed-off-by: Gavin Guo <[email protected]>
The bt(backtrace) command is pretty useful to diagnose the
hungtask/softlockup/panic issue inside the Kdump. Add the
stackcollapse-crashbt.pl parser, and it can parse the output of "bt"
command. The following example is parsing the output of "foreach UN bt"
from the crashdump environment:

~/os/FlameGraph/stackcollapse-crashbt.pl ~/Downloads/foreach-UN-bt.log | grep -v idle | ~/os/FlameGraph/flamegraph.pl --title "Flamegraph of crashdump backtrace" --subtitle "foreach UN bt"   --hash   > foreach-UN-bt.svg

https://people.canonical.com/~gavinguo/flamegraph/304304_foreach-UN-bt.svg
https://people.canonical.com/~gavinguo/flamegraph/304304_foreach_bt.svg
https://people.canonical.com/~gavinguo/flamegraph/304304_bt-a.svg

Signed-off-by: Gavin Guo <[email protected]>
The kernel log may have a lot of call traces and is hard to diagnose by
human reading. The parser will extract the OOM/hungtask/softlockup
information. For the OOM, the order and GFP flags will be used as the
classification.

There are still some bugs needing to be fixed:
1). Some call traces cannot figure out the kernel version and type of
call trace(oom/hungtask/softlockup).

2). The BUG() warning messages need to be handled instead of classifying
into the unknown call trace.

3). The type rcu stall detection needs to be implemented.

4). The nested stack which is above the RIP line needs to be implemented.

5). The <IRQ> nested stack needs to be identified.

The options can be showed by "./stackcollapse-kernel.pl -h":

The important one to mention:

"--all_calltrace": # Show all the call trace even there is no
OOM/hungtask/soft lockup type error. It's useful when there are call
traces captured inside the kern.log, and it doesn't match any types
already implemented.

For example:
"./stackcollapse-kernel.pl --all_calltrace --no_offset
~/Downloads/306942-kern.log | ./flamegraph.pl --hash --bgcolor "#c3c7c7"
> oom.svg"

Signed-off-by: Gavin Guo <[email protected]>
flamegraph

Latency Flamegraph is useful to identify the slow part of the function
call. It's especially important in the IO operations. Ftrace with
function_graph filter is good to generate the callgraph. However, it's
lacking of the method to analysize easily. The Latency flamegraph can be
used to visualize the maximum latency function and figure out the most
slow part.

sudo trace-cmd record -p function_graph -a -g __x64_sys_pread64 sleep 1
sudo trace-cmd report > pread64.log
./stackcollapse-ftracelat.pl ~/os/FlameGraph/pread64.log|  ./flamegraph.pl --count ns --hash --bgcolor "#c3c7c7" --title "pread64"  > pread64.svg

$ grep -P '\:\s+[$@*#!+\s]+\s+\d+\.\d+\s+\w+\s+\|\s{2}\}' ./trace-physical-20210405-10000.log | sort -rn -k6 | head -n 10
           <...>-69485 [023] 1075935.727961: funcgraph_exit:       ! 330.199 us |  }
           <...>-69485 [024] 1075918.004610: funcgraph_exit:       ! 272.085 us |  }
           <...>-69485 [016] 1075947.017290: funcgraph_exit:       ! 264.160 us |  }
           <...>-69485 [029] 1075904.191274: funcgraph_exit:       ! 262.039 us |  }
           <...>-69485 [021] 1075944.495187: funcgraph_exit:       ! 260.546 us |  }
           <...>-69485 [029] 1075904.372854: funcgraph_exit:       ! 244.026 us |  }
           <...>-69485 [029] 1075917.470951: funcgraph_exit:       ! 217.430 us |  }
           <...>-69485 [029] 1075916.952132: funcgraph_exit:       ! 213.317 us |  }
           <...>-69485 [029] 1075904.219169: funcgraph_exit:       ! 210.568 us |  }
           <...>-69485 [029] 1075911.971882: funcgraph_exit:       ! 209.655 us |  }

$ trace-cmd report --profile -i ./trace-virtual-20210405-10000.dat | grep SyS_writev
  Event: func: SyS_writev() (10000) Total: 1387394948 Avg: 138739 Max: 6456840(ts:2877935.156972) Min:56775(ts:2877967.138703)

Signed-off-by: Gavin Guo <[email protected]>
flamegraph and the symbol sorting table

stackcollapse-tracecmd.py is to reimplemented the
stackcollapse-tracecmd.pl. The Perl version has some bugs and also hard
to do the statistics analysis compared to Python version. So, just
reimplement the parser in Python and do the time/count statistics.

$ sudo taskset -c 0 trace-cmd record -p function_graph -g proc_sys_write
sysctl -w vm.nr_hugepages=2
$ trace-cmd report > test.log
$ cp test.log test1.log
$ python3 ./scan_symbol.py test.log test1.log -f > analysis.log
$ ls -ltrh test*
-rw-rw-r-- 1 gavin gavin 1.6K May  6 09:54 test.log
-rw-rw-r-- 1 gavin gavin 1.6K May  7 14:29 test1.log
-rw-rw-r-- 1 gavin gavin  516 May 27 15:55 test_flamegraph.log
-rw-rw-r-- 1 gavin gavin  527 May 27 15:55 test1_flamegraph.log

$ python3 ./scan_symbol.py test.log test1.log -f -m
$ ls -ltrh merged_flamegraph.log
-rw-rw-r-- 1 gavin gavin 1.1K May 27 15:56 merged_flamegraph.log

$ cat merged_flamegraph.log | ~/os/FlameGraph/flamegraph.pl --count ns --hash --bgcolor "#c3c7c7" --title "test_merge"  > test_merge.svg

Signed-off-by: Gavin Guo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant