[clang] [analyzer][docs] Document how to use perf and uftrace to debug performance issues (PR #126520)
via cfe-commits
cfe-commits at lists.llvm.org
Mon Feb 10 06:15:26 PST 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-clang-static-analyzer-1
Author: Balazs Benics (steakhal)
<details>
<summary>Changes</summary>
---
Patch is 1.37 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/126520.diff
3 Files Affected:
- (modified) clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst (+91)
- (added) clang/docs/analyzer/images/flamegraph.svg (+27641)
- (added) clang/docs/analyzer/images/uftrace_detailed.png ()
``````````diff
diff --git a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
index 3ee6e117a846528..c33853aca76168d 100644
--- a/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
+++ b/clang/docs/analyzer/developer-docs/PerformanceInvestigation.rst
@@ -5,6 +5,9 @@ Performance Investigation
Multiple factors contribute to the time it takes to analyze a file with Clang Static Analyzer.
A translation unit contains multiple entry points, each of which take multiple steps to analyze.
+Performance analysis using ``-ftime-trace``
+===========================================
+
You can add the ``-ftime-trace=file.json`` option to break down the analysis time into individual entry points and steps within each entry point.
You can explore the generated JSON file in a Chromium browser using the ``chrome://tracing`` URL,
or using `speedscope <https://speedscope.app>`_.
@@ -45,3 +48,91 @@ Note: Both Chrome-tracing and speedscope tools might struggle with time traces a
Luckily, in most cases the default max-steps boundary of 225 000 produces the traces of approximately that size
for a single entry point.
You can use ``-analyze-function=get_global_options`` together with ``-ftime-trace`` to narrow down analysis to a specific entry point.
+
+
+Performance analysis using ``perf``
+===================================
+
+`Perf <https://perfwiki.github.io/main/>`_ is an excellent tool for sampling-based profiling of an application.
+It's easy to start profiling, you only have 2 prerequisites.
+Build with ``-fno-omit-frame-pointer`` and debug info (``-g``).
+You can use release builds, but probably the easiest is to set the ``CMAKE_BUILD_TYPE=RelWithDebInfo``
+along with ``CMAKE_CXX_FLAGS="-fno-omit-frame-pointer"`` when configuring ``llvm``.
+Here is how to `get started <https://llvm.org/docs/CMake.html#quick-start>`_ if you are in trouble.
+
+.. code-block:: bash
+ :caption: Running the Clang Static Analyzer through ``perf`` to gather samples of the execution.
+
+ # -F: Sampling frequency, use `-F max` for maximal frequency
+ # -g: Enable call-graph recording for both kernel and user space
+ perf record -F 99 -g -- clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \
+ -setup-static-analyzer -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
+ -verify ./clang/test/Analysis/string.c
+
+Once you have the profile data, you can use it to produce a Flame graph.
+A Flame graph is a visual representation of the stack frames of the samples.
+Common stack frame prefixes are squashed together, making up a wider bar.
+The wider the bar, the more time was spent under that particular stack frame,
+giving a sense of how the overall execution time was spent.
+
+Clone the `FlameGraph <https://github.com/brendangregg/FlameGraph>`_ git repository,
+as we will use some scripts from there to convert the ``perf`` samples into a Flame graph.
+It's also useful to check out Brendan Gregg's (the author of FlameGraph)
+`homepage <https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html>`_.
+
+
+.. code-block:: bash
+ :caption: Converting the ``perf`` profile into a Flamegraph, then opening it in Firefox.
+
+ perf script | /path/to/FlameGraph/stackcollapse-perf.pl > perf.folded
+ /path/to/FlameGraph/flamegraph.pl perf.folded > perf.svg
+ firefox perf.svg
+
+.. image:: ../images/flamegraph.svg
+
+
+Performance analysis using ``uftrace``
+======================================
+
+`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data
+that you could use to focus and drill down into the timeline of your application.
+We will use it to generate Chromium trace JSON.
+In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and through than the sampling-based approaches like ``perf``.
+In contrast to using `-ftime-trace`, functions don't need to opt-in to be profiled using ``llvm::TimeTraceScope``.
+All functions are profiled due to static instrumentation.
+
+There is only one prerequisite to use this tool.
+You need to build the binary you are about to instrument using ``-pg`` or ``-finstrument-functions``.
+This will make it run substantially slower but allows rich instrumentation.
+
+.. code-block:: bash
+ :caption: Recording with ``uftrace``, then dumping the result as a Chrome trace JSON.
+
+ uftrace record clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \
+ -setup-static-analyzer -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
+ -verify ./clang/test/Analysis/string.c
+ uftrace dump --filter=".*::AnalysisConsumer::HandleTranslationUnit" --time-filter=300 --chrome > trace.json
+
+.. image:: ../images/uftrace_detailed.png
+
+In this picture, you can see the functions below the Static Analyzer's entry point, which takes at least 300 nanoseconds to run, visualized by Chrome's ``about:tracing`` page
+You can also see how deep function calls we may have due to AST visitors.
+
+Using different filters can reduce the number of functions to record.
+For the `common options <https://github.com/namhyung/uftrace/blob/master/doc/uftrace-record.md#common-options>`_, refer to the ``uftrace`` documentation.
+
+Similar filters could be applied for dumping too. That way you can reuse the same (detailed)
+recording to selectively focus on some special part using a refinement of the filter flags.
+Remember, the trace JSON needs to fit into Chrome's ``about:tracing`` or `speedscope <https://speedscope.app>`_,
+thus it needs to be of a limited size.
+In that case though, every dump operation would need to sieve through the whole recording if called repeatedly.
+
+If the trace JSON is still too large to load, have a look at the dump and look for frequent entries that refer to non-interesting parts.
+Once you have some of those, add them as ``--hide`` flags to the ``uftrace dump`` call.
+To see what functions appear frequently in the trace, use this command:
+
+.. code-block:: bash
+
+ cat trace.json | grep -Po '"name":"(.+)"' | sort | uniq -c | sort -nr | head -n 50
+
+``uftrace`` can also dump the report as a Flame graph using ``uftrace dump --framegraph``.
diff --git a/clang/docs/analyzer/images/flamegraph.svg b/clang/docs/analyzer/images/flamegraph.svg
new file mode 100644
index 000000000000000..d3c4a22c9ff536a
--- /dev/null
+++ b/clang/docs/analyzer/images/flamegraph.svg
@@ -0,0 +1,27641 @@
+<?xml version="1.0" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg version="1.1" width="1200" height="934" onload="init(evt)" viewBox="0 0 1200 934" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+<!-- Flame graph stack visualization. See https://github.com/brendangregg/FlameGraph for latest version, and http://www.brendangregg.com/flamegraphs.html for examples. -->
+<!-- NOTES: -->
+<defs>
+ <linearGradient id="background" y1="0" y2="1" x1="0" x2="0" >
+ <stop stop-color="#eeeeee" offset="5%" />
+ <stop stop-color="#eeeeb0" offset="95%" />
+ </linearGradient>
+</defs>
+<style type="text/css">
+ text { font-family:Verdana; font-size:12px; fill:rgb(0,0,0); }
+ #search, #ignorecase { opacity:0.1; cursor:pointer; }
+ #search:hover, #search.show, #ignorecase:hover, #ignorecase.show { opacity:1; }
+ #subtitle { text-anchor:middle; font-color:rgb(160,160,160); }
+ #title { text-anchor:middle; font-size:17px}
+ #unzoom { cursor:pointer; }
+ #frames > *:hover { stroke:black; stroke-width:0.5; cursor:pointer; }
+ .hide { display:none; }
+ .parent { opacity:0.5; }
+</style>
+<script type="text/ecmascript">
+<![CDATA[
+ "use strict";
+ var details, searchbtn, unzoombtn, matchedtxt, svg, searching, currentSearchTerm, ignorecase, ignorecaseBtn;
+ function init(evt) {
+ details = document.getElementById("details").firstChild;
+ searchbtn = document.getElementById("search");
+ ignorecaseBtn = document.getElementById("ignorecase");
+ unzoombtn = document.getElementById("unzoom");
+ matchedtxt = document.getElementById("matched");
+ svg = document.getElementsByTagName("svg")[0];
+ searching = 0;
+ currentSearchTerm = null;
+
+ // use GET parameters to restore a flamegraphs state.
+ var params = get_params();
+ if (params.x && params.y)
+ zoom(find_group(document.querySelector('[x="' + params.x + '"][y="' + params.y + '"]')));
+ if (params.s) search(params.s);
+ }
+
+ // event listeners
+ window.addEventListener("click", function(e) {
+ var target = find_group(e.target);
+ if (target) {
+ if (target.nodeName == "a") {
+ if (e.ctrlKey === false) return;
+ e.preventDefault();
+ }
+ if (target.classList.contains("parent")) unzoom(true);
+ zoom(target);
+ if (!document.querySelector('.parent')) {
+ // we have basically done a clearzoom so clear the url
+ var params = get_params();
+ if (params.x) delete params.x;
+ if (params.y) delete params.y;
+ history.replaceState(null, null, parse_params(params));
+ unzoombtn.classList.add("hide");
+ return;
+ }
+
+ // set parameters for zoom state
+ var el = target.querySelector("rect");
+ if (el && el.attributes && el.attributes.y && el.attributes._orig_x) {
+ var params = get_params()
+ params.x = el.attributes._orig_x.value;
+ params.y = el.attributes.y.value;
+ history.replaceState(null, null, parse_params(params));
+ }
+ }
+ else if (e.target.id == "unzoom") clearzoom();
+ else if (e.target.id == "search") search_prompt();
+ else if (e.target.id == "ignorecase") toggle_ignorecase();
+ }, false)
+
+ // mouse-over for info
+ // show
+ window.addEventListener("mouseover", function(e) {
+ var target = find_group(e.target);
+ if (target) details.nodeValue = "Function: " + g_to_text(target);
+ }, false)
+
+ // clear
+ window.addEventListener("mouseout", function(e) {
+ var target = find_group(e.target);
+ if (target) details.nodeValue = ' ';
+ }, false)
+
+ // ctrl-F for search
+ // ctrl-I to toggle case-sensitive search
+ window.addEventListener("keydown",function (e) {
+ if (e.keyCode === 114 || (e.ctrlKey && e.keyCode === 70)) {
+ e.preventDefault();
+ search_prompt();
+ }
+ else if (e.ctrlKey && e.keyCode === 73) {
+ e.preventDefault();
+ toggle_ignorecase();
+ }
+ }, false)
+
+ // functions
+ function get_params() {
+ var params = {};
+ var paramsarr = window.location.search.substr(1).split('&');
+ for (var i = 0; i < paramsarr.length; ++i) {
+ var tmp = paramsarr[i].split("=");
+ if (!tmp[0] || !tmp[1]) continue;
+ params[tmp[0]] = decodeURIComponent(tmp[1]);
+ }
+ return params;
+ }
+ function parse_params(params) {
+ var uri = "?";
+ for (var key in params) {
+ uri += key + '=' + encodeURIComponent(params[key]) + '&';
+ }
+ if (uri.slice(-1) == "&")
+ uri = uri.substring(0, uri.length - 1);
+ if (uri == '?')
+ uri = window.location.href.split('?')[0];
+ return uri;
+ }
+ function find_child(node, selector) {
+ var children = node.querySelectorAll(selector);
+ if (children.length) return children[0];
+ }
+ function find_group(node) {
+ var parent = node.parentElement;
+ if (!parent) return;
+ if (parent.id == "frames") return node;
+ return find_group(parent);
+ }
+ function orig_save(e, attr, val) {
+ if (e.attributes["_orig_" + attr] != undefined) return;
+ if (e.attributes[attr] == undefined) return;
+ if (val == undefined) val = e.attributes[attr].value;
+ e.setAttribute("_orig_" + attr, val);
+ }
+ function orig_load(e, attr) {
+ if (e.attributes["_orig_"+attr] == undefined) return;
+ e.attributes[attr].value = e.attributes["_orig_" + attr].value;
+ e.removeAttribute("_orig_"+attr);
+ }
+ function g_to_text(e) {
+ var text = find_child(e, "title").firstChild.nodeValue;
+ return (text)
+ }
+ function g_to_func(e) {
+ var func = g_to_text(e);
+ // if there's any manipulation we want to do to the function
+ // name before it's searched, do it here before returning.
+ return (func);
+ }
+ function update_text(e) {
+ var r = find_child(e, "rect");
+ var t = find_child(e, "text");
+ var w = parseFloat(r.attributes.width.value) -3;
+ var txt = find_child(e, "title").textContent.replace(/\([^(]*\)$/,"");
+ t.attributes.x.value = parseFloat(r.attributes.x.value) + 3;
+
+ // Smaller than this size won't fit anything
+ if (w < 2 * 12 * 0.59) {
+ t.textContent = "";
+ return;
+ }
+
+ t.textContent = txt;
+ var sl = t.getSubStringLength(0, txt.length);
+ // check if only whitespace or if we can fit the entire string into width w
+ if (/^ *$/.test(txt) || sl < w)
+ return;
+
+ // this isn't perfect, but gives a good starting point
+ // and avoids calling getSubStringLength too often
+ var start = Math.floor((w/sl) * txt.length);
+ for (var x = start; x > 0; x = x-2) {
+ if (t.getSubStringLength(0, x + 2) <= w) {
+ t.textContent = txt.substring(0, x) + "..";
+ return;
+ }
+ }
+ t.textContent = "";
+ }
+
+ // zoom
+ function zoom_reset(e) {
+ if (e.attributes != undefined) {
+ orig_load(e, "x");
+ orig_load(e, "width");
+ }
+ if (e.childNodes == undefined) return;
+ for (var i = 0, c = e.childNodes; i < c.length; i++) {
+ zoom_reset(c[i]);
+ }
+ }
+ function zoom_child(e, x, ratio) {
+ if (e.attributes != undefined) {
+ if (e.attributes.x != undefined) {
+ orig_save(e, "x");
+ e.attributes.x.value = (parseFloat(e.attributes.x.value) - x - 10) * ratio + 10;
+ if (e.tagName == "text")
+ e.attributes.x.value = find_child(e.parentNode, "rect[x]").attributes.x.value + 3;
+ }
+ if (e.attributes.width != undefined) {
+ orig_save(e, "width");
+ e.attributes.width.value = parseFloat(e.attributes.width.value) * ratio;
+ }
+ }
+
+ if (e.childNodes == undefined) return;
+ for (var i = 0, c = e.childNodes; i < c.length; i++) {
+ zoom_child(c[i], x - 10, ratio);
+ }
+ }
+ function zoom_parent(e) {
+ if (e.attributes) {
+ if (e.attributes.x != undefined) {
+ orig_save(e, "x");
+ e.attributes.x.value = 10;
+ }
+ if (e.attributes.width != undefined) {
+ orig_save(e, "width");
+ e.attributes.width.value = parseInt(svg.width.baseVal.value) - (10 * 2);
+ }
+ }
+ if (e.childNodes == undefined) return;
+ for (var i = 0, c = e.childNodes; i < c.length; i++) {
+ zoom_parent(c[i]);
+ }
+ }
+ function zoom(node) {
+ var attr = find_child(node, "rect").attributes;
+ var width = parseFloat(attr.width.value);
+ var xmin = parseFloat(attr.x.value);
+ var xmax = parseFloat(xmin + width);
+ var ymin = parseFloat(attr.y.value);
+ var ratio = (svg.width.baseVal.value - 2 * 10) / width;
+
+ // XXX: Workaround for JavaScript float issues (fix me)
+ var fudge = 0.0001;
+
+ unzoombtn.classList.remove("hide");
+
+ var el = document.getElementById("frames").children;
+ for (var i = 0; i < el.length; i++) {
+ var e = el[i];
+ var a = find_child(e, "rect").attributes;
+ var ex = parseFloat(a.x.value);
+ var ew = parseFloat(a.width.value);
+ var upstack;
+ // Is it an ancestor
+ if (0 == 0) {
+ upstack = parseFloat(a.y.value) > ymin;
+ } else {
+ upstack = parseFloat(a.y.value) < ymin;
+ }
+ if (upstack) {
+ // Direct ancestor
+ if (ex <= xmin && (ex+ew+fudge) >= xmax) {
+ e.classList.add("parent");
+ zoom_parent(e);
+ update_text(e);
+ }
+ // not in current path
+ else
+ e.classList.add("hide");
+ }
+ // Children maybe
+ else {
+ // no common path
+ if (ex < xmin || ex + fudge >= xmax) {
+ e.classList.add("hide");
+ }
+ else {
+ zoom_child(e, xmin, ratio);
+ update_text(e);
+ }
+ }
+ }
+ search();
+ }
+ function unzoom(dont_update_text) {
+ unzoombtn.classList.add("hide");
+ var el = document.getElementById("frames").children;
+ for(var i = 0; i < el.length; i++) {
+ el[i].classList.remove("parent");
+ el[i].classList.remove("hide");
+ zoom_reset(el[i]);
+ if(!dont_update_text) update_text(el[i]);
+ }
+ search();
+ }
+ function clearzoom() {
+ unzoom();
+
+ // remove zoom state
+ var params = get_params();
+ if (params.x) delete params.x;
+ if (params.y) delete params.y;
+ history.replaceState(null, null, parse_params(params));
+ }
+
+ // search
+ function toggle_ignorecase() {
+ ignorecase = !ignorecase;
+ if (ignorecase) {
+ ignorecaseBtn.classList.add("show");
+ } else {
+ ignorecaseBtn.classList.remove("show");
+ }
+ reset_search();
+ search();
+ }
+ function reset_search() {
+ var el = document.querySelectorAll("#frames rect");
+ for (var i = 0; i < el.length; i++) {
+ orig_load(el[i], "fill")
+ }
+ var params = get_params();
+ delete params.s;
+ history.replaceState(null, null, parse_params(params));
+ }
+ function search_prompt() {
+ if (!searching) {
+ var term = prompt("Enter a search term (regexp " +
+ "allowed, eg: ^ext4_)"
+ + (ignorecase ? ", ignoring case" : "")
+ + "\nPress Ctrl-i to toggle case sensitivity", "");
+ if (term != null) search(term);
+ } else {
+ reset_search();
+ searching = 0;
+ currentSearchTerm = null;
+ searchbtn.classList.remove("show");
+ searchbtn.firstChild.nodeValue = "Search"
+ matchedtxt.classList.add("hide");
+ matchedtxt.firstChild.nodeValue = ""
+ }
+ }
+ function search(term) {
+ if (term) currentSearchTerm = term;
+
+ var re = new RegExp(currentSearchTerm, ignorecase ? 'i' : '');
+ var el = document.getElementById("frames").children;
+ var matches = new Object();
+ var maxwidth = 0;
+ for (var i = 0; i < el.length; i++) {
+ var e = el[i];
+ var func = g_to_func(e);
+ var rect = find_child(e, "rect");
+ if (func == null || rect == null)
+ continue;
+
+ // Save max width. Only works as we have a root frame
+ var w = parseFloat(rect.attributes.width.value);
+ if (w > maxwidth)
+ maxwidth = w;
+
+ if (func.match(re)) {
+ // highlight
+ var x = parseFloat(rect.attributes.x.value);
+ orig_save(rect, "fill");
+ rect.attributes.fill.value = "rgb(230,0,230)";
+
+ // remember matches
+ if (matches[x] == undefined) {
+ matches[x] = w;
+ } else {
+ if (w > matches[x]) {
+ // overwrite with parent
+ matches[x] = w;
+ }
+ }
+ searching = 1;
+ }
+ }
+ if (!searching)
+ return;
+ var params = get_params();
+ params.s = currentSearchTerm;
+ history.replaceState(null, null, parse_params(params));
+
+ searchbtn.classList.add("show");
+ searchbtn.firstChild.nodeValue = "Reset Search";
+
+ // calculate percent matched, excluding vertical overlap
+ var count = 0;
+ var lastx = -1;
+ var lastw = 0;
+ var keys = Array();
+ for (k in matches) {
+ if (matches.hasOwnProperty(k))
+ keys.push(k);
+ }
+ // sort the matched frames by their x location
+ // ascending, then width descending
+ keys.sort(function(a, b){
+ return a - b;
+ });
+ // Step through frames saving only the biggest bottom-up frames
+ // thanks to the sort order. This relies on the tree property
+ // where children are always smaller than their parents.
+ var fudge = 0.0001; // JavaScript floating point
+ for (var k in keys) {
+ var x = parseFloat(keys[k]);
+ var w = matches[keys[k]];
+ if (x >= lastx + lastw - fudge) {
+ count += w;
+ lastx = x;
+ lastw = w;
+ }
+ }
+ // display matched percent
+ matchedtxt.classList.remove("hide");
+ var pct = 100 * count / maxwidth;
+ if (pct != 100) pct = pct.toFixed(1)
+ matchedtxt.firstChild.nodeValue = "Matched: " + pct + "%";
+ }
+]]>
+</script>
+<rect x="0.0" y="0" width="1200.0" height="934.0" fill="url(#background)" />
+<text id="title" x="600.00" y="24" >Flame Graph</text>
+<text id="details" x="10.00" y="917" > </text>
+<text id="unzoom" x="10.00" y="24" class="hide">Reset Zoom</text>
+<text id="search" x="1090.00" y="24" >Search</text>
+<text id="ignorecase" x="1174.00" y="24" >ic</text>
+<text id="matched" x="1090.00" y="917" > </text>
+<g id="frames">
+<g >
+<title>[unknown] (140,334 samples, 0.09%)</title><rect x="311.8" y="469" width="1.1" height="15.0" fill="rgb(210,24,5)" rx="2" ry="2" />
+<text x="314.78" y="479.5" ></text>
+</g>
+<g >
+<title>clang::Sema::CheckSingleAssignmentConstraints (259,475 samples, 0.17%)</title><rect x="665.4" y="325" width="2.0" height="15.0" fill="rgb(254,226,54)" rx="2" ry="2" />
+<text x="668.36" y="335.5" ></text>...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/126520
More information about the cfe-commits
mailing list