[clang] [analyzer][docs] Document how to use perf and uftrace to debug performance issues (PR #126520)
Arseniy Zaostrovnykh via cfe-commits
cfe-commits at lists.llvm.org
Mon Feb 10 07:08:59 PST 2025
================
@@ -45,3 +48,91 @@ Note: Both Chrome-tracing and speedscope tools might struggle with time traces a
Luckily, in most cases the default max-steps boundary of 225 000 produces the traces of approximately that size
for a single entry point.
You can use ``-analyze-function=get_global_options`` together with ``-ftime-trace`` to narrow down analysis to a specific entry point.
+
+
+Performance analysis using ``perf``
+===================================
+
+`Perf <https://perfwiki.github.io/main/>`_ is an excellent tool for sampling-based profiling of an application.
+It's easy to start profiling, you only have 2 prerequisites.
+Build with ``-fno-omit-frame-pointer`` and debug info (``-g``).
+You can use release builds, but probably the easiest is to set the ``CMAKE_BUILD_TYPE=RelWithDebInfo``
+along with ``CMAKE_CXX_FLAGS="-fno-omit-frame-pointer"`` when configuring ``llvm``.
+Here is how to `get started <https://llvm.org/docs/CMake.html#quick-start>`_ if you are in trouble.
+
+.. code-block:: bash
+ :caption: Running the Clang Static Analyzer through ``perf`` to gather samples of the execution.
+
+ # -F: Sampling frequency, use `-F max` for maximal frequency
+ # -g: Enable call-graph recording for both kernel and user space
+ perf record -F 99 -g -- clang -cc1 -nostdsysteminc -analyze -analyzer-constraints=range \
+ -setup-static-analyzer -analyzer-checker=core,unix,alpha.unix.cstring,debug.ExprInspection \
+ -verify ./clang/test/Analysis/string.c
+
+Once you have the profile data, you can use it to produce a Flame graph.
+A Flame graph is a visual representation of the stack frames of the samples.
+Common stack frame prefixes are squashed together, making up a wider bar.
+The wider the bar, the more time was spent under that particular stack frame,
+giving a sense of how the overall execution time was spent.
+
+Clone the `FlameGraph <https://github.com/brendangregg/FlameGraph>`_ git repository,
+as we will use some scripts from there to convert the ``perf`` samples into a Flame graph.
+It's also useful to check out Brendan Gregg's (the author of FlameGraph)
+`homepage <https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html>`_.
+
+
+.. code-block:: bash
+ :caption: Converting the ``perf`` profile into a Flamegraph, then opening it in Firefox.
+
+ perf script | /path/to/FlameGraph/stackcollapse-perf.pl > perf.folded
+ /path/to/FlameGraph/flamegraph.pl perf.folded > perf.svg
+ firefox perf.svg
+
+.. image:: ../images/flamegraph.svg
+
+
+Performance analysis using ``uftrace``
+======================================
+
+`uftrace <https://github.com/namhyung/uftrace/wiki/Tutorial#getting-started>`_ is a great tool to generate rich profile data
+that you could use to focus and drill down into the timeline of your application.
+We will use it to generate Chromium trace JSON.
+In contrast to ``perf``, this approach statically instruments every function, so it should be more precise and through than the sampling-based approaches like ``perf``.
+In contrast to using `-ftime-trace`, functions don't need to opt-in to be profiled using ``llvm::TimeTraceScope``.
+All functions are profiled due to static instrumentation.
+
+There is only one prerequisite to use this tool.
+You need to build the binary you are about to instrument using ``-pg`` or ``-finstrument-functions``.
+This will make it run substantially slower but allows rich instrumentation.
----------------
necto wrote:
Would be interesting to include the typical slowdown factor.
Also, I think, it is important to note the substantial disk space requirement
https://github.com/llvm/llvm-project/pull/126520
More information about the cfe-commits
mailing list