[cfe-dev] [RFC] Instrumenting Clang/LLVM with Perfetto

Wed Jul 1 14:30:00 PDT 2020

Instrumenting Clang/LLVM with Perfetto


Overview

    Perfetto is an event based tracer designed to replace chrome://tracing. It
    allows for fine-grained control over trace data and is currently in use by
    Chrome and Android.

    Instrumentation of Clang with Perfetto would give nicely formatted traces
    that are easily shareable by link. Compile time regression bugs could be
    filed with Perfetto links that clearly show the regression.

    Perfetto exposes a C++ library that allows arbitrary applications to record
    app-specific events. Trace events can be added to Clang by calling macros
    exposed by Perfetto.

    The trace events are sent to an in-process tracing service and are kept in
    memory until the trace is written to disk. The trace is written as a
    protobuf and can be opened by the Perfetto trace processor
(https://ui.perfetto.dev/).

    The Perfetto trace processor allows you to vizualize traces as flamegraphs.
    The view can be scrolled with "WASD" keys. There is also a query engine
    built into the processor that can run queries by pressing CTRL+ENTER.

    The benefits of Perfetto:

    Shareable Perfetto links
        Traces can be easily shared without sending the trace file
    Traces can be easily aggregated with UNIX cat
    Fine-grained Tracing Control
        Trace events can span across function boundaries  (Start a trace in one
        function, end it in another)
        Finer granularity than function level that you would see with Linux perf
    Less tracing overhead
        Trace events are buffered in memory, not sent directly to disk
        Perfetto macros are optimized to prevent overhead
    Smaller trace sizes
        Strings and other reused data is interned
        Traces are stored as protobufs instead of JSON
    3x smaller than -ftrace-time traces
    SQL queries for traces
        The Perfetto UI has a query language built in for data aggregation
    Works on Linux/MacOS/Windows


Example Trace

    This is an example trace on a Linux kernel source file.
    https://ui.perfetto.dev/#!/?s=c7942d5118f3ccfe16f46d166b05a66d077eb61ef8e22184a7d7dfe87ba8ea

    This is an example trace on the entire Linux kernel.
    Recorded with:
      make CC="clang-9" KCFLAGS="-perfetto" -j72
      find /tmp -name "*pftrace" -exec cat {} \; > trace.pftrace
    https://ui.perfetto.dev/#!/?s=10556b46b46aba46188a51478102a6ce21a9c767c218afa5b8429eac4cb9d4


Current Implementation

    These changes are behind a CMake flag (-DPERFETTO). When building Clang with
    the CMake flag enabled, the Perfetto GitHub is cloned into the
build folder and
    linked against any code that uses Perfetto macros.

    The -ftime-trace and Perfetto trace events have been combined into one
    macro that expands to trace events for both. The behavior of -ftime-trace
    is unchanged.

    To run a Perfetto trace, pass the flag -perfetto to Clang (built with
    -DPERFETTO). The trace output file follows the convention set by
    -ftime-trace and uses the filename passed to -o to determine the trace
    filename.

    For example:
    clang -perfetto -c foo.c -o foo.o
    would generate foo.pftrace.

    Link to implementation: https://reviews.llvm.org/D82994


Tracing documentation

    LLVM_TRACE_BEGIN(name, detail)
    Begins a tracing slice if Perfetto or -ftime-trace is enabled.
      name : constexpr String
        This is what will be displayed on the tracing UI.
      detail : StringRef
        Additional detail to add to the trace slice. This expands to a lambda
        and will be evaluated lazily only if Perfetto or -ftime-trace are
        enabled.

    LLVM_TRACE_END()
    Ends the most recently started slice.

    LLVM_TRACE_SCOPE(name, detail)
    Begins a tracing slice and initializes an anonymous struct if Perfetto or
    -ftime-trace is enabled. When the struct goes out of scope, the tracing
    slice will end.
      name : constexpr String
        This is what will be displayed on the tracing UI.
      detail : StringRef
        Additional detail to add to the trace slice. This expands to a lambda
        and will be evaluated lazily only if Perfetto or -ftime-trace are
        enabled.

    Perfetto Documentation: https://perfetto.dev/


FAQs

    Why not use Linux Perf?
      Perfetto's event based model allows for much finer grained control over
      the trace.
      Linux Perf is only available on Linux.
      Visualization requires post processing with separate tools.
      Requires kernel version specific dependencies.


    Why not use -ftime-trace?
      Perfetto has almost the same functionality as -ftime-trace, but with a
      few added benefits.

      Shareable links.
      Traces can be aggregated easily with UNIX cat.
      The query engine for trace analysis.
      The Perfetto UI is browser agnostic and could be used the same way as
        godbolt.
      The resulting trace files are ~3x smaller.
A trace of the Linux kernel is 50MB with Perfetto and 139MB with
-ftime-trace.


Extra Notes

    Perfetto also has a system-mode that interacts with Linux ftrace.  It can
    record things like process scheduling, syscalls, memory usage and CPU
    usage.

    This type of trace probably records way more data than we need,
but I recorded
    a sample trace anyway while testing.
    https://ui.perfetto.dev/#!/?s=18de7feb4f84ecd29519cb4ac136613ba891e4fd5e88a9e6511412ccfd210


Known Issues

    When no-integrated-as is enabled, traces are outputted to /tmp/. This is a
    bug with the current implementation of -ftime-trace. When the Perfetto
    change is applied, the bug also applies to Perfetto.