[llvm-dev] RFC: XRay Profiling in LLVM

Dean Michael Berris via llvm-dev llvm-dev at lists.llvm.org
Wed May 2 19:11:57 PDT 2018


As an additional update, this is now a series of patches.

If you're interested in learning more about XRay in compiler-rt, please
feel free to dive in and review the code. Any insights/questions/feedback
would be most appreciated!

https://reviews.llvm.org/D45757 - Part 2: Function Call Trie
https://reviews.llvm.org/D45758 - Part 3: Profile Collector Service
https://reviews.llvm.org/D44620 - Part 4: Profiler Mode Wiring
https://reviews.llvm.org/D45998 - Part 5: Profile File Writing

Cheers
On Tue, Mar 27, 2018 at 12:10 PM Dean Michael Berris <dean.berris at gmail.com>
wrote:

> FYI: Patch is now available for review in https://reviews.llvm.org/D44620.


> On Thu, Feb 15, 2018 at 1:34 PM Dean Michael Berris
> <dean.berris at gmail.com>
wrote:

>> # Objective

>> Implement an XRay mode 'xray-profiling' that gathers stack trace
latencies/durations and builds histograms to provide basic statistics about
where time is going in an execution of the application.

>> # Background

>> XRay has two modes currently implemented in compiler-rt: a basic (nee
naive) mode and flight data recorder (FDR) mode.

>> Basic mode logging, when enabled, will collect traces and write verbose
logs to disk. This mode has some unsophisticated filtering and stack-depth
limiting support. This mode is useful for tracing short-lived applications,
so that execution can be traced throughout the lifetime of the application.
We've used basic mode logging to identify sources of latency in Clang/LLVM.

>> FDR mode logging is designed to record traces and aggressively filter
out events that fall below a latency threshold. This mode uses a fixed
amount of heap space to record events as they're happening, and only writes
out the logs on demand. FDR mode is used mostly for long-running
applications and services.

>> We also already have tools in LLVM that can turn these XRay traces into
other forms that are easier to digest. One tool turns XRay traces into
something that the Chrome Trace Viewer can render [1] and another
summarises it to generate latency flame graphs [2].

>> # Problem Statement

>> Both basic and FDR mode logging can create massive traces. One example,
where a fully-instrumented clang binary was traced with XRay's basic mode,
we found that it could generate an 89GB trace compiling the canonical
"Hello, World!" program.

>> One way to address this problem is to limit the amount of memory being
used by the implementation through flags. Unfortunately that also means
basic mode XRay will block more on writing out the log files.

>> Filtering at runtime also takes some CPU time, so the overheads of
writing out the log as well as filtering the records become non-trivial for
heavily instrumented programs.

>> # Proposed Solution

>> We'd like to propose a profiling mode to XRay, called 'xray-profiling'.
The intent is to implement handlers that will generate profile data [3]
that other tools can digest. We'll export the data in an XRay-specific
format for a more compact representation. We intend to implement conversion
tools to support the profile.proto format that pprof can handle as input.

>> To do this, we need to record the "stack" of functions that are XRay
instrumented, represented as a prefix tree (or forest, as we'll see later).
This prefix tree will represent the XRay instrumented functions that have
been entered, and then exited.

>> A() -> B() -> C()
>>      \
>>        -> C()

>> Each node in the tree will contain a histogram of the
latencies/durations of the function by subtracting the timestamp at the
time of exit from the timestamp at entry.

>> When the tracing functionality is finalised and flushed, we end up with
a data file that represents the prefix tree in a summarised form following
the profile.proto format.

>> # Logistics

>> As with the Basic and FDR modes, the profiling mode will be implemented
in compiler-rt and will be default-linked into any XRay instrumented binary.

>> In the future we can provide more targeted archives that can be
controlled in clang, such that users can determine which runtime
implementations they'd like to be linked into the final binary. This work
may be done in parallel, but is unrelated to the work involved with the
XRay profiling mode implementation.

>> # Call for Reviewers

>> One of the issues we're finding with the XRay work is there's very
little community knowledge about what's happening in the project. I would
like to change that by encouraging more reviewers, interested in XRay, to
please sign up to be a reviewer. Eventually, we'd like to grow the list of
people contributing to XRay from the community and this is an opportunity
to do so.

>> # Open Questions

>> * The triggering mechanism is still through programmatically turning
on/off XRay instrumentation in an XRay-instrumented binary. We're exploring
some other options based on ptrace (will send another RFC at some point
about this) and a tool that will explicitly control the XRay tracing
externally. The alternative to this is having a dedicated XRay thread that
will periodically check whether a signal has been tripped, and do the
standardised procedure for enabling/disabling/flushing trace data. Any
alternatives to this that we may not have considered yet?

>> Thanks in advance!


>> References

>> [0] https://github.com/google/pprof
>> [1] http://lists.llvm.org/pipermail/llvm-dev/2017-December/119529.html
>> [2] https://llvm.org/docs/XRayExample.html#flame-graph-generation
>> [3] https://github.com/google/pprof/blob/master/proto/profile.proto
>> --
>> -- Dean

> --
> -- Dean



-- 
Dean


More information about the llvm-dev mailing list