[llvm-dev] RFC: XRay Profiling in LLVM
Dean Michael Berris via llvm-dev
llvm-dev at lists.llvm.org
Wed Feb 14 18:34:38 PST 2018
Implement an XRay mode 'xray-profiling' that gathers stack trace
latencies/durations and builds histograms to provide basic statistics about
where time is going in an execution of the application.
XRay has two modes currently implemented in compiler-rt: a basic (nee
naive) mode and flight data recorder (FDR) mode.
Basic mode logging, when enabled, will collect traces and write verbose
logs to disk. This mode has some unsophisticated filtering and stack-depth
limiting support. This mode is useful for tracing short-lived applications,
so that execution can be traced throughout the lifetime of the application.
We've used basic mode logging to identify sources of latency in Clang/LLVM.
FDR mode logging is designed to record traces and aggressively filter out
events that fall below a latency threshold. This mode uses a fixed amount
of heap space to record events as they're happening, and only writes out
the logs on demand. FDR mode is used mostly for long-running applications
We also already have tools in LLVM that can turn these XRay traces into
other forms that are easier to digest. One tool turns XRay traces into
something that the Chrome Trace Viewer can render  and another
summarises it to generate latency flame graphs .
# Problem Statement
Both basic and FDR mode logging can create massive traces. One example,
where a fully-instrumented clang binary was traced with XRay's basic mode,
we found that it could generate an 89GB trace compiling the canonical
"Hello, World!" program.
One way to address this problem is to limit the amount of memory being used
by the implementation through flags. Unfortunately that also means basic
mode XRay will block more on writing out the log files.
Filtering at runtime also takes some CPU time, so the overheads of writing
out the log as well as filtering the records become non-trivial for heavily
# Proposed Solution
We'd like to propose a profiling mode to XRay, called 'xray-profiling'. The
intent is to implement handlers that will generate profile data  that
other tools can digest. We'll export the data in an XRay-specific format
for a more compact representation. We intend to implement conversion tools
to support the profile.proto format that pprof can handle as input.
To do this, we need to record the "stack" of functions that are XRay
instrumented, represented as a prefix tree (or forest, as we'll see later).
This prefix tree will represent the XRay instrumented functions that have
been entered, and then exited.
A() -> B() -> C()
Each node in the tree will contain a histogram of the latencies/durations
of the function by subtracting the timestamp at the time of exit from the
timestamp at entry.
When the tracing functionality is finalised and flushed, we end up with a
data file that represents the prefix tree in a summarised form following
the profile.proto format.
As with the Basic and FDR modes, the profiling mode will be implemented in
compiler-rt and will be default-linked into any XRay instrumented binary.
In the future we can provide more targeted archives that can be controlled
in clang, such that users can determine which runtime implementations
they'd like to be linked into the final binary. This work may be done in
parallel, but is unrelated to the work involved with the XRay profiling
# Call for Reviewers
One of the issues we're finding with the XRay work is there's very little
community knowledge about what's happening in the project. I would like to
change that by encouraging more reviewers, interested in XRay, to please
sign up to be a reviewer. Eventually, we'd like to grow the list of people
contributing to XRay from the community and this is an opportunity to do so.
# Open Questions
* The triggering mechanism is still through programmatically turning on/off
XRay instrumentation in an XRay-instrumented binary. We're exploring some
other options based on ptrace (will send another RFC at some point about
this) and a tool that will explicitly control the XRay tracing externally.
The alternative to this is having a dedicated XRay thread that will
periodically check whether a signal has been tripped, and do the
standardised procedure for enabling/disabling/flushing trace data. Any
alternatives to this that we may not have considered yet?
Thanks in advance!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev