[llvm-dev] [XRay] RFC: LLVM-side Changes for nop-sleds
Dean Michael Berris via llvm-dev
llvm-dev at lists.llvm.org
Mon Jul 4 01:51:55 PDT 2016
On Mon, Jul 4, 2016 at 6:27 PM David Chisnall <david.chisnall at cl.cam.ac.uk>
> On 4 Jul 2016, at 06:50, Dean Michael Berris via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> > We've looked at the following alternatives, and we're looking to the
> community for feedback on both the current implementation and these
> I don’t think that I’ve yet seen an explanation of why you need the NOPs.
> DTrace stopped using them a long time ago, for two reasons:
> 1) The increased code size caused a noticeable increase in i-cache misses,
> even when instrumentation was not actively being used. This caused a
> noticeable probe effect (macroscopic observable performance artefacts even
> when no probes are active) and caused a lot of push-back in adoption.
> 2) On all of the architectures where we support DTrace (currently, I
> believe, x86, x86-64, AArch32, AArch64, MIPS64, and RISC-V) it’s possible
> to do the same thing by moving one of the instructions in the function
> prolog into the generated trampoline for the instrumentation.
> I could understand wanting something more like patchpoints if you want to
> be able to instrument in the middle of a function (along the lines of TESLA
> or CSI), but if you’re just tracing function entry and exit then it doesn’t
> seem like the best solution.
Thanks for the questions David -- the short version of the answer is that
DTrace (last I checked) requires some help from the Kernel, while XRay is
self-contained in the application.
All of your points above are valid, and DTrace is a really powerful tool
for debugging a lot of performance issues. XRay has a few things that
differentiate it from systems like DTrace though:
1) Because we insert the instrumentation sleds in specific functions that
fit a certain criteria (i.e. more selectively) instead of instrumenting
every function, we pay the cost of the instrumentation being off only on
functions that are instrumented. The combination of the changes in the
front-end to support attributes/annotations in the code to
force-instrument/-inhibit instrumentation gives control to the application
developer, allows us to limit the cost along a spectrum -- full coverage
costs more, selective coverage can be tuned, and explicit annotations
provide precise control of the instrumentation.
2) The cost of the instrumentation at run-time is O(100) cycles for the
"null-logging" case (mov + trampoline jump, atomic load and check if not
zero). All the cost of instrumentation is within the process' address space
(in-memory log) when on -- no additional overheads external to the
3) The runtime implementation for logging described in the white paper
allows us to balance the coverage (number of instrumentation events we get)
with overheads (the amount of resources used in the logging
implementation). Because we log only very specific things (function id, tsc
deltas in most cases, type of event) and have heuristics to condense the
information we keep (i.e. if entry-exit pairs are under epsilon, we can
omit the entry entirely), we don't need to be quite as complete when
logging and instead move a lot of the logic in reconstruction/analysis of
the generated traces.
There are certainly other approaches to doing selective instrumentation,
and then externally signalling/trapping (with environment support) when
probing. XRay moves this needle towards having the instrumentation and
collection and even signalling into the application. This makes sense if
you're deploying the application on a system that doesn't have DTrace and
still be able to isolate the costs of instrumentation just to the
I'll admit that I'll need to read a lot more about how DTrace manages to
keep the costs of probes low enough that it could be turned on dynamically
without stopping the process, and without having to intercept more events
than actually necessary (i.e. only on certain functions, and only when it's
on) to be able to provide a more complete answer.
Does this help?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev