[llvm-dev] [XRay] RFC: LLVM-side Changes for nop-sleds

Sun Jul 3 22:50:47 PDT 2016

Hi llvm-dev (cc google-xray),

As a follow-up to the first XRay RFC [0] introducing the technology, I've
been able to recently implement a functional prototype of the major parts
of the XRay functionality [1]. This RFC is limited to exploring potential
alternatives to the current LLVM-side changes, with the interest of getting
clear guidance for landing the changes first in LLVM.

Background / Current Implementation
=============================

XRay relies on statically inserted instrumentation points (implemented as
nop-sleds) and a dynamic enable/disable mechanism implemented in a runtime
library. As of this writing the implementation of the XRay prototype
involves adding two pseudo-instructions (PATCHABLE_FUNCTION_ENTER,
PATCHABLE_RET) that serve as placeholders for where the nop-sleds are to be
emitted when lowering. PATCHABLE_FUNCTION_ENTER is an instruction that
takes no operands and serves as a pure placeholder. PATCHABLE_RET
effectively behaves as a return instruction (isReturn = true) and wraps
whatever the return instruction is, along with all operands -- this is used
to replace the return instructions, and when lowered will unpack into the
appropriate nop-sled-padded return sequence. We rely on a
MachineFunctionPass (XRayInstrumentation) to observe IR functions with
xray-specific attributes (function-instrument=xray-{always,never}, or
xray-instruction-threshold=N), that then insert the pseudo-instructions to
the machine instructions that get lowered appropriately. While lowering, we
keep track of the instrumentation points marked by the lowered
pseudo-instructions and generate a per-function COMDAT/ELF Group section,
merged into a special section (xray_instr_map). We only currently implement
the lowering for x86_64 ELF.

All these changes are implemented in http://reviews.llvm.org/D19904.

Challenges
=========

This implementation approach poses two major challenges just on the LLVM
(core) side of the implementation:

1) The pseudo-instructions need to be handled especially for each platform
on which XRay would be ported. At this time we're exploring  implementing
(and accepting help from the community to complete) PPC and ARM support,
spelling the nop sleds differently for those architectures. Since the
prototype only supports ELF sections, we're thinking about a portable/clean
way of finding/coalescing the instrumentation point locations. We have some
choices made in the current implementation that we're unclear about whether
it will work or transfer cleanly to other architectures or formats/OSes
(MachO, COFF, a.out (?)).

2) We are only currently instrumenting "normal" function entry and exits.
We have a 1:1 correspondence between the type of instrumentation point and
the pseudo-instructions. This means, when we start implementing various
exit points (exception throwing, catch returns, tail calls, sibling calls)
we need to implement new pseudo-instructions and port to all other
platforms where XRay will be ported. The proliferation of
pseudo-instructions seems hardly desirable, and maybe a better approach
would scale better.

Alternatives
=========

We've looked at the following alternatives, and we're looking to the
community for feedback on both the current implementation and these
alternatives.

LLVM Functions
----------------------

Instead of using pseudo-instructions, use intrinsic functions [2] that are
part of the IR. These could be emitted at a higher level by front-ends
(like Clang) and are threaded through the various IR transformations
through the various optimisations. There's some pros and cons to this
approach, and we're attempting to list down some that I know about:

Pro:
+ We can encode variance in the sleds as function arguments (scales better
to more kinds of instrumentation points we can insert).
+ The IR has the functions in-line, instead of being magically inserted
when lowering (could be a better aid for debugging/understanding/reasoning).
+ In case the platform doesn't yet support XRay instrumentation, we can
trivially remove the function calls when lowering.

Cons:
- We're unsure whether we can still enforce the layout of emitted code,
especially in the special case of the return sleds. Since the return sleds
(in x86_64) are spelled as `ret; <10-byte nops>`, there may be some
acrobatics needed lower and legalize this lowering potentially inferior to
the pseudo-instructions approach.

More Magic
----------------

Instead of using pseudo-instructions, we rely solely on the
presence/absence of attributes then special-case the start-of-function
(prologue), end of function (epilogue), and return instruction lowering for
platforms where XRay would be supported. This entails adding special-case
function calls in strategic places in the compiler, the logic all being
embedded in the LLVM code base (in lib/CodeGen, lib/Target, etc.). There's
some pros and cons to the this approach:

Pro:
+ All XRay logic can be hidden in an interface purely in LLVM code, no need
for exposing logic in IR nor in MC.
+ Sidesteps all issues with lowering instructions in platforms, inserting
the correct instrumentation points on a platform-by-platform basis.
+ Allows for iterating the implementation purely in LLVM code, testing
logic in isolation, incremental changes to internals.

Cons:
- This involves much more work touching more places where instrumentation
points might be inserted. An initial attempt involves teaching the various
stack adjustment routines, prologue/epilogue emission, return instruction
lowering, the legalizer, and late-stage optimisations how to handle
XRay-specific instrumentation.

Open Questions
=============

There are some other open questions to the community at large:

* Looking at the current implementation, are there major objections to
committing to the current implementation, iterating with the knowledge that
this can evolve more later as we learn more about implementing XRay (and
other instrumentation routines) in LLVM?

* Are there other risks we haven't considered yet for having something like
XRay embedded as a supported instrumentation mechanism in LLVM?

* Given the current implementation in http://reviews.llvm.org/D19904, do
you have suggestions on how to partition it to smaller changes that could
be reviewed/landed easier than a singular patch?

Roadmap for Context
=================

Note that this RFC focuses only on the LLVM-side changes. To put this in
context, the order of changes we're looking to land comes in the following
order:

- LLVM Changes (subject of this RFC)
- Changes in compiler-rt (the runtime implementing dynamic patching and
in-memory logging)
- Changes in Clang to support emitting XRay-instrumented C/C++ (and maybe
Obj-C) binaries
- Tools for analysing XRay traces generated by XRay-instrumented binaries

I have some changes under works to get the in-memory logging implementation
(a naive implementation) and a simple function call accounting tool working
on top of the existing public patches. Hopefully as soon as we get clear
guidance on the subject of this RFC, more of the implementation described
in the white paper [2] in terms of the logging heuristics and runtime
enabling/disabling can proceed in earnest.

--- End of RFC ---

References:

[0] Original XRay RFC:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098901.html

[1] There are three patches that implement the prototype XRay
implementation, updated to track trunk of LLVM, Clang, and compiler-rt:

http://reviews.llvm.org/D19904 (llvm)
http://reviews.llvm.org/D20352 (clang)
http://reviews.llvm.org/D21612 (compiler-rt)

[2] XRay: A Function Call Tracing System:
http://research.google.com/pubs/pub45287.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160704/05c7027e/attachment.html>