[llvm-dev] [XRay] RFC: LLVM-side Changes for nop-sleds

Mon Jul 4 00:39:24 PDT 2016

I have a few meta questions here.

Why should LLVM (and from the patch it seems Clang) favor one
instrumentation system -- in this case the XRay instrumentation system
vs. many others that may be possible to add to upstream?

It seems GCC has -finstrument-functions that call into cyg_....
functions. Poor naming choice, but I suppose one thing would be to use
those names. Or better yet, provide a way in commandline to say what
functions are for entry, and what are for exit.

How is this different from hot patching that exists in Windows? I
suppose this feature makes it more accessible?

I hope we can change the name of this thing if it were to be added to
something generic that doesn't tie us to the runtime libraries needed
for XRay specifically.

On Sun, Jul 3, 2016 at 10:50 PM, Dean Michael Berris via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Hi llvm-dev (cc google-xray),
>
> As a follow-up to the first XRay RFC [0] introducing the technology, I've
> been able to recently implement a functional prototype of the major parts of
> the XRay functionality [1]. This RFC is limited to exploring potential
> alternatives to the current LLVM-side changes, with the interest of getting
> clear guidance for landing the changes first in LLVM.
>
> Background / Current Implementation
> =============================
>
> XRay relies on statically inserted instrumentation points (implemented as
> nop-sleds) and a dynamic enable/disable mechanism implemented in a runtime
> library. As of this writing the implementation of the XRay prototype
> involves adding two pseudo-instructions (PATCHABLE_FUNCTION_ENTER,
> PATCHABLE_RET) that serve as placeholders for where the nop-sleds are to be
> emitted when lowering. PATCHABLE_FUNCTION_ENTER is an instruction that takes
> no operands and serves as a pure placeholder. PATCHABLE_RET effectively
> behaves as a return instruction (isReturn = true) and wraps whatever the
> return instruction is, along with all operands -- this is used to replace
> the return instructions, and when lowered will unpack into the appropriate
> nop-sled-padded return sequence. We rely on a MachineFunctionPass
> (XRayInstrumentation) to observe IR functions with xray-specific attributes
> (function-instrument=xray-{always,never}, or xray-instruction-threshold=N),
> that then insert the pseudo-instructions to the machine instructions that
> get lowered appropriately. While lowering, we keep track of the
> instrumentation points marked by the lowered pseudo-instructions and
> generate a per-function COMDAT/ELF Group section, merged into a special
> section (xray_instr_map). We only currently implement the lowering for
> x86_64 ELF.
>
> All these changes are implemented in http://reviews.llvm.org/D19904.
>
> Challenges
> =========
>
> This implementation approach poses two major challenges just on the LLVM
> (core) side of the implementation:
>
> 1) The pseudo-instructions need to be handled especially for each platform
> on which XRay would be ported. At this time we're exploring  implementing
> (and accepting help from the community to complete) PPC and ARM support,
> spelling the nop sleds differently for those architectures. Since the
> prototype only supports ELF sections, we're thinking about a portable/clean
> way of finding/coalescing the instrumentation point locations. We have some
> choices made in the current implementation that we're unclear about whether
> it will work or transfer cleanly to other architectures or formats/OSes
> (MachO, COFF, a.out (?)).
>
> 2) We are only currently instrumenting "normal" function entry and exits. We
> have a 1:1 correspondence between the type of instrumentation point and the
> pseudo-instructions. This means, when we start implementing various exit
> points (exception throwing, catch returns, tail calls, sibling calls) we
> need to implement new pseudo-instructions and port to all other platforms
> where XRay will be ported. The proliferation of pseudo-instructions seems
> hardly desirable, and maybe a better approach would scale better.
>
> Alternatives
> =========
>
> We've looked at the following alternatives, and we're looking to the
> community for feedback on both the current implementation and these
> alternatives.
>
> LLVM Functions
> ----------------------
>
> Instead of using pseudo-instructions, use intrinsic functions [2] that are
> part of the IR. These could be emitted at a higher level by front-ends (like
> Clang) and are threaded through the various IR transformations through the
> various optimisations. There's some pros and cons to this approach, and
> we're attempting to list down some that I know about:
>
> Pro:
> + We can encode variance in the sleds as function arguments (scales better
> to more kinds of instrumentation points we can insert).
> + The IR has the functions in-line, instead of being magically inserted when
> lowering (could be a better aid for debugging/understanding/reasoning).
> + In case the platform doesn't yet support XRay instrumentation, we can
> trivially remove the function calls when lowering.
>
> Cons:
> - We're unsure whether we can still enforce the layout of emitted code,
> especially in the special case of the return sleds. Since the return sleds
> (in x86_64) are spelled as `ret; <10-byte nops>`, there may be some
> acrobatics needed lower and legalize this lowering potentially inferior to
> the pseudo-instructions approach.
>
> More Magic
> ----------------
>
> Instead of using pseudo-instructions, we rely solely on the presence/absence
> of attributes then special-case the start-of-function (prologue), end of
> function (epilogue), and return instruction lowering for platforms where
> XRay would be supported. This entails adding special-case function calls in
> strategic places in the compiler, the logic all being embedded in the LLVM
> code base (in lib/CodeGen, lib/Target, etc.). There's some pros and cons to
> the this approach:
>
> Pro:
> + All XRay logic can be hidden in an interface purely in LLVM code, no need
> for exposing logic in IR nor in MC.
> + Sidesteps all issues with lowering instructions in platforms, inserting
> the correct instrumentation points on a platform-by-platform basis.
> + Allows for iterating the implementation purely in LLVM code, testing logic
> in isolation, incremental changes to internals.
>
> Cons:
> - This involves much more work touching more places where instrumentation
> points might be inserted. An initial attempt involves teaching the various
> stack adjustment routines, prologue/epilogue emission, return instruction
> lowering, the legalizer, and late-stage optimisations how to handle
> XRay-specific instrumentation.
>
> Open Questions
> =============
>
> There are some other open questions to the community at large:
>
> * Looking at the current implementation, are there major objections to
> committing to the current implementation, iterating with the knowledge that
> this can evolve more later as we learn more about implementing XRay (and
> other instrumentation routines) in LLVM?
>
> * Are there other risks we haven't considered yet for having something like
> XRay embedded as a supported instrumentation mechanism in LLVM?
>
> * Given the current implementation in http://reviews.llvm.org/D19904, do you
> have suggestions on how to partition it to smaller changes that could be
> reviewed/landed easier than a singular patch?
>
> Roadmap for Context
> =================
>
> Note that this RFC focuses only on the LLVM-side changes. To put this in
> context, the order of changes we're looking to land comes in the following
> order:
>
> - LLVM Changes (subject of this RFC)
> - Changes in compiler-rt (the runtime implementing dynamic patching and
> in-memory logging)
> - Changes in Clang to support emitting XRay-instrumented C/C++ (and maybe
> Obj-C) binaries
> - Tools for analysing XRay traces generated by XRay-instrumented binaries
>
> I have some changes under works to get the in-memory logging implementation
> (a naive implementation) and a simple function call accounting tool working
> on top of the existing public patches. Hopefully as soon as we get clear
> guidance on the subject of this RFC, more of the implementation described in
> the white paper [2] in terms of the logging heuristics and runtime
> enabling/disabling can proceed in earnest.
>
> --- End of RFC ---
>
> References:
>
> [0] Original XRay RFC:
> http://lists.llvm.org/pipermail/llvm-dev/2016-April/098901.html
>
> [1] There are three patches that implement the prototype XRay
> implementation, updated to track trunk of LLVM, Clang, and compiler-rt:
>
> http://reviews.llvm.org/D19904 (llvm)
> http://reviews.llvm.org/D20352 (clang)
> http://reviews.llvm.org/D21612 (compiler-rt)
>
> [2] XRay: A Function Call Tracing System:
> http://research.google.com/pubs/pub45287.html
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>