[cfe-dev] RFC: Syringe -- A Dynamic Behavior Injection Framework

Wed Sep 5 20:05:16 PDT 2018

Is this project derived from a real-world use case or a just a good-to-have
framework?

On Thu, Sep 6, 2018 at 6:39 AM Paul Kirth via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> TLDR; During my internship at Google I developed a proof of concept
> framework for supporting dynamic behavior injection. It allows users to
> specify alternate implementations of functions, and dynamically switch
> between the original and new behavior at runtime. It works by dispatching
> the original call through a function pointer that can either point to the
> original function body or the injected version. We would like feedback
> about our approach, and the communities’ interest in adding our framework
> to LLVM.
>
> -----
>
> Overview
>
> Syringe takes a different approach from other systems interested in
> modifying runtime behavior, such as Detours or XRay, and borrows
> inspiration from JIT compilers to inject new behaviors. JIT compilers often
> use stub functions to dispatch execution to specially optimized versions of
> function bodies. Our approach uses the idea of an indirect call through a
> stub function to instead dispatch control flow to either the original
> behavior or the newly injected behavior. Our runtime component exposes APIs
> to allow the user to modify this behavior during execution.
>
> We achieve this through the following steps:
>
>
>    1.
>
>    Target functions are cloned and renamed
>    2.
>
>    The original function body is replaced with a stub
>    3.
>
>    The new stub makes an indirect call through a pointer controlled by
>    the Syringe runtime
>    4.
>
>    This implementation pointer will either point to the original
>    implementation or the new payload
>    5.
>
>    The payload function is given an alias that can be used to resolve its
>    address at link-time
>    6.
>
>    Callbacks into the runtime are used toggle between the two
>    implementations of a target function
>
>
> Syringe introduces the notion of injection sites (the target function
> whose behavior should be changed), and payloads (the new behavior to inject
> into the target function). These functions must always come in pairs, and
> will cause an error if they do not. However, this is a link time error, as
> Syringe payloads are designed such that they can exist in different
> translation units from their target function. Because these bindings don't
> get resolved until linking, we must tie a payload to its injection site. We
> achieve this by creating an alias for the payload based on its target
> function. The runtime registration functions can then reference this alias,
> and the dynamic linker can patch in the correct address when the Syringe
> runtime is being initialized.
>
> The Syringe runtime is very thin. It currently consists of a registration
> function, and a few APIs for changing the active variant for a target
> function. When calling into the runtime, the target function is looked up
> in the runtime metadata, and the implementation pointer’s value is changed
> to the other variant.
>
> Our current approach involves the following:
>
> 1.
>
> *Clang + LLVM*: Add support for function attributes(
> `[[clang::syringe_injection_site]]`,
> `[[clang::syringe_payload(“target_function_name”)]]` ) to indicate if a
> function should be considered a syringe injection site or a syringe payload
> respectively. The payload attribute requires a parameter to bind the
> syringe site and payload for use in the runtime.
>
> 2.
>
> *Clang*: Add flags (`-fsyringe`,
> `-fsyringe-config-file=”/path/to/config.yml”`) to enable and control
> Syringe instrumentation.
>
> 3.
>
> *LLVM*: Add a Transformation pass that instruments syringe sites and
> payloads, and generates the necessary initialization functions. Function
> cloning, stub creation, and implementation pointer management are all
> implemented using existing code in LLVM’s ORC library.
>
> 4.
>
> *compiler-rt* : Implement a small library called “syringe” that exposes
> the required APIs for toggling between implementations:
>
> `__syringe__toggle_impl(&targetFunction)` uses the function address to
> toggle its implementation
>
> `__syringe__cxx_toggle_impl(&Class::targetMethod, &class_instance)` looks
> up the target address in the class vtable according to the Itanium ABI, and
> uses that to change the runtime value of the target function pointer
>
> `__syringe_registration(&orig_function, &orig_impl, &injected_func,
> &impl_ptr)` used to initialize the runtime data used by Syringe.
>
> Improvements
>
> Our prototype works well in many cases, but has a few shortcomings which
> we would like to address.
>
> First, our implementation is almost completely contained in the LLVM
> backend, and thus has no real understanding of C++. While we can currently
> use the mangled names of functions to achieve our desired result, this is
> cumbersome and error prone. There are additional limitations when
> considering C++ Templates and class hierarchies. Right now, class methods
> can be instrumented and replaced by payloads in the same class hierarchy.
> In this case, an injected method must inherit from the target method’s
> class and override the target function. This has additional challenges if
> the function is virtual, since our runtime uses function addresses to
> resolve which function should be modified. As a result, we do not support
> injection of virtual methods outside of the Itanium ABI, where we can
> reliably index into the vtable and thus perform the correct behavior in the
> runtime. We currently consider C++ templates completely out of scope for
> the current implementation, chiefly because they are too cumbersome to use
> without support from the frontend.
>
> In light of these shortcomings, we believe that our current implementation
> should be extended with more support from the clang frontend to:
>
> 1. Alleviate the need to mangle function names
>
> 2. Directly support C++ class hierarchy
>
> 3. Add support for C++ Templates
>
> 4. Add new intrinsics to directly handle runtime lookups (i.e. directly
> insert real addresses for class methods without (ab)using the Itanium ABI)
>
> I have been exploring how to achieve this in Clang, and believe that it is
> possible to achieve these properties. Clang can correctly resolve the
> unmangled name and  can add a new payload annotation with the mangled name
> if required. Since this is abstracted away from the user, there is little
> downside to directly tagging the functions in this way.
>
> Because Clang understands the class hierarchy, we can add a new annotation
> for class methods that will take the target base class as a parameter.
> Clang, in Sema, can look up the base class and add the correct payload
> annotation to the resulting LLVM function. Similarly for Templates, any
> instantiated template function, or dependent method, can have its payload
> forcibly instantiated, and have the new instantiation correctly tagged.
> This requires that for templates the target and payload definitions must
> appear in the same translation unit, so that their instantiations can be
> correctly resolved. While this forces a change to the actual source code
> (even if it is only an #include directive) it seems to be a reasonable way
> to offer support for a feature a core language feature.
>
> Lastly, calls into the Syringe runtime currently use function addresses as
> keys to manipulate the target function pointer. It should be possible to
> use some new intrinsic(s) that can correctly resolve the address of
> functions and methods without relying on ABI details. Because the compiler
> will be aware of how Syringe works, it should be possible to have the
> compiler directly insert the correct address while providing an intuitive
> API to the user.
>
>
> Other considerations and future work:
>
> Currently Syringe modifies the global definition of a function for the
> entire program. While in some cases this behavior makes sense there are
> several strong use cases for handling behavior injection on a per thread
> basis. One solution here is to use thread local storage to manage these
> pointers on a per-thread basis. It is also possible to manage a global set
> of metadata with per thread information. Suggestions on approaches here are
> most welcome.
>
> Syringe was designed to help automate behavior injection by understanding
> a small set of trigger conditions that could be responsible for enabling
> and disabling the injected behavior. In our initial designs these triggers
> were often based on profiling counters that could be used to toggle the
> behavior after some threshold was exceeded. Currently, this is left up to
> the programmer, but our YAML configuration already supports these sort of
> annotations. In principle there is no reason why these quality of life
> instrumentation should not be implemented as the use and design of Syringe
> solidifies.
>
> Our Syringe prototype currently uses dynamic storage to manage runtime
> metadata. Future versions should transition from this to storing the
> required metadata in RO memory. The indirect call stubs should also have
> additional CFI checks added, because we statically know that only two valid
> targets for any particular Syringe function pointer exist.
>
> Questions
>
>    1.
>
>    Is this something the LLVM community is interested in having?
>    2.
>
>    Do you have feedback on our proposed approach/improvements?
>
>
>
> --
> Paul Kirth
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>

-- 
*Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this
mail are of my own and my employer has no take in it. *
Thank You.
Madhur D. Amilkanthwar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180906/ce8a4dd3/attachment.html>