[cfe-dev] RFC: Syringe -- A Dynamic Behavior Injection Framework
    Paul Kirth via cfe-dev 
    cfe-dev at lists.llvm.org
       
    Thu Sep  6 15:01:44 PDT 2018
    
    
  
On Thu, Sep 6, 2018 at 4:05 AM Gábor Márton <martongabesz at gmail.com> wrote:
> TLDR; I have been working on the same problem in the past.
> I even had a presentation about it in the last LLVM dev conference in
> Bristol, https://www.youtube.com/watch?v=mv60fYkKNHc .
> Implementation: https://github.com/martong/finstrument_mock .
> White paper:
> https://martong.github.io/compile-time-fci-to-mock_llvm_2018.pdf .
> Seems like your implementation (Syringe) and my realization (fci_mock)
> has a lot in common.
> I think we should merge our efforts and cooperate to provide an
> industrial strength implementation which could land in the future in
> LLVM/Clang (**if the community is interested**) and which may
> revolutionize testing! :)
> I am really happy and excited to see that an industrial giant, Google,
> is interested in having a generic injection framework for C/C++!
>
> ----
>
>
First, thanks for your feedback. Its interesting to see alternative
approaches and discuss the trade-offs. Its unfortunate that I missed your
work in my literature review. I focused most of my attention on fault
injection, but did a review of call interception techniques as well. My
apologies for missing this.
> In my implementation (fci_mock) I faced similar issues you did and I
> choose different solutions for some of them. I think this is the good
> place to do a comparison between the two.
> Fci_mock has a similar architecture to Syringe.
> It consists of three parts: a compiler instrumentation module
> (backend), a runtime library and a language specific module
> (frontend).
>
> The *instrumentation* module modifies the code to check whether a
> function has to be replaced or not.
> During the code generation we modify each and every function call
> expression to call an auxiliary function.
> By instrumenting the call expressions (and not the function body) we
> have the convenience and benefit that we do not have to recompile
> dependent libraries if the call expression is in a code outside of the
> library (e.g in glibc, or in libstdc++).
> This is done in the CodeGen of Clang, however it would be better
> handled as an LLVM pass.
> By having an LLVM pass in Syringe is a great benefit.
> In contrast to Syringe, we instrument all call expressions, but this
> way we don't have to modify anything in the production code.
>
>
Instrumenting call sites was also something we considered, but ultimately
decided against. There is an interesting trade-off between instrumenting
the function definition and its call sites. in the end, we felt modifying
the function itself made the most sense, and allowed us to only modify
targeted portions of the code. That said, there are some compelling use
cases that arise should we wish to enable a per call site type of behavior
injection, where the behavior at each call site could be modified
independently. This of course makes the overall system more complex, but
presents some interesting opportunities to consider moving forward.
> The *runtime library* provides functions to setup the replacements
> (_substitue_function for C, SUBSTITUTE macro for C++).
> This macro uses the new C++ intrinsic (__function_id) which will get a
> unique identifier for each C++ function, even if they are virtual.
> Here is a simple example to replace a template function (instantiation):
>   // unit_under_test.hpp
> template <typename T>
> T FunTemp(T t) {
>     return t;
> }
>   // test.cpp
> #include "unit_under_test.hpp"
> int fake_FunTemp(int p) { return p * 3; }
> TEST_F(FooFixture, FunT) {
>     SUBSTITUTE(FunTemp<int>, fake_FunTemp);
>     int p = 13;
>     auto res = FunTemp(p);
>     EXPECT_EQ(res, 39);
> }
>
> The *frontend* part is actually the implementation of __function_id.
> We modified the compiler to parse a new kind of unary expression when
> the __function_id literal is given and the test specific
> instrumentation is enabled.
> In case of free functions and static member functions this unary
> expression has the very same type which we would get in case of the
> "address of" unary expression:
>   void foo();
>   void bar() {
>     auto p = & foo; // void (*)()
>     auto q = __function_id foo; // void (*)()
>   }
> However in case of non-static member functions the two expressions
> yield different types:
>   struct X { void foo(); virtual void bar(); };
>   void bar() {
>     auto p = & X::foo; // void (X::*)()
>     auto q = __function_id X::foo; // void (*)()
>     auto r = __function_id X::bar; // void (*)()
>   }
>
>
> > First, our implementation is almost completely contained in the LLVM
> backend, and thus has no real understanding of C++. While we can currently
> use the mangled names of functions to achieve our desired result, this is
> cumbersome and error prone. There are additional limitations when
> considering C++ Templates and class hierarchies. Right now, class methods
> can be instrumented and replaced by payloads in the same class hierarchy.
> In this case, an injected method must inherit from the target method’s
> class and override the target function. This has additional challenges if
> the function is virtual, since our runtime uses function addresses to
> resolve which function should be modified. As a result, we do not support
> injection of virtual methods outside of the Itanium ABI, where we can
> reliably index into the vtable and thus perform the correct behavior in the
> runtime. We currently consider C++ templates completely out of scope for
> the current implementation, chiefly because they are too cumbersome to use
> without support from the frontend.
>
> In fci_mock, first we used the same Itanium ABI dependent solution,
> but then we have implemented the __function_id intrinsic.
>
> > 1. Alleviate the need to mangle function names
> > 3. Add support for C++ Templates
> Fci_mock works with direct function pointers and the substitution is
> happening at runtime, during the test setup. Thus, there is no need to
> use mangled names for the substitution.
> Though, seems like Syringe has the benefit that the replacement
> happens in load-time, before runtime, am I right?
>
All of the work done in Syringe is done at compile time, with the exception
of runtime initialization and the actual callbacks into the runtime to
dynamically enable/disable the active behavior. The only part of the
runtime that needs to be bootstrapped is initializing the metadata into a
searchable list, which happens before main begins execution. Ideally we can
store the metadata in read only data and avoid initializing the runtime at
all. This just wasn't fully implemented in my prototype.
> > 2. Directly support C++ class hierarchy
> > 4. Add new intrinsics to directly handle runtime lookups (i.e. directly
> insert real addresses for class methods without (ab)using the Itanium ABI)
> I think with the __function_id intrinsic these problems are handled/solved.
>
> > Because Clang understands the class hierarchy, we can add a new
> annotation for class methods that will take the target base class as a
> parameter. Clang, in Sema, can look up the base class and add the correct
> payload annotation to the resulting LLVM function. Similarly for Templates,
> any instantiated template function, or dependent method, can have its
> payload forcibly instantiated, and have the new instantiation correctly
> tagged. This requires that for templates the target and payload definitions
> must appear in the same translation unit, so that their instantiations can
> be correctly resolved. While this forces a change to the actual source code
> (even if it is only an #include directive) it seems to be a reasonable way
> to offer support for a feature a core language feature.
>
> Fci_mock call expression instrumentation forcibly fetches the address
> of every callee (even of function template instantiations). Thus we
> indirectly initiate an instantiation, so this is a non-existent
> problem there. This has a severe price though, we kill inlining
> absolutely, nothing is inlined.
>
>
This is a trade-off. Syringe tries to be minimally invasive and have zero
impact on uninstrumented code, but using indirect calls means that
instrumented functions cannot be inlined in a useful way (inlining a stub
isn't much benefit).  As an alternative we could change the way function
bodies are modified to try to help the inliner, i.e. avoiding indirect
calls and using global booleans instead of function pointers. This is
probably a better discussion to have once we know if the community is
actually interested in having a framework like this, since this a
performance/design problem rather than a fundamental limitation.
> > Lastly, calls into the Syringe runtime currently use function addresses
> as keys to manipulate the target function pointer. It should be possible to
> use some new intrinsic(s) that can correctly resolve the address of
> functions and methods without relying on ABI details. Because the compiler
> will be aware of how Syringe works, it should be possible to have the
> compiler directly insert the correct address while providing an intuitive
> API to the user.
>
> Yes it is possible and implemented: see __function_id
>
> https://github.com/martong/clang/compare/finstrument_mock_0...martong:finstrument_mock
>
> > Syringe was designed to help automate behavior injection by
> understanding a small set of trigger conditions that could be responsible
> for enabling and disabling the injected behavior. In our initial designs
> these triggers were often based on profiling counters that could be used to
> toggle the behavior after some threshold was exceeded. Currently, this is
> left up to the programmer, but our YAML configuration already supports
> these sort of annotations. In principle there is no reason why these
> quality of life instrumentation should not be implemented as the use and
> design of Syringe solidifies.
>
> Fci_mock uses the test file, i.e a separate translation unit to setup
> the replacement configurations.
>
> ----
>
> As I see, there are a few open questions in both of our solutions:
> - How to handle constexpr functions? I have some ideas about that, but
> this is not trivial.
>
I might consider constexpr functions to be out of scope, as Syringe is a
framework for dynamically injecting new behaviors rather than modifying
compile time results. Syringe can still work for constexpr functions that
are evaluated at runtime, but that might be incongruent with compile-time
results elsewhere in the program. Compile time evaluation has a lot of
sharp edges, so I'm not sure of the best policy here. This is one of the
details that I think requires a broader discussion.
> - Can we replace a constructor / destructor? Destructors seems easier
> and I had some early experiments with that, but getting the address of
> a constructor is hard, because of injected class names.
>
>
I haven't given much thought to whether there are subtle issues with our
approach to constructors and destructors. This is one reason we wanted to
solicit feedback from the community: there are probably several subtle
issues that we have failed to fully consider. In our experiments thus far,
they worked as expected, but our tests may not be thorough enough to fully
exercise this problem space.
> My experience with fci_mock shows that it is possible to replace
> (almost) every function (function template instantiations or virtual
> member functions too), but this had the price of killing inlining. The
> overall performance therefore was just slightly better than what we
> can have with -finstrument_functions. And also there are the
> constructors and destructors.
> Also, with these solutions (both fci_mock and syringe) we can replace
> only functions, but I wanted to be able to replace types too. Thus, I
> sought for other solutions which work in compile-time. One of my
> experimental idea and prototype reuses the Clang ASTImporter in a
> special way: https://martong.github.io/ast-mock_sqamia_2018.pdf
> My third idea is based on compile-time reflection, but that is far
> from mature (probably will be published in my PhD dissertation).
>
> Cheers,
> Gabor
>
Overall I think this is a nice alternative/complementary approach to
behavior injection. Should there be sufficient interest I wouldn't be
opposed to broadening the framework to support call site instrumentation,
though I think doing so will be a fairly significant design change. Right
now Syringe is tiny (I think the LLVM pass is only ~350 lines plus some
boiler plate for adding annotations/pass registry/YAML support/etc.), and
the runtime is also very small. To me, this is a selling point, and even
after adding frontend support, Syringe should remain a small extension to
the existing compiler infrastructure. Adding call site based behavior
injection will probably require a redesign to keep Syringe reasonably small
and provide a natural interface for controlling instrumentation.
-- 
Paul Kirth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180906/83f33612/attachment.html>
    
    
More information about the cfe-dev
mailing list