[cfe-dev] RFC: Syringe -- A Dynamic Behavior Injection Framework

Thu Sep 6 04:05:29 PDT 2018

TLDR; I have been working on the same problem in the past.
I even had a presentation about it in the last LLVM dev conference in
Bristol, https://www.youtube.com/watch?v=mv60fYkKNHc .
Implementation: https://github.com/martong/finstrument_mock .
White paper: https://martong.github.io/compile-time-fci-to-mock_llvm_2018.pdf .
Seems like your implementation (Syringe) and my realization (fci_mock)
has a lot in common.
I think we should merge our efforts and cooperate to provide an
industrial strength implementation which could land in the future in
LLVM/Clang (**if the community is interested**) and which may
revolutionize testing! :)
I am really happy and excited to see that an industrial giant, Google,
is interested in having a generic injection framework for C/C++!

----

In my implementation (fci_mock) I faced similar issues you did and I
choose different solutions for some of them. I think this is the good
place to do a comparison between the two.
Fci_mock has a similar architecture to Syringe.
It consists of three parts: a compiler instrumentation module
(backend), a runtime library and a language specific module
(frontend).

The *instrumentation* module modifies the code to check whether a
function has to be replaced or not.
During the code generation we modify each and every function call
expression to call an auxiliary function.
By instrumenting the call expressions (and not the function body) we
have the convenience and benefit that we do not have to recompile
dependent libraries if the call expression is in a code outside of the
library (e.g in glibc, or in libstdc++).
This is done in the CodeGen of Clang, however it would be better
handled as an LLVM pass.
By having an LLVM pass in Syringe is a great benefit.
In contrast to Syringe, we instrument all call expressions, but this
way we don't have to modify anything in the production code.

The *runtime library* provides functions to setup the replacements
(_substitue_function for C, SUBSTITUTE macro for C++).
This macro uses the new C++ intrinsic (__function_id) which will get a
unique identifier for each C++ function, even if they are virtual.
Here is a simple example to replace a template function (instantiation):
  // unit_under_test.hpp
template <typename T>
T FunTemp(T t) {
    return t;
}
  // test.cpp
#include "unit_under_test.hpp"
int fake_FunTemp(int p) { return p * 3; }
TEST_F(FooFixture, FunT) {
    SUBSTITUTE(FunTemp<int>, fake_FunTemp);
    int p = 13;
    auto res = FunTemp(p);
    EXPECT_EQ(res, 39);
}

The *frontend* part is actually the implementation of __function_id.
We modified the compiler to parse a new kind of unary expression when
the __function_id literal is given and the test specific
instrumentation is enabled.
In case of free functions and static member functions this unary
expression has the very same type which we would get in case of the
"address of" unary expression:
  void foo();
  void bar() {
    auto p = & foo; // void (*)()
    auto q = __function_id foo; // void (*)()
  }
However in case of non-static member functions the two expressions
yield different types:
  struct X { void foo(); virtual void bar(); };
  void bar() {
    auto p = & X::foo; // void (X::*)()
    auto q = __function_id X::foo; // void (*)()
    auto r = __function_id X::bar; // void (*)()
  }

> First, our implementation is almost completely contained in the LLVM backend, and thus has no real understanding of C++. While we can currently use the mangled names of functions to achieve our desired result, this is cumbersome and error prone. There are additional limitations when considering C++ Templates and class hierarchies. Right now, class methods can be instrumented and replaced by payloads in the same class hierarchy. In this case, an injected method must inherit from the target method’s class and override the target function. This has additional challenges if the function is virtual, since our runtime uses function addresses to resolve which function should be modified. As a result, we do not support injection of virtual methods outside of the Itanium ABI, where we can reliably index into the vtable and thus perform the correct behavior in the runtime. We currently consider C++ templates completely out of scope for the current implementation, chiefly because they are too cumbersome to use without support from the frontend.

In fci_mock, first we used the same Itanium ABI dependent solution,
but then we have implemented the __function_id intrinsic.

> 1. Alleviate the need to mangle function names
> 3. Add support for C++ Templates
Fci_mock works with direct function pointers and the substitution is
happening at runtime, during the test setup. Thus, there is no need to
use mangled names for the substitution.
Though, seems like Syringe has the benefit that the replacement
happens in load-time, before runtime, am I right?

> 2. Directly support C++ class hierarchy
> 4. Add new intrinsics to directly handle runtime lookups (i.e. directly insert real addresses for class methods without (ab)using the Itanium ABI)
I think with the __function_id intrinsic these problems are handled/solved.

> Because Clang understands the class hierarchy, we can add a new annotation for class methods that will take the target base class as a parameter. Clang, in Sema, can look up the base class and add the correct payload annotation to the resulting LLVM function. Similarly for Templates, any instantiated template function, or dependent method, can have its payload forcibly instantiated, and have the new instantiation correctly tagged. This requires that for templates the target and payload definitions must appear in the same translation unit, so that their instantiations can be correctly resolved. While this forces a change to the actual source code (even if it is only an #include directive) it seems to be a reasonable way to offer support for a feature a core language feature.

Fci_mock call expression instrumentation forcibly fetches the address
of every callee (even of function template instantiations). Thus we
indirectly initiate an instantiation, so this is a non-existent
problem there. This has a severe price though, we kill inlining
absolutely, nothing is inlined.

> Lastly, calls into the Syringe runtime currently use function addresses as keys to manipulate the target function pointer. It should be possible to use some new intrinsic(s) that can correctly resolve the address of functions and methods without relying on ABI details. Because the compiler will be aware of how Syringe works, it should be possible to have the compiler directly insert the correct address while providing an intuitive API to the user.

Yes it is possible and implemented: see __function_id
https://github.com/martong/clang/compare/finstrument_mock_0...martong:finstrument_mock

> Syringe was designed to help automate behavior injection by understanding a small set of trigger conditions that could be responsible for enabling and disabling the injected behavior. In our initial designs these triggers were often based on profiling counters that could be used to toggle the behavior after some threshold was exceeded. Currently, this is left up to the programmer, but our YAML configuration already supports these sort of annotations. In principle there is no reason why these quality of life instrumentation should not be implemented as the use and design of Syringe solidifies.

Fci_mock uses the test file, i.e a separate translation unit to setup
the replacement configurations.

----

As I see, there are a few open questions in both of our solutions:
- How to handle constexpr functions? I have some ideas about that, but
this is not trivial.
- Can we replace a constructor / destructor? Destructors seems easier
and I had some early experiments with that, but getting the address of
a constructor is hard, because of injected class names.

My experience with fci_mock shows that it is possible to replace
(almost) every function (function template instantiations or virtual
member functions too), but this had the price of killing inlining. The
overall performance therefore was just slightly better than what we
can have with -finstrument_functions. And also there are the
constructors and destructors.
Also, with these solutions (both fci_mock and syringe) we can replace
only functions, but I wanted to be able to replace types too. Thus, I
sought for other solutions which work in compile-time. One of my
experimental idea and prototype reuses the Clang ASTImporter in a
special way: https://martong.github.io/ast-mock_sqamia_2018.pdf
My third idea is based on compile-time reflection, but that is far
from mature (probably will be published in my PhD dissertation).

Cheers,
Gabor