[cfe-dev] [RFC] Per-callsite inline intrinsics

Tue Sep 4 11:49:27 PDT 2018

On Tue, 4 Sep 2018 at 11:33, Jakub (Kuba) Kuderski via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hi folks,
>
> TL;DR: I propose to add 3 new C/C++ intrinsics for controlling inlining at
> callsite:
> * __builtin_no_inline(Foo()) -- prevents the call to Foo() from being
> inlined at that particular callsite.
> * __builtin_always_inline(Foo()) -- inlines this call to Foo(), if
> possible.
> * __builtin_flatten_inline(Foo()) -- inlines this call to Foo() and
> (transitively) everything called within Foo’s body.
>
> These intrinsics apply to the outermost call-like expression and it will
> be possible to use them with: function calls, member function calls,
> operator calls, constructor calls, indirect calls (with function pointers,
> member function pointers, virtual calls).
>
>
> I proposed patch implementing the first two intrinsics here:
> https://reviews.llvm.org/D51200. I would really appreciate feedback on
> the proposed semantics and implementation. I don’t have much experience
> with Clang, and I’d appreciate any help with the technical problems I
> mentioned in the code review. Details below.
>
>
> Motivation:
> It’s often the case that the compiler missed some inlining opportunity or
> inlined a function call excessively. In a lot of cases, it’s possible to
> map a performance regression to a few wrong inlining decisions. When that
> happens, we can manually enforce the correct inlining decisions by:
> 1. Marking the callees of interest with __attribute__ ((noinline)),
> __attribute__ ((always_inline)), or gnu::flatten. This affects all call
> sites with such callees. For more fine-grained control over inlining, one
> workaround is to create a few copies (or proxies), each marked with a
> different attribute.
> 2. Globally changing the inline thresholds (e.g., -mllvm
> -inline-threshold=K).
> 3. Manually modifying the source in order to change the calculated
> inlining cost (e.g., splitting function into a few smaller ones), or even
> inlining a function by hand by copy-pasting it into the callsite.
>
> Problem with the existing solutions:
> * (1) and (2) is that they can affect inlining globally instead of only at
> the places where it matters.
> * (1) and (3) can have the disadvantage of duplicating code and thus
> making it less maintainable.
> * (1) and (3) sometimes cannot be applied if for some reason we cannot
> modify the inlined functions. This can be the case when these functions are
> declared in an external library.
>
>
> Proposed solution:
> I propose to introduce new Clang intrinsics for controlling inlining at
> the call-site level. This way, it’s possible to cleanly hint a compiler on
> what should happen to only a particular function call. These intrinsic are
> also self-documenting, in the sense that they are easy to reason about for
> humans and appear directly in source code.
>
> The proposed intrinsics are __builtin_no_inline, __builtin_always_inline,
> and __builtin_flatten_inline.
>
> Example:
> int foo(int) { /* ... */ }
>
> void baz(int) { /* ... */ }
>
> struct S {
>
>  S();
>
>  void bar(int);
>
>  virtual void virt();
>
>  S operator++();
>
>  friend S operator+(const S &, const S &);
>
> };
>
> S *GetS();
>
> int main() {
>
>  // Inline the function call to foo(0) into main.
>
>  int x = __builtin_always_inline(foo(0));
>
>  // Prevent the constructor from being inlined into main.
>
>  S s = __builtin_no_inline(S());
>
>  // Force inline S::bar into main without forcing foo to be inlined.
>
>  __builtin_always_inline(s.bar(foo(x)));
>
>  // Force inline foo into main without forcing S::bar to be inlined.
>
>  s.bar(__builtin_always_inline(foo(x)));
>
>  // Force the outer call to baz to be inlined, then try to
>
>  // transitively inline every function call from baz's body.
>
>  // Does not force foo to be inlined.
>
>  __builtin_flatten_inline(baz(foo(x)));
>
>  // Force the operator call S + S to be inlined.
>
>  ++__builtin_always_inline(s + s);
>
>  // Try to inline the virtual call to virt, if possible.
>
>  __builtin_always_inline(GetS()->virt());
>
> }
>
>
> Syntax and semantics:
> The inline intrinsics can be applied to function calls, member function
> calls, constructor calls, virtual calls, function pointer and member
> function pointer calls, and operator calls.  They always affect the
> outermost call and not subexpressions.
>
> All the intrinsics work on a “best-effort” basis, and make the specified
> inline decisions happen whenever possible. This may not always be the case,
> e.g. if you wrap indirect calls with __builtin_always_inline and the target
> doesn’t happen to be resolved during compilation.
>
> One thing I’m not sure about is what to do when the expression inside
> inline intrinsic doesn’t happen to be any kind of call. It doesn’t make
> much sense to be able to write something like:
> __builtin_always_inline(1 + 3), but what may happen in generic context
> (e.g.,
> __builtin_always_inline(t + u)), is that it’s not known if expressions
> will end up operating on primitive types or user-defined ones that actually
> make function calls. In my opinion, it will make life easier if inline
> intrinsics over non-call-like expressions will be treated as no-ops, in any
> context, as the compiler can already reason about them and won’t perform
> any function calls. One option is to silently not inline when the compiler
> resolves the call to an operation, which would be consistent with the
> behavior of silently not inlining calls it cannot resolve.  Alternatively
> we may emit warnings, which would make maintaining code with these
> intrinsics easier.
> I’d really like to get feedback on this issue.
>
> Implementation:
> I have already partially implemented the first two intrinsics
> (__builtin_no_inline and __builtin_always_inline) here:
> https://reviews.llvm.org/D51200. Calls wrapped with the inline intrinsics
> are annotated with appropriate attributes during code generation. LLVM
> seems to already take care of callsites attributed with alwaysinline and
> noinline. I think it should also be possible to implement some appropriate
> attribute for flattening, as there’s already gnu::flatten attribute for
> function declarations.
>

Thank you for the detailed description of the problem and the design
rationale. I think this is a reasonable and clean solution to the problem.

Regarding applying the builtins to a non-call: I think this deserves *at
least* an enabled-by-default warning. I'm not sure how compelling the use
cases are for applying these builtins within a template -- they seem like
very surgical tools for controlling inlining, and so applying them to a
family of functions is perhaps unwise -- and the case where the builtin is
applied to a call in some instantiations and to a built-in operator in
others does not immediately seem like a primary concern to me, so I'm not
too concerned about making this diagnostic an error on that basis. Perhaps
we could make this an error in most cases and downgrade it to a warning in
template instantiations. (Are there also macro scenarios where you
anticipate it being unknown whether the operand is a function call?)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180904/44dcf18f/attachment.html>