[llvm-dev] [RFC] Cheaper indirect calls via trampolines

Tue Mar 3 06:04:36 PST 2020

Taking the address of a function inhibits optimisations for that function.
Essentially any ABI changes are unavailable if we can't adjust the call
site to match. The case of interest here is when a given function is called
directly and indirectly, and we don't want the latter to impose a cost on
the former.

One approach to avoid the ABI constraint cost is to extract/outline the
body of an address taken function into a new function, then replace said
body with a direct call to the new function. This leaves us with two
functions that have the same semantic effect:
- One has its address taken, and may have external visibility. Just calls
the other.
- One does not have its address taken and has internal visibility

Direct call sites to the outer wrapper/trampoline can be optimised to
direct calls to the new internal function, leaving no net change other than
enabling other optimisations. Uses of the address of the symbol are
unchanged as the original function is still present.

Indirect call sites now go through this trampoline to share the code.
There's the runtime cost of undoing whatever ABI optimisations we later
chose to make to the internal function, e.g. some argument
shuffling/discarding, then either a tail call or a normal call if the
return value also needs to adjustment.

That is, the proposed transform has made indirect calls slightly slower
(unless we inline the new function back in to make a clone, in which case
it's made code size bigger) in exchange for re-enabling all the
optimisations that we currently lose from the address of. The same sort of
reasoning applies if the function is external and must expose an ABI
appropriate entry point for other translation units, but we'd like to use a
faster calling convention internally.

If at the end of a pipeline we didn't actually want to change the function
after all, we should be able to fold the two back together.

I think that's plausibly a win. Taking the address of a function no longer
thwarts other optimisations, in exchange for making the indirectly called
function slightly slower. Thoughts?

Jon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200303/4ac5e0fb/attachment.html>