[llvm-dev] Function specialisation pass

Wed Mar 24 06:58:13 PDT 2021

>
> Date: Tue, 23 Mar 2021 19:44:49 +0000
> From: Sjoerd Meijer via llvm-dev <llvm-dev at lists.llvm.org>
> To: "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org>
> Subject: [llvm-dev] Function specialisation pass
>
>
> I am interested in adding a function specialisation(*) pass to LLVM ...
>
> Both previous attempts were parked at approximately the same point: the
> transformation was implemented but the cost-model to control compile-times
> and code-size was lacking ...
>

This sounds right. The transform is fairly mechanical - clone function,
replace uses of the argument being specialised on with the value, replace
call sites which passed said value. Great to see there's already work in
that direction.

I'd be delighted to contribute to the implementation effort. It may even
qualify as a legitimate thing to do during work hours - it would let the
amdgpu openmp runtime back off on the enthusiasm for inlining everything.
Some initial thoughts below.

Thanks!

Eliding call overhead and specialising on known arguments are currently
bundled together as 'inlining', which also has a challenging cost model. If
we can reliably specialise, call sites that are presently inlined no longer
need to be. I'm sure the end point reached, with specialisation and
inlining working in harmony, would be better than what we have now in
compile time, code size, code quality.

There's a lot of work to get the heuristics and interaction with inlining
right though. I'd suggest an intermediate step of specialising based on
user annotations, kicking that problem down the road:

void example(int x, __attribute__((bikeshed)) int y) {... big function ...}

where this means a call site with a compile time known value for y, say 42,
gets specialised without reference to heuristics to a call to:

void example.carefully-named.42(int x) {constexpr int y = 42; ...}

We still have to get the machinery right - caching previous
specialisations, mapping multiple specialisations down to the same end
call, care with naming, maybe teaching thinlto about it and so forth.
However we postpone the problem of heuristically determining which
arguments and call sites are profitable.

The big motivating examples for me are things that take function pointers
(qsort style) and functions containing a large switch on one of the
arguments that is often known at the callee. An in-tree example of the
latter using 'the trick' from partial evaluation to get the same end result
is in llvm/openmp/libomptarget/plugins/amdgpu/impl/msgpack.h:

The entry point is:
template <typename F>
const unsigned char *handle_msgpack(byte_range bytes, F f);

It's going to switch on the first byte in the byte_range but the function
is too large to inline. So it is specialized, not with a convenient
attribute, but by introducing something like:

template <typename F, msgpack::type ty>
const unsigned char *func(byte_range bytes, F f) {
  switch(ty) {}
}
template <typename F>
const unsigned char *handle_msgpack(byte_range bytes, F f) {
auto ty = bytes[0];
switch(ty) {
  case 0: return func<F,0>(...);
  case 1: return func<F,1>(...);
}
}

This is, alas, a slightly more complicated example than `void example(int
x, __attribute__((bikeshed)) int y)`, but I'm optimistic that explicitly
annotating a parameter as 'when this is known at compile time, specialise
on it' would still let me simplify that code.

Function pointer interfaces that are only called with a couple of different
pointers may be a more obvious win. Like openmp target regions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210324/4b1b91d6/attachment.html>