[llvm-dev] Function specialisation pass

Tue Mar 23 12:44:49 PDT 2021

I am interested in adding a function specialisation(*) pass to LLVM. This frequently
comes up as a reason for performance differences of generated code between
GCC and LLVM. In GCC this is implemented in [1] and gets enabled at
optimisation level -03 and up. There have been two previous attempts in adding
this to LLVM: a while back this was attempted in [2] and very recently in [3].

Both previous attempts were parked at approximately the same point: the
transformation was implemented but the cost-model to control compile-times
and code-size was lacking. This is crucial as the goal would be to have
function specialisation enabled by default (at some opt level) and function
cloning has of course great potential to increase code-size and the
interprocedural analysis isn't very cheap. Thus, I agree with previous comments
on both tickets that we probably need to have a solid story for this before it makes
sense to add this to the LLVM code-base.

Given that GCC has this enabled by default we are in an excellent position to
evaluate the compile-time/code-size overhead of function specialiation. For two
cases I have tried investigating this overhead (and didn't evaluate performance of
the generated code). First, I took SQLite's amalgamation version, i.e. 1 source file
that is "220,000 lines long and over 7.5 megabytes in size" [4] as that seemed like a
good stress test for an IPA transformation pass to me. Comparing GCC 10.2.0
with -O3 and "-O3 -fdisable-ipa-cp" to toggle this on and off, respectively,
showed a 0.3% - 1.5% compile-time difference on different systems and a
1.3% object file increase. Second, I timed compilation of LLVM, representing a
very different codebase. For LLMV compile time differences were all within noise
levels with only a tiny code-size increase. I haven't looked into details here, but I
guess that it is not doing much, which could be the benefit of a well-tuned
cost-model, so is a good result.

Another reason why I am widening the audience with this email is that
alternative approaches were floated. In [2] it was remarked that "we could
propagate constant sets indicating the functions indirect call sites could possibly
target. Although we would probably want to limit the size of the sets to something
small, the pass could attach the sets via metadata to the calls so that this information
could be consumed by later passes. Such metadata could be used for indirect call
promotion, intersecting the function attributes of the possible targets".

But I am reluctant to go for this let's say more experimental approach as the GCC
numbers are very encouraging. I.e., both compile-time and code-size seem very
reasonable and I don't have seen yet any reasons to abandon the approaches in
[2] and [3] which are very similar. So, the approach I would like to take is complement
[3] with an analysis of the added compile-time/code-size increase, and then propose a
cost-model which then hopefully gives results in the range of GCC; I think that is what
we should be aiming at. But I am interested to hear if there are more/other opinions on
this.

(*) Motivating examples for function specialisation in previous works were e.g.:

long plus(long x) {
  return x + 1;
}
long minus(long x) {
  return x - 1;
}
long compute(long x, long (*binop)(long) ) {
  return binop(x);
}
int foo(int x, int y) {
  if (y)
    return compute(x, plus);
  else
    return compute(x, minus);
}

If we specialise compute which gets inlined in foo, we end up with just a return x + 1
and return x -1 in foo. Other motivating examples pass constants to functions, which
can then get propagated after function specialisation.

[1] https://github.com/gcc-mirror/gcc/blob/master/gcc/ipa-cp.c
[2] https://reviews.llvm.org/D36432
[3] https://reviews.llvm.org/D93838
[4] https://www.sqlite.org/amalgamation.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210323/ccbda7e4/attachment.html>