[LLVMdev] the PartialSpecialization pass (was Re: Is there a "callback optimization"?)

Tue Jun 8 05:35:03 PDT 2010

On Tue, Jun 8, 2010 at 6:53 AM, NAKAMURA Takumi <geek4civic at gmail.com> wrote:
> I think the pass must be still cleaned up and rewritten.
> There are my two proposals for enhancement.
>
> 1) To separate Specialization(and rewriting callsites) to other module.
>  It would be better if new module were available from other passes.
>
> 2) To separate methods discovering interests.
>  Various (optional) heuristics would be helpful.
>
> Also I will help to contribute above someday.

I think that the InlineCost code in Analysis has a lot of valuable
heuristics that we should be able to use for PartialSpecialization.  A
good first step would be to refactor CodeMetrics a bit, to take out
things having specifically to do with inlining, replacing them with
facts about the function that are useful for inlining decisions (such
as CallsSetjmp, IsRecursive, HasIndirectBr, instead of NeverInline),
and adjusting the inliner accordingly.

Then a new PartialSpecializationCost can be added to the Analysis
folder.  It is like InlineCost, except for the following differences:

1. CountCodeReductionForAlloca doesn't apply,.
2. callIsSmall doesn't apply.
3. A function with only one return statement would not be able to lose
that return statement.
4. The noinline flag should not have any effect.
5. The NoreturnPenalty should not apply.
6. The presence of dynamic alloca's should not be taken into account.
7. The parameter passing cost is reduced, not eliminated.

These differences of course stem from the fact that the function body
will still be in a separate function, rather than copied into the
caller.

The PartialSpecialization pass should then use the
PartialSpecializationCost, and compare it to a threshold much like
Inliner does, with the following differences:

1. The threshold should be reduced (from InlineThreshold) by
CallPenalty, since a call is not being eliminated by
PartialSpecialization whereas it would be by Inlining.
2. The threshold should not be affected by the callsite's containing
function's OptimizeForSize attribute.  Perhaps the target function's
OptimizeForSize attribute should be taken into account instead, since
the target function's total footprint would essentially be multiplied
by the number of specializations done.  If Module ever gets an
OptimizeForSize attribute, we should of course take that into account
as well.
3. The parameter passing cost is reduced, not eliminated.
4. Most importantly, the threshold value should be multiplied by the
number of callsites that would use the same specialization. As long as
this threshold is met at any callsite, *every* callsite that can use
the specialization should use it, since redirecting the call costs
nothing.

#4 above implies that PartialSpecialization would happen in many cases
where inlining does not.  While inlining imposes a cost at every
callsite, PartialSpecialization imposes a cost only for the first
matching callsite, and all other matching callsites can be redirected
for free (actually a bit better than free, since they pass fewer
parameters).

> I had been afraid to read the discussion
> http://llvm.org/bugs/show_bug.cgi?id=3757
> I was doubtful then whether the patch might be valueless and useless :p

It certainly was useless while that bug was in it!  But I can see it
giving us many of the benefits of C++ functor and integer template
functions without the complications - and while still accepting
dynamic parameters when needed - making it easier to write clean,
performant code.

One big win for me is that I can pass a (constant) allocator function
pointer to every method in my libraries and application code that
needs to allocate memory without making any of them template functions
and without calling through a function pointer to allocate the memory.
 This allows me much easier memory management for my language and
runtime without building or running a garbage collector and without
making the majority of my functions templated.  Since 90% of the time,
that allocator will tend to be a memory pool implementation, most
functions will only be specialized once for that (with the originals
usually going away).

Unless I hear objections, I'd like to start work on it fairly soon.