[PATCH] instcombine: Only create library call simplifier once

Thu Mar 7 14:34:00 PST 2013

On Mar 7, 2013, at 2:23 PM, Meador Inge <meadori at codesourcery.com> wrote:

> On 03/07/2013 03:56 PM, Michael Ilseman wrote:
> 
>> On Mar 7, 2013, at 12:52 PM, Meador Inge <meadori at codesourcery.com> wrote:
>> 
>>> Nadav reported a performance regression due to the work I did to
>>> merge the library call simplifier into instcombine [1].  The issue
>>> is that a new LibCallSimplifier object is being created every time
>>> InstCombiner::runOnFunction is called.  The is very inefficient
>>> because every time a LibCallSimplifier object is new'd up it creates
>>> a hash table to map from a function name to an object that optimizes
>>> functions of that name.
>>> 
>> 
>> Actually, I think only one was new-ed up. Also, it was the
>> "LibCallSimplifierImpl::optimizeCall()" method which would call
>> "initOptimizations()" only if Optimizations wasn't already set up.
> 
> Yeah, but in 'InstCombiner::runOnFunction' we have:
> 
>  InstCombinerLibCallSimplifier TheSimplifier(TD, TLI, this);
>  Simplifier = &TheSimplifier;
> 
> So a new LibCallSimplifier object gets created *each* time
> 'InstCombiner::runOnFunction' is called.  Even though the hash table is not
> constructed until later when 'optimizeCall' is invoked we still start fresh
> with a new object each time.
> 
>>> The original library call simplifier avoided this problem by only
>>> initializing that table *once* in SimplifyLibCalls::runOnFunction.
>>> This patch fixes the problem by implementing a similar idea in
>>> instcombine.  LibCallSimplifier objects are only created the first
>>> time InstCombiner::runOnFunction is called.
>> 
>> Did you do any measurements, and if so what was the data?
> 
> I did.  I profiled it with 'perf' on an x86_64 Fedora system using a sample
> program that contained 20,000 small functions.  Before the patch about 32% of
> the time spent in LibCallSimplifier was in building the table.  After the patch
> it was basically 0%.
> 
>>> diff --git a/lib/Transforms/InstCombine/InstructionCombining.cpp b/lib/Transforms/InstCombine/InstructionCombining.cpp
>>> index c6115e3..327c70c 100644
>>> --- a/lib/Transforms/InstCombine/InstructionCombining.cpp
>>> +++ b/lib/Transforms/InstCombine/InstructionCombining.cpp
>>> @@ -2456,10 +2456,9 @@ namespace {
>>> class InstCombinerLibCallSimplifier : public LibCallSimplifier {
>>>  InstCombiner *IC;
>>> public:
>>> -  InstCombinerLibCallSimplifier(const DataLayout *TD,
>>> -                                const TargetLibraryInfo *TLI,
>>> +  InstCombinerLibCallSimplifier(const TargetLibraryInfo *TLI,
>>>                                InstCombiner *IC)
>>> -    : LibCallSimplifier(TD, TLI, UnsafeFPShrink) {
>>> +    : LibCallSimplifier(TLI, UnsafeFPShrink) {
>>>    this->IC = IC;
>>>  }
>>> 
>>> @@ -2485,8 +2484,8 @@ bool InstCombiner::runOnFunction(Function &F) {
>>>               InstCombineIRInserter(Worklist));
>>>  Builder = &TheBuilder;
>>> 
>>> -  InstCombinerLibCallSimplifier TheSimplifier(TD, TLI, this);
>>> -  Simplifier = &TheSimplifier;
>>> +  if (!Simplifier)
>>> +    Simplifier = new InstCombinerLibCallSimplifier(TLI, this);
>>> 
>> 
>> Are you sure you want to keep pointers live across multiple compilations?
>> I would think you would have to destroy this in releaseMemory().
> 
> Hmmmm.  I will have to look into that.  I may have to go a different route.
> 
>> I'm also not sure what's gained here, the constructor is fairly trivial.
> 
> As explained above creating a new LibCallSimplifier object for every function is
> avoided (and thus we avoid having to rebuild the hash table whenever
> optimizeCall is invoked).
> 

Oh, I see, you're wanting to cache the hash table from the first call. Unfortunately, this method seems dangerous and contrary to the design of the PassManager. Is there some higher-level structure that it makes sense to pin it to? For example, maybe the TargetLibraryInfo, Module, LLVMContext, etc would make sense.

At the very least, perhaps we can use TargetLibraryInfo::getLibFunc() before making the hash table to make sure it's really necessary. We could early exit in optimizeCall() or initOptimizations() for functions that we won't even be able to analyze. We might want to tune TargetLibraryInfo::getLibFunc() while we're at it as well, e.g. add in preliminary range-checking or have a string table.

> Thanks for the review!
> 
> -- 
> Meador Inge
> CodeSourcery / Mentor Embedded