[PATCH] instcombine: Only create library call simplifier once

Mon Mar 11 14:36:17 PDT 2013

Hi Meador, 

I see nice compile time improvements with your patch (when compiling LLVM itself).   LGTM!

Thanks,
Nadav

On Mar 11, 2013, at 9:26 AM, Michael Ilseman <milseman at apple.com> wrote:

> This is exactly the kind of fast-path for intrinsics I was hoping for and avoids cross-module caching. Awesome!
> 
> Nadav brings up some great suggestions, and I hope that it makes sense to add a max-length to the TLI to further exit early. Are targets allowed to map lib calls from function names beginning with "_Z"?
> 
> On Mar 11, 2013, at 9:01 AM, Nadav Rotem <nrotem at apple.com> wrote:
> 
>> Hi Meador. 
>> 
>> Thanks for working on this. The new approach looks good to me, and I will benchmark it later today. 
>> 
>> I think that we can do a few things to make it a little faster. First, we can detect prefixes that are used by c++ mangled functions, such as "_Z" and exit early. We can also check if the length of the incoming string is greater than any library function call that we optimize. 
>> 
>> Thanks,
>> Nadav
>> 
>> On Mar 11, 2013, at 6:29 AM, Meador Inge <meadori at codesourcery.com> wrote:
>> 
>>> On 03/07/2013 07:37 PM, Nadav Rotem wrote:
>>> 
>>>> Hi Meador, 
>>>> 
>>>> I spoke with Bill about this and I now think that your proposal to initialize
>>>> the SimplifyLibCalls on the first invocation of runOnFunction is correct. The
>>>> only exception is LTO build. On LTO builds we can change the TargetData and TLI
>>>> per-function. So, on each runOnFunction we need to check of the current TLI and
>>>> DataLayout is the same as the previous run. If it is not the we need to
>>>> re-initialize the SimpLibCalls, rehash the table, etc. 
>>> 
>>> Hi Nadav,
>>> 
>>> After thinking about this a little more caching the LibCallSimplifier object on
>>> the first call makes me a little nervous and I agree with the original
>>> objections to it.  This motivated me to try something different.  The attached
>>> patch drops the hash table and does an explicit lookup for each of the lib call
>>> simplifier optimizers.
>>> 
>>> This should alleviate Michael's concerns about caching the object in
>>> runOnFunction, short circuits the intrinsic case Michael is interested in,
>>> avoid repeatedly building and destroying the hash table, and benefits all
>>> clients of LibCallSimplifier.
>>> 
>>> There is a slight overhead for the new lookup function, but it is still much
>>> better than the current approach.  On a benchmark containing 100,000 calls
>>> where *none* of them are simplified I noticed a 30% speedup.  On a benchmark
>>> containing 100,000 calls where *all* of them are simplified I noticed an 8%
>>> speedup.  The original LibCallSimplifier caching patch also obtained a 30%
>>> speedup in the case where nothing was simplified and a 13% speedup when all
>>> calls where simplified.  I am comfortable that the new patch is slightly slower
>>> for some cases (this is on average.  I saw a few runs where the too approaches
>>> gave equivalent speedups).
>>> 
>>> Comments?  Would you all mind trying this patch for the benchmarks that you all
>>> are interested in?  I personally like the new patch better.
>>> 
>>> -- 
>>> Meador Inge
>>> CodeSourcery / Mentor Embedded
>>> <0001-LibCallSimplifier-optimize-speed-for-short-lived-ins.patch>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130311/bdf4a703/attachment.html>