[PATCH] instcombine: Only create library call simplifier once

Mon Mar 11 09:26:37 PDT 2013

This is exactly the kind of fast-path for intrinsics I was hoping for and avoids cross-module caching. Awesome!

Nadav brings up some great suggestions, and I hope that it makes sense to add a max-length to the TLI to further exit early. Are targets allowed to map lib calls from function names beginning with "_Z"?

On Mar 11, 2013, at 9:01 AM, Nadav Rotem <nrotem at apple.com> wrote:

> Hi Meador. 
> 
> Thanks for working on this. The new approach looks good to me, and I will benchmark it later today. 
> 
> I think that we can do a few things to make it a little faster. First, we can detect prefixes that are used by c++ mangled functions, such as "_Z" and exit early. We can also check if the length of the incoming string is greater than any library function call that we optimize. 
> 
> Thanks,
> Nadav
> 
> On Mar 11, 2013, at 6:29 AM, Meador Inge <meadori at codesourcery.com> wrote:
> 
>> On 03/07/2013 07:37 PM, Nadav Rotem wrote:
>> 
>>> Hi Meador, 
>>> 
>>> I spoke with Bill about this and I now think that your proposal to initialize
>>> the SimplifyLibCalls on the first invocation of runOnFunction is correct. The
>>> only exception is LTO build. On LTO builds we can change the TargetData and TLI
>>> per-function. So, on each runOnFunction we need to check of the current TLI and
>>> DataLayout is the same as the previous run. If it is not the we need to
>>> re-initialize the SimpLibCalls, rehash the table, etc. 
>> 
>> Hi Nadav,
>> 
>> After thinking about this a little more caching the LibCallSimplifier object on
>> the first call makes me a little nervous and I agree with the original
>> objections to it.  This motivated me to try something different.  The attached
>> patch drops the hash table and does an explicit lookup for each of the lib call
>> simplifier optimizers.
>> 
>> This should alleviate Michael's concerns about caching the object in
>> runOnFunction, short circuits the intrinsic case Michael is interested in,
>> avoid repeatedly building and destroying the hash table, and benefits all
>> clients of LibCallSimplifier.
>> 
>> There is a slight overhead for the new lookup function, but it is still much
>> better than the current approach.  On a benchmark containing 100,000 calls
>> where *none* of them are simplified I noticed a 30% speedup.  On a benchmark
>> containing 100,000 calls where *all* of them are simplified I noticed an 8%
>> speedup.  The original LibCallSimplifier caching patch also obtained a 30%
>> speedup in the case where nothing was simplified and a 13% speedup when all
>> calls where simplified.  I am comfortable that the new patch is slightly slower
>> for some cases (this is on average.  I saw a few runs where the too approaches
>> gave equivalent speedups).
>> 
>> Comments?  Would you all mind trying this patch for the benchmarks that you all
>> are interested in?  I personally like the new patch better.
>> 
>> -- 
>> Meador Inge
>> CodeSourcery / Mentor Embedded
>> <0001-LibCallSimplifier-optimize-speed-for-short-lived-ins.patch>
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130311/96f4a795/attachment.html>