[llvm-dev] Saving Compile Time in InstCombine

Thu Apr 13 17:18:36 PDT 2017

I’m taking a first look at InstCombine performance. I picked up the caching patch and ran a few experiments on one of our larger C++ apps. The size of the *.0.2.internalize.bc no-debug IR is ~ 30M. Here are my observations so far.

Interestingly, caching produced a slight but measurable performance degradation of -O3 compile time.

InstCombine takes about 35% of total execution time, of which ~20% originates from CGPassManager.

ComputeKnownBits contributes 7.8%, but calls from InstCombine contribute only 2.6% to the total execution time. Caching only covers InstCombine use of KnownBits. This may explain limited gain or even slight degradation if KnownBits are not re-computed as often as we thought.

Most of the time is spent in instruction visitor routines. CmpInst, LoadInst, CallInst, GetElementPtrInst and StoreInst are the top contributors.

ICmpInst          6.1%
LoadInst          5.5%
CallInst          2.1%
GetElementPtrInst 2.1%
StoreInst         1.6%

Out of 35% InstCombine time, about half is spent in the top 5 visitor routines. 

I wanted to see what transformations InstCombine actually performs. Using -debug option turned out not to be very scalable. Never mind the large output size of the trace, running "opt -debug -instcombine” on anything other than a small IR is excruciatingly slow. Out of curiosity I profiled it too: 96% of the time is spent decoding and printing instructions. Is this a known problem? If so, what are the alternatives for debugging large scale problem? If not, it’s possibly another item to add to the to-do list.

Back to InstCombine, from the profile it does not appear there’s an obvious magic bullet that can help drastically improve performance. I will take a closer look at visitor functions and see if there’s anything that can be done.

Dmitry.

> On Mar 22, 2017, at 6:45 PM, Davide Italiano via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> On Wed, Mar 22, 2017 at 6:29 PM, Mikhail Zolotukhin via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> 
>> In my testing results are not that impressive, but that's because I'm now focusing on Os. For me even complete disabling of all KnownBits-related patterns in InstCombine places the results very close to the noise level. In my original patch I also had some extra patterns moved under ExpensiveCombines - and that seems to make a difference too (without this part, or without the KnownBits part I get results below 1%, which are not reported as regressions/improvements).
>> 
> 
> Have you profiled a single InstCombine run to see where we actually
> spend our cycles (as Sanjay did for his reduced testcase)?
> 
>> I realize that InstCombine doesn't usually do any harm, if we don't care about compile time, but that's only the case for O3 (to some extent), not for other optimization levels.
> 
> Independently from what's the optimization level, I think compile-time
> is important. Note, for example, that we run a (kinda) similar
> pipeline at O3 and LTO (full, that is), where the impact of compile
> time is much more evident. Also, while people are not generally bitten
> by O3 compilation time, you may end up with terrible performances for
> large TUs (and I unfortunately learned this the hard way).
> 
> --
> Davide
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev