[llvm-dev] Saving Compile Time in InstCombine

Mon Mar 20 06:30:59 PDT 2017

The goal is to improve compile-time, so I think we should measure what's
taking the most time and work from there. I'm not sure how the code
coverage report was made, but that seems like it is based on execution
frequency rather than time?

Here are some Instruments (macOS) screenshots of a time profile of a
release build of opt running on a Haswell machine:
$ opt -O2 row_common.bc -S -o /dev/null

The input is derived from the Chrome source file attached to:
https://bugs.llvm.org//show_bug.cgi?id=32037

It's an implicit assumption in this experiment that 'opt' time is important
to overall compile-time, but as you can see in the bug report, that's not
really true for this case. Also, this profile may not be representative of
other programs or important vs. other programs, but to summarize:

1. 123 ms out of 326 ms (38%) of the -O2 opt time is going into
InstCombine. So yes, InstCombine is a big chunk of opt time.
2. Focusing on computeKnownBits() within the InstCombine subtree: > 48% of
the time consumed by InstCombine is spent in ValueTracking.

We know that most InstCombines don't use ValueTracking, so I think
transforms that use computeKnownBits are the ones that should be grouped
under "ExpensiveCombines". Maybe there's another category for
"RareCombines", or we split InstCombine into multiple passes to further
reduce compile-time?

In any case, the profile lines up with David's comment about
simplifyDemanded / computeKnownBits and previous anecdotes in the "llvm is
getting slower" threads. Simple matchers are relatively short strands of
compare and branch spaghetti. It's the few transforms that use
ValueTracking that have a high relative cost. If we attack that, the
profile says we could recover much more than 1% if -O2 middle-end time is
significant for most compiles.

On Sat, Mar 18, 2017 at 8:43 AM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
> On 03/17/2017 08:12 PM, David Majnemer wrote:
>
> Honestly, I'm not a huge fan of this change as-is. The set of transforms
> that were added behind ExpensiveChecks seems awfully strange and many would
> not lead the reader to believe that they are expensive at all
> (the SimplifyDemandedInstructionBits and foldICmpUsingKnownBits calls
> being the obvious expensive routines).
>
> The purpose of many of InstCombine's xforms is to canonicalize the IR to
> make life easier for downstream passes and analyses.
>
> InstCombine internally relies on knowing what transforms it may or may not
> perform. This is important: canonicalizations may duel endlessly if we get
> this wrong; the order of the combines is also important for exactly the
> same reason (SelectionDAG deals with this problem in a different way with
> its pattern complexity field).
>
> Another concern with moving seemingly arbitrary combines under
> ExpensiveCombines is that it will make it that much harder to understand
> what is and is not canonical at a given point during the execution of the
> optimizer.
>
>
> I agree with this up to a point. If we have these kinds of
> canonicalization dependencies that depend on ValueTracking's depth, this
> seems very fragile. Even if we introduce caching, thus making the depth
> often infinite, if it will still be finite in the face of updates then we
> still need to be careful (plus, if we're worried about being able to
> understand the canonical form, then depending on known-bits analysis can
> make defining this form subtle).
>
>
> I'd be much more interested in a patch which caches the result of
> frequently called ValueTracking functionality like ComputeKnownBits,
> ComputeSignBit, etc. which often doesn't change but is not intelligently
> reused. I imagine that the performance win might be quite comparable. Such
> a patch would have the benefit of keeping the set of available transforms
> constant throughout the pipeline while bringing execution time down; I
> wouldn't be at all surprised if caching the ValueTracking functions
> resulted in a bigger time savings.
>
>
> I'd started working on this a few months ago; I didn't finish the patch
> (mostly because I discovered that there's also a need to invalidate the
> cache whenever to perform a transformation that drops nsw/nuw flags and
> I've never got to that part), but I'm happy to provide my work-in-progress
> to anyone interested. cc'ing Davide who had also expressed an interest in
> this.
>
>  -Hal
>
>
>
> On Fri, Mar 17, 2017 at 5:49 PM, Hal Finkel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>> On 03/17/2017 04:30 PM, Mehdi Amini via llvm-dev wrote:
>>
>>
>> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Hi,
>>
>> One of the most time-consuming passes in LLVM middle-end is InstCombine
>> (see e.g. [1]). It is a very powerful pass capable of doing all the crazy
>> stuff, and new patterns are being constantly introduced there. The problem
>> is that we often use it just as a clean-up pass: it's scheduled 6 times in
>> the current pass pipeline, and each time it's invoked it checks all known
>> patterns. It sounds ok for O3, where we try to squeeze as much performance
>> as possible, but it is too excessive for other opt-levels. InstCombine has
>> an ExpensiveCombines parameter to address that - but I think it's underused
>> at the moment.
>>
>>
>> Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?) but I
>> believe has always been intended to be extended the way you’re doing it. So
>> I support this effort :)
>>
>>
>> +1
>>
>> Also, did your profiling reveal why the other combines are expensive?
>> Among other things, I'm curious if the expensive ones tend to spend a lot
>> of time in ValueTracking (getting known bits and similar)?
>>
>>  -Hal
>>
>>
>>
>> CC: David for the general direction on InstCombine though.
>>
>>
>> —
>> Mehdi
>>
>>
>>
>>
>> Trying to find out, which patterns are important, and which are rare, I
>> profiled clang using CTMark and got the following coverage report:
>> <InstCombine_covreport.html>
>> (beware, the file is ~6MB).
>>
>> Guided by this profile I moved some patterns under the "if
>> (ExpensiveCombines)" check, which expectedly happened to be neutral for
>> runtime performance, but improved compile-time. The testing results are
>> below (measured for Os).
>>
>> Performance Improvements - Compile Time Δ  Previous Current σ
>> CTMark/sqlite3/sqlite3
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> -1.55%
>> 6.8155 6.7102 0.0081
>> CTMark/mafft/pairlocalalign
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2> -1.05% 8.0407
>> 7.9559 0.0193
>> CTMark/ClamAV/clamscan
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> -1.02%
>> 11.3893 11.2734 0.0081
>> CTMark/lencod/lencod
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2> -1.01%
>> 12.8763 12.7461 0.0244
>> CTMark/SPASS/SPASS
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2> -1.01%
>> 12.5048 12.3791 0.0340
>>
>> Performance Improvements - Compile Time Δ  Previous Current σ
>> External/SPEC/CINT2006/403.gcc/403.gcc
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2> -1.64%
>> 54.0801 53.1930 -
>> External/SPEC/CINT2006/400.perlbench/400.perlbench
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2> -1.25%
>> 19.1481 18.9091 -
>> External/SPEC/CINT2006/445.gobmk/445.gobmk
>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2> -1.01%
>> 15.2819 15.1274 -
>>
>>
>> Do such changes make sense? The patch doesn't change O3, but it does
>> change Os and potentially can change performance there (though I didn't see
>> any changes in my tests).
>>
>> The patch is attached for the reference, if we decide to go for it, I'll
>> upload it to phab:
>>
>> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch>
>>
>>
>> Thanks,
>> Michael
>>
>> [1]: http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> _______________________________________________ LLVM Developers mailing
>> list llvm-dev at lists.llvm.org http://lists.llvm.org/cgi-bin/
>> mailman/listinfo/llvm-dev
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170320/453a670b/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: optO2profile.png
Type: image/png
Size: 481437 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170320/453a670b/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: computeKnownBIts.png
Type: image/png
Size: 234772 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170320/453a670b/attachment-0003.png>