[llvm-dev] Saving Compile Time in InstCombine
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Wed Mar 22 06:36:25 PDT 2017
On 03/20/2017 11:51 PM, Gerolf Hoflehner wrote:
>
>> On Mar 17, 2017, at 6:12 PM, David Majnemer via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> Honestly, I'm not a huge fan of this change as-is. The set of
>> transforms that were added behind ExpensiveChecks seems awfully
>> strange and many would not lead the reader to believe that they are
>> expensive at all (the SimplifyDemandedInstructionBits and
>> foldICmpUsingKnownBits calls being the obvious expensive routines).
>>
>> The purpose of many of InstCombine's xforms is to canonicalize the IR
>> to make life easier for downstream passes and analyses.
>
> As we get further along with compile-time improvements one question we
> need to ask ourselves more frequently is about the effectiveness of
> optimizations/passes. For example - in this case - how can we make an
> educated assessment that running the combiner N times is a good
> cost/benefit investment of compute resources? The questions below are
> meant to figure out what technologies/instrumentations/etc could help
> towards a more data-driven decision process when it comes to the
> effectiveness of optimizations. Instcombiner might just be an
> inspirational use case to see what is possible in that direction.
>
> The combiner is invoked in full multiple times. But is it really
> necessary to run all of it for that purpose? After instcombine is run
> once is there a mapping from transformation -> combines? I suspect
> most transformations could invoke a subset of combines to
> re-canonicalize. Or, if there was a (cheap) verifier for canonical IR,
> it could invoke a specific canonicalization routine. Instrumenting the
> instcombiner and checking which patterns actually kick in (for
> different invocations) might give insight into how the combiner could
> be structured and so that only a subset of pattern need to be checked.
>>
>> InstCombine internally relies on knowing what transforms it may or
>> may not perform. This is important: canonicalizations may duel
>> endlessly if we get this wrong; the order of the combines is also
>> important for exactly the same reason (SelectionDAG deals with this
>> problem in a different way with its pattern complexity field).
>
> Can you elaborate on this “duel endlessly” with specific examples?
> This is out of curiosity. There must be verifiers that check that this
> cannot happen. Or an implementation strategy that guarantees that.
> Global isel will run into the same/similar question when it gets far
> enough to replace SD.
>>
>> Another concern with moving seemingly arbitrary combines under
>> ExpensiveCombines is that it will make it that much harder to
>> understand what is and is not canonical at a given point during the
>> execution of the optimizer.
>
>>
>> I'd be much more interested in a patch which caches the result of
>> frequently called ValueTracking functionality like ComputeKnownBits,
>> ComputeSignBit, etc. which often doesn't change but is not
>> intelligently reused. I imagine that the performance win might be
>> quite comparable.
>
> Can you back this up with measurements? Caching schemes are tricky. Is
> there a way to evaluate when the results of ComputeKnownBits etc is
> actually effective meaining the result is used and gives faster
> instructions? E.g. it might well be that only the first instance of
> inst_combine benefits from computing the bits.
To (hopefully) make it easier to answer this question, I've posted my
(work-in-progress) patch which adds a known-bits cache to InstCombine. I
rebased it yesterday, so it should be fairly easy to apply:
https://reviews.llvm.org/D31239 - Seeing what this does to the
performance of the benchmarks mentioned in this thread (among others)
would certainly be interesting.
-Hal
>
>
>> Such a patch would have the benefit of keeping the set of available
>> transforms constant throughout the pipeline while bringing execution
>> time down; I wouldn't be at all surprised if caching the
>> ValueTracking functions resulted in a bigger time savings.
>>
>> On Fri, Mar 17, 2017 at 5:49 PM, Hal Finkel via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>
>> On 03/17/2017 04:30 PM, Mehdi Amini via llvm-dev wrote:
>>>
>>>> On Mar 17, 2017, at 11:50 AM, Mikhail Zolotukhin via llvm-dev
>>>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> One of the most time-consuming passes in LLVM middle-end is
>>>> InstCombine (see e.g. [1]). It is a very powerful pass capable
>>>> of doing all the crazy stuff, and new patterns are being
>>>> constantly introduced there. The problem is that we often use
>>>> it just as a clean-up pass: it's scheduled 6 times in the
>>>> current pass pipeline, and each time it's invoked it checks all
>>>> known patterns. It sounds ok for O3, where we try to squeeze as
>>>> much performance as possible, but it is too excessive for other
>>>> opt-levels. InstCombine has an ExpensiveCombines parameter to
>>>> address that - but I think it's underused at the moment.
>>>
>>> Yes, the “ExpensiveCombines” has been added recently (4.0? 3.9?)
>>> but I believe has always been intended to be extended the way
>>> you’re doing it. So I support this effort :)
>>
>> +1
>>
>> Also, did your profiling reveal why the other combines are
>> expensive? Among other things, I'm curious if the expensive ones
>> tend to spend a lot of time in ValueTracking (getting known bits
>> and similar)?
>>
>> -Hal
>>
>>
>>>
>>> CC: David for the general direction on InstCombine though.
>>>
>>>
>>> —
>>> Mehdi
>>>
>>>
>>>
>>>>
>>>> Trying to find out, which patterns are important, and which are
>>>> rare, I profiled clang using CTMark and got the following
>>>> coverage report:
>>>> <InstCombine_covreport.html>
>>>> (beware, the file is ~6MB).
>>>>
>>>> Guided by this profile I moved some patterns under the "if
>>>> (ExpensiveCombines)" check, which expectedly happened to be
>>>> neutral for runtime performance, but improved compile-time. The
>>>> testing results are below (measured for Os).
>>>>
>>>> Performance Improvements - Compile Time Δ Previous Current σ
>>>> CTMark/sqlite3/sqlite3
>>>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2>
>>>> -1.55% 6.8155 6.7102 0.0081
>>>> CTMark/mafft/pairlocalalign
>>>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.1=2>
>>>> -1.05% 8.0407 7.9559 0.0193
>>>> CTMark/ClamAV/clamscan
>>>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2>
>>>> -1.02% 11.3893 11.2734 0.0081
>>>> CTMark/lencod/lencod
>>>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.10=2>
>>>> -1.01% 12.8763 12.7461 0.0244
>>>> CTMark/SPASS/SPASS
>>>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.5=2>
>>>> -1.01% 12.5048 12.3791 0.0340
>>>>
>>>>
>>>> Performance Improvements - Compile Time Δ Previous Current σ
>>>> External/SPEC/CINT2006/403.gcc/403.gcc
>>>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.14=2>
>>>> -1.64% 54.0801 53.1930 -
>>>> External/SPEC/CINT2006/400.perlbench/400.perlbench
>>>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.7=2>
>>>> -1.25% 19.1481 18.9091 -
>>>> External/SPEC/CINT2006/445.gobmk/445.gobmk
>>>> <http://michaelsmacmini.local/perf/v4/nts/2/graph?test.15=2>
>>>> -1.01% 15.2819 15.1274 -
>>>>
>>>>
>>>>
>>>> Do such changes make sense? The patch doesn't change O3, but it
>>>> does change Os and potentially can change performance there
>>>> (though I didn't see any changes in my tests).
>>>>
>>>> The patch is attached for the reference, if we decide to go for
>>>> it, I'll upload it to phab:
>>>>
>>>> <0001-InstCombine-Move-some-infrequent-patterns-under-if-E.patch>
>>>>
>>>>
>>>> Thanks,
>>>> Michael
>>>>
>>>> [1]:
>>>> http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html
>>>> <http://lists.llvm.org/pipermail/llvm-dev/2016-December/108279.html>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> _______________________________________________ LLVM Developers
>> mailing list llvm-dev at lists.llvm.org
>> <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>> _______________________________________________ LLVM Developers
>> mailing list llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170322/56166879/attachment-0001.html>
More information about the llvm-dev
mailing list