[llvm] 6102310 - [InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block

Fri Aug 28 14:21:48 PDT 2020

On Fri, Aug 28, 2020 at 10:53 PM Nikita Popov <nikita.ppv at gmail.com> wrote:

> On Thu, Aug 27, 2020 at 11:01 PM Roman Lebedev <lebedev.ri at gmail.com>
> wrote:
>
>> On Thu, Aug 27, 2020 at 10:43 PM Roman Lebedev <lebedev.ri at gmail.com>
>> wrote:
>> >
>> > On Thu, Aug 27, 2020 at 9:55 PM Nikita Popov <nikita.ppv at gmail.com>
>> wrote:
>> > >
>> > > On Thu, Aug 27, 2020 at 5:47 PM Roman Lebedev via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>> > >>
>> > >>
>> > >> Author: Roman Lebedev
>> > >> Date: 2020-08-27T18:47:04+03:00
>> > >> New Revision: 6102310d814ad73eab60a88b21dd70874f7a056f
>> > >>
>> > >> URL:
>> https://github.com/llvm/llvm-project/commit/6102310d814ad73eab60a88b21dd70874f7a056f
>> > >> DIFF:
>> https://github.com/llvm/llvm-project/commit/6102310d814ad73eab60a88b21dd70874f7a056f.diff
>> > >>
>> > >> LOG: [InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic
>> block
>> > >>
>> > >> Apparently, we don't do this, neither in EarlyCSE, nor in
>> InstSimplify,
>> > >> nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places..
>> > >>
>> > >> While i could teach EarlyCSE how to hash PHI nodes,
>> > >> we can't really do much (anything?) even if we find two identical
>> > >> PHI nodes in different basic blocks, same-BB case is the interesting
>> one,
>> > >> and if we teach InstSimplify about it (which is what i wanted
>> originally,
>> > >> https://reviews.llvm.org/D86530), we get EarlyCSE support for free.
>> > >>
>> > >> So i would think this is pretty uncontroversial.
>> > >>
>> > >> On vanilla llvm test-suite + RawSpeed, this has the following
>> effects:
>> > >> ```
>> > >> | statistic name                                     | baseline  |
>> proposed  |      Δ |        % |    \|%\| |
>> > >>
>> |----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
>> > >> | instsimplify.NumPHICSE                             | 0         |
>> 23779     |  23779 |    0.00% |    0.00% |
>> > >> | asm-printer.EmittedInsts                           | 7942328   |
>> 7942392   |     64 |    0.00% |    0.00% |
>> > >> | assembler.ObjectBytes                              | 273069192 |
>> 273084704 |  15512 |    0.01% |    0.01% |
>> > >> | correlated-value-propagation.NumPhis               | 18412     |
>> 18539     |    127 |    0.69% |    0.69% |
>> > >> | early-cse.NumCSE                                   | 2183283   |
>> 2183227   |    -56 |    0.00% |    0.00% |
>> > >> | early-cse.NumSimplify                              | 550105    |
>> 542090    |  -8015 |   -1.46% |    1.46% |
>> > >> | instcombine.NumAggregateReconstructionsSimplified  | 73        |
>> 4506      |   4433 | 6072.60% | 6072.60% |
>> > >> | instcombine.NumCombined                            | 3640264   |
>> 3664769   |  24505 |    0.67% |    0.67% |
>> > >> | instcombine.NumDeadInst                            | 1778193   |
>> 1783183   |   4990 |    0.28% |    0.28% |
>> > >> | instcount.NumCallInst                              | 1758401   |
>> 1758799   |    398 |    0.02% |    0.02% |
>> > >> | instcount.NumInvokeInst                            | 59478     |
>> 59502     |     24 |    0.04% |    0.04% |
>> > >> | instcount.NumPHIInst                               | 330557    |
>> 330533    |    -24 |   -0.01% |    0.01% |
>> > >> | instcount.TotalInsts                               | 8831952   |
>> 8832286   |    334 |    0.00% |    0.00% |
>> > >> | simplifycfg.NumInvokes                             | 4300      |
>> 4410      |    110 |    2.56% |    2.56% |
>> > >> | simplifycfg.NumSimpl                               | 1019808   |
>> 999607    | -20201 |   -1.98% |    1.98% |
>> > >> ```
>> > >> I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` ->
>> `call`
>> > >> transforms, and counter-intuitively results in *more* instructions
>> total.
>> > >>
>> > >> That being said, the PHI count doesn't decrease that much,
>> > >> and looking at some examples, it seems at least some of them
>> > >> were previously getting PHI CSE'd in SimplifyCFG of all places..
>> > >>
>> > >> I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same
>> time.
>> > >> As a comment in `InstCombinerImpl::visitPHINode()` already stated,
>> > >> there are no guarantees on the ordering of the operands of a PHI
>> node,
>> > >> so if we just naively compare them, we may false-negatively say that
>> > >> the nodes are not equal when the only difference is operand order,
>> > >> which is especially important since the fold is in InstSimplify,
>> > >> so we can't rely on InstCombine sorting them beforehand.
>> > >>
>> > >> Fixing this for the general case is costly (geomean +0.02%),
>> > >> and does not appear to catch anything in test-suite, but for
>> > >> the same-BB case, it's trivial, so let's fix at least that.
>> > >>
>> > >> As per
>> http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions
>> > >> this appears to cause geomean +0.03% compile time increase
>> (regression),
>> > >> but geomean -0.01%..-0.04% code size decrease (improvement).
>> >
>> > > This transform seems to be breaking some new ground for InstSimplify,
>> in that it can
>> > > now simplify to instructions that are not def-use "reachable" from
>> the root instruction.
>> > True.
>> >
>> > > I'm a bit uncomfortable with this as a general direction.
>> > Please can you be more specific?
>> >
>> > > Would we lose out much transformation power if we did this in
>> InstCombine only instead?
>> > I'm not really sure, let me see how it affects -stats i guess.
>> Effect of moving it to instcombine: https://reviews.llvm.org/P8233
>> (again, vanilla llvm test-suite + RawSpeed)
>> So having it as a InstSimplify utility catches ~7% more PHI's.
>> Honestly i expected this number to be *much* lower.
>>
>
> Thanks for testing! I think it should be expected that doing something in
> InstSimplify will trigger more, simplify by dint of running earlier and
> more often (i.e., before other folds happen). It doesn't look like this
> change affects the motivating statistics (simplifycfg.NumInvokes and
> instcombine.NumAggregateReconstructionsSimplified remain the same).
>
> I think i'd like to better understand what breakage keeping it in
>> InstSimplify causes.
>>
>
> My concern here is more about the pass design than any actual breakage.
> Performing CSE is outside the scope of InstSimplify responsibility. We have
> other passes, such as EarlyCSE and GVN that are responsible for CSE.
>
> Now, InstCombine is of course also not responsible for performing CSE.
> However, InstCombine is already a rather pragmatic kitchen sink of random
> transforms and already contains quite a few CSE related optimizations, such
> as basic redundant load elimination. Adding Phi CSE support to it doesn't
> change the types of optimizations InstCombine performs in a fundamental
> way, just adds one more special case.
>
> Regards,
> Nikita
>

As for actual breakage: I have a strong suspicion that this change has
introduced compiler non-determinism. Looking at
https://llvm-compile-time-tracker.com/index.php?config=ReleaseThinLTO&stat=size-text
text size fluctuations have started occurring directly after this commit.

Regards,
Nikita
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200828/b2c6dfec/attachment.html>