[llvm] 6102310 - [InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block

Fri Aug 28 13:53:00 PDT 2020

On Thu, Aug 27, 2020 at 11:01 PM Roman Lebedev <lebedev.ri at gmail.com> wrote:

> On Thu, Aug 27, 2020 at 10:43 PM Roman Lebedev <lebedev.ri at gmail.com>
> wrote:
> >
> > On Thu, Aug 27, 2020 at 9:55 PM Nikita Popov <nikita.ppv at gmail.com>
> wrote:
> > >
> > > On Thu, Aug 27, 2020 at 5:47 PM Roman Lebedev via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
> > >>
> > >>
> > >> Author: Roman Lebedev
> > >> Date: 2020-08-27T18:47:04+03:00
> > >> New Revision: 6102310d814ad73eab60a88b21dd70874f7a056f
> > >>
> > >> URL:
> https://github.com/llvm/llvm-project/commit/6102310d814ad73eab60a88b21dd70874f7a056f
> > >> DIFF:
> https://github.com/llvm/llvm-project/commit/6102310d814ad73eab60a88b21dd70874f7a056f.diff
> > >>
> > >> LOG: [InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic
> block
> > >>
> > >> Apparently, we don't do this, neither in EarlyCSE, nor in
> InstSimplify,
> > >> nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places..
> > >>
> > >> While i could teach EarlyCSE how to hash PHI nodes,
> > >> we can't really do much (anything?) even if we find two identical
> > >> PHI nodes in different basic blocks, same-BB case is the interesting
> one,
> > >> and if we teach InstSimplify about it (which is what i wanted
> originally,
> > >> https://reviews.llvm.org/D86530), we get EarlyCSE support for free.
> > >>
> > >> So i would think this is pretty uncontroversial.
> > >>
> > >> On vanilla llvm test-suite + RawSpeed, this has the following effects:
> > >> ```
> > >> | statistic name                                     | baseline  |
> proposed  |      Δ |        % |    \|%\| |
> > >>
> |----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
> > >> | instsimplify.NumPHICSE                             | 0         |
> 23779     |  23779 |    0.00% |    0.00% |
> > >> | asm-printer.EmittedInsts                           | 7942328   |
> 7942392   |     64 |    0.00% |    0.00% |
> > >> | assembler.ObjectBytes                              | 273069192 |
> 273084704 |  15512 |    0.01% |    0.01% |
> > >> | correlated-value-propagation.NumPhis               | 18412     |
> 18539     |    127 |    0.69% |    0.69% |
> > >> | early-cse.NumCSE                                   | 2183283   |
> 2183227   |    -56 |    0.00% |    0.00% |
> > >> | early-cse.NumSimplify                              | 550105    |
> 542090    |  -8015 |   -1.46% |    1.46% |
> > >> | instcombine.NumAggregateReconstructionsSimplified  | 73        |
> 4506      |   4433 | 6072.60% | 6072.60% |
> > >> | instcombine.NumCombined                            | 3640264   |
> 3664769   |  24505 |    0.67% |    0.67% |
> > >> | instcombine.NumDeadInst                            | 1778193   |
> 1783183   |   4990 |    0.28% |    0.28% |
> > >> | instcount.NumCallInst                              | 1758401   |
> 1758799   |    398 |    0.02% |    0.02% |
> > >> | instcount.NumInvokeInst                            | 59478     |
> 59502     |     24 |    0.04% |    0.04% |
> > >> | instcount.NumPHIInst                               | 330557    |
> 330533    |    -24 |   -0.01% |    0.01% |
> > >> | instcount.TotalInsts                               | 8831952   |
> 8832286   |    334 |    0.00% |    0.00% |
> > >> | simplifycfg.NumInvokes                             | 4300      |
> 4410      |    110 |    2.56% |    2.56% |
> > >> | simplifycfg.NumSimpl                               | 1019808   |
> 999607    | -20201 |   -1.98% |    1.98% |
> > >> ```
> > >> I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call`
> > >> transforms, and counter-intuitively results in *more* instructions
> total.
> > >>
> > >> That being said, the PHI count doesn't decrease that much,
> > >> and looking at some examples, it seems at least some of them
> > >> were previously getting PHI CSE'd in SimplifyCFG of all places..
> > >>
> > >> I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same
> time.
> > >> As a comment in `InstCombinerImpl::visitPHINode()` already stated,
> > >> there are no guarantees on the ordering of the operands of a PHI node,
> > >> so if we just naively compare them, we may false-negatively say that
> > >> the nodes are not equal when the only difference is operand order,
> > >> which is especially important since the fold is in InstSimplify,
> > >> so we can't rely on InstCombine sorting them beforehand.
> > >>
> > >> Fixing this for the general case is costly (geomean +0.02%),
> > >> and does not appear to catch anything in test-suite, but for
> > >> the same-BB case, it's trivial, so let's fix at least that.
> > >>
> > >> As per
> http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions
> > >> this appears to cause geomean +0.03% compile time increase
> (regression),
> > >> but geomean -0.01%..-0.04% code size decrease (improvement).
> >
> > > This transform seems to be breaking some new ground for InstSimplify,
> in that it can
> > > now simplify to instructions that are not def-use "reachable" from the
> root instruction.
> > True.
> >
> > > I'm a bit uncomfortable with this as a general direction.
> > Please can you be more specific?
> >
> > > Would we lose out much transformation power if we did this in
> InstCombine only instead?
> > I'm not really sure, let me see how it affects -stats i guess.
> Effect of moving it to instcombine: https://reviews.llvm.org/P8233
> (again, vanilla llvm test-suite + RawSpeed)
> So having it as a InstSimplify utility catches ~7% more PHI's.
> Honestly i expected this number to be *much* lower.
>

Thanks for testing! I think it should be expected that doing something in
InstSimplify will trigger more, simplify by dint of running earlier and
more often (i.e., before other folds happen). It doesn't look like this
change affects the motivating statistics (simplifycfg.NumInvokes and
instcombine.NumAggregateReconstructionsSimplified remain the same).

I think i'd like to better understand what breakage keeping it in
> InstSimplify causes.
>

My concern here is more about the pass design than any actual breakage.
Performing CSE is outside the scope of InstSimplify responsibility. We have
other passes, such as EarlyCSE and GVN that are responsible for CSE.

Now, InstCombine is of course also not responsible for performing CSE.
However, InstCombine is already a rather pragmatic kitchen sink of random
transforms and already contains quite a few CSE related optimizations, such
as basic redundant load elimination. Adding Phi CSE support to it doesn't
change the types of optimizations InstCombine performs in a fundamental
way, just adds one more special case.

Regards,
Nikita

> > > Calling SimplifyInstruction on phi nodes is a very common pattern,
> > > and pretty much all of those uses have a comment above them
> > > that specifically says that the simplification call is being made
> > > to reduce the phi to a common operand.
> > Well, right, because that is what SimplifyPHINode() only knew to do.
> >
> > > It's not immediately clear that PHI CSE is desired and/or safe in all
> those cases.
> > I'm not really following.
> > If you have concrete examples i'd take a look.
> >
> > Roman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200828/a66d9ec6/attachment.html>