[llvm] 6102310 - [InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block
Nikita Popov via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 28 14:21:48 PDT 2020
On Fri, Aug 28, 2020 at 10:53 PM Nikita Popov <nikita.ppv at gmail.com> wrote:
> On Thu, Aug 27, 2020 at 11:01 PM Roman Lebedev <lebedev.ri at gmail.com>
> wrote:
>
>> On Thu, Aug 27, 2020 at 10:43 PM Roman Lebedev <lebedev.ri at gmail.com>
>> wrote:
>> >
>> > On Thu, Aug 27, 2020 at 9:55 PM Nikita Popov <nikita.ppv at gmail.com>
>> wrote:
>> > >
>> > > On Thu, Aug 27, 2020 at 5:47 PM Roman Lebedev via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>> > >>
>> > >>
>> > >> Author: Roman Lebedev
>> > >> Date: 2020-08-27T18:47:04+03:00
>> > >> New Revision: 6102310d814ad73eab60a88b21dd70874f7a056f
>> > >>
>> > >> URL:
>> https://github.com/llvm/llvm-project/commit/6102310d814ad73eab60a88b21dd70874f7a056f
>> > >> DIFF:
>> https://github.com/llvm/llvm-project/commit/6102310d814ad73eab60a88b21dd70874f7a056f.diff
>> > >>
>> > >> LOG: [InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic
>> block
>> > >>
>> > >> Apparently, we don't do this, neither in EarlyCSE, nor in
>> InstSimplify,
>> > >> nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places..
>> > >>
>> > >> While i could teach EarlyCSE how to hash PHI nodes,
>> > >> we can't really do much (anything?) even if we find two identical
>> > >> PHI nodes in different basic blocks, same-BB case is the interesting
>> one,
>> > >> and if we teach InstSimplify about it (which is what i wanted
>> originally,
>> > >> https://reviews.llvm.org/D86530), we get EarlyCSE support for free.
>> > >>
>> > >> So i would think this is pretty uncontroversial.
>> > >>
>> > >> On vanilla llvm test-suite + RawSpeed, this has the following
>> effects:
>> > >> ```
>> > >> | statistic name | baseline |
>> proposed | Δ | % | \|%\| |
>> > >>
>> |----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
>> > >> | instsimplify.NumPHICSE | 0 |
>> 23779 | 23779 | 0.00% | 0.00% |
>> > >> | asm-printer.EmittedInsts | 7942328 |
>> 7942392 | 64 | 0.00% | 0.00% |
>> > >> | assembler.ObjectBytes | 273069192 |
>> 273084704 | 15512 | 0.01% | 0.01% |
>> > >> | correlated-value-propagation.NumPhis | 18412 |
>> 18539 | 127 | 0.69% | 0.69% |
>> > >> | early-cse.NumCSE | 2183283 |
>> 2183227 | -56 | 0.00% | 0.00% |
>> > >> | early-cse.NumSimplify | 550105 |
>> 542090 | -8015 | -1.46% | 1.46% |
>> > >> | instcombine.NumAggregateReconstructionsSimplified | 73 |
>> 4506 | 4433 | 6072.60% | 6072.60% |
>> > >> | instcombine.NumCombined | 3640264 |
>> 3664769 | 24505 | 0.67% | 0.67% |
>> > >> | instcombine.NumDeadInst | 1778193 |
>> 1783183 | 4990 | 0.28% | 0.28% |
>> > >> | instcount.NumCallInst | 1758401 |
>> 1758799 | 398 | 0.02% | 0.02% |
>> > >> | instcount.NumInvokeInst | 59478 |
>> 59502 | 24 | 0.04% | 0.04% |
>> > >> | instcount.NumPHIInst | 330557 |
>> 330533 | -24 | -0.01% | 0.01% |
>> > >> | instcount.TotalInsts | 8831952 |
>> 8832286 | 334 | 0.00% | 0.00% |
>> > >> | simplifycfg.NumInvokes | 4300 |
>> 4410 | 110 | 2.56% | 2.56% |
>> > >> | simplifycfg.NumSimpl | 1019808 |
>> 999607 | -20201 | -1.98% | 1.98% |
>> > >> ```
>> > >> I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` ->
>> `call`
>> > >> transforms, and counter-intuitively results in *more* instructions
>> total.
>> > >>
>> > >> That being said, the PHI count doesn't decrease that much,
>> > >> and looking at some examples, it seems at least some of them
>> > >> were previously getting PHI CSE'd in SimplifyCFG of all places..
>> > >>
>> > >> I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same
>> time.
>> > >> As a comment in `InstCombinerImpl::visitPHINode()` already stated,
>> > >> there are no guarantees on the ordering of the operands of a PHI
>> node,
>> > >> so if we just naively compare them, we may false-negatively say that
>> > >> the nodes are not equal when the only difference is operand order,
>> > >> which is especially important since the fold is in InstSimplify,
>> > >> so we can't rely on InstCombine sorting them beforehand.
>> > >>
>> > >> Fixing this for the general case is costly (geomean +0.02%),
>> > >> and does not appear to catch anything in test-suite, but for
>> > >> the same-BB case, it's trivial, so let's fix at least that.
>> > >>
>> > >> As per
>> http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions
>> > >> this appears to cause geomean +0.03% compile time increase
>> (regression),
>> > >> but geomean -0.01%..-0.04% code size decrease (improvement).
>> >
>> > > This transform seems to be breaking some new ground for InstSimplify,
>> in that it can
>> > > now simplify to instructions that are not def-use "reachable" from
>> the root instruction.
>> > True.
>> >
>> > > I'm a bit uncomfortable with this as a general direction.
>> > Please can you be more specific?
>> >
>> > > Would we lose out much transformation power if we did this in
>> InstCombine only instead?
>> > I'm not really sure, let me see how it affects -stats i guess.
>> Effect of moving it to instcombine: https://reviews.llvm.org/P8233
>> (again, vanilla llvm test-suite + RawSpeed)
>> So having it as a InstSimplify utility catches ~7% more PHI's.
>> Honestly i expected this number to be *much* lower.
>>
>
> Thanks for testing! I think it should be expected that doing something in
> InstSimplify will trigger more, simplify by dint of running earlier and
> more often (i.e., before other folds happen). It doesn't look like this
> change affects the motivating statistics (simplifycfg.NumInvokes and
> instcombine.NumAggregateReconstructionsSimplified remain the same).
>
> I think i'd like to better understand what breakage keeping it in
>> InstSimplify causes.
>>
>
> My concern here is more about the pass design than any actual breakage.
> Performing CSE is outside the scope of InstSimplify responsibility. We have
> other passes, such as EarlyCSE and GVN that are responsible for CSE.
>
> Now, InstCombine is of course also not responsible for performing CSE.
> However, InstCombine is already a rather pragmatic kitchen sink of random
> transforms and already contains quite a few CSE related optimizations, such
> as basic redundant load elimination. Adding Phi CSE support to it doesn't
> change the types of optimizations InstCombine performs in a fundamental
> way, just adds one more special case.
>
> Regards,
> Nikita
>
As for actual breakage: I have a strong suspicion that this change has
introduced compiler non-determinism. Looking at
https://llvm-compile-time-tracker.com/index.php?config=ReleaseThinLTO&stat=size-text
text size fluctuations have started occurring directly after this commit.
Regards,
Nikita
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200828/b2c6dfec/attachment.html>
More information about the llvm-commits
mailing list