[PATCH] D99249: [PassManager] Run additional LICM before LoopRotate

Wed Mar 24 02:39:17 PDT 2021

lebedev.ri created this revision.
lebedev.ri added reviewers: thopre, fhahn, jeroen.dobbelaere, nikic, MaskRay, mkazantsev, reames, dmgreen, jdoerfert.
lebedev.ri added a project: LLVM.
Herald added subscribers: wenlei, kerbowa, steven_wu, hiraditya, nhaehnle, jvesely.
lebedev.ri requested review of this revision.

This is an alternative to D99204 <https://reviews.llvm.org/D99204>.
Better PhaseOrdering test TBD.

Loop rotation often has to perform code duplication
from header into preheader, which introduces PHI nodes.

In D99204 <https://reviews.llvm.org/D99204>, @thopre wrote:

> With loop peeling, it is important that unnecessary PHIs be avoided or
> it will leads to spurious peeling. One source of such PHIs is loop
> rotation which creates PHIs for invariant loads. Those PHIs are
> particularly problematic since loop peeling is now run as part of simple
> loop unrolling before GVN is run, and are thus a source of spurious
> peeling.
>
> Note that while some of the load can be hoisted and eventually
> eliminated by instruction combine, this is not always possible due to
> alignment issue. In particular, the motivating example [1] was a load
> inside a class instance which cannot be hoisted because the `this'
> pointer has an alignment of 1.
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp

Now, we could enhance LoopRotate to avoid duplicating code when not needed,
but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*)
We have LICM, and in fact we already run it right after LoopRotation.

We could try to move it to before LoopRotation,
that basically free from compile-time perspective:
https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions
But, looking at stats, i think it isn't great that we no longer do LICM after LoopRotation, in particular:

| statistic name                                   | LoopRotate-LICM | LICM-LoopRotate | Δ     | %       | abs(%) |
| asm-printer.EmittedInsts                         | 9015528               | 9015337               | -191  | 0.00%   | 0.00%  |
| indvars.NumElimCmp                               | 3536            | 3544            | 8     | 0.23%   | 0.23%  |
| indvars.NumElimExt                               | 36724           | 36578           | -146  | -0.40%  | 0.40%  |
| indvars.NumElimIdentity                          | 143             | 136             | -7    | -4.90%  | 4.90%  |
| indvars.NumElimIV                                | 1197            | 1187            | -10   | -0.84%  | 0.84%  |
| indvars.NumElimRem                               | 4               | 5               | 1     | 25.00%  | 25.00% |
| indvars.NumLFTR                                  | 29841           | 29889           | 48    | 0.16%   | 0.16%  |
| indvars.NumReplaced                              | 2293            | 2225            | -68   | -2.97%  | 2.97%  |
| indvars.NumSimplifiedSDiv                        | 6               | 8               | 2     | 33.33%  | 33.33% |
| indvars.NumWidened                               | 26437           | 26327           | -110  | -0.42%  | 0.42%  |
| instcount.TotalBlocks                            | 1178323               | 1173825               | -4498 | -0.38%  | 0.38%  |
| instcount.TotalFuncs                             | 111826          | 111830          | 4     | 0.00%   | 0.00%  |
| instcount.TotalInsts                             | 9905337               | 9896028               | -9309 | -0.09%  | 0.09%  |
| lcssa.NumLCSSA                                   | 425869          | 423982          | -1887 | -0.44%  | 0.44%  |
| licm.NumHoisted                                  | 378363          | 378759          | 396   | 0.10%   | 0.10%  |
| licm.NumMovedCalls                               | 2196            | 2211            | 15    | 0.68%   | 0.68%  |
| licm.NumMovedLoads                               | 35900           | 31822           | -4078 | -11.36% | 11.36% |
| licm.NumPromoted                                 | 11178           | 11154           | -24   | -0.21%  | 0.21%  |
| licm.NumSunk                                     | 1878363               | 1876168               | -2195 | -0.12%  | 0.12%  |
| loop-delete.NumDeleted                           | 8547            | 8402            | -145  | -1.70%  | 1.70%  |
| loop-instsimplify.NumSimplified                  | 12874           | 11888           | -986  | -7.66%  | 7.66%  |
| loop-peel.NumPeeled                              | 1008            | 925             | -83   | -8.23%  | 8.23%  |
| loop-rotate.NumNotRotatedDueToHeaderSize         | 368             | 365             | -3    | -0.82%  | 0.82%  |
| loop-rotate.NumRotated                           | 42012           | 42000           | -12   | -0.03%  | 0.03%  |
| loop-simplifycfg.NumLoopBlocksDeleted            | 240             | 242             | 2     | 0.83%   | 0.83%  |
| loop-simplifycfg.NumLoopExitsDeleted             | 497             | 20              | -477  | -95.98% | 95.98% |
| loop-simplifycfg.NumTerminatorsFolded            | 618             | 336             | -282  | -45.63% | 45.63% |
| loop-unroll.NumCompletelyUnrolled                | 11028           | 11032           | 4     | 0.04%   | 0.04%  |
| loop-unroll.NumUnrolled                          | 12608           | 12529           | -79   | -0.63%  | 0.63%  |
| mem2reg.NumDeadAlloca                            | 10222           | 10221           | -1    | -0.01%  | 0.01%  |
| mem2reg.NumPHIInsert                             | 192110          | 192106          | -4    | 0.00%   | 0.00%  |
| mem2reg.NumSingleStore                           | 637650          | 637643          | -7    | 0.00%   | 0.00%  |
| scalar-evolution.NumBruteForceTripCountsComputed | 814             | 812             | -2    | -0.25%  | 0.25%  |
| scalar-evolution.NumTripCountsComputed           | 283207          | 283029          | -178  | -0.06%  | 0.06%  |
| scalar-evolution.NumTripCountsNotComputed        | 106714          | 106724          | 10    | 0.01%   | 0.01%  |
| simple-loop-unswitch.NumBranches                 | 5178            | 4752            | -426  | -8.23%  | 8.23%  |
| simple-loop-unswitch.NumCostMultiplierSkipped    | 914             | 503             | -411  | -44.97% | 44.97% |
| simple-loop-unswitch.NumSwitches                 | 20              | 18              | -2    | -10.00% | 10.00% |
| simple-loop-unswitch.NumTrivial                  | 183             | 95              | -88   | -48.09% | 48.09% |
|

... but if we instead have LICM both before and after LoopRotate:

| statistic name                                  | LoopRotate-LICM   | LICM-LoopRotate-LICM   | Δ        | %        | \        | %\ |  |
| ----------------------------------------------- | ----------------- | ---------------------- | -------: | -------: | -------: |
| asm-printer.EmittedInsts                        | 9015528                 | 9014089                      | -1439    | -0.02%   | 0.02%    |
| indvars.NumElimCmp                              | 3536              | 3546                   | 10       | 0.28%    | 0.28%    |
| indvars.NumElimExt                              | 36724             | 36680                  | -44      | -0.12%   | 0.12%    |
| indvars.NumElimIV                               | 1197              | 1185                   | -12      | -1.00%   | 1.00%    |
| indvars.NumElimIdentity                         | 143               | 146                    | 3        | 2.10%    | 2.10%    |
| indvars.NumElimRem                              | 4                 | 5                      | 1        | 25.00%   | 25.00%   |
| indvars.NumLFTR                                 | 29841             | 29898                  | 57       | 0.19%    | 0.19%    |
| indvars.NumReplaced                             | 2293              | 2299                   | 6        | 0.26%    | 0.26%    |
| indvars.NumSimplifiedSDiv                       | 6                 | 8                      | 2        | 33.33%   | 33.33%   |
| indvars.NumWidened                              | 26437             | 26403                  | -34      | -0.13%   | 0.13%    |
| instcount.TotalBlocks                           | 1178323                 | 1173637                      | -4686    | -0.40%   | 0.40%    |
| instcount.TotalFuncs                            | 111826            | 111830                 | 4        | 0.00%    | 0.00%    |
| instcount.TotalInsts                            | 9905337                 | 9895348                      | -9989    | -0.10%   | 0.10%    |
| lcssa.NumLCSSA                                  | 425869            | 425371                 | -498     | -0.12%   | 0.12%    |
| licm.NumHoisted                                 | 378363            | 383358                 | 4995     | 1.32%    | 1.32%    |
| licm.NumMovedCalls                              | 2196              | 2208                   | 12       | 0.55%    | 0.55%    |
| licm.NumMovedLoads                              | 35900             | 35756                  | -144     | -0.40%   | 0.40%    |
| licm.NumPromoted                                | 11178             | 11163                  | -15      | -0.13%   | 0.13%    |
| licm.NumSunk                                    | 1878363                 | 2461967                      | 583604   | 31.07%   | 31.07%   |
| loop-delete.NumDeleted                          | 8547              | 8538                   | -9       | -0.11%   | 0.11%    |
| loop-instsimplify.NumSimplified                 | 12874             | 12039                  | -835     | -6.49%   | 6.49%    |
| loop-peel.NumPeeled                             | 1008              | 924                    | -84      | -8.33%   | 8.33%    |
| loop-rotate.NumNotRotatedDueToHeaderSize        | 368               | 365                    | -3       | -0.82%   | 0.82%    |
| loop-rotate.NumRotated                          | 42012             | 42002                  | -10      | -0.02%   | 0.02%    |
| loop-simplifycfg.NumLoopBlocksDeleted           | 240               | 241                    | 1        | 0.42%    | 0.42%    |
| loop-simplifycfg.NumTerminatorsFolded           | 618               | 619                    | 1        | 0.16%    | 0.16%    |
| loop-unroll.NumCompletelyUnrolled               | 11028             | 11029                  | 1        | 0.01%    | 0.01%    |
| loop-unroll.NumUnrolled                         | 12608             | 12525                  | -83      | -0.66%   | 0.66%    |
| mem2reg.NumPHIInsert                            | 192110            | 192073                 | -37      | -0.02%   | 0.02%    |
| mem2reg.NumSingleStore                          | 637650            | 637652                 | 2        | 0.00%    | 0.00%    |
| scalar-evolution.NumTripCountsComputed          | 283207            | 283097                 | -110     | -0.04%   | 0.04%    |
| scalar-evolution.NumTripCountsNotComputed       | 106714            | 106693                 | -21      | -0.02%   | 0.02%    |
| simple-loop-unswitch.NumBranches                | 5178              | 5185                   | 7        | 0.14%    | 0.14%    |
| simple-loop-unswitch.NumCostMultiplierSkipped   | 914               | 925                    | 11       | 1.20%    | 1.20%    |
| simple-loop-unswitch.NumTrivial                 | 183               | 179                    | -4       | -2.19%   | 2.19%    |
|

I.e. we end up with less instructions, more LICM activity (+30% more sunks out of loops!),
and less peeling.

This does have an observable compile-time regression of +~0.5% geomean
https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions
but i think that's basically nothing.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D99249

Files:
  llvm/lib/Passes/PassBuilder.cpp
  llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
  llvm/test/CodeGen/AMDGPU/opt-pipeline.ll
  llvm/test/Other/new-pm-defaults.ll
  llvm/test/Other/new-pm-thinlto-defaults.ll
  llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
  llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
  llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
  llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
  llvm/test/Other/opt-O2-pipeline.ll
  llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
  llvm/test/Other/opt-O3-pipeline.ll
  llvm/test/Other/opt-Os-pipeline.ll
  llvm/test/Other/pass-pipelines.ll
  llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll
  llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll
  llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll
  llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-invariant-loads.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D99249.332901.patch
Type: text/x-patch
Size: 44005 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210324/3ede84b1/attachment.bin>