[PATCH] D99249: [PassManager] Run additional LICM before LoopRotate
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 24 02:39:17 PDT 2021
lebedev.ri created this revision.
lebedev.ri added reviewers: thopre, fhahn, jeroen.dobbelaere, nikic, MaskRay, mkazantsev, reames, dmgreen, jdoerfert.
lebedev.ri added a project: LLVM.
Herald added subscribers: wenlei, kerbowa, steven_wu, hiraditya, nhaehnle, jvesely.
lebedev.ri requested review of this revision.
This is an alternative to D99204 <https://reviews.llvm.org/D99204>.
Better PhaseOrdering test TBD.
Loop rotation often has to perform code duplication
from header into preheader, which introduces PHI nodes.
In D99204 <https://reviews.llvm.org/D99204>, @thopre wrote:
> With loop peeling, it is important that unnecessary PHIs be avoided or
> it will leads to spurious peeling. One source of such PHIs is loop
> rotation which creates PHIs for invariant loads. Those PHIs are
> particularly problematic since loop peeling is now run as part of simple
> loop unrolling before GVN is run, and are thus a source of spurious
> peeling.
>
> Note that while some of the load can be hoisted and eventually
> eliminated by instruction combine, this is not always possible due to
> alignment issue. In particular, the motivating example [1] was a load
> inside a class instance which cannot be hoisted because the `this'
> pointer has an alignment of 1.
>
> [1] http://lists.llvm.org/pipermail/llvm-dev/attachments/20210312/4ce73c47/attachment.cpp
Now, we could enhance LoopRotate to avoid duplicating code when not needed,
but instead hoist loop-invariant code, but isn't that a code duplication? (*sic*)
We have LICM, and in fact we already run it right after LoopRotation.
We could try to move it to before LoopRotation,
that basically free from compile-time perspective:
https://llvm-compile-time-tracker.com/compare.php?from=6c93eb4477d88af046b915bc955c03693b2cbb58&to=a4bee6d07732b1184c436da489040b912f0dc271&stat=instructions
But, looking at stats, i think it isn't great that we no longer do LICM after LoopRotation, in particular:
| statistic name | LoopRotate-LICM | LICM-LoopRotate | Δ | % | abs(%) |
| asm-printer.EmittedInsts | 9015528 | 9015337 | -191 | 0.00% | 0.00% |
| indvars.NumElimCmp | 3536 | 3544 | 8 | 0.23% | 0.23% |
| indvars.NumElimExt | 36724 | 36578 | -146 | -0.40% | 0.40% |
| indvars.NumElimIdentity | 143 | 136 | -7 | -4.90% | 4.90% |
| indvars.NumElimIV | 1197 | 1187 | -10 | -0.84% | 0.84% |
| indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% |
| indvars.NumLFTR | 29841 | 29889 | 48 | 0.16% | 0.16% |
| indvars.NumReplaced | 2293 | 2225 | -68 | -2.97% | 2.97% |
| indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% |
| indvars.NumWidened | 26437 | 26327 | -110 | -0.42% | 0.42% |
| instcount.TotalBlocks | 1178323 | 1173825 | -4498 | -0.38% | 0.38% |
| instcount.TotalFuncs | 111826 | 111830 | 4 | 0.00% | 0.00% |
| instcount.TotalInsts | 9905337 | 9896028 | -9309 | -0.09% | 0.09% |
| lcssa.NumLCSSA | 425869 | 423982 | -1887 | -0.44% | 0.44% |
| licm.NumHoisted | 378363 | 378759 | 396 | 0.10% | 0.10% |
| licm.NumMovedCalls | 2196 | 2211 | 15 | 0.68% | 0.68% |
| licm.NumMovedLoads | 35900 | 31822 | -4078 | -11.36% | 11.36% |
| licm.NumPromoted | 11178 | 11154 | -24 | -0.21% | 0.21% |
| licm.NumSunk | 1878363 | 1876168 | -2195 | -0.12% | 0.12% |
| loop-delete.NumDeleted | 8547 | 8402 | -145 | -1.70% | 1.70% |
| loop-instsimplify.NumSimplified | 12874 | 11888 | -986 | -7.66% | 7.66% |
| loop-peel.NumPeeled | 1008 | 925 | -83 | -8.23% | 8.23% |
| loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% |
| loop-rotate.NumRotated | 42012 | 42000 | -12 | -0.03% | 0.03% |
| loop-simplifycfg.NumLoopBlocksDeleted | 240 | 242 | 2 | 0.83% | 0.83% |
| loop-simplifycfg.NumLoopExitsDeleted | 497 | 20 | -477 | -95.98% | 95.98% |
| loop-simplifycfg.NumTerminatorsFolded | 618 | 336 | -282 | -45.63% | 45.63% |
| loop-unroll.NumCompletelyUnrolled | 11028 | 11032 | 4 | 0.04% | 0.04% |
| loop-unroll.NumUnrolled | 12608 | 12529 | -79 | -0.63% | 0.63% |
| mem2reg.NumDeadAlloca | 10222 | 10221 | -1 | -0.01% | 0.01% |
| mem2reg.NumPHIInsert | 192110 | 192106 | -4 | 0.00% | 0.00% |
| mem2reg.NumSingleStore | 637650 | 637643 | -7 | 0.00% | 0.00% |
| scalar-evolution.NumBruteForceTripCountsComputed | 814 | 812 | -2 | -0.25% | 0.25% |
| scalar-evolution.NumTripCountsComputed | 283207 | 283029 | -178 | -0.06% | 0.06% |
| scalar-evolution.NumTripCountsNotComputed | 106714 | 106724 | 10 | 0.01% | 0.01% |
| simple-loop-unswitch.NumBranches | 5178 | 4752 | -426 | -8.23% | 8.23% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 503 | -411 | -44.97% | 44.97% |
| simple-loop-unswitch.NumSwitches | 20 | 18 | -2 | -10.00% | 10.00% |
| simple-loop-unswitch.NumTrivial | 183 | 95 | -88 | -48.09% | 48.09% |
|
... but if we instead have LICM both before and after LoopRotate:
| statistic name | LoopRotate-LICM | LICM-LoopRotate-LICM | Δ | % | \ | %\ | |
| ----------------------------------------------- | ----------------- | ---------------------- | -------: | -------: | -------: |
| asm-printer.EmittedInsts | 9015528 | 9014089 | -1439 | -0.02% | 0.02% |
| indvars.NumElimCmp | 3536 | 3546 | 10 | 0.28% | 0.28% |
| indvars.NumElimExt | 36724 | 36680 | -44 | -0.12% | 0.12% |
| indvars.NumElimIV | 1197 | 1185 | -12 | -1.00% | 1.00% |
| indvars.NumElimIdentity | 143 | 146 | 3 | 2.10% | 2.10% |
| indvars.NumElimRem | 4 | 5 | 1 | 25.00% | 25.00% |
| indvars.NumLFTR | 29841 | 29898 | 57 | 0.19% | 0.19% |
| indvars.NumReplaced | 2293 | 2299 | 6 | 0.26% | 0.26% |
| indvars.NumSimplifiedSDiv | 6 | 8 | 2 | 33.33% | 33.33% |
| indvars.NumWidened | 26437 | 26403 | -34 | -0.13% | 0.13% |
| instcount.TotalBlocks | 1178323 | 1173637 | -4686 | -0.40% | 0.40% |
| instcount.TotalFuncs | 111826 | 111830 | 4 | 0.00% | 0.00% |
| instcount.TotalInsts | 9905337 | 9895348 | -9989 | -0.10% | 0.10% |
| lcssa.NumLCSSA | 425869 | 425371 | -498 | -0.12% | 0.12% |
| licm.NumHoisted | 378363 | 383358 | 4995 | 1.32% | 1.32% |
| licm.NumMovedCalls | 2196 | 2208 | 12 | 0.55% | 0.55% |
| licm.NumMovedLoads | 35900 | 35756 | -144 | -0.40% | 0.40% |
| licm.NumPromoted | 11178 | 11163 | -15 | -0.13% | 0.13% |
| licm.NumSunk | 1878363 | 2461967 | 583604 | 31.07% | 31.07% |
| loop-delete.NumDeleted | 8547 | 8538 | -9 | -0.11% | 0.11% |
| loop-instsimplify.NumSimplified | 12874 | 12039 | -835 | -6.49% | 6.49% |
| loop-peel.NumPeeled | 1008 | 924 | -84 | -8.33% | 8.33% |
| loop-rotate.NumNotRotatedDueToHeaderSize | 368 | 365 | -3 | -0.82% | 0.82% |
| loop-rotate.NumRotated | 42012 | 42002 | -10 | -0.02% | 0.02% |
| loop-simplifycfg.NumLoopBlocksDeleted | 240 | 241 | 1 | 0.42% | 0.42% |
| loop-simplifycfg.NumTerminatorsFolded | 618 | 619 | 1 | 0.16% | 0.16% |
| loop-unroll.NumCompletelyUnrolled | 11028 | 11029 | 1 | 0.01% | 0.01% |
| loop-unroll.NumUnrolled | 12608 | 12525 | -83 | -0.66% | 0.66% |
| mem2reg.NumPHIInsert | 192110 | 192073 | -37 | -0.02% | 0.02% |
| mem2reg.NumSingleStore | 637650 | 637652 | 2 | 0.00% | 0.00% |
| scalar-evolution.NumTripCountsComputed | 283207 | 283097 | -110 | -0.04% | 0.04% |
| scalar-evolution.NumTripCountsNotComputed | 106714 | 106693 | -21 | -0.02% | 0.02% |
| simple-loop-unswitch.NumBranches | 5178 | 5185 | 7 | 0.14% | 0.14% |
| simple-loop-unswitch.NumCostMultiplierSkipped | 914 | 925 | 11 | 1.20% | 1.20% |
| simple-loop-unswitch.NumTrivial | 183 | 179 | -4 | -2.19% | 2.19% |
|
I.e. we end up with less instructions, more LICM activity (+30% more sunks out of loops!),
and less peeling.
This does have an observable compile-time regression of +~0.5% geomean
https://llvm-compile-time-tracker.com/compare.php?from=7c5222e4d1a3a14f029e5f614c9aefd0fa505f1e&to=5d81826c3411982ca26e46b9d0aff34c80577664&stat=instructions
but i think that's basically nothing.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D99249
Files:
llvm/lib/Passes/PassBuilder.cpp
llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
llvm/test/CodeGen/AMDGPU/opt-pipeline.ll
llvm/test/Other/new-pm-defaults.ll
llvm/test/Other/new-pm-thinlto-defaults.ll
llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
llvm/test/Other/opt-O2-pipeline.ll
llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
llvm/test/Other/opt-O3-pipeline.ll
llvm/test/Other/opt-Os-pipeline.ll
llvm/test/Other/pass-pipelines.ll
llvm/test/Transforms/IndVarSimplify/X86/pr45360.ll
llvm/test/Transforms/PhaseOrdering/X86/vdiv.ll
llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll
llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-invariant-loads.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D99249.332901.patch
Type: text/x-patch
Size: 44005 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210324/3ede84b1/attachment.bin>
More information about the llvm-commits
mailing list