[PATCH] D112851: [PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passes

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 29 15:40:21 PDT 2021


lebedev.ri created this revision.
lebedev.ri added reviewers: aeubanks, asbirlea, reames, mkazantsev, fhahn, jdoerfert, nikic.
lebedev.ri added a project: LLVM.
Herald added subscribers: ormris, wenlei, steven_wu, javed.absar, hiraditya.
lebedev.ri requested review of this revision.

Test thanks to Michael Kuklinski from `#llvm`: https://godbolt.org/z/bdrah5Goo
originally inspired by Daniel Lemire's https://lemire.me/blog/2021/10/26/in-c-is-empty-faster-than-comparing-the-size-with-zero/

We manage to deduce that the answer does not require looping,
but we do that after the last `LoopDeletion` pass run,
so we end up being stuck with a dead loop.

Now, as with all things SCEV, this has a very expected TBD compile time performance regression:
https://llvm-compile-time-tracker.com/compare.php?from=0ae7bf124a9bca76dd9a91b2f7379168ff13f562&to=c2ae57c9b961aeb4a28c747266949340613a6d84&stat=instructions

Looking at the transformation stats over vanilla test-suite, i think it's rather expected:

  | statistic name                                   |  baseline |  proposed |     Δ |      % |    |%| |
  |--------------------------------------------------|----------:|----------:|------:|-------:|-------:|
  | scalar-evolution.NumBruteForceTripCountsComputed |       789 |       888 |    99 | 12.55% | 12.55% |
  | scalar-evolution.NumTripCountsNotComputed        |    105592 |    117900 | 12308 | 11.66% | 11.66% |
  | loop-delete.NumBackedgesBroken                   |       542 |       559 |    17 |  3.14% |  3.14% |
  | regalloc.numExtends                              |        81 |        79 |    -2 | -2.47% |  2.47% |
  | indvars.NumFoldedUser                            |       408 |       400 |    -8 | -1.96% |  1.96% |
  | indvars.NumElimCmp                               |      3831 |      3758 |   -73 | -1.91% |  1.91% |
  | scalar-evolution.NumTripCountsComputed           |    299759 |    304278 |  4519 |  1.51% |  1.51% |
  | loop-delete.NumDeleted                           |      8055 |      8128 |    73 |  0.91% |  0.91% |
  | machine-cse.NumCommutes                          |       111 |       110 |    -1 | -0.90% |  0.90% |
  | globaldce.NumFunctions                           |      1187 |      1192 |     5 |  0.42% |  0.42% |
  | codegenprepare.NumSelectsExpanded                |       277 |       278 |     1 |  0.36% |  0.36% |
  | loop-unroll.NumRuntimeUnrolled                   |     13841 |     13791 |   -50 | -0.36% |  0.36% |
  | machinelicm.NumPostRAHoisted                     |      1168 |      1172 |     4 |  0.34% |  0.34% |
  | phi-node-elimination.NumCriticalEdgesSplit       |     83054 |     82879 |  -175 | -0.21% |  0.21% |
  | machine-cse.NumPREs                              |      3085 |      3079 |    -6 | -0.19% |  0.19% |
  | branch-folder.NumBranchOpts                      |    108122 |    107942 |  -180 | -0.17% |  0.17% |
  | loop-unroll.NumUnrolled                          |     40136 |     40067 |   -69 | -0.17% |  0.17% |
  | branch-folder.NumDeadBlocks                      |    130818 |    130607 |  -211 | -0.16% |  0.16% |
  | codegenprepare.NumBlocksElim                     |     92856 |     92714 |  -142 | -0.15% |  0.15% |
  | instsimplify.NumSimplified                       |    103263 |    103129 |  -134 | -0.13% |  0.13% |
  | instcombine.NumConstProp                         |     26070 |     26102 |    32 |  0.12% |  0.12% |
  | instsimplify.NumExpand                           |      1716 |      1718 |     2 |  0.12% |  0.12% |
  | loop-unroll.NumCompletelyUnrolled                |      9236 |      9225 |   -11 | -0.12% |  0.12% |
  | branch-folder.NumHoist                           |      2773 |      2770 |    -3 | -0.11% |  0.11% |
  | regalloc.NumReloadsRemoved                       |     10822 |     10834 |    12 |  0.11% |  0.11% |
  | regalloc.NumSnippets                             |     11394 |     11406 |    12 |  0.11% |  0.11% |
  | machine-cse.NumCrossBBCSEs                       |      1052 |      1053 |     1 |  0.10% |  0.10% |
  | machinelicm.NumCSEed                             |     99887 |     99784 |  -103 | -0.10% |  0.10% |
  | branch-folder.NumTailMerge                       |     72501 |     72435 |   -66 | -0.09% |  0.09% |
  | codegenprepare.NumExtUses                        |     22007 |     21987 |   -20 | -0.09% |  0.09% |
  | local.NumRemoved                                 |     68232 |     68294 |    62 |  0.09% |  0.09% |
  | loop-vectorize.LoopsAnalyzed                     |     75483 |     75413 |   -70 | -0.09% |  0.09% |

Note that i'm only changing current PM, and not touching obsolete PM.

This is an alternative to the function simplification pipeline variant of the same change, D112840 <https://reviews.llvm.org/D112840>.
It has both less compile time impact (since the additional number of SCEV trip count calculations
is way lass less than with the D112840 <https://reviews.llvm.org/D112840>), and it is much more powerful/impactful (almost 2x more loops deleted).
I have checked, and doing this after loop rotation is favorable (more loops deleted).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D112851

Files:
  llvm/lib/Passes/PassBuilderPipelines.cpp
  llvm/test/Other/new-pm-defaults.ll
  llvm/test/Other/new-pm-thinlto-defaults.ll
  llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
  llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
  llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D112851.383516.patch
Type: text/x-patch
Size: 8862 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211029/30890a60/attachment.bin>


More information about the llvm-commits mailing list