[all-commits] [llvm/llvm-project] 9c2469: [PassManager] `buildModuleOptimizationPipeline()`:...
Roman Lebedev via All-commits
all-commits at lists.llvm.org
Wed Nov 3 09:25:39 PDT 2021
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 9c2469c1ddb34517de8dafd83d1940deada3fc22
https://github.com/llvm/llvm-project/commit/9c2469c1ddb34517de8dafd83d1940deada3fc22
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-11-03 (Wed, 03 Nov 2021)
Changed paths:
M llvm/lib/Passes/PassBuilderPipelines.cpp
M llvm/test/Other/new-pm-defaults.ll
M llvm/test/Other/new-pm-thinlto-defaults.ll
M llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
M llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
M llvm/test/Transforms/PhaseOrdering/deletion-of-loops-that-became-side-effect-free.ll
Log Message:
-----------
[PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passes
Test thanks to Michael Kuklinski from `#llvm`: https://godbolt.org/z/bdrah5Goo
originally inspired by Daniel Lemire's https://lemire.me/blog/2021/10/26/in-c-is-empty-faster-than-comparing-the-size-with-zero/
We manage to deduce that the answer does not require looping,
but we do that after the last `LoopDeletion` pass run,
so we end up being stuck with a dead loop.
Now, as with all things SCEV, this has
a very expected ~`+0.12%` compile time performance regression:
https://llvm-compile-time-tracker.com/compare.php?from=0ae7bf124a9bca76dd9a91b2f7379168ff13f562&to=c2ae57c9b961aeb4a28c747266949340613a6d84&stat=instructions
(for comparison, doing that in function simplification pipeline
would have been ~`+0.5` compile time performance regression, D112840)
Looking at the transformation stats over vanilla test-suite, i think it's rather expected:
```
| statistic name | baseline | proposed | Δ | % | |%| |
|--------------------------------------------------|----------:|----------:|------:|-------:|-------:|
| scalar-evolution.NumBruteForceTripCountsComputed | 789 | 888 | 99 | 12.55% | 12.55% |
| scalar-evolution.NumTripCountsNotComputed | 105592 | 117900 | 12308 | 11.66% | 11.66% |
| loop-delete.NumBackedgesBroken | 542 | 559 | 17 | 3.14% | 3.14% |
| regalloc.numExtends | 81 | 79 | -2 | -2.47% | 2.47% |
| indvars.NumFoldedUser | 408 | 400 | -8 | -1.96% | 1.96% |
| indvars.NumElimCmp | 3831 | 3758 | -73 | -1.91% | 1.91% |
| scalar-evolution.NumTripCountsComputed | 299759 | 304278 | 4519 | 1.51% | 1.51% |
| loop-delete.NumDeleted | 8055 | 8128 | 73 | 0.91% | 0.91% |
| machine-cse.NumCommutes | 111 | 110 | -1 | -0.90% | 0.90% |
| globaldce.NumFunctions | 1187 | 1192 | 5 | 0.42% | 0.42% |
| codegenprepare.NumSelectsExpanded | 277 | 278 | 1 | 0.36% | 0.36% |
| loop-unroll.NumRuntimeUnrolled | 13841 | 13791 | -50 | -0.36% | 0.36% |
| machinelicm.NumPostRAHoisted | 1168 | 1172 | 4 | 0.34% | 0.34% |
| phi-node-elimination.NumCriticalEdgesSplit | 83054 | 82879 | -175 | -0.21% | 0.21% |
| machine-cse.NumPREs | 3085 | 3079 | -6 | -0.19% | 0.19% |
| branch-folder.NumBranchOpts | 108122 | 107942 | -180 | -0.17% | 0.17% |
| loop-unroll.NumUnrolled | 40136 | 40067 | -69 | -0.17% | 0.17% |
| branch-folder.NumDeadBlocks | 130818 | 130607 | -211 | -0.16% | 0.16% |
| codegenprepare.NumBlocksElim | 92856 | 92714 | -142 | -0.15% | 0.15% |
| instsimplify.NumSimplified | 103263 | 103129 | -134 | -0.13% | 0.13% |
| instcombine.NumConstProp | 26070 | 26102 | 32 | 0.12% | 0.12% |
| instsimplify.NumExpand | 1716 | 1718 | 2 | 0.12% | 0.12% |
| loop-unroll.NumCompletelyUnrolled | 9236 | 9225 | -11 | -0.12% | 0.12% |
| branch-folder.NumHoist | 2773 | 2770 | -3 | -0.11% | 0.11% |
| regalloc.NumReloadsRemoved | 10822 | 10834 | 12 | 0.11% | 0.11% |
| regalloc.NumSnippets | 11394 | 11406 | 12 | 0.11% | 0.11% |
| machine-cse.NumCrossBBCSEs | 1052 | 1053 | 1 | 0.10% | 0.10% |
| machinelicm.NumCSEed | 99887 | 99784 | -103 | -0.10% | 0.10% |
| branch-folder.NumTailMerge | 72501 | 72435 | -66 | -0.09% | 0.09% |
| codegenprepare.NumExtUses | 22007 | 21987 | -20 | -0.09% | 0.09% |
| local.NumRemoved | 68232 | 68294 | 62 | 0.09% | 0.09% |
| loop-vectorize.LoopsAnalyzed | 75483 | 75413 | -70 | -0.09% | 0.09% |
```
Note that i'm only changing current PM, and not touching obsolete PM.
This is an alternative to the function simplification pipeline variant
of the same change, D112840. It has both less compile time impact
(since the additional number of SCEV trip count calculations
is way lass less than with the D112840), and it is
much more powerful/impactful (almost 2x more loops deleted).
I have checked, and doing this after loop rotation
is favorable (more loops deleted).
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D112851
More information about the All-commits
mailing list