[llvm] r293241 - [PM] Flesh out almost all of the late loop passes.

Thu Jan 26 16:50:21 PST 2017

Author: chandlerc
Date: Thu Jan 26 18:50:21 2017
New Revision: 293241

URL: http://llvm.org/viewvc/llvm-project?rev=293241&view=rev
Log:
[PM] Flesh out almost all of the late loop passes.

With this the per-module pass pipeline is *extremely* close to the
legacy PM. The missing pieces are:
- PruneEH (or some equivalent)
- ArgumentPromotion
- LoopLoadElimination
- LoopUnswitch

I'm going to work through those in essentially that order but this seems
like a worthwhile incremental step toward the end state.

One difference in what I have here from the legacy PM is that I've
consolidated some of the per-function passes at the very end of the
pipeline into the main optimization function pipeline. The intervening
passes are *really* uninteresting and so this seems very likely to have
any effect other than minor improvement to locality.

Note that there are still some failures in the test suite, but the
compiler doesn't crash or assert.

Differential Revision: https://reviews.llvm.org/D29114

Modified:
    llvm/trunk/lib/Passes/PassBuilder.cpp
    llvm/trunk/test/Other/new-pm-defaults.ll

Modified: llvm/trunk/lib/Passes/PassBuilder.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Passes/PassBuilder.cpp?rev=293241&r1=293240&r2=293241&view=diff
==============================================================================

--- llvm/trunk/lib/Passes/PassBuilder.cpp (original)
+++ llvm/trunk/lib/Passes/PassBuilder.cpp Thu Jan 26 18:50:21 2017
@@ -491,32 +491,60 @@ PassBuilder::buildPerModuleDefaultPipeli
   // Optimize the loop execution. These passes operate on entire loop nests
   // rather than on each loop in an inside-out manner, and so they are actually
   // function passes.
+
+  // First rotate loops that may have been un-rotated by prior passes.
+  OptimizePM.addPass(createFunctionToLoopPassAdaptor(LoopRotatePass()));
+
+  // Distribute loops to allow partial vectorization.  I.e. isolate dependences
+  // into separate loop that would otherwise inhibit vectorization.  This is
+  // currently only performed for loops marked with the metadata
+  // llvm.loop.distribute=true or when -enable-loop-distribute is specified.
   OptimizePM.addPass(LoopDistributePass());
+
+  // Now run the core loop vectorizer.
   OptimizePM.addPass(LoopVectorizePass());
+
   // FIXME: Need to port Loop Load Elimination and add it here.
+
+  // Cleanup after the loop optimization passes.
   OptimizePM.addPass(InstCombinePass());
 
+
+  // Now that we've formed fast to execute loop structures, we do further
+  // optimizations. These are run afterward as they might block doing complex
+  // analyses and transforms such as what are needed for loop vectorization.
+
   // Optimize parallel scalar instruction chains into SIMD instructions.
   OptimizePM.addPass(SLPVectorizerPass());
 
-  // Cleanup after vectorizers.
+  // Cleanup after all of the vectorizers.
   OptimizePM.addPass(SimplifyCFGPass());
   OptimizePM.addPass(InstCombinePass());
 
   // Unroll small loops to hide loop backedge latency and saturate any parallel
-  // execution resources of an out-of-order processor.
-  // FIXME: Need to add once loop pass pipeline is available.
-
-  // FIXME: Add the loop sink pass when ported.
-
-  // FIXME: Add cleanup from the loop pass manager when we're forming LCSSA
-  // here.
+  // execution resources of an out-of-order processor. We also then need to
+  // clean up redundancies and loop invariant code.
+  // FIXME: It would be really good to use a loop-integrated instruction
+  // combiner for cleanup here so that the unrolling and LICM can be pipelined
+  // across the loop nests.
+  OptimizePM.addPass(createFunctionToLoopPassAdaptor(LoopUnrollPass::create()));
+  OptimizePM.addPass(InstCombinePass());
+  OptimizePM.addPass(createFunctionToLoopPassAdaptor(LICMPass()));
 
   // Now that we've vectorized and unrolled loops, we may have more refined
   // alignment information, try to re-derive it here.
   OptimizePM.addPass(AlignmentFromAssumptionsPass());
 
-  // ADd the core optimizing pipeline.
+  // LoopSink pass sinks instructions hoisted by LICM, which serves as a
+  // canonicalization pass that enables other optimizations. As a result,
+  // LoopSink pass needs to be a very late IR pass to avoid undoing LICM
+  // result too early.
+  OptimizePM.addPass(LoopSinkPass());
+
+  // And finally clean up LCSSA form before generating code.
+  OptimizePM.addPass(InstSimplifierPass());
+
+  // Add the core optimizing pipeline.
   MPM.addPass(createModuleToFunctionPassAdaptor(std::move(OptimizePM)));
 
   // Now we need to do some global optimization transforms.

Modified: llvm/trunk/test/Other/new-pm-defaults.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Other/new-pm-defaults.ll?rev=293241&r1=293240&r2=293241&view=diff
==============================================================================
--- llvm/trunk/test/Other/new-pm-defaults.ll (original)
+++ llvm/trunk/test/Other/new-pm-defaults.ll Thu Jan 26 18:50:21 2017
@@ -129,6 +129,7 @@
 ; CHECK-O-NEXT: Running pass: ModuleToFunctionPassAdaptor<{{.*}}PassManager{{.*}}>
 ; CHECK-O-NEXT: Starting llvm::Function pass manager run.
 ; CHECK-O-NEXT: Running pass: Float2IntPass
+; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LoopRotatePass
 ; CHECK-O-NEXT: Running pass: LoopDistributePass
 ; CHECK-O-NEXT: Running pass: LoopVectorizePass
 ; CHECK-O-NEXT: Running analysis: BlockFrequencyAnalysis
@@ -137,7 +138,12 @@
 ; CHECK-O-NEXT: Running pass: SLPVectorizerPass
 ; CHECK-O-NEXT: Running pass: SimplifyCFGPass
 ; CHECK-O-NEXT: Running pass: InstCombinePass
+; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LoopUnrollPass
+; CHECK-O-NEXT: Running pass: InstCombinePass
+; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
 ; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
+; CHECK-O-NEXT: Running pass: LoopSinkPass
+; CHECK-O-NEXT: Running pass: InstSimplifierPass
 ; CHECK-O-NEXT: Finished llvm::Function pass manager run.
 ; CHECK-O-NEXT: Running pass: GlobalDCEPass
 ; CHECK-O-NEXT: Running pass: ConstantMergePass