[LLVMdev] Where in the pass pipeline run the loop optimizer?

Wed Dec 3 00:43:53 PST 2014

Maybe a stupid question, but what does lnt say when you run it on both
positions (if possible)?

On 12/03, Tobias Grosser wrote:
> On 30.11.2014 15:05, Hongbin Zheng wrote:
> >What should us consider when we try to determinate the point to run Polly?
> 
> I think we could benefit from some community feedback here.
> 
> Some context:
> 
> Since a couple of weeks we focus on tuning Polly to minimize
> compile time overhead. One important case for us is to minimize the compile
> time impact of Polly in case no loop optimizations are performed. One item
> that we would like to reconsider here is the position in the LLVM pass order
> where we execute Polly.
> 
> Currently, when '-O3 -polly' is run, we first schedule nine canonicalization
> passes (taken from the -O3 pass pipeline) to prepare the code for Polly,
> then run Polly itself and after this we run the full unmodified -O3
> sequence.
> 
> The reason for this was that we wanted full control on how we canonicalize
> the code for Polly and we also wanted to run the _entire_
> LLVM optimization chain to be sure the code is well optimized after. The
> reason for this choice was mostly to be able to experiment
> with loop optimizations without being constrained by compile time or
> pass order concerns.
> 
> This pass order is obviously not optimal due to the inherent compile time
> increase it implies and also because we do not currently run the inliner in
> our canonicalizer.
> 
> Hence, we would like to place the Polly optimizer inside the standard -O3
> optimization sequence. The question is now what would be the optimal
> location.
> 
> Here some considerations:
> 
> 1) After the IR has been canonicalized
> 
> This should remove the need for most of our -polly-canonicalize passes
> and eliminate the corresponding compile time cost.
> 
> 2) After inlining has been performed.
> 
> Inlining is essential to optimize C++ heavy code such as boost::ublas.
> 
> 3) Before the Loop/SLP vectorizer
> 
> We want to ensure that Polly is run at a position, where the vectorizer
> can continue to optimize the code generated by Polly (and to benefit from
> the parallelism, alias information and loop optimizations we performed).
> 
> 4) Minimize interactions with the inliner
> 
> At best Polly is run after all inlining has happened, such that Polly always
> sees the largest possible loop nest. If we optimize too early
> we may optimize code that later is inlined in a larger loop nest where
> the earlier optimization hinders the optimization of the larger loop nest.
> 
> Some open questions:
> 
> 5) Run it before / after LICM?
> 
> LICM may make more loop bounds analyzable by making the parameters in the
> loop bounds loop invariant.
> 
> On the other side, running it may introduce unnecessary scalar data
> dependences, which need to be eliminated again (Polly does not do this
> yet).
> 
> 6) Run it before loop rotate / loop unswitch?
> 
> Both of these passes introduce copies of code which Polly does not really
> benefit from. (Polly may e.g prove that a loop is executed at
> least once, such that the copy introduced by loop-rotate is not necessary
> any more). Hence running them before Polly may unnecessarily complicating
> the loop structure. On the other side, those passes may be helpful for LICM.
> 
> I currently have been considering two locations:
> 
> Option 1: Before the loop optimization passes
> 
> =============================================
> 
>   MPM.add(createReassociatePass()); // Reassociate expressions
> 
> << Add Polly before the loop optimization passes >>
> 
> // Or better call it EP_LoopCanonicalizerStart?
> + addExtensionsToPM(EP_LoopOptimizerStart, MPM);
> 
> 
>   MPM.add(createLoopRotatePass()); // Rotate Loop
>   MPM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3));
>   MPM.add(createInstructionCombiningPass());
>   MPM.add(createIndVarSimplifyPass());        // Canonicalize indvars
>   MPM.add(createLoopIdiomPass());             // Recognize idioms like
>                                                  memset.
>   MPM.add(createLoopDeletionPass());          // Delete dead loops
> 
> 
> Option 2: Right before the loop vectorizer
> =====================================
> 
>   MPM.add(createBarrierNoopPass());
> 
> << Add Polly at the beginning of the target specific optimizations >>
> + addExtensionsToPM(EP_TargetOptsStart, MPM);
> 
>   // Re-rotate loops in all our loop nests. These may have fallout out
>   // of rotated form due to GVN or other transformations, and the
>   // vectorizer relies on the rotated form.
>   if (ExtraVectorizerPasses)
>     MPM.add(createLoopRotatePass());
> 
>   MPM.add(createLoopVectorizePass(DisableUnrollLoops,
>         LoopVectorize));
> 
> Originally I was thinking of 'Option 1'. It is after the scalar
> optimization passes, but before LICM and LoopRotate. However, this location
> seems still to be still somehow a canonicalization phase. 'Option 2' places
> Polly after all canonicalization (is the inliner somehow finished at this
> point?) and right at the beginning of the target specific optimizations. It
> seems somehow preferable to me at the moment.
> 
> Does anybody have opinions on this?
> 
> Cheers,
> Tobias
> 
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-- 

Johannes Doerfert
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.26

Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://www.cdl.uni-saarland.de/people/doerfert
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 213 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141203/8c726950/attachment.sig>