[LLVMdev] Where in the pass pipeline run the loop optimizer?
doerfert at cs.uni-saarland.de
Wed Dec 3 00:43:53 PST 2014
Maybe a stupid question, but what does lnt say when you run it on both
positions (if possible)?
On 12/03, Tobias Grosser wrote:
> On 30.11.2014 15:05, Hongbin Zheng wrote:
> >What should us consider when we try to determinate the point to run Polly?
> I think we could benefit from some community feedback here.
> Some context:
> Since a couple of weeks we focus on tuning Polly to minimize
> compile time overhead. One important case for us is to minimize the compile
> time impact of Polly in case no loop optimizations are performed. One item
> that we would like to reconsider here is the position in the LLVM pass order
> where we execute Polly.
> Currently, when '-O3 -polly' is run, we first schedule nine canonicalization
> passes (taken from the -O3 pass pipeline) to prepare the code for Polly,
> then run Polly itself and after this we run the full unmodified -O3
> The reason for this was that we wanted full control on how we canonicalize
> the code for Polly and we also wanted to run the _entire_
> LLVM optimization chain to be sure the code is well optimized after. The
> reason for this choice was mostly to be able to experiment
> with loop optimizations without being constrained by compile time or
> pass order concerns.
> This pass order is obviously not optimal due to the inherent compile time
> increase it implies and also because we do not currently run the inliner in
> our canonicalizer.
> Hence, we would like to place the Polly optimizer inside the standard -O3
> optimization sequence. The question is now what would be the optimal
> Here some considerations:
> 1) After the IR has been canonicalized
> This should remove the need for most of our -polly-canonicalize passes
> and eliminate the corresponding compile time cost.
> 2) After inlining has been performed.
> Inlining is essential to optimize C++ heavy code such as boost::ublas.
> 3) Before the Loop/SLP vectorizer
> We want to ensure that Polly is run at a position, where the vectorizer
> can continue to optimize the code generated by Polly (and to benefit from
> the parallelism, alias information and loop optimizations we performed).
> 4) Minimize interactions with the inliner
> At best Polly is run after all inlining has happened, such that Polly always
> sees the largest possible loop nest. If we optimize too early
> we may optimize code that later is inlined in a larger loop nest where
> the earlier optimization hinders the optimization of the larger loop nest.
> Some open questions:
> 5) Run it before / after LICM?
> LICM may make more loop bounds analyzable by making the parameters in the
> loop bounds loop invariant.
> On the other side, running it may introduce unnecessary scalar data
> dependences, which need to be eliminated again (Polly does not do this
> 6) Run it before loop rotate / loop unswitch?
> Both of these passes introduce copies of code which Polly does not really
> benefit from. (Polly may e.g prove that a loop is executed at
> least once, such that the copy introduced by loop-rotate is not necessary
> any more). Hence running them before Polly may unnecessarily complicating
> the loop structure. On the other side, those passes may be helpful for LICM.
> I currently have been considering two locations:
> Option 1: Before the loop optimization passes
> MPM.add(createReassociatePass()); // Reassociate expressions
> << Add Polly before the loop optimization passes >>
> // Or better call it EP_LoopCanonicalizerStart?
> + addExtensionsToPM(EP_LoopOptimizerStart, MPM);
> MPM.add(createLoopRotatePass()); // Rotate Loop
> MPM.add(createLoopUnswitchPass(SizeLevel || OptLevel < 3));
> MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars
> MPM.add(createLoopIdiomPass()); // Recognize idioms like
> MPM.add(createLoopDeletionPass()); // Delete dead loops
> Option 2: Right before the loop vectorizer
> << Add Polly at the beginning of the target specific optimizations >>
> + addExtensionsToPM(EP_TargetOptsStart, MPM);
> // Re-rotate loops in all our loop nests. These may have fallout out
> // of rotated form due to GVN or other transformations, and the
> // vectorizer relies on the rotated form.
> if (ExtraVectorizerPasses)
> Originally I was thinking of 'Option 1'. It is after the scalar
> optimization passes, but before LICM and LoopRotate. However, this location
> seems still to be still somehow a canonicalization phase. 'Option 2' places
> Polly after all canonicalization (is the inliner somehow finished at this
> point?) and right at the beginning of the target specific optimizations. It
> seems somehow preferable to me at the moment.
> Does anybody have opinions on this?
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
Researcher / PhD Student
Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.26
Tel. +49 (0)681 302-57521 : doerfert at cs.uni-saarland.de
Fax. +49 (0)681 302-3065 : http://www.cdl.uni-saarland.de/people/doerfert
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 213 bytes
Desc: not available
More information about the llvm-dev