[LLVMdev] Optimization passes organization and tradeoffs

Wed May 21 13:48:57 PDT 2008

On Wed, 21 May 2008, Nicolas Capens wrote:
> Thanks for the detailed explanations. I have a few remaining questions:
>
> Am I correct that ScalarReplAggregates is hardly more expensive than Mem2Reg
> and therefore generally preferable?

Right.

> What would be the code quality implications of using "-dce -simplifycfg"
> instead of -adce? As far as I understand the algorithms involved, -dce would
> hardly ever miss a dead instruction if dead stores have already been
> eliminated, right?

-dce -simplifycfg should be basically the same as adce.  ADCE was recently 
changed to not delete output-free loops.  That was the major advantage it 
had over simplifycfg.  We now have an explicit loop deletion pass as part 
of the optimizer.

> For my application it is safe to reduce x+0.0 to x. I checked and my current
> set of passes keeps the add. Is there a way to control the floating-point
> model in some way? Lots of compilers can select 'strict', 'precise', 'fast',
> etc. Even adding and then subtracting the same value, which almost never
> results in the original value, is almost always safe to eliminate for my
> app...

You can tell the code generator that it is safe to do this (see 
llvm/Target/TargetOptions.h), but there isn't currently a way to tell the 
optimizers about it.  My tentative plan is to split "add" into 
"fadd"/"iadd" and then make the "fp" operations have some sort of flags to 
parameterize the optimizations allowed for them.  This would allow us to 
propagate down the equivalent of GCC's -ffast-math into the IR.

> What's the difference between GVN and GCSE, if they both perform common
> subexpression elimination?

GVN does more, and is a better algorithm.  GCSE is basically deprecated 
and should be removed at some point.

> So far LLVM is working great for me. I'm impressed by the overall
> architecture and the fair learning curve. I'm generating complex but
> well-optimized code in a matter of days. I haven't found any problems in the
> generated code yet, but I'll let you know as soon as I notice a glitch.

Awesome!

-Chris

> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
> Behalf Of Chris Lattner
> Sent: Tuesday, 20 May, 2008 20:28
> To: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] Optimization passes organization and tradeoffs
>
>
> On May 20, 2008, at 8:57 AM, David Greene wrote:
>
>> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote:
>>
>>> 1) Does ScalarReplAggregates totally superscede
>>> PromoteMemoryToRegister? I
>>
>> Nope, they are different.  Mem2Reg is really important if you want
>> register
>> allocation.
>
> Actually SROA does fully subsume Mem2Reg. It iterates between breaking
> up aggregates and promoting them to registers.
>
>>> think I need it to optimize small arrays, but what is the expected
>>> added
>>> complexity?
>>
>> I shouldn't think it would be very expensive at all.
>
> Agreed.
>
>>> 2) Does SCCP also eliminate multiplying/dividing by 1 and
>>> adding/subtracting 0?
>>
>> That's probably more the purview of instcombine.
>
> Right.
>
>>> 4) Is DeadStoreElimination still necessary when we have
>>> AggressiveDCE?
>>
>> Probably, but I'll let others give the definitive answer.
>
> They are orthogonal.  DSE touches stores, AggressiveDCE does mostly
> scalars.  ADCE basically assumes all stores are live.
>
>>> 5) What are the tradeoffs between the different dead code elimination
>>> variants (why not always use the aggressive one)?
>>
>> Others can speak to this.
>
> ADCE is actually quite expensive from the compile-time perspective,
> because it requires building post dominance information.  I'd stick
> with a combination of "-dce -simplifycfg" instead of -adce.
>
>>> 6) Is there a better place for CFGSimplification? Should I perform
>>> it at
>>> multiple points?
>>
>> I think once is probably enough.  Earlier would probably be better
>> as it will
>> simplify later passes and potentially help them run faster.
>
> I'd suggest running it earlier as you suggest, but then also late to
> clean up.  It's a very fast pass.
>
>>> Also, my code will frequently have vectors, that are either
>>> initialized to
>>> all 0.0 or 1.0. This offers a lot of opportunity for eliminating many
>>> multiplications and additions, but I was wondering which passes I
>>> need for
>>> this (probably a Reassociate pass, what else)? And I also wonder
>>> whether
>>> these passes actually work with vectors?
>>
>> I would assume they work with vector.  Anything expression-related
>> is good
>> to capture these opportunities (reassociation, folding, instcombine,
>> etc.).
>
> Note that a lot of optimizations don't apply to floating point
> values.  E.g. "x+0.0" can't be eliminated (in general), because
> "-0.0+0.0"  is 0.0, not -0.0.  All the optimizations should tolerate
> vectors, but may miss certain optimizations on them.  If you notice
> something (e.g. reassociation) that isn't being done aggressively with
> vectors, please file a bug.  We do a lot of vector optimizations
> though, so we are probably only missing details, not whole classes of
> important optimizations.
>
>>> Is there any other highly recommended pass for this kind of
>>> applications
>>> that I'm forgetting? Any passes that I better avoid due to poor
>>> gains and
>>> long optimization time?
>>
> Another important optimization is GVN, which does common subexpression
> elimination (for example).
>
>>> Sorry for the many question marks. :-) I don't expect there is an
>>> absolute
>>> definite answer to each of them, but some guidelines and insights
>>> would be
>>> very welcome and much appreciated.
>>
>> Phase ordering is one of the trickiest parts to tune.  It's often
>> highly
>> code-dependent.  If your target sources are limited in number, you
>> might be
>> able to get away with a funky ordering that would be disastrous on
>> general-purpose integer codes, for example.  It's often a matter of
>> trial and
>> error.  Several people have done research on using genetic
>> algorithms and
>> other tricks to find an optimal phase ordering.  A google search
>> should turn
>> up interesting stuff.
>
> "What he said" :).  Also, if you see specific problems in the
> generated code, let us know.
>
> -Chris
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-Chris

-- 
http://nondot.org/sabre/
http://llvm.org/