[LLVMdev] Optimization passes organization and tradeoffs

Tue May 20 11:28:15 PDT 2008

On May 20, 2008, at 8:57 AM, David Greene wrote:

> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote:
>
>> 1) Does ScalarReplAggregates totally superscede  
>> PromoteMemoryToRegister? I
>
> Nope, they are different.  Mem2Reg is really important if you want  
> register
> allocation.

Actually SROA does fully subsume Mem2Reg. It iterates between breaking  
up aggregates and promoting them to registers.

>> think I need it to optimize small arrays, but what is the expected  
>> added
>> complexity?
>
> I shouldn't think it would be very expensive at all.

Agreed.

>> 2) Does SCCP also eliminate multiplying/dividing by 1 and
>> adding/subtracting 0?
>
> That's probably more the purview of instcombine.

Right.

>> 4) Is DeadStoreElimination still necessary when we have  
>> AggressiveDCE?
>
> Probably, but I'll let others give the definitive answer.

They are orthogonal.  DSE touches stores, AggressiveDCE does mostly  
scalars.  ADCE basically assumes all stores are live.

>> 5) What are the tradeoffs between the different dead code elimination
>> variants (why not always use the aggressive one)?
>
> Others can speak to this.

ADCE is actually quite expensive from the compile-time perspective,  
because it requires building post dominance information.  I'd stick  
with a combination of "-dce -simplifycfg" instead of -adce.

>> 6) Is there a better place for CFGSimplification? Should I perform  
>> it at
>> multiple points?
>
> I think once is probably enough.  Earlier would probably be better  
> as it will
> simplify later passes and potentially help them run faster.

I'd suggest running it earlier as you suggest, but then also late to  
clean up.  It's a very fast pass.

>> Also, my code will frequently have vectors, that are either  
>> initialized to
>> all 0.0 or 1.0. This offers a lot of opportunity for eliminating many
>> multiplications and additions, but I was wondering which passes I  
>> need for
>> this (probably a Reassociate pass, what else)? And I also wonder  
>> whether
>> these passes actually work with vectors?
>
> I would assume they work with vector.  Anything expression-related  
> is good
> to capture these opportunities (reassociation, folding, instcombine,  
> etc.).

Note that a lot of optimizations don't apply to floating point  
values.  E.g. "x+0.0" can't be eliminated (in general), because  
"-0.0+0.0"  is 0.0, not -0.0.  All the optimizations should tolerate  
vectors, but may miss certain optimizations on them.  If you notice  
something (e.g. reassociation) that isn't being done aggressively with  
vectors, please file a bug.  We do a lot of vector optimizations  
though, so we are probably only missing details, not whole classes of  
important optimizations.

>> Is there any other highly recommended pass for this kind of  
>> applications
>> that I'm forgetting? Any passes that I better avoid due to poor  
>> gains and
>> long optimization time?
>
Another important optimization is GVN, which does common subexpression  
elimination (for example).

>> Sorry for the many question marks. :-) I don't expect there is an  
>> absolute
>> definite answer to each of them, but some guidelines and insights  
>> would be
>> very welcome and much appreciated.
>
> Phase ordering is one of the trickiest parts to tune.  It's often  
> highly
> code-dependent.  If your target sources are limited in number, you  
> might be
> able to get away with a funky ordering that would be disastrous on
> general-purpose integer codes, for example.  It's often a matter of  
> trial and
> error.  Several people have done research on using genetic  
> algorithms and
> other tricks to find an optimal phase ordering.  A google search  
> should turn
> up interesting stuff.

"What he said" :).  Also, if you see specific problems in the  
generated code, let us know.

-Chris