[LLVMdev] Optimization passes organization and tradeoffs
Chris Lattner
clattner at apple.com
Tue May 20 11:28:15 PDT 2008
On May 20, 2008, at 8:57 AM, David Greene wrote:
> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote:
>
>> 1) Does ScalarReplAggregates totally superscede
>> PromoteMemoryToRegister? I
>
> Nope, they are different. Mem2Reg is really important if you want
> register
> allocation.
Actually SROA does fully subsume Mem2Reg. It iterates between breaking
up aggregates and promoting them to registers.
>> think I need it to optimize small arrays, but what is the expected
>> added
>> complexity?
>
> I shouldn't think it would be very expensive at all.
Agreed.
>> 2) Does SCCP also eliminate multiplying/dividing by 1 and
>> adding/subtracting 0?
>
> That's probably more the purview of instcombine.
Right.
>> 4) Is DeadStoreElimination still necessary when we have
>> AggressiveDCE?
>
> Probably, but I'll let others give the definitive answer.
They are orthogonal. DSE touches stores, AggressiveDCE does mostly
scalars. ADCE basically assumes all stores are live.
>> 5) What are the tradeoffs between the different dead code elimination
>> variants (why not always use the aggressive one)?
>
> Others can speak to this.
ADCE is actually quite expensive from the compile-time perspective,
because it requires building post dominance information. I'd stick
with a combination of "-dce -simplifycfg" instead of -adce.
>> 6) Is there a better place for CFGSimplification? Should I perform
>> it at
>> multiple points?
>
> I think once is probably enough. Earlier would probably be better
> as it will
> simplify later passes and potentially help them run faster.
I'd suggest running it earlier as you suggest, but then also late to
clean up. It's a very fast pass.
>> Also, my code will frequently have vectors, that are either
>> initialized to
>> all 0.0 or 1.0. This offers a lot of opportunity for eliminating many
>> multiplications and additions, but I was wondering which passes I
>> need for
>> this (probably a Reassociate pass, what else)? And I also wonder
>> whether
>> these passes actually work with vectors?
>
> I would assume they work with vector. Anything expression-related
> is good
> to capture these opportunities (reassociation, folding, instcombine,
> etc.).
Note that a lot of optimizations don't apply to floating point
values. E.g. "x+0.0" can't be eliminated (in general), because
"-0.0+0.0" is 0.0, not -0.0. All the optimizations should tolerate
vectors, but may miss certain optimizations on them. If you notice
something (e.g. reassociation) that isn't being done aggressively with
vectors, please file a bug. We do a lot of vector optimizations
though, so we are probably only missing details, not whole classes of
important optimizations.
>> Is there any other highly recommended pass for this kind of
>> applications
>> that I'm forgetting? Any passes that I better avoid due to poor
>> gains and
>> long optimization time?
>
Another important optimization is GVN, which does common subexpression
elimination (for example).
>> Sorry for the many question marks. :-) I don't expect there is an
>> absolute
>> definite answer to each of them, but some guidelines and insights
>> would be
>> very welcome and much appreciated.
>
> Phase ordering is one of the trickiest parts to tune. It's often
> highly
> code-dependent. If your target sources are limited in number, you
> might be
> able to get away with a funky ordering that would be disastrous on
> general-purpose integer codes, for example. It's often a matter of
> trial and
> error. Several people have done research on using genetic
> algorithms and
> other tricks to find an optimal phase ordering. A google search
> should turn
> up interesting stuff.
"What he said" :). Also, if you see specific problems in the
generated code, let us know.
-Chris
More information about the llvm-dev
mailing list