[LLVMdev] Optimization passes organization and tradeoffs
Nicolas Capens
nicolas at capens.net
Wed May 21 02:29:32 PDT 2008
Hi Chris,
Thanks for the detailed explanations. I have a few remaining questions:
Am I correct that ScalarReplAggregates is hardly more expensive than Mem2Reg
and therefore generally preferable?
What would be the code quality implications of using "-dce -simplifycfg"
instead of -adce? As far as I understand the algorithms involved, -dce would
hardly ever miss a dead instruction if dead stores have already been
eliminated, right?
For my application it is safe to reduce x+0.0 to x. I checked and my current
set of passes keeps the add. Is there a way to control the floating-point
model in some way? Lots of compilers can select 'strict', 'precise', 'fast',
etc. Even adding and then subtracting the same value, which almost never
results in the original value, is almost always safe to eliminate for my
app...
What's the difference between GVN and GCSE, if they both perform common
subexpression elimination?
So far LLVM is working great for me. I'm impressed by the overall
architecture and the fair learning curve. I'm generating complex but
well-optimized code in a matter of days. I haven't found any problems in the
generated code yet, but I'll let you know as soon as I notice a glitch.
Kind regards,
Nicolas
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Chris Lattner
Sent: Tuesday, 20 May, 2008 20:28
To: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Optimization passes organization and tradeoffs
On May 20, 2008, at 8:57 AM, David Greene wrote:
> On Tuesday 20 May 2008 07:03, Nicolas Capens wrote:
>
>> 1) Does ScalarReplAggregates totally superscede
>> PromoteMemoryToRegister? I
>
> Nope, they are different. Mem2Reg is really important if you want
> register
> allocation.
Actually SROA does fully subsume Mem2Reg. It iterates between breaking
up aggregates and promoting them to registers.
>> think I need it to optimize small arrays, but what is the expected
>> added
>> complexity?
>
> I shouldn't think it would be very expensive at all.
Agreed.
>> 2) Does SCCP also eliminate multiplying/dividing by 1 and
>> adding/subtracting 0?
>
> That's probably more the purview of instcombine.
Right.
>> 4) Is DeadStoreElimination still necessary when we have
>> AggressiveDCE?
>
> Probably, but I'll let others give the definitive answer.
They are orthogonal. DSE touches stores, AggressiveDCE does mostly
scalars. ADCE basically assumes all stores are live.
>> 5) What are the tradeoffs between the different dead code elimination
>> variants (why not always use the aggressive one)?
>
> Others can speak to this.
ADCE is actually quite expensive from the compile-time perspective,
because it requires building post dominance information. I'd stick
with a combination of "-dce -simplifycfg" instead of -adce.
>> 6) Is there a better place for CFGSimplification? Should I perform
>> it at
>> multiple points?
>
> I think once is probably enough. Earlier would probably be better
> as it will
> simplify later passes and potentially help them run faster.
I'd suggest running it earlier as you suggest, but then also late to
clean up. It's a very fast pass.
>> Also, my code will frequently have vectors, that are either
>> initialized to
>> all 0.0 or 1.0. This offers a lot of opportunity for eliminating many
>> multiplications and additions, but I was wondering which passes I
>> need for
>> this (probably a Reassociate pass, what else)? And I also wonder
>> whether
>> these passes actually work with vectors?
>
> I would assume they work with vector. Anything expression-related
> is good
> to capture these opportunities (reassociation, folding, instcombine,
> etc.).
Note that a lot of optimizations don't apply to floating point
values. E.g. "x+0.0" can't be eliminated (in general), because
"-0.0+0.0" is 0.0, not -0.0. All the optimizations should tolerate
vectors, but may miss certain optimizations on them. If you notice
something (e.g. reassociation) that isn't being done aggressively with
vectors, please file a bug. We do a lot of vector optimizations
though, so we are probably only missing details, not whole classes of
important optimizations.
>> Is there any other highly recommended pass for this kind of
>> applications
>> that I'm forgetting? Any passes that I better avoid due to poor
>> gains and
>> long optimization time?
>
Another important optimization is GVN, which does common subexpression
elimination (for example).
>> Sorry for the many question marks. :-) I don't expect there is an
>> absolute
>> definite answer to each of them, but some guidelines and insights
>> would be
>> very welcome and much appreciated.
>
> Phase ordering is one of the trickiest parts to tune. It's often
> highly
> code-dependent. If your target sources are limited in number, you
> might be
> able to get away with a funky ordering that would be disastrous on
> general-purpose integer codes, for example. It's often a matter of
> trial and
> error. Several people have done research on using genetic
> algorithms and
> other tricks to find an optimal phase ordering. A google search
> should turn
> up interesting stuff.
"What he said" :). Also, if you see specific problems in the
generated code, let us know.
-Chris
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list