[LLVMdev] Optimization passes organization and tradeoffs
David Greene
dag at cray.com
Tue May 20 08:57:27 PDT 2008
On Tuesday 20 May 2008 07:03, Nicolas Capens wrote:
> 1) Does ScalarReplAggregates totally superscede PromoteMemoryToRegister? I
Nope, they are different. Mem2Reg is really important if you want register
allocation.
> think I need it to optimize small arrays, but what is the expected added
> complexity?
I shouldn't think it would be very expensive at all.
> 2) Does SCCP also eliminate multiplying/dividing by 1 and
> adding/subtracting 0?
That's probably more the purview of instcombine.
> 3) Is it arbitrary where to place InstructionCombining? Is there a better
> order?
Typically you'll want to place it after various kinds of propagation are done
(for example, SCCP). You can run it multipe times.
> 4) Is DeadStoreElimination still necessary when we have AggressiveDCE?
Probably, but I'll let others give the definitive answer.
> 5) What are the tradeoffs between the different dead code elimination
> variants (why not always use the aggressive one)?
Others can speak to this.
> 6) Is there a better place for CFGSimplification? Should I perform it at
> multiple points?
I think once is probably enough. Earlier would probably be better as it will
simplify later passes and potentially help them run faster.
> Also, my code will frequently have vectors, that are either initialized to
> all 0.0 or 1.0. This offers a lot of opportunity for eliminating many
> multiplications and additions, but I was wondering which passes I need for
> this (probably a Reassociate pass, what else)? And I also wonder whether
> these passes actually work with vectors?
I would assume they work with vector. Anything expression-related is good
to capture these opportunities (reassociation, folding, instcombine, etc.).
> Is there any other highly recommended pass for this kind of applications
> that I'm forgetting? Any passes that I better avoid due to poor gains and
> long optimization time?
The most expensive optimizations are typically scheduling and register
allocation. Regalloc is pretty important but it depends a lot on the machine
architecture. If you have a very fast cache, it becomes less important.
Scheduling is hit-or-miss. If your architecture is strictly in-order, it's
pretty important. You can always play games like have the scheduler
bail out if a basic block is too large or not scheduling bocks outside of
loops. This can save significant compile time.
> Sorry for the many question marks. :-) I don't expect there is an absolute
> definite answer to each of them, but some guidelines and insights would be
> very welcome and much appreciated.
Phase ordering is one of the trickiest parts to tune. It's often highly
code-dependent. If your target sources are limited in number, you might be
able to get away with a funky ordering that would be disastrous on
general-purpose integer codes, for example. It's often a matter of trial and
error. Several people have done research on using genetic algorithms and
other tricks to find an optimal phase ordering. A google search should turn
up interesting stuff.
Have a look at the opt tool sources to get an idea of where to start.
-Dave
More information about the llvm-dev
mailing list