[LLVMdev] Optimization passes organization and tradeoffs

Tue May 20 08:57:27 PDT 2008

On Tuesday 20 May 2008 07:03, Nicolas Capens wrote:

> 1) Does ScalarReplAggregates totally superscede PromoteMemoryToRegister? I

Nope, they are different.  Mem2Reg is really important if you want register 
allocation.

> think I need it to optimize small arrays, but what is the expected added
> complexity?

I shouldn't think it would be very expensive at all.

> 2) Does SCCP also eliminate multiplying/dividing by 1 and
> adding/subtracting 0?

That's probably more the purview of instcombine.

> 3) Is it arbitrary where to place InstructionCombining? Is there a better
> order?

Typically you'll want to place it after various kinds of propagation are done 
(for example, SCCP).  You can run it multipe times.

> 4) Is DeadStoreElimination still necessary when we have AggressiveDCE?

Probably, but I'll let others give the definitive answer.

> 5) What are the tradeoffs between the different dead code elimination
> variants (why not always use the aggressive one)?

Others can speak to this.

> 6) Is there a better place for CFGSimplification? Should I perform it at
> multiple points?

I think once is probably enough.  Earlier would probably be better as it will
simplify later passes and potentially help them run faster.

> Also, my code will frequently have vectors, that are either initialized to
> all 0.0 or 1.0. This offers a lot of opportunity for eliminating many
> multiplications and additions, but I was wondering which passes I need for
> this (probably a Reassociate pass, what else)? And I also wonder whether
> these passes actually work with vectors?

I would assume they work with vector.  Anything expression-related is good 
to capture these opportunities (reassociation, folding, instcombine, etc.).

> Is there any other highly recommended pass for this kind of applications
> that I'm forgetting? Any passes that I better avoid due to poor gains and
> long optimization time?

The most expensive optimizations are typically scheduling and register 
allocation.  Regalloc is pretty important but it depends a lot on the machine 
architecture.  If you have a very fast cache, it becomes less important.

Scheduling is hit-or-miss.  If your architecture is strictly in-order, it's 
pretty important.  You can always play games like have the scheduler
bail out if a basic block is too large or not scheduling bocks outside of 
loops.  This can save significant compile time.

> Sorry for the many question marks. :-) I don't expect there is an absolute
> definite answer to each of them, but some guidelines and insights would be
> very welcome and much appreciated.

Phase ordering is one of the trickiest parts to tune.  It's often highly 
code-dependent.  If your target sources are limited in number, you might be 
able to get away with a funky ordering that would be disastrous on 
general-purpose integer codes, for example.  It's often a matter of trial and 
error.  Several people have done research on using genetic algorithms and 
other tricks to find an optimal phase ordering.  A google search should turn 
up interesting stuff.

Have a look at the opt tool sources to get an idea of where to start.

                                                    -Dave