[LLVMdev] Optimization passes organization and tradeoffs

Tue May 20 05:03:17 PDT 2008

Hi all,

I'm getting more impressed by LLVM day by day, but what's a bit unclear to
me now is the order of optimization passes, and their performance. I think I
have a pretty solid understanding of what each pass does at a high level,
but I couldn't find any documentation about how they interact at a lower
level.

I'd like to use LLVM for generating high-performance stream processing code
at run-time. Obviously the resulting code should be as optimized as
possible, but performing the optimizations themselves should also be very
fast. The code I'm compiling is comparable to C (without any exception
handling or garbage collection, so none of the related passes are needed).
My first attempt at collecting useful optimizations looks like this:

passManager->add(new TargetData(*executionEngine->getTargetData()));

passManager->add(createScalarReplAggregatesPass());   // Convert to SSA form

passManager->add(createSCCPPass());                   // Propagate constants

passManager->add(createInstructionCombiningPass());   // Peephole
optimization

passManager->add(createDeadStoreEliminationPass());   // Dead store
elimination

passManager->add(createAggressiveDCEPass());          // Aggressive dead
code elimination

passManager->add(createCFGSimplificationPass());      // Control-flow
optimization

I have several questions about this:

1) Does ScalarReplAggregates totally superscede PromoteMemoryToRegister? I
think I need it to optimize small arrays, but what is the expected added
complexity?

2) Does SCCP also eliminate multiplying/dividing by 1 and adding/subtracting
0?

3) Is it arbitrary where to place InstructionCombining? Is there a better
order?

4) Is DeadStoreElimination still necessary when we have AggressiveDCE?

5) What are the tradeoffs between the different dead code elimination
variants (why not always use the aggressive one)? 

6) Is there a better place for CFGSimplification? Should I perform it at
multiple points?

Also, my code will frequently have vectors, that are either initialized to
all 0.0 or 1.0. This offers a lot of opportunity for eliminating many
multiplications and additions, but I was wondering which passes I need for
this (probably a Reassociate pass, what else)? And I also wonder whether
these passes actually work with vectors?

Is there any other highly recommended pass for this kind of applications
that I'm forgetting? Any passes that I better avoid due to poor gains and
long optimization time?

Sorry for the many question marks. :-) I don't expect there is an absolute
definite answer to each of them, but some guidelines and insights would be
very welcome and much appreciated.

Thanks!

Nicolas Capens

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080520/b3cc9592/attachment.html>