[LLVMdev] Making optimization passes do less

Tue May 27 23:25:16 PDT 2008

On May 26, 2008, at 4:23 AM, Matthijs Kooijman wrote:
> I'm currently struggling with a few optimization passes that change  
> stuff I
> don't want to be changed.

Hehe ok.

> However, for the most part those passes (InstructionCombining
> and SimplifyCFG currently) do stuff that I do want, so disabling them
> alltogether doesn't help me much.

Ok.

> The problem arises because the architecture I'm compiling for is quite
> non-standard. In particular, it has the ability to execute a lot of
> instructions in parallel, but at the same time can't execute  
> everything you
> throw at it.

Ok, that is odd :)

> My problem with SimplifyCFG is the following: Whenever the if and  
> else branch
> start with the same instruction, it gets hoisted up into the  
> predecessor
> block. For my architecture, instructions in different blocks can't  
> be run in
> parallel, so this optimization makes code either very inefficient or  
> not
> compile at all.

There are two different issues here.  Passes like instcombine and  
simplifycfg [which is really "basic block combine" :) ] do two things:

1. They make changes that are clear wins, e.g. deleting unconditional  
branches and noop instrs.
2. They change code into more canonical form.

Merging repeated instructions is an important canonicalization because  
it can unlock other optimizations.  The fact that your target doesn't  
like code in this form is not a good reason for simplifycfg to stop  
doing it. :)

> InstructionCombining has this habit of removing unneeded bits from  
> constants.
> For example, if I do i & 63, where i is a loop counter that is  
> always even,
> this gets replaced by i & 62. Which gives, of course, the same  
> results when
> interpreted, but our backend cannot just use any constant as an &  
> mask (in
> particular, it can only use a limited amount of them).

Sure, this is another example of canonicalization.  Are you using the  
LLVM code generator?  It has support for handling this specifically.   
ARM and Alpha in particular have special instructions that only work  
with very specific and masks.  If you write a pattern/instruction that  
matches (and myreg, 255) for example, this will match a dag node for  
"(and myreg, 16)" if the code generator knows that the other bits are  
already zero.

> I'd very much prefer to
> preserve the original value from the source here (I also assume that  
> this
> optimization is in place to help further optimizations, because I  
> can't really
> see any use of this change on regular architectures...).

This is folly.  If the user wrote the code in the "optimized" form  
that instcombine transforms it into, your code generator should still  
produce the optimized instructions.  You're trading one missed  
optimization for another one.

> I've been thinking a bit on how to achieve this, and I see a few  
> options

> :None of these options seem too attractive to me, what do others  
> think? Is

> there some other option I'm missing here?

I really don't like any of these options.  The best ways to go are:

1) teach your code generator how to do these optimizations, reversing  
the cases that you care about.
2) if #1 isn't feasible, write a canonicalization prepass (like  
codegen prepare) that transforms code from the "canonical optimizer  
form" into a happy form for your target.

-Chris