[llvm-commits] [llvm] r89187 - in /llvm/trunk: include/llvm/Target/TargetInstrInfo.h lib/CodeGen/BranchFolding.cpp lib/Target/ARM/ARMBaseInstrInfo.cpp lib/Target/ARM/ARMBaseInstrInfo.h lib/Target/ARM/ARMSubtarget.cpp lib/Target/ARM/ARMSubtarget.h

Thu Nov 19 22:52:27 PST 2009

On Nov 19, 2009, at 10:28 PM, Chris Lattner wrote:
>
> I don't really buy this.  Are you really claiming that duplicating a  
> 10K instruction basic block is worth it?  In reality there has to be  
> a balance, even for ARM.  This is also likely to be a huge win for  
> X86 but this is just like jump threading: while eliminating  
> correlated branches is *always* a win from the dynamic instruction  
> count perspective, we balance the benefit with the code size cost.   
> I don't see how this case is any different.

The size of the block is definitely limited -- the whole point of the  
target hook is to adjust that limit.  The aspect that we don't limit  
is the number of predecessors where we may duplicate that block.  More  
on this below....

> In practice, must jump table indirect gotos are preceded by a  
> conditional branch that checks the "range" of the table anyway, so  
> it won't matter.  However, if that weren't the case, this  
> optimization would be just as useful for switches as indbr's.   
> Ideally the same code *should* apply to both.

Not necessarily.  Our implementation of indirect branches artificially  
combines all the indirect branches in a function into a single  
branch.  That has a very bad effect on branch prediction.  The main  
reason we need to do this aggressive tail duplication for indirect  
branches is to essentially undo that transformation.

The same thing is not true of jump tables.  More below...

>
>>>>> Unless someone else has another idea, I'll get rid of the tail
>>>>> duplication target hook.  As you mention, we'll need a way to  
>>>>> identify
>>>>> indirect branches.  I'd prefer to add a new IsIndirectBranch  
>>>>> target
>>>>> hook.  This goes against your desire to avoid new target hooks,  
>>>>> but
>>>>> it's nice and simple.
>>>
>>> Using the extant isIndirectBranch flag would be best, but even  
>>> adding this sort of target hook would be somewhat ok.  At least  
>>> this would be a property of the architecture.  If we can avoid it,  
>>> I'd definitely prefer to of course.
>>
>> The isIndirectBranch flag would not allow us to distinguish jump  
>> table branches.
>
> I don't think we want to :).  Why do we want to?

Indirect branches (i.e., "computed gotos", not jump tables) are most  
often used for interpreters.  Besides the "undo the front end's  
factoring of the CFG" motivation for treating indirect branches  
specially, there is more to it than that.  It is quite common for an  
interpreter to see common patterns in the operations it handles.   
(This is especially true for certain benchmarks we care about.)  The  
typical interpreter loop has a chunk of code to handle each operation,  
ending with an indirect branch to go to the next operation.  When  
there are patterns in the order of interpreted operations, those  
indirect branches become predictable -- but only if they are  
duplicated into the separate chunks of code for each operation.

Applying this intuition to the code size question above, if an  
interpreter loop handles 1000 different operations, we would still  
want to duplicate the indirect branches into every one of those 1000  
chunks of code, as long as the code being duplicated is "small  
enough".  (I am thinking here of processors that can predict those  
branches and where the branch misprediction penalty is significant.   
You would want to make a different tradeoff for a processor with no  
branch prediction.)

> And why is +2 the "right" amount?  Because it happens to be enough  
> to get one particular testcase that you care about, or because of  
> some fundamental property of the architecture?

It is for the same reason that -tail-merge-size defaults to "3".  ;-)

The default limit for tail duplication is "tail-merge-size" - 1.  That  
is also completely arbitrary.  We pick values that work well for the  
code we have measured and that we care about.  There's nothing  
fundamental about them.