[llvm-commits] [llvm] r89187 - in /llvm/trunk: include/llvm/Target/TargetInstrInfo.h lib/CodeGen/BranchFolding.cpp lib/Target/ARM/ARMBaseInstrInfo.cpp lib/Target/ARM/ARMBaseInstrInfo.h lib/Target/ARM/ARMSubtarget.cpp lib/Target/ARM/ARMSubtarget.h

Sun Nov 22 06:06:47 PST 2009

On Nov 21, 2009, at 1:01 PM, Bob Wilson wrote:

> 
> On Nov 21, 2009, at 6:49 AM, Chris Lattner wrote:
>> 
>> This is a completely acceptable tradeoff, I just want it factored the right way.  I'd be very happy if the code in taildupe said "if the block ends in an indirect goto operation, and if "it is profitable to dupliate indirect gotos for the target" then increase the threshold a bit.  I don't like asking the target how much to increase the threshold.  "How much to increase the threshold" is not asking a target property.  Asking "is it profitable to duplicate indirect gotos" is.
> 
> Factoring it in the way you suggest sounds good to me.  I will do that.

Thanks Bob.

> I see your point about switches being used for interpreters.  I wasn't trying to say otherwise.  I'm just not sure that this transformation will be a win for switches in general, and switches

This won't kick in for jump table switches because of the extra range check before them.  However, if that branch got jump threaded or deleted as redundant in the future, the xform could/should apply in theory.

> are used for lots of things besides interpreters.  My experience with indirect branches is that they're used much less frequently and interpreters are the most common use.  I was trying to reduce the risk of breaking something by limiting this change to the one specific case that I know about.

Don't live in fear, do the right thing :).  If it turns out to cause a problem in the future, we can track it down and reevaluate it then.

> I'm not sure we've really solved this problem yet, anyway.  The current solution seems to be very sensitive to minor changes in the code.  We run tail merging very late, so small differences in register allocation, scheduling, etc. will change what tail merging does.  If tail merging is very successful, then tail duplication has to be very aggressive to get the result we need.  For example, if tail merging creates a new block with a common tail of a dozen instructions, and that block is a predecessor of the indirect branch block, it is no problem to tail duplicate the indirect branch into the common tail, but what we really need is to get the new "common tail + indirect branch" duplicated back one more level.  It won't help to tell tail merging not to merge blocks ending with indirect branches, because the indirect branch is in a separate block when we run tail merging.  I suppose we could disable tail merging entirely at -O3, but I'm still trying to better understand the situation before proposing anything like that.

I don't know any of the details, but it sounds like the code should be considering the complete tail length (in # instrs), ignoring the fact that the tail may be broken across multiple blocks.  If you really want to handle indbr's, maybe start from indbrs scanning backward through all the fall throughs and look for any uncond brs that jump into the tail at any point.  At that case you could consider duplicating the entire tail.

If taildup is useful for other cases that are more than one instruction, it seems reasonable to make the algorithm always work this way.

-Chris