[llvm-commits] [llvm] r89187 - in /llvm/trunk: include/llvm/Target/TargetInstrInfo.h lib/CodeGen/BranchFolding.cpp lib/Target/ARM/ARMBaseInstrInfo.cpp lib/Target/ARM/ARMBaseInstrInfo.h lib/Target/ARM/ARMSubtarget.cpp lib/Target/ARM/ARMSubtarget.h

Thu Nov 19 10:11:11 PST 2009

On Nov 19, 2009, at 9:35 AM, Chris Lattner wrote:

> Apologies for the last email being rambling, and in advance for this  
> one also being rambling. :)
>
> My basic perspective is that I want to sink as much logic into the  
> target-independent code as possible and have the fewest number of  
> target hooks reasonable.  This is for three reasons:
>
> 1. Porting to a new target is easier with fewer target hooks, and  
> they get better codegen 'out of the box' by specifying primitives  
> instead of having to tweak tons of knobs.

I disagree.  It is good to have knobs to tweak.  As long as the  
defaults are set to values that work well for most targets, having  
more knobs does not make it harder to port to a new target.  It does  
increase the overall complexity and documentation burden, though.

In this particular case, I left the default set to match the previous  
behavior.  We could easily change that once we find out what works  
best on most of the targets we support.

>
> 2. A single and well tuned implementation of something in target  
> independent code is better than having a few well tuned impls in  
> some targets and missing or just wrong impls in others.  For  
> example, this xform is definitely profitable on X86 chips as well,  
> probably many others.

Right.  There is a single implementation of tail duplication here.   
It's just a question of tuning it for different targets.  It is not  
possible (in general) to have it tuned ideally for all targets.  For  
that matter, we haven't really tuned it for ARM yet.  I've started  
looking at trying it for X86.

I suppose that after further experimentation, we may discover that in  
practice the same tuning works well enough on all the targets we care  
about.  I don't know whether that will be true or not, but based on  
your comments, let's assume that it will.  If we discover later that  
it doesn't work well, we can always add the target hook at that time.

>
> 3. Less target-specific code to maintain and push around as the  
> compiler moves forward.
>
> anyway, these are just general principles why I don't like new  
> target hooks when they can be avoided.  More details:
>
>> On Nov 18, 2009, at 9:29 AM, Bob Wilson wrote:
>>> The only disadvantage of what you suggest is code size.  There is
>>> currently no limit on the number of predecessors where a block may  
>>> be
>>> duplicated, so the effect of duplicating a small block can be
>>> magnified in the overall code size.
>
> Right.  However, the implementation for ARM doesn't consider this at  
> all.  Also, this cost is not something that should need target- 
> specific information to quantify.

It's easy to measure the change in code size but hard to weigh that  
against the performance benefit.  That part is very target-specific.

>
>>> (And, at least for duplicating
>>> indirect branches on ARM Cortex processors, we don't want to limit
>>> that.)
>
> Why not??

Because it is such a dramatic performance win for ARM Cortex, and  
perhaps more importantly, I can't think of a way to limit it that  
might not also miss cases where we really want to get the  
performance.  I don't expect indirect branches to be so common that  
the code size matters.

>
>>> On a small low-power device without sophisticated branch
>>> prediction, where code size typically matters more than usual, we'll
>>> be doing the wrong thing.  But, indirect branches are not very  
>>> common,
>>> so it probably doesn't really matter so much.
>
> They are fairly common in switch statements that codegen to jump  
> tables, but not anyway near as common as normal branches.

For ARM we currently use different instruction patterns for jump  
tables and indirect branches.  So, the special case we're doing for  
tail duplication does not apply to jump tables.  We might have to be  
more conservative if we can't distinguish those on other targets.

>
>>> Most modern high-performance processors will benefit from tail
>>> duplicating indirect branches, so that would be another reason to
>>> avoid the target hook.
>
> Right.  I would also like the target hook much better if it was of  
> the form "DoesBranchPredictionOfIndirectCallsBasedOnPC" or something  
> like that.  This predicate makes it *really really* obvious what it  
> doing.  By returning a 'cost' with no real units, the target author  
> has no "truth" they just fiddle around until the 'right thing  
> happens'.  Given the documentation, it's very unclear (as a target  
> author) why you'd want to specify the hook you added.
>
> I much prefer very simple hooks that return an _aspect of the micro  
> architecture_ rather than having hooks that _implement pieces of  
> codegen passes_.  The later is the approach that GCC took and is one  
> of the major reasons it is difficult to change the algorithms in the  
> GCC backend.

Doesn't that go against your desire to have fewer target hooks?  I  
tried to make the tail duplication hook very general, since it seems  
conceivable that we may want to adjust the cost metric based on other  
characteristics of the target.  If we add very specific target hooks,  
we may end up with more of them.

I admit that "obviousness" advantage of using very specific hooks.

>
>>>
>>> Unless someone else has another idea, I'll get rid of the tail
>>> duplication target hook.  As you mention, we'll need a way to  
>>> identify
>>> indirect branches.  I'd prefer to add a new IsIndirectBranch target
>>> hook.  This goes against your desire to avoid new target hooks, but
>>> it's nice and simple.
>
> Using the extant isIndirectBranch flag would be best, but even  
> adding this sort of target hook would be somewhat ok.  At least this  
> would be a property of the architecture.  If we can avoid it, I'd  
> definitely prefer to of course.

The isIndirectBranch flag would not allow us to distinguish jump table  
branches.

>
> Dan wrote:
>> However I don't know what the big problem is
>> with the target hook; it's neither arbitrary nor unitless. Perhaps
>> what he really meant is that the present use for the hook is too
>> obscure to justify having the clutter of the hook in TargetInstrInfo,
>> but I'm just guessing here.
>
> What is the units that it returns?  As a target author, how do I  
> know to return +2 or +4 or +1234?

It returns the maximum number of instructions in a basic block that  
will be tail duplicated.