[llvm-commits] [llvm] r89187 - in /llvm/trunk: include/llvm/Target/TargetInstrInfo.h lib/CodeGen/BranchFolding.cpp lib/Target/ARM/ARMBaseInstrInfo.cpp lib/Target/ARM/ARMBaseInstrInfo.h lib/Target/ARM/ARMSubtarget.cpp lib/Target/ARM/ARMSubtarget.h

Thu Nov 19 15:53:52 PST 2009

Hi Chris,

An update so you don't think I'm ignoring you....  I want to try some  
experiments on x86 before changing anything.  I'm currently blocked by  
<rdar://problem/7410232>.  I'll take another look at it once that is  
resolved.

On Nov 19, 2009, at 10:11 AM, Bob Wilson wrote:

>
> On Nov 19, 2009, at 9:35 AM, Chris Lattner wrote:
>
>> Apologies for the last email being rambling, and in advance for this
>> one also being rambling. :)
>>
>> My basic perspective is that I want to sink as much logic into the
>> target-independent code as possible and have the fewest number of
>> target hooks reasonable.  This is for three reasons:
>>
>> 1. Porting to a new target is easier with fewer target hooks, and
>> they get better codegen 'out of the box' by specifying primitives
>> instead of having to tweak tons of knobs.
>
> I disagree.  It is good to have knobs to tweak.  As long as the
> defaults are set to values that work well for most targets, having
> more knobs does not make it harder to port to a new target.  It does
> increase the overall complexity and documentation burden, though.
>
> In this particular case, I left the default set to match the previous
> behavior.  We could easily change that once we find out what works
> best on most of the targets we support.
>
>>
>> 2. A single and well tuned implementation of something in target
>> independent code is better than having a few well tuned impls in
>> some targets and missing or just wrong impls in others.  For
>> example, this xform is definitely profitable on X86 chips as well,
>> probably many others.
>
> Right.  There is a single implementation of tail duplication here.
> It's just a question of tuning it for different targets.  It is not
> possible (in general) to have it tuned ideally for all targets.  For
> that matter, we haven't really tuned it for ARM yet.  I've started
> looking at trying it for X86.
>
> I suppose that after further experimentation, we may discover that in
> practice the same tuning works well enough on all the targets we care
> about.  I don't know whether that will be true or not, but based on
> your comments, let's assume that it will.  If we discover later that
> it doesn't work well, we can always add the target hook at that time.
>
>>
>> 3. Less target-specific code to maintain and push around as the
>> compiler moves forward.
>>
>> anyway, these are just general principles why I don't like new
>> target hooks when they can be avoided.  More details:
>>
>>> On Nov 18, 2009, at 9:29 AM, Bob Wilson wrote:
>>>> The only disadvantage of what you suggest is code size.  There is
>>>> currently no limit on the number of predecessors where a block may
>>>> be
>>>> duplicated, so the effect of duplicating a small block can be
>>>> magnified in the overall code size.
>>
>> Right.  However, the implementation for ARM doesn't consider this at
>> all.  Also, this cost is not something that should need target-
>> specific information to quantify.
>
> It's easy to measure the change in code size but hard to weigh that
> against the performance benefit.  That part is very target-specific.
>
>>
>>>> (And, at least for duplicating
>>>> indirect branches on ARM Cortex processors, we don't want to limit
>>>> that.)
>>
>> Why not??
>
> Because it is such a dramatic performance win for ARM Cortex, and
> perhaps more importantly, I can't think of a way to limit it that
> might not also miss cases where we really want to get the
> performance.  I don't expect indirect branches to be so common that
> the code size matters.
>
>>
>>>> On a small low-power device without sophisticated branch
>>>> prediction, where code size typically matters more than usual,  
>>>> we'll
>>>> be doing the wrong thing.  But, indirect branches are not very
>>>> common,
>>>> so it probably doesn't really matter so much.
>>
>> They are fairly common in switch statements that codegen to jump
>> tables, but not anyway near as common as normal branches.
>
> For ARM we currently use different instruction patterns for jump
> tables and indirect branches.  So, the special case we're doing for
> tail duplication does not apply to jump tables.  We might have to be
> more conservative if we can't distinguish those on other targets.
>
>>
>>>> Most modern high-performance processors will benefit from tail
>>>> duplicating indirect branches, so that would be another reason to
>>>> avoid the target hook.
>>
>> Right.  I would also like the target hook much better if it was of
>> the form "DoesBranchPredictionOfIndirectCallsBasedOnPC" or something
>> like that.  This predicate makes it *really really* obvious what it
>> doing.  By returning a 'cost' with no real units, the target author
>> has no "truth" they just fiddle around until the 'right thing
>> happens'.  Given the documentation, it's very unclear (as a target
>> author) why you'd want to specify the hook you added.
>>
>> I much prefer very simple hooks that return an _aspect of the micro
>> architecture_ rather than having hooks that _implement pieces of
>> codegen passes_.  The later is the approach that GCC took and is one
>> of the major reasons it is difficult to change the algorithms in the
>> GCC backend.
>
> Doesn't that go against your desire to have fewer target hooks?  I
> tried to make the tail duplication hook very general, since it seems
> conceivable that we may want to adjust the cost metric based on other
> characteristics of the target.  If we add very specific target hooks,
> we may end up with more of them.
>
> I admit that "obviousness" advantage of using very specific hooks.
>
>>
>>>>
>>>> Unless someone else has another idea, I'll get rid of the tail
>>>> duplication target hook.  As you mention, we'll need a way to
>>>> identify
>>>> indirect branches.  I'd prefer to add a new IsIndirectBranch target
>>>> hook.  This goes against your desire to avoid new target hooks, but
>>>> it's nice and simple.
>>
>> Using the extant isIndirectBranch flag would be best, but even
>> adding this sort of target hook would be somewhat ok.  At least this
>> would be a property of the architecture.  If we can avoid it, I'd
>> definitely prefer to of course.
>
> The isIndirectBranch flag would not allow us to distinguish jump table
> branches.
>
>>
>> Dan wrote:
>>> However I don't know what the big problem is
>>> with the target hook; it's neither arbitrary nor unitless. Perhaps
>>> what he really meant is that the present use for the hook is too
>>> obscure to justify having the clutter of the hook in  
>>> TargetInstrInfo,
>>> but I'm just guessing here.
>>
>> What is the units that it returns?  As a target author, how do I
>> know to return +2 or +4 or +1234?
>
> It returns the maximum number of instructions in a basic block that
> will be tail duplicated.
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits