[llvm-commits] [llvm] r89187 - in /llvm/trunk: include/llvm/Target/TargetInstrInfo.h lib/CodeGen/BranchFolding.cpp lib/Target/ARM/ARMBaseInstrInfo.cpp lib/Target/ARM/ARMBaseInstrInfo.h lib/Target/ARM/ARMSubtarget.cpp lib/Target/ARM/ARMSubtarget.h

Thu Nov 19 22:28:25 PST 2009

On Nov 19, 2009, at 10:11 AM, Bob Wilson wrote:
> On Nov 19, 2009, at 9:35 AM, Chris Lattner wrote:
>> Apologies for the last email being rambling, and in advance for this one also being rambling. :)
>> 
>> My basic perspective is that I want to sink as much logic into the target-independent code as possible and have the fewest number of target hooks reasonable.  This is for three reasons:
>> 
>> 1. Porting to a new target is easier with fewer target hooks, and they get better codegen 'out of the box' by specifying primitives instead of having to tweak tons of knobs.
> 
> I disagree.  It is good to have knobs to tweak.  As long as the defaults are set to values that work well for most targets, having more knobs does not make it harder to port to a new target.  It does increase the overall complexity and documentation burden, though.

On general principle, I disagree.  "Knobs" add to maintenance burden, documentation, and learning the code base.  If we can get the same effect with fewer knobs, we're better off.

>> I much prefer very simple hooks that return an _aspect of the micro architecture_ rather than having hooks that _implement pieces of codegen passes_.  The later is the approach that GCC took and is one of the major reasons it is difficult to change the algorithms in the GCC backend.
> 
> Doesn't that go against your desire to have fewer target hooks?  I tried to make the tail duplication hook very general, since it seems conceivable that we may want to adjust the cost metric based on other characteristics of the target.  If we add very specific target hooks, we may end up with more of them.

While I don't like things like isMoveInstr, I'm not opposed to them either.  While they are technically target hooks (and if you don't implement them, you get atrocious code!) isMoveInstr is well defined and exposes an aspect of the architecture to the target independent parts of the code.   I guess what I really don't like is target hooks that implement pieces of algorithms (of which I would like to have zero).  Beyond that I have a general goal of reducing target hooks, but the goal is to keep them to a reasonable minimum, not eliminate them completely.

If a new target hook adds value and describes the architecture, I'm not opposed to it.  Particularly if it could be derived from the .td files some day (like isMoveInstr).

>> 2. A single and well tuned implementation of something in target independent code is better than having a few well tuned impls in some targets and missing or just wrong impls in others.  For example, this xform is definitely profitable on X86 chips as well, probably many others.
> 
> Right.  There is a single implementation of tail duplication here.  It's just a question of tuning it for different targets.  It is not possible (in general) to have it tuned ideally for all targets.  For that matter, we haven't really tuned it for ARM yet.  I've started looking at trying it for X86.

Sure, but the hook you implemented for ARM is not well tuned yet, and will grow in complexity as/if it does.

>> 
>>>> (And, at least for duplicating
>>>> indirect branches on ARM Cortex processors, we don't want to limit
>>>> that.)
>> 
>> Why not??
> 
> Because it is such a dramatic performance win for ARM Cortex, and perhaps more importantly, I can't think of a way to limit it that might not also miss cases where we really want to get the performance.  I don't expect indirect branches to be so common that the code size matters.

I don't really buy this.  Are you really claiming that duplicating a 10K instruction basic block is worth it?  In reality there has to be a balance, even for ARM.  This is also likely to be a huge win for X86 but this is just like jump threading: while eliminating correlated branches is *always* a win from the dynamic instruction count perspective, we balance the benefit with the code size cost.  I don't see how this case is any different.

>>>> On a small low-power device without sophisticated branch
>>>> prediction, where code size typically matters more than usual, we'll
>>>> be doing the wrong thing.  But, indirect branches are not very common,
>>>> so it probably doesn't really matter so much.
>> 
>> They are fairly common in switch statements that codegen to jump tables, but not anyway near as common as normal branches.
> 
> For ARM we currently use different instruction patterns for jump tables and indirect branches.  So, the special case we're doing for tail duplication does not apply to jump tables.  We might have to be more conservative if we can't distinguish those on other targets.

In practice, must jump table indirect gotos are preceded by a conditional branch that checks the "range" of the table anyway, so it won't matter.  However, if that weren't the case, this optimization would be just as useful for switches as indbr's.  Ideally the same code *should* apply to both.

>>>> Unless someone else has another idea, I'll get rid of the tail
>>>> duplication target hook.  As you mention, we'll need a way to identify
>>>> indirect branches.  I'd prefer to add a new IsIndirectBranch target
>>>> hook.  This goes against your desire to avoid new target hooks, but
>>>> it's nice and simple.
>> 
>> Using the extant isIndirectBranch flag would be best, but even adding this sort of target hook would be somewhat ok.  At least this would be a property of the architecture.  If we can avoid it, I'd definitely prefer to of course.
> 
> The isIndirectBranch flag would not allow us to distinguish jump table branches.

I don't think we want to :).  Why do we want to?

>> Dan wrote:
>>> However I don't know what the big problem is
>>> with the target hook; it's neither arbitrary nor unitless. Perhaps
>>> what he really meant is that the present use for the hook is too
>>> obscure to justify having the clutter of the hook in TargetInstrInfo,
>>> but I'm just guessing here.
>> 
>> What is the units that it returns?  As a target author, how do I know to return +2 or +4 or +1234?
> 
> It returns the maximum number of instructions in a basic block that will be tail duplicated.

And why is +2 the "right" amount?  Because it happens to be enough to get one particular testcase that you care about, or because of some fundamental property of the architecture?

-Chris

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20091119/46eee448/attachment.html>