[llvm-commits] [llvm] r58426 - in /llvm/trunk: include/llvm/Target/TargetLowering.h lib/CodeGen/SelectionDAG/LegalizeDAG.cpp lib/CodeGen/SelectionDAG/LegalizeTypes.h lib/CodeGen/SelectionDAG/TargetLowering.cpp lib/Target/X86/X86ISelLowering.cpp lib/Target/X86/X86ISelLowering.h

Sat Nov 1 13:43:23 PDT 2008

Hi Duncan,

You bring up several good points.  With the DAG Combiner removing the  
unused values and since most machines uses a power of 2 vector length,  
the strategy of widening to a power of 2s when there is no legal wider  
vector size make sense and should generate superior results.   The  
widening code uses a similar strategy to chop up a vector to store  
it.  I'll make this change to how widening works.

   -- Mon Ping

On Oct 31, 2008, at 1:45 PM, Duncan Sands wrote:

> Hi Mon Ping,
>
>> So if we had vector of 3 elements and there is no valid 4 element
>> vector type, we should just split and not try the next power of 2 and
>> then split down to 4 operations.  The rational that I see for going  
>> to
>> the power of 2 is that is more likely that the split logic  (where we
>> split in 1/2) has a higher chance of finding the correct legal type.
>> The down side is that if we go to a power of 2, we might end up doing
>> work that doesn't benefit us.  Given the example above of v10i32, if
>> we promote to v16i32, we will break it down to 4 v4i32 where one of
>> the v4i32 is not useful.
>
> the pointless v4i32 should be eliminated by the DAG combiner.  It is
> similar to what happens for apints.  Consider i129 on a 32 bit  
> machine.
> This i129 is promoted to i256, which is then expanded successively,
> resulting in eight i32.  Only the first five are needed to cover the
> 129 original bits.  So doesn't the final code get littered with
> pointless operations on the last three i32?  In fact it does not, they
> are eliminated by the combiner if they aren't being used for anything
> useful, typically because they have no users, or because they contain
> undef.  I never actually saw pointless code due to this in the final
> assembler, not once.  I expect it will be the same for vectors, and
> if it is not probably that means the combiner needs to be made a  
> little
> smarter.
>
> So I think that widening v10i32 to v16i32 and then splitting would
> work well.  In fact I think we should just give up on the idea of
> splitting non-power-of-two vectors, and always widen them instead.
> The reason is that there are several operations which are really
> hard to split for non-power-of-two vectors.  LegalizeTypes handles
> more cases than LegalizeDAG but there are some where I just gave
> up and added an assertion.  This is again similar to the situation
> with integers: why promote to a power of two size when you could
> do uneven expansion (expand into two unequal parts)?  The answer
> is that expansion is harder than promotion, and some operations
> would be just too hard for uneven expansion.  The promote-to-power-
> of-two then expand (maybe many times) logic sidesteps all that and
> works great.  I expect it to work pretty well for vectors too.
> After all, integers are just a special case of vectors: vectors of
> i1, right :)
>
> Ciao,
>
> Duncan.