[llvm-commits] Optimization for SIGN_EXTEND on AVX and AVX2 - please review

Tue Jan 15 23:38:40 PST 2013

Duncan, Nadav,

Can I commit this patch? Or DAGCombine changes seem weird?

I'll try to explain a theory under this change.

we have a chain

{ illegal type node (1) } ->{ EXTEND to  legal type (2) } -> { EXTEND to  legal type (3) } -> { EXTEND to  illegal type (4) }

Type legalizer works better with (1) -> (2) and (3) -> (4) because one type in each pair is legal.
The operation (2)->(3) may have "Custom" legalization and be replaced with an optimal sequence for a given target.

If after type legalizer we still have a chain 
->{ EXTEND to  legal type (X) } -> { EXTEND to  legal type (Y) }->{ EXTEND to  legal type (Z) }
it still can be combined to 
->{ EXTEND to  legal type (X) } ->{ EXTEND to  legal type (Z) }

- Elena

-----Original Message-----
From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Demikhovsky, Elena
Sent: Tuesday, January 15, 2013 15:32
To: Duncan Sands
Cc: llvm-commits at cs.uiuc.edu
Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and AVX2 - please review

v8i16 -VPMOVSXWD> v8i32 works only for AVX2. On AVX machine I have only v8i16 -VPMOVSXWD> v4i32.

The logic of shuffles is implemented in LowerSIGN_EXTEND().

- Elena

-----Original Message-----
From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of Duncan Sands
Sent: Tuesday, January 15, 2013 15:24
To: Demikhovsky, Elena
Cc: llvm-commits at cs.uiuc.edu
Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and AVX2 - please review

Hi Elena,

On 15/01/13 14:09, Demikhovsky, Elena wrote:
>> X86 turns it into v8i8 -> v8i16 ->v8i32 ->v8i64
>
> DAGCombiner takes the last v8i32 ->v8i64, combines with v8i16 ->v8i32. Than it sees a new node again. Takes v8i16 -> v8i64 and combines with v8i8 -> v8i16.
> And I have an endless loop that splits and joins SEXT nodes.
> I looked for a "delimiter" node that can be inserted between nodes will disappear at the end, but I did not find such.

thanks for explaining.  Maybe you can generate
   v8i8 -> v8i16 -VPMOVSXWD> v8i32 ->v8i64 i.e. directly use a target instruction in the middle rather than an sext?

Ciao, Duncan.

>
> - Elena
>
> -----Original Message-----
> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of Duncan 
> Sands
> Sent: Tuesday, January 15, 2013 14:49
> To: Demikhovsky, Elena
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
> AVX2 - please review
>
> Hi Elena,
>
> On 15/01/13 13:42, Demikhovsky, Elena wrote:
>>> why don't you have the X86 target turn this immediately into a VPMOVSXBW, VPMOVSXWD, VPMOVSXDQ combination, rather than going through all of these intermediate steps?
>>
>> I already have the code that does the work in LowerSIGN_EXTEND(). It is a little bit more than just VPMOVSX.
>> Combining load with "sext v8i8 -> v8i16" happens in the common DAGCombine. I don't want to duplicate it too.
>
> OK, I misunderstood the role of the load at first.
>
>>
>>> I don't like turning off generally useful transforms like sext(sext)->sext.
>> I don't turn it off, just do it after type legalizer.
>
> I still didn't understand why this is needed.  I'm imagining this: the 
> DAG combiner runs, and calls the X86 combiner on v8i8 - > v8i64.  X86 
> turns it into v8i8 -> v8i16 ->v8i32 ->v8i64.  As the DAG/X86 combiner 
> runs until there is no more work to do, the DAG combiner should then 
> combine the load with the v8i8
> -> v8i16 and so on until you only have the target instructions you 
> -> described
> (VPMOVSXBW etc).  So no-one outside of the DAG combiner, in particular the type legalizer, should ever see the intermediate v8i8 -> v8i16 ->v8i32 ->v8i64 form.
> Is this not how it happens?
>
> Ciao, Duncan.
>
>>
>> - Elena
>>
>> -----Original Message-----
>> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of 
>> Duncan Sands
>> Sent: Tuesday, January 15, 2013 14:26
>> To: Demikhovsky, Elena
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
>> AVX2 - please review
>>
>> Hi Elena,
>>
>> On 15/01/13 13:21, Demikhovsky, Elena wrote:
>>> I put 2 or 3 sequential sign-extend operations.
>>>
>>> original: v8i8 - > v8i64
>>
>> why don't you have the X86 target turn this immediately into a VPMOVSXBW, VPMOVSXWD, VPMOVSXDQ combination, rather than going through all of these intermediate steps?
>>
>> I don't like turning off generally useful transforms like sext(sext)->sext.
>>
>> Ciao, Duncan.
>>
>>> new:      v8i8 -> v8i16 ->v8i32 ->v8i64
>>>
>>> I combine load with v8i8 -> v8i16 and use VPMOVSXBW instruction.
>>> I translate v8i16 ->v8i32 to 2 VPMOVSXWD instructions.
>>> And then I have 4 VPMOVSXDQ for  v8i32 ->v8i64.
>>>
>>> if I work with one SIGN_EXTEND operation with illegal types, I receive a scalar code after type legalizer.
>>>
>>> - Elena
>>>
>>> -----Original Message-----
>>> From: llvm-commits-bounces at cs.uiuc.edu 
>>> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands
>>> Sent: Tuesday, January 15, 2013 13:58
>>> To: llvm-commits at cs.uiuc.edu
>>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
>>> AVX2 - please review
>>>
>>> Hi Elena,
>>>
>>> On 15/01/13 12:47, Demikhovsky, Elena wrote:
>>>> I optimized the following SEXT pairs:
>>>> v8i8 -> v8i64
>>>> v8i8->v8i32
>>>> v4i8 ->v4i64
>>>> v4i16->v4i64
>>>
>>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>> ===================================================================
>>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(revision 172422)
>>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(working copy)
>>>> @@ -4298,12 +4298,13 @@
>>>>       if (isa<ConstantSDNode>(N0))
>>>>         return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(), VT, 
>>>> N0);
>>>>
>>>> -  // fold (sext (sext x)) -> (sext x)
>>>> -  // fold (sext (aext x)) -> (sext x)
>>>> -  if (N0.getOpcode() == ISD::SIGN_EXTEND || N0.getOpcode() == ISD::ANY_EXTEND)
>>>> -    return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(), VT,
>>>> -                       N0.getOperand(0));
>>>> -
>>>> +  if (Level >= AfterLegalizeTypes) {
>>>> +    // fold (sext (sext x)) -> (sext x)
>>>> +    // fold (sext (aext x)) -> (sext x)
>>>> +    if (N0.getOpcode() == ISD::SIGN_EXTEND || N0.getOpcode() == ISD::ANY_EXTEND)
>>>> +      return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(), VT,
>>>> +                         N0.getOperand(0));  }
>>>
>>> why is this needed?  As the transform doesn't introduce any illegal types that weren't there already it should be safe to perform at any level.
>>>
>>> If you are worried that it might create an illegal operation, then that is what you should check for, not the Level.
>>>
>>>>       if (N0.getOpcode() == ISD::TRUNCATE) {
>>>>         // fold (sext (truncate (load x))) -> (sext (smaller load x))
>>>>         // fold (sext (truncate (srl (load x), c))) -> (sext 
>>>> (smaller load (x+c/n)))
>>>
>>> ...
>>>
>>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(revision 172422)
>>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(working copy)
>>>> @@ -2554,9 +2554,7 @@
>>>>                 VT.getVectorNumElements() ==
>>>>                 Operand.getValueType().getVectorNumElements()) &&
>>>>                "Vector element count mismatch!");
>>>> -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode == ISD::ZERO_EXTEND)
>>>> -      return getNode(OpOpcode, DL, VT, Operand.getNode()->getOperand(0));
>>>> -    else if (OpOpcode == ISD::UNDEF)
>>>> +    if (OpOpcode == ISD::UNDEF)
>>>>           // sext(undef) = 0, because the top bits will all be the same.
>>>>           return getConstant(0, VT);
>>>>         break;
>>>
>>> Likewise, why are you getting rid of this folding?
>>>
>>> Ciao, Duncan.
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> --------------------------------------------------------------------
>>> -
>>> Intel Israel (74) Limited
>>>
>>> This e-mail and any attachments may contain confidential material 
>>> for the sole use of the intended recipient(s). Any review or 
>>> distribution by others is strictly prohibited. If you are not the 
>>> intended recipient, please contact the sender and delete all copies.
>>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for 
>> the sole use of the intended recipient(s). Any review or distribution 
>> by others is strictly prohibited. If you are not the intended 
>> recipient, please contact the sender and delete all copies.
>>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
>

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.

_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.