[llvm-commits] Optimization for SIGN_EXTEND on AVX and AVX2 - please review

Demikhovsky, Elena elena.demikhovsky at intel.com
Wed Jan 16 02:24:14 PST 2013


I tried what you ask now in spite of LowerXX() functions usually intended for LowerOperation stage. 
I still need to remove SIGN_EXTEND joining in getNode().



- Elena

-----Original Message-----
From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of Duncan Sands
Sent: Wednesday, January 16, 2013 10:14
To: Demikhovsky, Elena
Cc: llvm-commits at cs.uiuc.edu
Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and AVX2 - please review

Hi Elena,

On 15/01/13 14:32, Demikhovsky, Elena wrote:
> v8i16 -VPMOVSXWD> v8i32 works only for AVX2. On AVX machine I have only v8i16 -VPMOVSXWD> v4i32.
>
> The logic of shuffles is implemented in LowerSIGN_EXTEND().

can't you call LowerSIGN_EXTEND directly then?

Ciao, Duncan.

>
> - Elena
>
> -----Original Message-----
> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of Duncan 
> Sands
> Sent: Tuesday, January 15, 2013 15:24
> To: Demikhovsky, Elena
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and 
> AVX2 - please review
>
> Hi Elena,
>
> On 15/01/13 14:09, Demikhovsky, Elena wrote:
>>> X86 turns it into v8i8 -> v8i16 ->v8i32 ->v8i64
>>
>> DAGCombiner takes the last v8i32 ->v8i64, combines with v8i16 ->v8i32. Than it sees a new node again. Takes v8i16 -> v8i64 and combines with v8i8 -> v8i16.
>> And I have an endless loop that splits and joins SEXT nodes.
>> I looked for a "delimiter" node that can be inserted between nodes will disappear at the end, but I did not find such.
>
> thanks for explaining.  Maybe you can generate
>     v8i8 -> v8i16 -VPMOVSXWD> v8i32 ->v8i64 i.e. directly use a target instruction in the middle rather than an sext?
>
> Ciao, Duncan.
>
>>
>> - Elena
>>
>> -----Original Message-----
>> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of 
>> Duncan Sands
>> Sent: Tuesday, January 15, 2013 14:49
>> To: Demikhovsky, Elena
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
>> AVX2 - please review
>>
>> Hi Elena,
>>
>> On 15/01/13 13:42, Demikhovsky, Elena wrote:
>>>> why don't you have the X86 target turn this immediately into a VPMOVSXBW, VPMOVSXWD, VPMOVSXDQ combination, rather than going through all of these intermediate steps?
>>>
>>> I already have the code that does the work in LowerSIGN_EXTEND(). It is a little bit more than just VPMOVSX.
>>> Combining load with "sext v8i8 -> v8i16" happens in the common DAGCombine. I don't want to duplicate it too.
>>
>> OK, I misunderstood the role of the load at first.
>>
>>>
>>>> I don't like turning off generally useful transforms like sext(sext)->sext.
>>> I don't turn it off, just do it after type legalizer.
>>
>> I still didn't understand why this is needed.  I'm imagining this: 
>> the DAG combiner runs, and calls the X86 combiner on v8i8 - > v8i64.  
>> X86 turns it into v8i8 -> v8i16 ->v8i32 ->v8i64.  As the DAG/X86 
>> combiner runs until there is no more work to do, the DAG combiner 
>> should then combine the load with the v8i8
>> -> v8i16 and so on until you only have the target instructions you 
>> -> described
>> (VPMOVSXBW etc).  So no-one outside of the DAG combiner, in particular the type legalizer, should ever see the intermediate v8i8 -> v8i16 ->v8i32 ->v8i64 form.
>> Is this not how it happens?
>>
>> Ciao, Duncan.
>>
>>>
>>> - Elena
>>>
>>> -----Original Message-----
>>> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of 
>>> Duncan Sands
>>> Sent: Tuesday, January 15, 2013 14:26
>>> To: Demikhovsky, Elena
>>> Cc: llvm-commits at cs.uiuc.edu
>>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
>>> AVX2 - please review
>>>
>>> Hi Elena,
>>>
>>> On 15/01/13 13:21, Demikhovsky, Elena wrote:
>>>> I put 2 or 3 sequential sign-extend operations.
>>>>
>>>> original: v8i8 - > v8i64
>>>
>>> why don't you have the X86 target turn this immediately into a VPMOVSXBW, VPMOVSXWD, VPMOVSXDQ combination, rather than going through all of these intermediate steps?
>>>
>>> I don't like turning off generally useful transforms like sext(sext)->sext.
>>>
>>> Ciao, Duncan.
>>>
>>>> new:      v8i8 -> v8i16 ->v8i32 ->v8i64
>>>>
>>>> I combine load with v8i8 -> v8i16 and use VPMOVSXBW instruction.
>>>> I translate v8i16 ->v8i32 to 2 VPMOVSXWD instructions.
>>>> And then I have 4 VPMOVSXDQ for  v8i32 ->v8i64.
>>>>
>>>> if I work with one SIGN_EXTEND operation with illegal types, I receive a scalar code after type legalizer.
>>>>
>>>> - Elena
>>>>
>>>> -----Original Message-----
>>>> From: llvm-commits-bounces at cs.uiuc.edu 
>>>> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands
>>>> Sent: Tuesday, January 15, 2013 13:58
>>>> To: llvm-commits at cs.uiuc.edu
>>>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
>>>> AVX2 - please review
>>>>
>>>> Hi Elena,
>>>>
>>>> On 15/01/13 12:47, Demikhovsky, Elena wrote:
>>>>> I optimized the following SEXT pairs:
>>>>> v8i8 -> v8i64
>>>>> v8i8->v8i32
>>>>> v4i8 ->v4i64
>>>>> v4i16->v4i64
>>>>
>>>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>>> ===================================================================
>>>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(revision 172422)
>>>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(working copy)
>>>>> @@ -4298,12 +4298,13 @@
>>>>>        if (isa<ConstantSDNode>(N0))
>>>>>          return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(), 
>>>>> VT, N0);
>>>>>
>>>>> -  // fold (sext (sext x)) -> (sext x)
>>>>> -  // fold (sext (aext x)) -> (sext x)
>>>>> -  if (N0.getOpcode() == ISD::SIGN_EXTEND || N0.getOpcode() == ISD::ANY_EXTEND)
>>>>> -    return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(), VT,
>>>>> -                       N0.getOperand(0));
>>>>> -
>>>>> +  if (Level >= AfterLegalizeTypes) {
>>>>> +    // fold (sext (sext x)) -> (sext x)
>>>>> +    // fold (sext (aext x)) -> (sext x)
>>>>> +    if (N0.getOpcode() == ISD::SIGN_EXTEND || N0.getOpcode() == ISD::ANY_EXTEND)
>>>>> +      return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(), VT,
>>>>> +                         N0.getOperand(0));  }
>>>>
>>>> why is this needed?  As the transform doesn't introduce any illegal types that weren't there already it should be safe to perform at any level.
>>>>
>>>> If you are worried that it might create an illegal operation, then that is what you should check for, not the Level.
>>>>
>>>>>        if (N0.getOpcode() == ISD::TRUNCATE) {
>>>>>          // fold (sext (truncate (load x))) -> (sext (smaller load x))
>>>>>          // fold (sext (truncate (srl (load x), c))) -> (sext 
>>>>> (smaller load (x+c/n)))
>>>>
>>>> ...
>>>>
>>>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(revision 172422)
>>>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(working copy)
>>>>> @@ -2554,9 +2554,7 @@
>>>>>                  VT.getVectorNumElements() ==
>>>>>                  Operand.getValueType().getVectorNumElements()) &&
>>>>>                 "Vector element count mismatch!");
>>>>> -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode == ISD::ZERO_EXTEND)
>>>>> -      return getNode(OpOpcode, DL, VT, Operand.getNode()->getOperand(0));
>>>>> -    else if (OpOpcode == ISD::UNDEF)
>>>>> +    if (OpOpcode == ISD::UNDEF)
>>>>>            // sext(undef) = 0, because the top bits will all be the same.
>>>>>            return getConstant(0, VT);
>>>>>          break;
>>>>
>>>> Likewise, why are you getting rid of this folding?
>>>>
>>>> Ciao, Duncan.
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> Intel Israel (74) Limited
>>>>
>>>> This e-mail and any attachments may contain confidential material 
>>>> for the sole use of the intended recipient(s). Any review or 
>>>> distribution by others is strictly prohibited. If you are not the 
>>>> intended recipient, please contact the sender and delete all copies.
>>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> Intel Israel (74) Limited
>>>
>>> This e-mail and any attachments may contain confidential material 
>>> for the sole use of the intended recipient(s). Any review or 
>>> distribution by others is strictly prohibited. If you are not the 
>>> intended recipient, please contact the sender and delete all copies.
>>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for 
>> the sole use of the intended recipient(s). Any review or distribution 
>> by others is strictly prohibited. If you are not the intended 
>> recipient, please contact the sender and delete all copies.
>>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
>

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sext_3.diff
Type: application/octet-stream
Size: 9235 bytes
Desc: sext_3.diff
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130116/b9b41daa/attachment.obj>


More information about the llvm-commits mailing list