[llvm-commits] Optimization for SIGN_EXTEND on AVX and AVX2 - please review

Duncan Sands baldrick at free.fr
Tue Jan 15 05:24:21 PST 2013


Hi Elena,

On 15/01/13 14:09, Demikhovsky, Elena wrote:
>> X86 turns it into v8i8 -> v8i16 ->v8i32 ->v8i64
>
> DAGCombiner takes the last v8i32 ->v8i64, combines with v8i16 ->v8i32. Than it sees a new node again. Takes v8i16 -> v8i64 and combines with v8i8 -> v8i16.
> And I have an endless loop that splits and joins SEXT nodes.
> I looked for a "delimiter" node that can be inserted between nodes will disappear at the end, but I did not find such.

thanks for explaining.  Maybe you can generate
   v8i8 -> v8i16 -VPMOVSXWD> v8i32 ->v8i64
i.e. directly use a target instruction in the middle rather than an sext?

Ciao, Duncan.

>
> - Elena
>
> -----Original Message-----
> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of Duncan Sands
> Sent: Tuesday, January 15, 2013 14:49
> To: Demikhovsky, Elena
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and AVX2 - please review
>
> Hi Elena,
>
> On 15/01/13 13:42, Demikhovsky, Elena wrote:
>>> why don't you have the X86 target turn this immediately into a VPMOVSXBW, VPMOVSXWD, VPMOVSXDQ combination, rather than going through all of these intermediate steps?
>>
>> I already have the code that does the work in LowerSIGN_EXTEND(). It is a little bit more than just VPMOVSX.
>> Combining load with "sext v8i8 -> v8i16" happens in the common DAGCombine. I don't want to duplicate it too.
>
> OK, I misunderstood the role of the load at first.
>
>>
>>> I don't like turning off generally useful transforms like sext(sext)->sext.
>> I don't turn it off, just do it after type legalizer.
>
> I still didn't understand why this is needed.  I'm imagining this: the DAG combiner runs, and calls the X86 combiner on v8i8 - > v8i64.  X86 turns it into v8i8 -> v8i16 ->v8i32 ->v8i64.  As the DAG/X86 combiner runs until there is no more work to do, the DAG combiner should then combine the load with the v8i8
> -> v8i16 and so on until you only have the target instructions you
> -> described
> (VPMOVSXBW etc).  So no-one outside of the DAG combiner, in particular the type legalizer, should ever see the intermediate v8i8 -> v8i16 ->v8i32 ->v8i64 form.
> Is this not how it happens?
>
> Ciao, Duncan.
>
>>
>> - Elena
>>
>> -----Original Message-----
>> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of Duncan
>> Sands
>> Sent: Tuesday, January 15, 2013 14:26
>> To: Demikhovsky, Elena
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
>> AVX2 - please review
>>
>> Hi Elena,
>>
>> On 15/01/13 13:21, Demikhovsky, Elena wrote:
>>> I put 2 or 3 sequential sign-extend operations.
>>>
>>> original: v8i8 - > v8i64
>>
>> why don't you have the X86 target turn this immediately into a VPMOVSXBW, VPMOVSXWD, VPMOVSXDQ combination, rather than going through all of these intermediate steps?
>>
>> I don't like turning off generally useful transforms like sext(sext)->sext.
>>
>> Ciao, Duncan.
>>
>>> new:      v8i8 -> v8i16 ->v8i32 ->v8i64
>>>
>>> I combine load with v8i8 -> v8i16 and use VPMOVSXBW instruction.
>>> I translate v8i16 ->v8i32 to 2 VPMOVSXWD instructions.
>>> And then I have 4 VPMOVSXDQ for  v8i32 ->v8i64.
>>>
>>> if I work with one SIGN_EXTEND operation with illegal types, I receive a scalar code after type legalizer.
>>>
>>> - Elena
>>>
>>> -----Original Message-----
>>> From: llvm-commits-bounces at cs.uiuc.edu
>>> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Duncan Sands
>>> Sent: Tuesday, January 15, 2013 13:58
>>> To: llvm-commits at cs.uiuc.edu
>>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
>>> AVX2 - please review
>>>
>>> Hi Elena,
>>>
>>> On 15/01/13 12:47, Demikhovsky, Elena wrote:
>>>> I optimized the following SEXT pairs:
>>>> v8i8 -> v8i64
>>>> v8i8->v8i32
>>>> v4i8 ->v4i64
>>>> v4i16->v4i64
>>>
>>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>> ===================================================================
>>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(revision 172422)
>>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(working copy)
>>>> @@ -4298,12 +4298,13 @@
>>>>       if (isa<ConstantSDNode>(N0))
>>>>         return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(), VT,
>>>> N0);
>>>>
>>>> -  // fold (sext (sext x)) -> (sext x)
>>>> -  // fold (sext (aext x)) -> (sext x)
>>>> -  if (N0.getOpcode() == ISD::SIGN_EXTEND || N0.getOpcode() == ISD::ANY_EXTEND)
>>>> -    return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(), VT,
>>>> -                       N0.getOperand(0));
>>>> -
>>>> +  if (Level >= AfterLegalizeTypes) {
>>>> +    // fold (sext (sext x)) -> (sext x)
>>>> +    // fold (sext (aext x)) -> (sext x)
>>>> +    if (N0.getOpcode() == ISD::SIGN_EXTEND || N0.getOpcode() == ISD::ANY_EXTEND)
>>>> +      return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(), VT,
>>>> +                         N0.getOperand(0));  }
>>>
>>> why is this needed?  As the transform doesn't introduce any illegal types that weren't there already it should be safe to perform at any level.
>>>
>>> If you are worried that it might create an illegal operation, then that is what you should check for, not the Level.
>>>
>>>>       if (N0.getOpcode() == ISD::TRUNCATE) {
>>>>         // fold (sext (truncate (load x))) -> (sext (smaller load x))
>>>>         // fold (sext (truncate (srl (load x), c))) -> (sext (smaller
>>>> load (x+c/n)))
>>>
>>> ...
>>>
>>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(revision 172422)
>>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(working copy)
>>>> @@ -2554,9 +2554,7 @@
>>>>                 VT.getVectorNumElements() ==
>>>>                 Operand.getValueType().getVectorNumElements()) &&
>>>>                "Vector element count mismatch!");
>>>> -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode == ISD::ZERO_EXTEND)
>>>> -      return getNode(OpOpcode, DL, VT, Operand.getNode()->getOperand(0));
>>>> -    else if (OpOpcode == ISD::UNDEF)
>>>> +    if (OpOpcode == ISD::UNDEF)
>>>>           // sext(undef) = 0, because the top bits will all be the same.
>>>>           return getConstant(0, VT);
>>>>         break;
>>>
>>> Likewise, why are you getting rid of this folding?
>>>
>>> Ciao, Duncan.
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> ---------------------------------------------------------------------
>>> Intel Israel (74) Limited
>>>
>>> This e-mail and any attachments may contain confidential material for
>>> the sole use of the intended recipient(s). Any review or distribution
>>> by others is strictly prohibited. If you are not the intended
>>> recipient, please contact the sender and delete all copies.
>>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>




More information about the llvm-commits mailing list