[llvm-commits] Optimization for SIGN_EXTEND on AVX and AVX2 - please review

Thu Jan 17 00:12:09 PST 2013

Hi Duncan,

Please look if the following comment gives the proper explanation.

  // Folding (sext (sext x)) is obvious, but we do it only after the type 
  // legalization phase. When the sequence is like {(T1->T2), (T2->T3)} and 
  // T1 or T3 (or the both) are illegal types, the TypeLegalizer may not 
  // give a good sequence for the (T1->T3) pair.
  // So we give a chance to target specific combiner to optimize T1->T2 and T2->T3
  // separately and may be fold them in a preceding of subsequent instruction.

Thanks,
- Elena

-----Original Message-----
From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of Duncan Sands
Sent: Thursday, January 17, 2013 09:26
To: Demikhovsky, Elena
Cc: Nadav Rotem; llvm-commits at cs.uiuc.edu
Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and AVX2 - please review

Hi Elena,

On 17/01/13 08:03, Demikhovsky, Elena wrote:
> I optimize SEXT v8i8 -> v8i64, v8i8->v8i32, v4i8 -> v4i64, v4i16->v4i64 for AVX and AVX2.
>
> I don't like the second patch. I prefer the first one.

the first patch should at least get a big fat comment explaining why you are only doing the ext combine after type legalization.

Ciao, Duncan.

>
> Bug 14865.
>
>
> - Elena
>
> -----Original Message-----
> From: Nadav Rotem [mailto:nrotem at apple.com]
> Sent: Wednesday, January 16, 2013 23:01
> To: Demikhovsky, Elena
> Cc: Duncan Sands; llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and 
> AVX2 - please review
>
> Hi Elena and Duncan,
>
> I understand that the type-legalizer scalarizes everything when it runs into illegal types, and if we want to support SEXT of illegal types (such as 4 x i8) then we need to have a pre-type-legalizer dag-combine optimization.  Okay.  But I prefer that we don't call Lower_XXX from within the DAGCombiner.  I know that (thanks to the work of Elena) most of our SEXT patterns are optimized.  I prefer not to optimize uncommon sequences such as <8 x i8> to <8 x i64> if it means that we add more complexity. Keep in mind that DAG-combine optimizations are NOT automatically detected by the cost model (which uses TLI), and every dag-combine optimization needs to be recorded in the cost-model tables.  Elena, which patterns are improved by this patch ?
>
> Thanks,
> Nadav
>
>
>
> On Jan 16, 2013, at 2:24 AM, "Demikhovsky, Elena" <elena.demikhovsky at intel.com> wrote:
>
>> I tried what you ask now in spite of LowerXX() functions usually intended for LowerOperation stage.
>> I still need to remove SIGN_EXTEND joining in getNode().
>>
>>
>>
>> - Elena
>>
>> -----Original Message-----
>> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of 
>> Duncan Sands
>> Sent: Wednesday, January 16, 2013 10:14
>> To: Demikhovsky, Elena
>> Cc: llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
>> AVX2 - please review
>>
>> Hi Elena,
>>
>> On 15/01/13 14:32, Demikhovsky, Elena wrote:
>>> v8i16 -VPMOVSXWD> v8i32 works only for AVX2. On AVX machine I have only v8i16 -VPMOVSXWD> v4i32.
>>>
>>> The logic of shuffles is implemented in LowerSIGN_EXTEND().
>>
>> can't you call LowerSIGN_EXTEND directly then?
>>
>> Ciao, Duncan.
>>
>>>
>>> - Elena
>>>
>>> -----Original Message-----
>>> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of 
>>> Duncan Sands
>>> Sent: Tuesday, January 15, 2013 15:24
>>> To: Demikhovsky, Elena
>>> Cc: llvm-commits at cs.uiuc.edu
>>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
>>> AVX2 - please review
>>>
>>> Hi Elena,
>>>
>>> On 15/01/13 14:09, Demikhovsky, Elena wrote:
>>>>> X86 turns it into v8i8 -> v8i16 ->v8i32 ->v8i64
>>>>
>>>> DAGCombiner takes the last v8i32 ->v8i64, combines with v8i16 ->v8i32. Than it sees a new node again. Takes v8i16 -> v8i64 and combines with v8i8 -> v8i16.
>>>> And I have an endless loop that splits and joins SEXT nodes.
>>>> I looked for a "delimiter" node that can be inserted between nodes will disappear at the end, but I did not find such.
>>>
>>> thanks for explaining.  Maybe you can generate
>>>     v8i8 -> v8i16 -VPMOVSXWD> v8i32 ->v8i64 i.e. directly use a target instruction in the middle rather than an sext?
>>>
>>> Ciao, Duncan.
>>>
>>>>
>>>> - Elena
>>>>
>>>> -----Original Message-----
>>>> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of 
>>>> Duncan Sands
>>>> Sent: Tuesday, January 15, 2013 14:49
>>>> To: Demikhovsky, Elena
>>>> Cc: llvm-commits at cs.uiuc.edu
>>>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
>>>> AVX2 - please review
>>>>
>>>> Hi Elena,
>>>>
>>>> On 15/01/13 13:42, Demikhovsky, Elena wrote:
>>>>>> why don't you have the X86 target turn this immediately into a VPMOVSXBW, VPMOVSXWD, VPMOVSXDQ combination, rather than going through all of these intermediate steps?
>>>>>
>>>>> I already have the code that does the work in LowerSIGN_EXTEND(). It is a little bit more than just VPMOVSX.
>>>>> Combining load with "sext v8i8 -> v8i16" happens in the common DAGCombine. I don't want to duplicate it too.
>>>>
>>>> OK, I misunderstood the role of the load at first.
>>>>
>>>>>
>>>>>> I don't like turning off generally useful transforms like sext(sext)->sext.
>>>>> I don't turn it off, just do it after type legalizer.
>>>>
>>>> I still didn't understand why this is needed.  I'm imagining this:
>>>> the DAG combiner runs, and calls the X86 combiner on v8i8 - > v8i64.
>>>> X86 turns it into v8i8 -> v8i16 ->v8i32 ->v8i64.  As the DAG/X86 
>>>> combiner runs until there is no more work to do, the DAG combiner 
>>>> should then combine the load with the v8i8
>>>> -> v8i16 and so on until you only have the target instructions you 
>>>> -> described
>>>> (VPMOVSXBW etc).  So no-one outside of the DAG combiner, in particular the type legalizer, should ever see the intermediate v8i8 -> v8i16 ->v8i32 ->v8i64 form.
>>>> Is this not how it happens?
>>>>
>>>> Ciao, Duncan.
>>>>
>>>>>
>>>>> - Elena
>>>>>
>>>>> -----Original Message-----
>>>>> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of 
>>>>> Duncan Sands
>>>>> Sent: Tuesday, January 15, 2013 14:26
>>>>> To: Demikhovsky, Elena
>>>>> Cc: llvm-commits at cs.uiuc.edu
>>>>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX 
>>>>> and
>>>>> AVX2 - please review
>>>>>
>>>>> Hi Elena,
>>>>>
>>>>> On 15/01/13 13:21, Demikhovsky, Elena wrote:
>>>>>> I put 2 or 3 sequential sign-extend operations.
>>>>>>
>>>>>> original: v8i8 - > v8i64
>>>>>
>>>>> why don't you have the X86 target turn this immediately into a VPMOVSXBW, VPMOVSXWD, VPMOVSXDQ combination, rather than going through all of these intermediate steps?
>>>>>
>>>>> I don't like turning off generally useful transforms like sext(sext)->sext.
>>>>>
>>>>> Ciao, Duncan.
>>>>>
>>>>>> new:      v8i8 -> v8i16 ->v8i32 ->v8i64
>>>>>>
>>>>>> I combine load with v8i8 -> v8i16 and use VPMOVSXBW instruction.
>>>>>> I translate v8i16 ->v8i32 to 2 VPMOVSXWD instructions.
>>>>>> And then I have 4 VPMOVSXDQ for  v8i32 ->v8i64.
>>>>>>
>>>>>> if I work with one SIGN_EXTEND operation with illegal types, I receive a scalar code after type legalizer.
>>>>>>
>>>>>> - Elena
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: llvm-commits-bounces at cs.uiuc.edu 
>>>>>> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Duncan 
>>>>>> Sands
>>>>>> Sent: Tuesday, January 15, 2013 13:58
>>>>>> To: llvm-commits at cs.uiuc.edu
>>>>>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX 
>>>>>> and
>>>>>> AVX2 - please review
>>>>>>
>>>>>> Hi Elena,
>>>>>>
>>>>>> On 15/01/13 12:47, Demikhovsky, Elena wrote:
>>>>>>> I optimized the following SEXT pairs:
>>>>>>> v8i8 -> v8i64
>>>>>>> v8i8->v8i32
>>>>>>> v4i8 ->v4i64
>>>>>>> v4i16->v4i64
>>>>>>
>>>>>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>>>>> ===================================================================
>>>>>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(revision 172422)
>>>>>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(working copy)
>>>>>>> @@ -4298,12 +4298,13 @@
>>>>>>>        if (isa<ConstantSDNode>(N0))
>>>>>>>          return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(), 
>>>>>>> VT, N0);
>>>>>>>
>>>>>>> -  // fold (sext (sext x)) -> (sext x)
>>>>>>> -  // fold (sext (aext x)) -> (sext x)
>>>>>>> -  if (N0.getOpcode() == ISD::SIGN_EXTEND || N0.getOpcode() == ISD::ANY_EXTEND)
>>>>>>> -    return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(), VT,
>>>>>>> -                       N0.getOperand(0));
>>>>>>> -
>>>>>>> +  if (Level >= AfterLegalizeTypes) {
>>>>>>> +    // fold (sext (sext x)) -> (sext x)
>>>>>>> +    // fold (sext (aext x)) -> (sext x)
>>>>>>> +    if (N0.getOpcode() == ISD::SIGN_EXTEND || N0.getOpcode() == ISD::ANY_EXTEND)
>>>>>>> +      return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(), VT,
>>>>>>> +                         N0.getOperand(0));  }
>>>>>>
>>>>>> why is this needed?  As the transform doesn't introduce any illegal types that weren't there already it should be safe to perform at any level.
>>>>>>
>>>>>> If you are worried that it might create an illegal operation, then that is what you should check for, not the Level.
>>>>>>
>>>>>>>        if (N0.getOpcode() == ISD::TRUNCATE) {
>>>>>>>          // fold (sext (truncate (load x))) -> (sext (smaller load x))
>>>>>>>          // fold (sext (truncate (srl (load x), c))) -> (sext 
>>>>>>> (smaller load (x+c/n)))
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(revision 172422)
>>>>>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(working copy)
>>>>>>> @@ -2554,9 +2554,7 @@
>>>>>>>                  VT.getVectorNumElements() ==
>>>>>>>                  Operand.getValueType().getVectorNumElements()) &&
>>>>>>>                 "Vector element count mismatch!");
>>>>>>> -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode == ISD::ZERO_EXTEND)
>>>>>>> -      return getNode(OpOpcode, DL, VT, Operand.getNode()->getOperand(0));
>>>>>>> -    else if (OpOpcode == ISD::UNDEF)
>>>>>>> +    if (OpOpcode == ISD::UNDEF)
>>>>>>>            // sext(undef) = 0, because the top bits will all be the same.
>>>>>>>            return getConstant(0, VT);
>>>>>>>          break;
>>>>>>
>>>>>> Likewise, why are you getting rid of this folding?
>>>>>>
>>>>>> Ciao, Duncan.
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>> -----------------------------------------------------------------
>>>>>> -
>>>>>> -
>>>>>> -
>>>>>> -
>>>>>> Intel Israel (74) Limited
>>>>>>
>>>>>> This e-mail and any attachments may contain confidential material 
>>>>>> for the sole use of the intended recipient(s). Any review or 
>>>>>> distribution by others is strictly prohibited. If you are not the 
>>>>>> intended recipient, please contact the sender and delete all copies.
>>>>>>
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> -
>>>>> Intel Israel (74) Limited
>>>>>
>>>>> This e-mail and any attachments may contain confidential material 
>>>>> for the sole use of the intended recipient(s). Any review or 
>>>>> distribution by others is strictly prohibited. If you are not the 
>>>>> intended recipient, please contact the sender and delete all copies.
>>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> -
>>>> -
>>>> Intel Israel (74) Limited
>>>>
>>>> This e-mail and any attachments may contain confidential material 
>>>> for the sole use of the intended recipient(s). Any review or 
>>>> distribution by others is strictly prohibited. If you are not the 
>>>> intended recipient, please contact the sender and delete all copies.
>>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> Intel Israel (74) Limited
>>>
>>> This e-mail and any attachments may contain confidential material 
>>> for the sole use of the intended recipient(s). Any review or 
>>> distribution by others is strictly prohibited. If you are not the 
>>> intended recipient, please contact the sender and delete all copies.
>>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for 
>> the sole use of the intended recipient(s). Any review or distribution 
>> by others is strictly prohibited. If you are not the intended 
>> recipient, please contact the sender and delete all copies.
>> <sext_3.diff>_______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for 
> the sole use of the intended recipient(s). Any review or distribution 
> by others is strictly prohibited. If you are not the intended 
> recipient, please contact the sender and delete all copies.
>

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.