[llvm-commits] Optimization for SIGN_EXTEND on AVX and AVX2 - please review

Thu Jan 17 00:25:24 PST 2013

Hi Elena, this is OK.

Ciao, Duncan.

On 17/01/13 09:12, Demikhovsky, Elena wrote:
> Hi Duncan,
>
> Please look if the following comment gives the proper explanation.
>
>    // Folding (sext (sext x)) is obvious, but we do it only after the type
>    // legalization phase. When the sequence is like {(T1->T2), (T2->T3)} and
>    // T1 or T3 (or the both) are illegal types, the TypeLegalizer may not
>    // give a good sequence for the (T1->T3) pair.
>    // So we give a chance to target specific combiner to optimize T1->T2 and T2->T3
>    // separately and may be fold them in a preceding of subsequent instruction.
>
> Thanks,
> - Elena
>
> -----Original Message-----
> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of Duncan Sands
> Sent: Thursday, January 17, 2013 09:26
> To: Demikhovsky, Elena
> Cc: Nadav Rotem; llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and AVX2 - please review
>
> Hi Elena,
>
> On 17/01/13 08:03, Demikhovsky, Elena wrote:
>> I optimize SEXT v8i8 -> v8i64, v8i8->v8i32, v4i8 -> v4i64, v4i16->v4i64 for AVX and AVX2.
>>
>> I don't like the second patch. I prefer the first one.
>
> the first patch should at least get a big fat comment explaining why you are only doing the ext combine after type legalization.
>
> Ciao, Duncan.
>
>>
>> Bug 14865.
>>
>>
>> - Elena
>>
>> -----Original Message-----
>> From: Nadav Rotem [mailto:nrotem at apple.com]
>> Sent: Wednesday, January 16, 2013 23:01
>> To: Demikhovsky, Elena
>> Cc: Duncan Sands; llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
>> AVX2 - please review
>>
>> Hi Elena and Duncan,
>>
>> I understand that the type-legalizer scalarizes everything when it runs into illegal types, and if we want to support SEXT of illegal types (such as 4 x i8) then we need to have a pre-type-legalizer dag-combine optimization.  Okay.  But I prefer that we don't call Lower_XXX from within the DAGCombiner.  I know that (thanks to the work of Elena) most of our SEXT patterns are optimized.  I prefer not to optimize uncommon sequences such as <8 x i8> to <8 x i64> if it means that we add more complexity. Keep in mind that DAG-combine optimizations are NOT automatically detected by the cost model (which uses TLI), and every dag-combine optimization needs to be recorded in the cost-model tables.  Elena, which patterns are improved by this patch ?
>>
>> Thanks,
>> Nadav
>>
>>
>>
>> On Jan 16, 2013, at 2:24 AM, "Demikhovsky, Elena" <elena.demikhovsky at intel.com> wrote:
>>
>>> I tried what you ask now in spite of LowerXX() functions usually intended for LowerOperation stage.
>>> I still need to remove SIGN_EXTEND joining in getNode().
>>>
>>>
>>>
>>> - Elena
>>>
>>> -----Original Message-----
>>> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of
>>> Duncan Sands
>>> Sent: Wednesday, January 16, 2013 10:14
>>> To: Demikhovsky, Elena
>>> Cc: llvm-commits at cs.uiuc.edu
>>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
>>> AVX2 - please review
>>>
>>> Hi Elena,
>>>
>>> On 15/01/13 14:32, Demikhovsky, Elena wrote:
>>>> v8i16 -VPMOVSXWD> v8i32 works only for AVX2. On AVX machine I have only v8i16 -VPMOVSXWD> v4i32.
>>>>
>>>> The logic of shuffles is implemented in LowerSIGN_EXTEND().
>>>
>>> can't you call LowerSIGN_EXTEND directly then?
>>>
>>> Ciao, Duncan.
>>>
>>>>
>>>> - Elena
>>>>
>>>> -----Original Message-----
>>>> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of
>>>> Duncan Sands
>>>> Sent: Tuesday, January 15, 2013 15:24
>>>> To: Demikhovsky, Elena
>>>> Cc: llvm-commits at cs.uiuc.edu
>>>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
>>>> AVX2 - please review
>>>>
>>>> Hi Elena,
>>>>
>>>> On 15/01/13 14:09, Demikhovsky, Elena wrote:
>>>>>> X86 turns it into v8i8 -> v8i16 ->v8i32 ->v8i64
>>>>>
>>>>> DAGCombiner takes the last v8i32 ->v8i64, combines with v8i16 ->v8i32. Than it sees a new node again. Takes v8i16 -> v8i64 and combines with v8i8 -> v8i16.
>>>>> And I have an endless loop that splits and joins SEXT nodes.
>>>>> I looked for a "delimiter" node that can be inserted between nodes will disappear at the end, but I did not find such.
>>>>
>>>> thanks for explaining.  Maybe you can generate
>>>>      v8i8 -> v8i16 -VPMOVSXWD> v8i32 ->v8i64 i.e. directly use a target instruction in the middle rather than an sext?
>>>>
>>>> Ciao, Duncan.
>>>>
>>>>>
>>>>> - Elena
>>>>>
>>>>> -----Original Message-----
>>>>> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of
>>>>> Duncan Sands
>>>>> Sent: Tuesday, January 15, 2013 14:49
>>>>> To: Demikhovsky, Elena
>>>>> Cc: llvm-commits at cs.uiuc.edu
>>>>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX and
>>>>> AVX2 - please review
>>>>>
>>>>> Hi Elena,
>>>>>
>>>>> On 15/01/13 13:42, Demikhovsky, Elena wrote:
>>>>>>> why don't you have the X86 target turn this immediately into a VPMOVSXBW, VPMOVSXWD, VPMOVSXDQ combination, rather than going through all of these intermediate steps?
>>>>>>
>>>>>> I already have the code that does the work in LowerSIGN_EXTEND(). It is a little bit more than just VPMOVSX.
>>>>>> Combining load with "sext v8i8 -> v8i16" happens in the common DAGCombine. I don't want to duplicate it too.
>>>>>
>>>>> OK, I misunderstood the role of the load at first.
>>>>>
>>>>>>
>>>>>>> I don't like turning off generally useful transforms like sext(sext)->sext.
>>>>>> I don't turn it off, just do it after type legalizer.
>>>>>
>>>>> I still didn't understand why this is needed.  I'm imagining this:
>>>>> the DAG combiner runs, and calls the X86 combiner on v8i8 - > v8i64.
>>>>> X86 turns it into v8i8 -> v8i16 ->v8i32 ->v8i64.  As the DAG/X86
>>>>> combiner runs until there is no more work to do, the DAG combiner
>>>>> should then combine the load with the v8i8
>>>>> -> v8i16 and so on until you only have the target instructions you
>>>>> -> described
>>>>> (VPMOVSXBW etc).  So no-one outside of the DAG combiner, in particular the type legalizer, should ever see the intermediate v8i8 -> v8i16 ->v8i32 ->v8i64 form.
>>>>> Is this not how it happens?
>>>>>
>>>>> Ciao, Duncan.
>>>>>
>>>>>>
>>>>>> - Elena
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Duncan Sands [mailto:duncan.sands at gmail.com] On Behalf Of
>>>>>> Duncan Sands
>>>>>> Sent: Tuesday, January 15, 2013 14:26
>>>>>> To: Demikhovsky, Elena
>>>>>> Cc: llvm-commits at cs.uiuc.edu
>>>>>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX
>>>>>> and
>>>>>> AVX2 - please review
>>>>>>
>>>>>> Hi Elena,
>>>>>>
>>>>>> On 15/01/13 13:21, Demikhovsky, Elena wrote:
>>>>>>> I put 2 or 3 sequential sign-extend operations.
>>>>>>>
>>>>>>> original: v8i8 - > v8i64
>>>>>>
>>>>>> why don't you have the X86 target turn this immediately into a VPMOVSXBW, VPMOVSXWD, VPMOVSXDQ combination, rather than going through all of these intermediate steps?
>>>>>>
>>>>>> I don't like turning off generally useful transforms like sext(sext)->sext.
>>>>>>
>>>>>> Ciao, Duncan.
>>>>>>
>>>>>>> new:      v8i8 -> v8i16 ->v8i32 ->v8i64
>>>>>>>
>>>>>>> I combine load with v8i8 -> v8i16 and use VPMOVSXBW instruction.
>>>>>>> I translate v8i16 ->v8i32 to 2 VPMOVSXWD instructions.
>>>>>>> And then I have 4 VPMOVSXDQ for  v8i32 ->v8i64.
>>>>>>>
>>>>>>> if I work with one SIGN_EXTEND operation with illegal types, I receive a scalar code after type legalizer.
>>>>>>>
>>>>>>> - Elena
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: llvm-commits-bounces at cs.uiuc.edu
>>>>>>> [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Duncan
>>>>>>> Sands
>>>>>>> Sent: Tuesday, January 15, 2013 13:58
>>>>>>> To: llvm-commits at cs.uiuc.edu
>>>>>>> Subject: Re: [llvm-commits] Optimization for SIGN_EXTEND on AVX
>>>>>>> and
>>>>>>> AVX2 - please review
>>>>>>>
>>>>>>> Hi Elena,
>>>>>>>
>>>>>>> On 15/01/13 12:47, Demikhovsky, Elena wrote:
>>>>>>>> I optimized the following SEXT pairs:
>>>>>>>> v8i8 -> v8i64
>>>>>>>> v8i8->v8i32
>>>>>>>> v4i8 ->v4i64
>>>>>>>> v4i16->v4i64
>>>>>>>
>>>>>>>> Index: lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>>>>>>>> ===================================================================
>>>>>>>> --- lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(revision 172422)
>>>>>>>> +++ lib/CodeGen/SelectionDAG/DAGCombiner.cpp	(working copy)
>>>>>>>> @@ -4298,12 +4298,13 @@
>>>>>>>>         if (isa<ConstantSDNode>(N0))
>>>>>>>>           return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(),
>>>>>>>> VT, N0);
>>>>>>>>
>>>>>>>> -  // fold (sext (sext x)) -> (sext x)
>>>>>>>> -  // fold (sext (aext x)) -> (sext x)
>>>>>>>> -  if (N0.getOpcode() == ISD::SIGN_EXTEND || N0.getOpcode() == ISD::ANY_EXTEND)
>>>>>>>> -    return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(), VT,
>>>>>>>> -                       N0.getOperand(0));
>>>>>>>> -
>>>>>>>> +  if (Level >= AfterLegalizeTypes) {
>>>>>>>> +    // fold (sext (sext x)) -> (sext x)
>>>>>>>> +    // fold (sext (aext x)) -> (sext x)
>>>>>>>> +    if (N0.getOpcode() == ISD::SIGN_EXTEND || N0.getOpcode() == ISD::ANY_EXTEND)
>>>>>>>> +      return DAG.getNode(ISD::SIGN_EXTEND, N->getDebugLoc(), VT,
>>>>>>>> +                         N0.getOperand(0));  }
>>>>>>>
>>>>>>> why is this needed?  As the transform doesn't introduce any illegal types that weren't there already it should be safe to perform at any level.
>>>>>>>
>>>>>>> If you are worried that it might create an illegal operation, then that is what you should check for, not the Level.
>>>>>>>
>>>>>>>>         if (N0.getOpcode() == ISD::TRUNCATE) {
>>>>>>>>           // fold (sext (truncate (load x))) -> (sext (smaller load x))
>>>>>>>>           // fold (sext (truncate (srl (load x), c))) -> (sext
>>>>>>>> (smaller load (x+c/n)))
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>> --- lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(revision 172422)
>>>>>>>> +++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp	(working copy)
>>>>>>>> @@ -2554,9 +2554,7 @@
>>>>>>>>                   VT.getVectorNumElements() ==
>>>>>>>>                   Operand.getValueType().getVectorNumElements()) &&
>>>>>>>>                  "Vector element count mismatch!");
>>>>>>>> -    if (OpOpcode == ISD::SIGN_EXTEND || OpOpcode == ISD::ZERO_EXTEND)
>>>>>>>> -      return getNode(OpOpcode, DL, VT, Operand.getNode()->getOperand(0));
>>>>>>>> -    else if (OpOpcode == ISD::UNDEF)
>>>>>>>> +    if (OpOpcode == ISD::UNDEF)
>>>>>>>>             // sext(undef) = 0, because the top bits will all be the same.
>>>>>>>>             return getConstant(0, VT);
>>>>>>>>           break;
>>>>>>>
>>>>>>> Likewise, why are you getting rid of this folding?
>>>>>>>
>>>>>>> Ciao, Duncan.
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>> -----------------------------------------------------------------
>>>>>>> -
>>>>>>> -
>>>>>>> -
>>>>>>> -
>>>>>>> Intel Israel (74) Limited
>>>>>>>
>>>>>>> This e-mail and any attachments may contain confidential material
>>>>>>> for the sole use of the intended recipient(s). Any review or
>>>>>>> distribution by others is strictly prohibited. If you are not the
>>>>>>> intended recipient, please contact the sender and delete all copies.
>>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------
>>>>>> -
>>>>>> -
>>>>>> -
>>>>>> Intel Israel (74) Limited
>>>>>>
>>>>>> This e-mail and any attachments may contain confidential material
>>>>>> for the sole use of the intended recipient(s). Any review or
>>>>>> distribution by others is strictly prohibited. If you are not the
>>>>>> intended recipient, please contact the sender and delete all copies.
>>>>>>
>>>>>
>>>>> -------------------------------------------------------------------
>>>>> -
>>>>> -
>>>>> Intel Israel (74) Limited
>>>>>
>>>>> This e-mail and any attachments may contain confidential material
>>>>> for the sole use of the intended recipient(s). Any review or
>>>>> distribution by others is strictly prohibited. If you are not the
>>>>> intended recipient, please contact the sender and delete all copies.
>>>>>
>>>>
>>>> --------------------------------------------------------------------
>>>> -
>>>> Intel Israel (74) Limited
>>>>
>>>> This e-mail and any attachments may contain confidential material
>>>> for the sole use of the intended recipient(s). Any review or
>>>> distribution by others is strictly prohibited. If you are not the
>>>> intended recipient, please contact the sender and delete all copies.
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> Intel Israel (74) Limited
>>>
>>> This e-mail and any attachments may contain confidential material for
>>> the sole use of the intended recipient(s). Any review or distribution
>>> by others is strictly prohibited. If you are not the intended
>>> recipient, please contact the sender and delete all copies.
>>> <sext_3.diff>_______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>