[PATCH][X86] Teach the backend how to simplify/canonicalize dag nodes introduced during type legalization.

Fri May 30 16:26:38 PDT 2014

Thanks Nadav,
Committed revision 209930.

On Fri, May 30, 2014 at 10:08 PM, Nadav Rotem <nrotem at apple.com> wrote:
>
> On May 30, 2014, at 1:58 PM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
>
>> Hi Nadav,
>> thanks for the useful feedback.
>>
>> On Thu, May 29, 2014 at 8:55 PM, Nadav Rotem <nrotem at apple.com> wrote:
>>> Hi Andrea,
>>>
>>> Thanks for working on this.
>>>
>>> +  // Type legalization might introduce new shuffles in the DAG.
>>> +  // Fold (VBinOp (shuffle (A, Undef, Mask)), (shuffle (B, Undef, Mask)))
>>> +  //   -> (shuffle (VBinOp (A, B)), Undef, Mask).
>>>
>>> There are other places in DAGCombine where we sink or hoist shuffles across binary operations. I think that SimplifyBinOpWithSameOpcodeHands does something similar.
>>>
>>
>> True, we do a similar transformation for AND/OR/XOR dag nodes.
>> However, it seems to me that method '
>> SimplifyBinOpWithSameOpcodeHands' is only safe to be called on
>> AND/OR/XOR dag nodes. For example, some of the rules work under the
>> assumption that AND/OR/XOR dag nodes are indifferent to swizzle
>> operation.
>> The rule I tried to add with this patch would be more generic (for
>> example, it can be used to simplify an ADD or a SUB). That's why I
>> thought it was reasonable to have it in 'SimplifyVBinOp’.
>
> Okay.
>
>>
>>> +  // During Type Legalization, when promoting illegal vector types,
>>> +  // the backend might introduce new shuffle dag nodes and bitcasts.
>>> +  //
>>> +  // This code performs the following transformation:
>>> +  // fold: (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) ->
>>> +  //       (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>)
>>>
>>> I think that your approach is reasonable but there may be other solutions. Have you considered handling bit-casted binary operations before type legalization?
>>>
>>> Clang is bit-casting <2 x float> to a double to implement the calling convention. If you are working in a Jitted environment and you don’t care about calling external functions then you can disable the bitcasting at the clang level.
>>
>> Right, if I disable the bitcasting we don't get any shuffles.
>> When adding two <2 x i32> values, both CopyFromReg and CopyToReg use
>> v2i64 register types.
>> basically, rather than producing the sequence
>>  (v2i32: bitcast (f64,ch: CopyFromReg (Chain), vreg))
>> I get
>>  (v2i32: truncate (v2i64,ch: CopyFromReg (Chain), vreg)).
>>
>> I can try to see if it makes sense to have a target specific combine
>> that attempts to convert the sequence bitcast+copyfromreg to a
>> truncate+copyfromreg. I am not sure if that would be safe to do in
>> general; (I think) it should be safe in the case of a bitcast from f64
>> -> v2i32; I am not sure it would be ok in other cases though. Is this
>> what you had in mind when you said that there might be other
>> solutions?
>
> The solutions that I had in mind for the testcase that you sent were pre-type-legalization optimization or post-type-legalization optimization. I think that your approach makes sense and I don’t have any other comments.
>
> Thanks,
> Nadav
>
>>
>> Cheers,
>> Andrea
>>
>>>
>>> Thanks,
>>> Nadav
>>>
>>>
>>> On May 29, 2014, at 11:51 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> This patch teaches the backend how to simplify/canonicalize dag node
>>>> sequences normally introduced by the backend when promoting dag nodes
>>>> with an illegal vector types.
>>>>
>>>> Example:
>>>>
>>>> define double @foo(double %A, double %B) {
>>>> %1 = bitcast double %A to <2 x i32>
>>>> %2 = bitcast double %B to <2 x i32>
>>>> %add = add <2 x i32> %1, %2
>>>> %3 = bitcast <2 x i32> %add to double
>>>> ret double %3
>>>> }
>>>>
>>>> All the bitcasts in function @foo are promoted to type MVT::v2i64,
>>>> since type MVT::v2i32 is not a legal type. For the same reason, the
>>>> integer result of the vector add node also promoted.
>>>>
>>>> Type promotion might introduce new build_vector nodes (which are then
>>>> combined into shuffles) and bitcast operations. This is what happens
>>>> for example with function 'foo' that is compiled to the following
>>>> assembly sequence (using -mcpu=corei7):
>>>> pmovzxdq  %xmm0, %xmm0  # promotion from <2 x i32> to <2 x i64>
>>>> pmovzxdq  %xmm1, %xmm1  # promotion from <2 x i32> to <2 x i64>
>>>> paddq %xmm0, %xmm1         # promoted to a legal add of type <2 x i64>
>>>> pshufd $8, %xmm1, %xmm0  # xmm0 = xmm1[0,2,u,u]
>>>>
>>>> Ideally, the backend should be able to understand that the code
>>>> sequence above can be simplified into a single instruction:
>>>> paddd %xmm0, %xmm1
>>>>
>>>> This patch adds two new combine rules:
>>>> 1)
>>>> fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) ->
>>>>      (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>)
>>>>
>>>> 2)
>>>> fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) ->
>>>>      (shuffle (BINOP A, B), Undef, <Mask>).
>>>>
>>>> The goal is to simplify the dag node sequence when dealing with 64-bit
>>>> vector types.
>>>>
>>>> Both rules are only triggered on the type-legalized DAG.
>>>> In particular, rule 1. is a target specific combine rule that attempts
>>>> to sink a bitconvert into the operands of a binary operation.
>>>> Rule 2. is a target independet rule that attempts to move a shuffle
>>>> immediately after a binary operation.
>>>>
>>>> With this patch, all the functions from test 'combine-64bit-vbinop.ll'
>>>> (a new test) are now strongly simplified.
>>>> In the case of function @foo (from the example), the backend now
>>>> correctly generates a single 'paddd' instruction.
>>>>
>>>> Please let me know if ok to submit.
>>>>
>>>> Thanks,
>>>> Andrea Di Biagio
>>>> <patch-dagcombine.diff>
>>>
>