[PATCH][X86] Teach the backend how to simplify/canonicalize dag nodes introduced during type legalization.

Thu May 29 12:55:14 PDT 2014

Hi Andrea, 

Thanks for working on this. 

+  // Type legalization might introduce new shuffles in the DAG.
+  // Fold (VBinOp (shuffle (A, Undef, Mask)), (shuffle (B, Undef, Mask)))
+  //   -> (shuffle (VBinOp (A, B)), Undef, Mask).

There are other places in DAGCombine where we sink or hoist shuffles across binary operations. I think that SimplifyBinOpWithSameOpcodeHands does something similar. 

+  // During Type Legalization, when promoting illegal vector types,
+  // the backend might introduce new shuffle dag nodes and bitcasts.
+  //
+  // This code performs the following transformation:
+  // fold: (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) ->
+  //       (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>)

I think that your approach is reasonable but there may be other solutions. Have you considered handling bit-casted binary operations before type legalization?

Clang is bit-casting <2 x float> to a double to implement the calling convention. If you are working in a Jitted environment and you don’t care about calling external functions then you can disable the bitcasting at the clang level.  

Thanks,
Nadav

On May 29, 2014, at 11:51 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:

> Hi,
> 
> This patch teaches the backend how to simplify/canonicalize dag node
> sequences normally introduced by the backend when promoting dag nodes
> with an illegal vector types.
> 
> Example:
> 
> define double @foo(double %A, double %B) {
>  %1 = bitcast double %A to <2 x i32>
>  %2 = bitcast double %B to <2 x i32>
>  %add = add <2 x i32> %1, %2
>  %3 = bitcast <2 x i32> %add to double
>  ret double %3
> }
> 
> All the bitcasts in function @foo are promoted to type MVT::v2i64,
> since type MVT::v2i32 is not a legal type. For the same reason, the
> integer result of the vector add node also promoted.
> 
> Type promotion might introduce new build_vector nodes (which are then
> combined into shuffles) and bitcast operations. This is what happens
> for example with function 'foo' that is compiled to the following
> assembly sequence (using -mcpu=corei7):
>  pmovzxdq  %xmm0, %xmm0  # promotion from <2 x i32> to <2 x i64>
>  pmovzxdq  %xmm1, %xmm1  # promotion from <2 x i32> to <2 x i64>
>  paddq %xmm0, %xmm1         # promoted to a legal add of type <2 x i64>
>  pshufd $8, %xmm1, %xmm0  # xmm0 = xmm1[0,2,u,u]
> 
> Ideally, the backend should be able to understand that the code
> sequence above can be simplified into a single instruction:
>  paddd %xmm0, %xmm1
> 
> This patch adds two new combine rules:
> 1)
>  fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) ->
>       (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>)
> 
> 2)
>  fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) ->
>       (shuffle (BINOP A, B), Undef, <Mask>).
> 
> The goal is to simplify the dag node sequence when dealing with 64-bit
> vector types.
> 
> Both rules are only triggered on the type-legalized DAG.
> In particular, rule 1. is a target specific combine rule that attempts
> to sink a bitconvert into the operands of a binary operation.
> Rule 2. is a target independet rule that attempts to move a shuffle
> immediately after a binary operation.
> 
> With this patch, all the functions from test 'combine-64bit-vbinop.ll'
> (a new test) are now strongly simplified.
> In the case of function @foo (from the example), the backend now
> correctly generates a single 'paddd' instruction.
> 
> Please let me know if ok to submit.
> 
> Thanks,
> Andrea Di Biagio
> <patch-dagcombine.diff>