[llvm-commits] [PATCH] AVX vmovaps +vxoprs + vinsertf128 DAG combine to vmovaps

Fri Dec 30 13:07:21 PST 2011

Hi Chad,

On Thu, Dec 22, 2011 at 12:12 AM, Chad Rosier <mcrosier at apple.com> wrote:
> This patch is for an AVX specific DAGcombine optimization.
>
> The following code:
>
> __m256 foo(float *f) {
>    return _mm256_castps128_ps256 (_mm_load_ps(f));
> }
>
> generates this assembly:
>
>        vmovaps (%rdi), %xmm0
>        vxorps  %ymm1, %ymm1, %ymm1
>        vinsertf128     $0, %xmm0, %ymm1, %ymm0
>
> On AVX enabled processors, the vmovaps will zero the upper bits (255:128) of the corresponding YMM register.  Therefore, the vxorps and vinsertf128 instructions are not necessary.
>
> This patch implements a DAG combine that removes the unnecessary vxorps and vinsertf128 instructions.  Currently, this is only working as an enhancement to one of Bruno's DAGcombines (r135727), but I do plan on making this more general in the future.

LGTM, just a few comments:

+      return DAG.getNode(ISD::BITCAST, dl, VT, ResNode);

Since you're early returning here,

+    } else {
+      // Emit a zeroed vector and insert the desired subvector on its
+      // first half.
+      SDValue Zeros = getZeroVector(VT, true /* HasXMMInt */, DAG, dl);
+      SDValue InsV = Insert128BitVector(Zeros, V1.getOperand(0),
+                                        DAG.getConstant(0, MVT::i32), DAG, dl);
+      return DCI.CombineTo(N, InsV);
+    }

to follow llvm coding style, you don't need the "else".

+def X86vzload128  : SDNode<"X86ISD::VZEXT_LOAD128", SDTLoad,
+                            [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;

+      // VZEXT_LOAD128 - Load vector and zero extend.
+      VZEXT_LOAD128,
+

Why not use the previous X86vzload and VZEXT_LOAD instead? You can you
use it and still match it right by using v4i64 in the pattern.

-- 
Bruno Cardoso Lopes
http://www.brunocardoso.cc