[PATCH][AVX] Lower v4i64->v4i32 ISD::TRUNCATE for minimal shuffles

Wed Mar 5 07:44:01 PST 2014

Hey guys,

For AVX, v4i64->v4i32 truncates are currently lowered into two
shuffles plus a movlh:

> vpshufd $8, %xmm1, %xmm1        ## xmm1 = xmm1[0,2,0,0]
> vpshufd $8, %xmm0, %xmm0        ## xmm0 = xmm0[0,2,0,0]
> vmovlhps %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm1[0]

This could also be done using a vshufps:

> vshufps $-120, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0,2],xmm1[0,2]

Please note that this does change the execution domain of the shuffle,
but as far as I can tell this should be okay. My understanding, from
looking at Fog's tables, is that the shuffles should be a wash and
avoiding the movlh is a win.

Any insights into whether this change is a good idea or not?

Tia,
Cameron
-------------- next part --------------
Index: test/CodeGen/X86/avx-trunc.ll
===================================================================

--- test/CodeGen/X86/avx-trunc.ll	(revision 202981)
+++ test/CodeGen/X86/avx-trunc.ll	(working copy)
@@ -2,7 +2,7 @@
 
 define <4 x i32> @trunc_64_32(<4 x i64> %A) nounwind uwtable readnone ssp{
 ; CHECK: trunc_64_32
-; CHECK: pshufd
+; CHECK: shufps
   %B = trunc <4 x i64> %A to <4 x i32>
   ret <4 x i32>%B
 }
Index: lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- lib/Target/X86/X86ISelLowering.cpp	(revision 202981)
+++ lib/Target/X86/X86ISelLowering.cpp	(working copy)
@@ -9134,24 +9134,14 @@
                          DAG.getIntPtrConstant(0));
     }
 
-    // On AVX, v4i64 -> v4i32 becomes a sequence that uses PSHUFD and MOVLHPS.
     SDValue OpLo = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v2i64, In,
                                DAG.getIntPtrConstant(0));
     SDValue OpHi = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v2i64, In,
                                DAG.getIntPtrConstant(2));
-
     OpLo = DAG.getNode(ISD::BITCAST, DL, MVT::v4i32, OpLo);
     OpHi = DAG.getNode(ISD::BITCAST, DL, MVT::v4i32, OpHi);
-
-    // The PSHUFD mask:
-    static const int ShufMask1[] = {0, 2, 0, 0};
-    SDValue Undef = DAG.getUNDEF(VT);
-    OpLo = DAG.getVectorShuffle(VT, DL, OpLo, Undef, ShufMask1);
-    OpHi = DAG.getVectorShuffle(VT, DL, OpHi, Undef, ShufMask1);
-
-    // The MOVLHPS mask:
-    static const int ShufMask2[] = {0, 1, 4, 5};
-    return DAG.getVectorShuffle(VT, DL, OpLo, OpHi, ShufMask2);
+    static const int ShufMask[] = {0, 2, 4, 6};
+    return DAG.getVectorShuffle(VT, DL, OpLo, OpHi, ShufMask);
   }
 
   if ((VT == MVT::v8i16) && (InVT == MVT::v8i32)) {