[llvm-commits] Please review - One more shuffle optimization for AVX

Sun Jun 24 07:10:25 PDT 2012

CHECK: test18
+; CHECK: vshufps
+; CHECK: vshufps
+; CHECK: vunpcklps

Check for 'ret' at the end of the test.

                                    DebugLoc dl) {
-  SDValue V = Insert128BitVector(DAG.getUNDEF(VT), V1, 0, DAG, dl);
-  return Insert128BitVector(V, V2, NumElems/2, DAG, dl);
+  SDValue V = DAG.getNode(ISD::UNDEF, dl, VT);
+
+  if (V1.getOpcode() != ISD::UNDEF) 
+    V = Insert128BitVector(V, V1, 0, DAG, dl);
+
+  if (V2.getOpcode() != ISD::UNDEF)
+    V = Insert128BitVector(V, V2, NumElems/2, DAG, dl);
+
+  return V;
 }

No need to do this. Craig changed Insert128BitVector so that it checks for undef values. 

+//
+// Some special combinations that can be optimized
+//

What is special about these combinations ? Period at the end of the sentence.  Why is this function called Compact8x32ShuffleNode ?

+  if (VT.is256BitVector() && (NumElts == 8)) {

You can check that VT = v8i32;

+    ArrayRef<int> Mask = SVOp->getMask();
+    if (isUndefOrEqual(Mask[0], 0) &&
+        isUndefOrEqual(Mask[1], 8) &&
+        isUndefOrEqual(Mask[2], 2) &&
+        isUndefOrEqual(Mask[3], 10) &&
+        isUndefOrEqual(Mask[4], 4) &&
+        isUndefOrEqual(Mask[5], 12) &&
+        isUndefOrEqual(Mask[6], 6) &&
+        isUndefOrEqual(Mask[7], 14)) {

Please create a local array and iterate over it in a loop. Calling a function 16 times bloats the code.

+      int CompactionMask[] = {0, 2, -1, -1, 4, 6, -1, -1};
+      SDValue Op0 = DAG.getVectorShuffle(VT, dl, SVOp->getOperand(0), 
+        DAG.getNode(ISD::UNDEF, dl, VT), CompactionMask);
+      SDValue Op1 = DAG.getVectorShuffle(VT, dl, SVOp->getOperand(1), 
+        DAG.getNode(ISD::UNDEF, dl, VT), CompactionMask);
+      int UnpackMask[] = {0, 8, 1, 9, 4, 12, 5, 13};

Undef can be created once, not 4 times. 

+}
+
+

Remove the extra line breaks. 

-----Original Message-----
From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Demikhovsky, Elena
Sent: Sunday, June 24, 2012 15:30
To: llvm-commits at cs.uiuc.edu
Subject: [llvm-commits] Please review - One more shuffle optimization for AVX

Hi,

I have a bunch of optimizations for AVX and AVX2 code that I recently did. Most of them show significant performance speedup on real workloads.
I'll send it to review one by one accompanied with appropriate tests.

The current patch optimizes frequently used shuffle patterns and gives these instruction sequence reduction.
Before:
       vshufps $-35, %xmm1, %xmm0, %xmm2 ## xmm2 = xmm0[1,3],xmm1[1,3]
        vpermilps       $-40, %xmm2, %xmm2 ## xmm2 = xmm2[0,2,1,3]
        vextractf128    $1, %ymm1, %xmm1
        vextractf128    $1, %ymm0, %xmm0
        vshufps $-35, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[1,3],xmm1[1,3]
        vpermilps       $-40, %xmm0, %xmm0 ## xmm0 = xmm0[0,2,1,3]
        vinsertf128     $1, %xmm0, %ymm2, %ymm0
After:
       vshufps $13, %ymm0, %ymm1, %ymm1 ## ymm1 = ymm1[1,3],ymm0[0,0],ymm1[5,7],ymm0[4,4]
       vshufps $13, %ymm0, %ymm0, %ymm0 ## ymm0 = ymm0[1,3,0,0,5,7,4,4]
       vunpcklps       %ymm1, %ymm0, %ymm0 ## ymm0 = ymm0[0],ymm1[0],ymm0[1],ymm1[1],ymm0[4],ymm1[4],ymm0[5],ymm1[5]

  Thank you

- Elena

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.