[PATCH] D11518: [DAGCombiner] Convert constant AND masks to shuffle clear masks down to the byte level

Sun Jul 26 22:09:13 PDT 2015

chandlerc added a comment.

By and large, really, really nice.

Comments in-line.


================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12991-12994
@@ -12990,8 +12990,6 @@
 
-    for (unsigned i = 0; i != NumElts; ++i) {
-      SDValue Elt = RHS.getOperand(i);
-      if (isAllOnesConstant(Elt))
-        Indices.push_back(i);
-      else if (isNullConstant(Elt))
-        Indices.push_back(NumElts+i);
-      else
+    // Attempt to create a valid clear mask, splitting the mask into
+    // sub elements and checking to see if each is
+    // all zeros or all ones - suitable for shuffle masking.
+    auto BuildClearMask = [&](unsigned Split) {
+      unsigned NumSubElts = NumElts * Split;
----------------
How annoying would it be to fully split this out into a helper function? The indentation is already reasonably significant here.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:12999-13001
@@ +12998,5 @@
+      SmallVector<int, 8> Indices;
+      for (unsigned i = 0; i != NumSubElts; ++i) {
+        unsigned EltIdx = i / Split;
+        unsigned SubIdx = i % Split;
+        SDValue Elt = RHS.getOperand(EltIdx);
----------------
I would just consistently use 'int' here.

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:13118-13121
@@ -13079,2 +13117,6 @@
 
+  // Try to convert a constant mask AND into a shuffle clear mask.
+  if (SDValue Shuffle = XformToShuffleWithZero(N))
+    return Shuffle;
+
   // Type legalization might introduce new shuffles in the DAG.
----------------
Is there a particular reason to need to adjust this in this way?

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:6884-6885
@@ +6883,4 @@
+
+    if (SDValue Masked = lowerVectorShuffleAsBitMask(DL, VT, V1, V2, Mask, DAG))
+      return Masked;
+
----------------
Please send a separate review with just this (the bit mask lowering) change?

It seems likely it should be testable independently.

================
Comment at: test/CodeGen/X86/sse3.ll:272-275
@@ -271,4 +271,6 @@
 ; X64:       ## BB#0: ## %entry
-; X64-NEXT:    movddup {{.*#+}} xmm0 = mem[0,0]
-; X64-NEXT:    andpd {{.*}}(%rip), %xmm0
+; X64-NEXT:    movaps (%rax), %xmm0
+; X64-NEXT:    unpcklps {{.*#+}} xmm0 = xmm0[0,0,1,1]
+; X64-NEXT:    pxor %xmm1, %xmm1
+; X64-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
 ; X64-NEXT:    retq
----------------
This seems like a pretty bad regression. Any hope of recovering it?

================
Comment at: test/CodeGen/X86/vec_cast2.ll:49-54
@@ -48,6 +48,8 @@
 ; CHECK:       ## BB#0:
-; CHECK-NEXT:    vpmovzxwd {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
-; CHECK-NEXT:    vpunpckhwd {{.*#+}} xmm0 = xmm0[4,4,5,5,6,6,7,7]
-; CHECK-NEXT:    vinsertf128 $1, %xmm0, %ymm1, %ymm0
-; CHECK-NEXT:    vandps LCPI2_0, %ymm0, %ymm0
+; CHECK-NEXT:    vpunpckhwd {{.*#+}} xmm1 = xmm0[4,4,5,5,6,6,7,7]
+; CHECK-NEXT:    vmovdqa {{.*#+}} xmm2 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
+; CHECK-NEXT:    vpand %xmm2, %xmm1, %xmm1
+; CHECK-NEXT:    vpmovzxwd {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero
+; CHECK-NEXT:    vpand %xmm2, %xmm0, %xmm0
+; CHECK-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
 ; CHECK-NEXT:    vcvtdq2ps %ymm0, %ymm0
----------------
Same question as above...

================
Comment at: test/CodeGen/X86/vec_int_to_fp.ll:1762-1765
@@ -1771,2 +1761,6 @@
 ; AVX1-NEXT:    vpunpckhwd {{.*#+}} xmm1 = xmm1[4,4,5,5,6,6,7,7]
+; AVX1-NEXT:    vmovdqa {{.*#+}} xmm2 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
+; AVX1-NEXT:    vpand %xmm2, %xmm1, %xmm1
+; AVX1-NEXT:    vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
+; AVX1-NEXT:    vpand %xmm2, %xmm0, %xmm0
 ; AVX1-NEXT:    vinsertf128 $1, %xmm1, %ymm0, %ymm0
----------------
and here, wow!


Repository:
  rL LLVM

http://reviews.llvm.org/D11518