[PATCH] D37446: [x86] eliminate unnecessary vector compare for AVX masked store

Wed Sep 6 10:34:00 PDT 2017

spatel updated this revision to Diff 114028.
spatel added a comment.

Patch updated:
Given that AVX512 requires different pattern matching and different output, I'm pushing back on trying to include that in this patch. It should be an independent improvement (and I'm not the right person to make that improvement).

I have updated the comments in the code and the test to reflect this. NFC from the previous rev of the patch.


https://reviews.llvm.org/D37446

Files:
  lib/Target/X86/X86ISelLowering.cpp
  test/CodeGen/X86/masked_memop.ll


Index: test/CodeGen/X86/masked_memop.ll
===================================================================

--- test/CodeGen/X86/masked_memop.ll
+++ test/CodeGen/X86/masked_memop.ll
@@ -1140,21 +1140,18 @@
   ret <8 x double> %res
 }
 
-; FIXME: The mask bit for each data element is the most significant bit of the mask operand, so a compare isn't needed.
+; The mask bit for each data element is the most significant bit of the mask operand, so a compare isn't needed.
+; FIXME: The AVX512 code should be improved to use 'vpmovd2m'. Add tests for 512-bit vectors when implementing that.
 
 define void @trunc_mask(<4 x float> %x, <4 x float>* %ptr, <4 x float> %y, <4 x i32> %mask) {
 ; AVX-LABEL: trunc_mask:
 ; AVX:       ## BB#0:
-; AVX-NEXT:    vpxor %xmm1, %xmm1, %xmm1
-; AVX-NEXT:    vpcmpgtd %xmm2, %xmm1, %xmm1
-; AVX-NEXT:    vmaskmovps %xmm0, %xmm1, (%rdi)
+; AVX-NEXT:    vmaskmovps %xmm0, %xmm2, (%rdi)
 ; AVX-NEXT:    retq
 ;
 ; AVX512F-LABEL: trunc_mask:
 ; AVX512F:       ## BB#0:
-; AVX512F-NEXT:    vpxor %xmm1, %xmm1, %xmm1
-; AVX512F-NEXT:    vpcmpgtd %xmm2, %xmm1, %xmm1
-; AVX512F-NEXT:    vmaskmovps %xmm0, %xmm1, (%rdi)
+; AVX512F-NEXT:    vmaskmovps %xmm0, %xmm2, (%rdi)
 ; AVX512F-NEXT:    retq
 ;
 ; SKX-LABEL: trunc_mask:
Index: lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- lib/Target/X86/X86ISelLowering.cpp
+++ lib/Target/X86/X86ISelLowering.cpp
@@ -33206,8 +33206,33 @@
   if (Mst->isCompressingStore())
     return SDValue();
 
-  if (!Mst->isTruncatingStore())
-    return reduceMaskedStoreToScalarStore(Mst, DAG);
+  if (!Mst->isTruncatingStore()) {
+    if (SDValue ScalarStore = reduceMaskedStoreToScalarStore(Mst, DAG))
+      return ScalarStore;
+
+    // If the mask is checking (0 > X), we're creating a vector with all-zeros
+    // or all-ones elements based on the sign bits of X. AVX1 masked store only
+    // cares about the sign bit of each mask element, so eliminate the compare:
+    // mstore val, ptr, (pcmpgt 0, X) --> mstore val, ptr, X
+    // Note that by waiting to match an x86-specific PCMPGT node, we're
+    // eliminating potentially more complex matching of a setcc node which has
+    // a full range of predicates.
+    SDValue Mask = Mst->getMask();
+    if (Mask.getOpcode() == X86ISD::PCMPGT &&
+        ISD::isBuildVectorAllZeros(Mask.getOperand(0).getNode())) {
+      assert(Mask.getValueType() == Mask.getOperand(1).getValueType() &&
+             "Unexpected type for PCMPGT");
+      return DAG.getMaskedStore(
+          Mst->getChain(), SDLoc(N), Mst->getValue(), Mst->getBasePtr(),
+          Mask.getOperand(1), Mst->getMemoryVT(), Mst->getMemOperand());
+    }
+
+    // TODO: AVX512 targets should also be able to simplify something like the
+    // pattern above, but that pattern will be different. It will either need to
+    // match setcc more generally or match PCMPGTM later (in tablegen?).
+
+    return SDValue();
+  }
 
   // Resolve truncating stores.
   EVT VT = Mst->getValue().getValueType();


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D37446.114028.patch
Type: text/x-patch
Size: 3050 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170906/3d7b8e4e/attachment.bin>