[PATCH] Masked Vector Load/Store Intrinsics

Tue Nov 11 22:58:29 PST 2014

================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:273-278
@@ -272,1 +272,8 @@
 
+  /// \brief Return true if the target works with masked instruction
+  /// AVX2 allows masks for consecutive load and store for i32 and i64 elements.
+  /// AVX-512 architecture will also allow masks for non-consecutive memory
+  /// accesses.
+  virtual bool isLegalPredicatedStore(Type *DataType, int Consecutive) const;
+  virtual bool isLegalPredicatedLoad (Type *DataType, int Consecutive) const;
+
----------------
Should Consecutive be bool?

================
Comment at: lib/Target/X86/X86InstrAVX512.td:2102-2136
@@ -2101,2 +2101,37 @@
 
+def: Pat<(masked_store addr:$ptr, VK8WM:$mask, (v8f32 VR256:$src)),
+         (VMOVUPSZmrk addr:$ptr,
+         (v16i1 (COPY_TO_REGCLASS VK8WM:$mask, VK16WM)),
+         (INSERT_SUBREG (v16f32 (IMPLICIT_DEF)), VR256:$src, sub_ymm))>;
+
+def: Pat<(v8f32 (masked_load addr:$ptr, VK8WM:$mask, undef)),
+         (v8f32 (EXTRACT_SUBREG (v16f32 (VMOVUPSZrmkz 
+          (v16i1 (COPY_TO_REGCLASS VK8WM:$mask, VK16WM)), addr:$ptr)), sub_ymm))>;
+
+def: Pat<(masked_store addr:$ptr, VK16WM:$mask, (v16f32 VR512:$src)),
+         (VMOVUPSZmrk addr:$ptr, VK16WM:$mask, VR512:$src)>;
+
+def: Pat<(masked_store addr:$ptr, VK8WM:$mask, (v8f64 VR512:$src)),
+         (VMOVUPDZmrk addr:$ptr, VK8WM:$mask, VR512:$src)>;
+
+def: Pat<(v16f32 (masked_load addr:$ptr, VK16WM:$mask, undef)),
+         (VMOVUPSZrmkz VK16WM:$mask, addr:$ptr)>;
+
+def: Pat<(v16f32 (masked_load addr:$ptr, VK16WM:$mask,
+                              (bc_v16f32 (v16i32 immAllZerosV)))),
+         (VMOVUPSZrmkz VK16WM:$mask, addr:$ptr)>;
+
+def: Pat<(v16f32 (masked_load addr:$ptr, VK16WM:$mask, (v16f32 VR512:$src0))),
+         (VMOVUPSZrmk VR512:$src0, VK16WM:$mask, addr:$ptr)>;
+
+def: Pat<(v8f64 (masked_load addr:$ptr, VK8WM:$mask, undef)),
+         (VMOVUPDZrmkz VK8WM:$mask, addr:$ptr)>;
+
+def: Pat<(v8f64 (masked_load addr:$ptr, VK8WM:$mask,
+                             (bc_v8f64 (v16i32 immAllZerosV)))),
+         (VMOVUPDZrmkz VK8WM:$mask, addr:$ptr)>;
+
+def: Pat<(v8f64 (masked_load addr:$ptr, VK8WM:$mask, (v8f64 VR512:$src0))),
+         (VMOVUPDZrmk VR512:$src0, VK8WM:$mask, addr:$ptr)>;
+
 defm VMOVDQA32 : avx512_load_vl<0x6F, "vmovdqa32", "alignedload", "i", "32",
----------------
There's got to be a better way to write this and the store later.

My preference would be to only add a few (one?) for now in order to test the functionality.  Then we figure out a way to have this be part of a new AVX512_maskable class (e.g. AVX512_maskable_trapping)

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:886
@@ -873,2 +885,3 @@
   SmallPtrSet<Value *, 8> StrideSet;
+  std::set<const Instruction*> MaskedOp;
 };
----------------
Please comment what this is for.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5352
@@ -5303,2 +5351,3 @@
     // We might be able to hoist the load.
+    
     if (it->mayReadFromMemory()) {
----------------
Don't add a new line.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5355-5364
@@ -5305,5 +5354,12 @@
       LoadInst *LI = dyn_cast<LoadInst>(it);
-      if (!LI || !SafePtrs.count(LI->getPointerOperand()))
+      if (!LI)
         return false;
+      if (!SafePtrs.count(LI->getPointerOperand())) {
+        if (canPredicateLoad(LI->getType(), LI->getPointerOperand())) {
+          MaskedOp.insert(LI);
+          continue;
+        }
+        return false;
+      }
     }
 
----------------
I read this far and I still don't understand the setMaskedOp business.  Can you please explain.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5366
@@ -5309,3 +5365,3 @@
 
     // We don't predicate stores at the moment.
     if (it->mayWriteToMemory()) {
----------------
Stale comment.

http://reviews.llvm.org/D6191