[PATCH] Masked Vector Load/Store Intrinsics

Adam Nemet anemet at apple.com
Thu Nov 13 14:05:10 PST 2014


================
Comment at: lib/Target/X86/X86InstrAVX512.td:2102-2136
@@ -2101,2 +2101,37 @@
 
+def: Pat<(masked_store addr:$ptr, VK8WM:$mask, (v8f32 VR256:$src)),
+         (VMOVUPSZmrk addr:$ptr,
+         (v16i1 (COPY_TO_REGCLASS VK8WM:$mask, VK16WM)),
+         (INSERT_SUBREG (v16f32 (IMPLICIT_DEF)), VR256:$src, sub_ymm))>;
+
+def: Pat<(v8f32 (masked_load addr:$ptr, VK8WM:$mask, undef)),
+         (v8f32 (EXTRACT_SUBREG (v16f32 (VMOVUPSZrmkz 
+          (v16i1 (COPY_TO_REGCLASS VK8WM:$mask, VK16WM)), addr:$ptr)), sub_ymm))>;
+
+def: Pat<(masked_store addr:$ptr, VK16WM:$mask, (v16f32 VR512:$src)),
+         (VMOVUPSZmrk addr:$ptr, VK16WM:$mask, VR512:$src)>;
+
+def: Pat<(masked_store addr:$ptr, VK8WM:$mask, (v8f64 VR512:$src)),
+         (VMOVUPDZmrk addr:$ptr, VK8WM:$mask, VR512:$src)>;
+
+def: Pat<(v16f32 (masked_load addr:$ptr, VK16WM:$mask, undef)),
+         (VMOVUPSZrmkz VK16WM:$mask, addr:$ptr)>;
+
+def: Pat<(v16f32 (masked_load addr:$ptr, VK16WM:$mask,
+                              (bc_v16f32 (v16i32 immAllZerosV)))),
+         (VMOVUPSZrmkz VK16WM:$mask, addr:$ptr)>;
+
+def: Pat<(v16f32 (masked_load addr:$ptr, VK16WM:$mask, (v16f32 VR512:$src0))),
+         (VMOVUPSZrmk VR512:$src0, VK16WM:$mask, addr:$ptr)>;
+
+def: Pat<(v8f64 (masked_load addr:$ptr, VK8WM:$mask, undef)),
+         (VMOVUPDZrmkz VK8WM:$mask, addr:$ptr)>;
+
+def: Pat<(v8f64 (masked_load addr:$ptr, VK8WM:$mask,
+                             (bc_v8f64 (v16i32 immAllZerosV)))),
+         (VMOVUPDZrmkz VK8WM:$mask, addr:$ptr)>;
+
+def: Pat<(v8f64 (masked_load addr:$ptr, VK8WM:$mask, (v8f64 VR512:$src0))),
+         (VMOVUPDZrmk VR512:$src0, VK8WM:$mask, addr:$ptr)>;
+
 defm VMOVDQA32 : avx512_load_vl<0x6F, "vmovdqa32", "alignedload", "i", "32",
----------------
delena wrote:
> anemet wrote:
> > There's got to be a better way to write this and the store later.
> > 
> > My preference would be to only add a few (one?) for now in order to test the functionality.  Then we figure out a way to have this be part of a new AVX512_maskable class (e.g. AVX512_maskable_trapping)
> I implemented load and store together. I have to generated code and check correctness.That's why I put the whole code in.
> I know that I still have a lot of work here.
> Let we optimize the .td file later, in the one of the next patches.
OK.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:886
@@ -873,2 +885,3 @@
   SmallPtrSet<Value *, 8> StrideSet;
+  std::set<const Instruction*> MaskedOp;
 };
----------------
delena wrote:
> anemet wrote:
> > Please comment what this is for.
> I collect here memory operations that have to be masked on vectorization. If the block is predicated, it does not mean that all loads and stores requires masks. Sometimes the pointer is safe, sometimes there is a configuration flag. All these options are checked before.
> The last check is against target readiness to work with masked operations. In this case I put the instruction in the MaskedOp.
> I'll add a comment in the code.
I see, yes a comment is necessary here.

http://reviews.llvm.org/D6191






More information about the llvm-commits mailing list