[PATCH] Masked Vector Load/Store Intrinsics
Adam Nemet
anemet at apple.com
Tue Nov 11 22:58:29 PST 2014
================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:273-278
@@ -272,1 +272,8 @@
+ /// \brief Return true if the target works with masked instruction
+ /// AVX2 allows masks for consecutive load and store for i32 and i64 elements.
+ /// AVX-512 architecture will also allow masks for non-consecutive memory
+ /// accesses.
+ virtual bool isLegalPredicatedStore(Type *DataType, int Consecutive) const;
+ virtual bool isLegalPredicatedLoad (Type *DataType, int Consecutive) const;
+
----------------
Should Consecutive be bool?
================
Comment at: lib/Target/X86/X86InstrAVX512.td:2102-2136
@@ -2101,2 +2101,37 @@
+def: Pat<(masked_store addr:$ptr, VK8WM:$mask, (v8f32 VR256:$src)),
+ (VMOVUPSZmrk addr:$ptr,
+ (v16i1 (COPY_TO_REGCLASS VK8WM:$mask, VK16WM)),
+ (INSERT_SUBREG (v16f32 (IMPLICIT_DEF)), VR256:$src, sub_ymm))>;
+
+def: Pat<(v8f32 (masked_load addr:$ptr, VK8WM:$mask, undef)),
+ (v8f32 (EXTRACT_SUBREG (v16f32 (VMOVUPSZrmkz
+ (v16i1 (COPY_TO_REGCLASS VK8WM:$mask, VK16WM)), addr:$ptr)), sub_ymm))>;
+
+def: Pat<(masked_store addr:$ptr, VK16WM:$mask, (v16f32 VR512:$src)),
+ (VMOVUPSZmrk addr:$ptr, VK16WM:$mask, VR512:$src)>;
+
+def: Pat<(masked_store addr:$ptr, VK8WM:$mask, (v8f64 VR512:$src)),
+ (VMOVUPDZmrk addr:$ptr, VK8WM:$mask, VR512:$src)>;
+
+def: Pat<(v16f32 (masked_load addr:$ptr, VK16WM:$mask, undef)),
+ (VMOVUPSZrmkz VK16WM:$mask, addr:$ptr)>;
+
+def: Pat<(v16f32 (masked_load addr:$ptr, VK16WM:$mask,
+ (bc_v16f32 (v16i32 immAllZerosV)))),
+ (VMOVUPSZrmkz VK16WM:$mask, addr:$ptr)>;
+
+def: Pat<(v16f32 (masked_load addr:$ptr, VK16WM:$mask, (v16f32 VR512:$src0))),
+ (VMOVUPSZrmk VR512:$src0, VK16WM:$mask, addr:$ptr)>;
+
+def: Pat<(v8f64 (masked_load addr:$ptr, VK8WM:$mask, undef)),
+ (VMOVUPDZrmkz VK8WM:$mask, addr:$ptr)>;
+
+def: Pat<(v8f64 (masked_load addr:$ptr, VK8WM:$mask,
+ (bc_v8f64 (v16i32 immAllZerosV)))),
+ (VMOVUPDZrmkz VK8WM:$mask, addr:$ptr)>;
+
+def: Pat<(v8f64 (masked_load addr:$ptr, VK8WM:$mask, (v8f64 VR512:$src0))),
+ (VMOVUPDZrmk VR512:$src0, VK8WM:$mask, addr:$ptr)>;
+
defm VMOVDQA32 : avx512_load_vl<0x6F, "vmovdqa32", "alignedload", "i", "32",
----------------
There's got to be a better way to write this and the store later.
My preference would be to only add a few (one?) for now in order to test the functionality. Then we figure out a way to have this be part of a new AVX512_maskable class (e.g. AVX512_maskable_trapping)
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:886
@@ -873,2 +885,3 @@
SmallPtrSet<Value *, 8> StrideSet;
+ std::set<const Instruction*> MaskedOp;
};
----------------
Please comment what this is for.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5352
@@ -5303,2 +5351,3 @@
// We might be able to hoist the load.
+
if (it->mayReadFromMemory()) {
----------------
Don't add a new line.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5355-5364
@@ -5305,5 +5354,12 @@
LoadInst *LI = dyn_cast<LoadInst>(it);
- if (!LI || !SafePtrs.count(LI->getPointerOperand()))
+ if (!LI)
return false;
+ if (!SafePtrs.count(LI->getPointerOperand())) {
+ if (canPredicateLoad(LI->getType(), LI->getPointerOperand())) {
+ MaskedOp.insert(LI);
+ continue;
+ }
+ return false;
+ }
}
----------------
I read this far and I still don't understand the setMaskedOp business. Can you please explain.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5366
@@ -5309,3 +5365,3 @@
// We don't predicate stores at the moment.
if (it->mayWriteToMemory()) {
----------------
Stale comment.
http://reviews.llvm.org/D6191
More information about the llvm-commits
mailing list