[PATCH] Using Masked Load / Store intrinsics in Loop Vectorizer

Michael Zolotukhin mzolotukhin at apple.com
Mon Dec 15 13:59:42 PST 2014


Hi Elena,

Thank you for checking-in the first (small) part of the changes.

The code looks good to me except some minor nits (see inline), but I'd wait for a LGTM from Nadav or Arnold.


================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1884-1885
@@ +1883,4 @@
+      if (Legal->isMaskRequired(SI)) {
+        Type *I8PtrTy =
+        Builder.getInt8PtrTy(PartPtr->getType()->getPointerAddressSpace());
+
----------------
Please fix the indentation here.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1896-1897
@@ +1895,4 @@
+        NewSI = Builder.CreateMaskedStore(Ops);
+      }
+      else 
+        NewSI = Builder.CreateAlignedStore(StoredVal[Part], VecPtr, Alignment);
----------------
'}' and 'else' should be on the same line.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1932-1933
@@ +1931,4 @@
+      NewLI = Builder.CreateMaskedLoad(Ops);
+    }
+    else {
+      Value *VecPtr = Builder.CreateBitCast(PartPtr,
----------------
And here too.

================
Comment at: test/Transforms/LoopVectorize/X86/masked_load_store.ll:8
@@ +7,3 @@
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-pc_linux"
+
----------------
Do we really need the target triple, or could it be omitted?

================
Comment at: test/Transforms/LoopVectorize/X86/masked_load_store.ll:12-19
@@ +11,10 @@
+;
+;void foo1(int *A, int *B, int *trigger) {
+;
+;  for (int i=0; i<10000; i++) {
+;    if (trigger[i] < 100) {
+;          A[i] = B[i] + trigger[i];
+;    }
+;  }
+;}
+
----------------
Do we need a test for i64 case as well?

================
Comment at: test/Transforms/LoopVectorize/X86/masked_load_store.ll:26
@@ +25,3 @@
+;AVX2: call void @llvm.masked.store.v8i32
+;AVX2: ret void
+
----------------
I believe this is unnecessary. We check for AVX2-LABEL here and later in other functions, so all AVX2-instructions should be between the two labels. Thus, the next label would designate the end of the current function, and there is no need to explicitly look for 'ret void'.

Also, I wonder if it's possible that basic blocks are printed in a different order - in this case we might get a fail even though the loop is correctly vectorized.

http://reviews.llvm.org/D6527

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list