[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved memory access

Michael Zolotukhin mzolotukhin at apple.com
Thu May 21 18:55:21 PDT 2015


Hi Hao,

Please find more remarks from me inline, andI think this is the last bunch of comments from me. With them properly addressed I'm fine with committing the patch. However, I'd also like to hear from Adam regarding the LoopAccessAnalysis part, since he's been working actively on it.

And again, thanks for working on this, it's a much-needed feature!

Michael


================
Comment at: include/llvm/Analysis/LoopAccessAnalysis.h:408
@@ +407,3 @@
+private:
+  int Delta; // Interleave Factor.
+  bool Reverse;
----------------
Should it be `unsigned`?

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:988
@@ +987,3 @@
+
+    for (auto II = std::next(I); II != E; ++II) {
+      Instruction *B = II->first;
----------------
I think you missed my question earlier on this part.

This algorithm is clearly quadratic, which is dangerous for compile time. We need to either make it linear, or add some limits (e.g. if number of accesses > 100, give up and don't even try to group them).

Also, if we separate it into two phases: 1) grouping accesses, 2) finding write-after-write pairs, I think we can make it more efficient. Since it's sufficient to compare against any access from a group, we'll be quadratic on the number of groups, not the number of accesses. And when we look for WaW pairs, we can only look only inside a group, not across all accesses in the loop. Does it sound reasonable, or am I missing something?


================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:703
@@ -695,3 +702,3 @@
 
-  const LoopAccessInfo *getLAI() const {
-    return LAI;
+  const LoopAccessInfo *getLAI() const { return LAI; }
+
----------------
This change is unnecessary.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1676
@@ +1675,3 @@
+// Get a mask to interleave \p NumVec vectors into a wide vector.
+// I.E  <0, VF, VF*2, ..., VF*(NumVec-1), 1, VF+1, VF*2+1, ...>
+// E.g. For 2 interleaved vectors, if VF is 4, the mask is:
----------------
Nitpick: s/I.E/I.e./

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1756-1757
@@ +1755,4 @@
+    for (unsigned i = 0; i < NumVec / 2; i++)
+      TmpList.push_back(
+          ConcatenateTwoVectors(Builder, VecList[2 * i], VecList[2 * i + 1]));
+
----------------
Maybe it's worth mentioning here that though `ConcatenateTwoVectors` could extend a vector with UNDEFs, that only could happens with the last vector in the set.

Currently implementation guarantees that, but with the code evolving in future it can be easily violated if it's not stated clearly. Maybe an assertion could be even better here (like if it's not the last pair of vectors, then their sizes should be the same), but it might be an overkill.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1764
@@ +1763,3 @@
+    VecList.clear();
+    VecList.append(TmpList.begin(), TmpList.end());
+    NumVec = VecList.size();
----------------
Could we just assign `TmpList` to `VecList`?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1823
@@ +1822,3 @@
+  unsigned Index = Group->getIndex(Instr);
+  for (unsigned Part = 0; Part < UF; Part++) {
+    // Notice current instruction could be any index. Need to adjust the address
----------------
Should we iterate to `VF` instead of `UF` here? `PtrParts` contains `VF` elements, right?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1835-1836
@@ +1834,4 @@
+    // Current pointer is pointed to A[i+2], adjust it to A[i].
+    Value *NewPtr =
+        Builder.CreateExtractElement(PtrParts[Part], Builder.getInt32(0));
+    NewPtr = Builder.CreateGEP(NewPtr, Builder.getInt32(-Index));
----------------
I'm not sure it's correct if we vectorize *and* unroll the loop. Don't we need to adjust the pointer according to which unrolling-part we are at (instead of using `0`)?

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1857
@@ +1856,3 @@
+    for (unsigned Part = 0; Part < UF; Part++) {
+      Instruction *CallI = Builder.CreateAlignedLoad(
+          NewPtrs[Part], Group->getAlign(), "wide.vec");
----------------
mzolotukhin wrote:
> Name `CallI` is very misleading.
Name `CallI` is very misleading.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1857-1858
@@ +1856,4 @@
+    for (unsigned Part = 0; Part < UF; Part++) {
+      Instruction *CallI = Builder.CreateAlignedLoad(
+          NewPtrs[Part], Group->getAlign(), "wide.vec");
+
----------------
Name `CallI` is very misleading.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1904-1906
@@ +1903,5 @@
+
+      // If this member has different type, cast it to a unified type.
+      if (StoredVec->getType() != SubVT)
+        StoredVec = Builder.CreateBitOrPointerCast(StoredVec, SubVT);
+
----------------
Could you please provide an example when this triggers? I'm a bit concerned about legality of this cast.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1919-1920
@@ +1918,4 @@
+
+    Instruction *CallI =
+        Builder.CreateAlignedStore(IVec, NewPtrs[Part], Group->getAlign());
+    propagateMetadata(CallI, Instr);
----------------
The name is misleading.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4871
@@ +4870,3 @@
+      // Calculate the cost of the whole interleaved group.
+      // FIXME: The interleaved load group with a huge gap may be even expensive
+      // than scalar operations. Then we need to replace it with scalar cost.
----------------
s/even expensive/even more expensive/

================
Comment at: test/Transforms/LoopVectorize/interleaved-accesses.ll:1
@@ +1,2 @@
+; RUN: opt -S -loop-vectorize -instcombine -force-vector-width=4 -force-vector-interleave=1 -enable-interleaving=true -runtime-memory-check-threshold=24 < %s | FileCheck %s
+
----------------
Is it possible to use `noalias` attributes for arguments and drop `-runtime-memory-check-threshold=24`?

http://reviews.llvm.org/D9368

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list