[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved memory access

Hao Liu Hao.Liu at arm.com
Thu May 7 22:46:11 PDT 2015


I've attached a new patch Diff25284 according to the comments from Michael and Renato.

This patch also handles two things:

  (1) Add a new function in BasicTTIImpl.h for the default cost model.
  (2) Handle the redundant load/store case. (The collection phase will return early if there are redundant accesses)

Review please.

Thanks,
-Hao


================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:533
@@ -532,2 +532,3 @@
           *Ptr << " SCEV: " << *PtrScev << "\n");
+    return 0;
   }
----------------
rengolin wrote:
> HaoLiu wrote:
> > hfinkel wrote:
> > > Is this just a drive-by bug fix?
> > Yes. I just came across this minor bug.
> > As it's quite obvious and simple, should I just commit this fix?
> Yes, if this is an unrelated fix, please submit a separate diff with a full description of what it is and a test.
OK. I'll do that in another patch.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:782
@@ +781,3 @@
+  unsigned NumIterations = ForcedFactor * ForcedUnroll;
+  // The number of vectorized & interleaved iterations should not be less than 2
+  if (NumIterations < 2)
----------------
rengolin wrote:
> HaoLiu wrote:
> > rengolin wrote:
> > > why not?
> > Sorry that my comment is confusing.
> > 
> > I mean the "ForcedFactor * ForcedUnroll" will never be 1. 
> >     (1) If the VectorizationFactor and VectorizationInterleave are both set to be 1 by user, this is the same to disable the loop vectorizer. It will never execute to this point.
> >     (2) If either of the Factors is forced, the number of iterations will also never be 1. See if user set "-force-vector-interleaved=1", the vector factor will be at least 2. Or if user set "-force-vector-width=1", the unroll factor will be at least 2.
> > 
> > So I set the number of iterations to be 2 if either factors is forced. This can simplify the following distance checks.
> > 
> > I've changed the comments to make it more clear.
> > 
> > 
> If one of the variables must be > 1 at this point, I don't see why you need the check below.
If one parameter is not forced, it is initialized to be 0. If either of the parameter is forced, we need to set it properly.
Anyway, I've changed the logic again to make it more clear.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:776-779
@@ -779,8 +775,6 @@
                            VectorizerParams::VectorizationInterleave : 1);
-
-  // The distance must be bigger than the size needed for a vectorized version
-  // of the operation and the size of the vectorized operation must not be
-  // bigger than the currrent maximum size.
-  if (Distance < 2*TypeByteSize ||
-      2*TypeByteSize > MaxSafeDepDistBytes ||
-      Distance < TypeByteSize * ForcedUnroll * ForcedFactor) {
+  unsigned NumIterations = ForcedFactor * ForcedUnroll;
+  // The minimum number of vectorized & interleaved iterations is 2.
+  if (NumIterations < 2)
+    NumIterations = 2;
+
----------------
mzolotukhin wrote:
> From your and Renato's comments I think it's more appropriate to have an assert here. Otherwise, this is very misleading.
> 
Please have a look at the new code. It could be 0 or 1 if either of the parameter is not forced. In that case, we need to assign a minimum number of 2.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:581
@@ +580,3 @@
+    SmallSet<InterleavedAccess *, 4> DelSet;
+    // Collecting all the pointers to avoid delete a pointer twice,
+    for (auto &I : InterleavedAccessMap)
----------------
mzolotukhin wrote:
> Nitpick: s/twice,/twice./
Fixed.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1972
@@ +1971,3 @@
+  }
+  return;
+}
----------------
mzolotukhin wrote:
> This looks redundant.
Removed.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3763-3764
@@ +3762,4 @@
+
+static bool isConsecutiveAccess(Instruction *A, Instruction *B,
+                                ScalarEvolution *SE, const DataLayout &DL) {
+  Value *PtrA = getPointerOperand(A);
----------------
mzolotukhin wrote:
> This code is a dupe of the one from SLP vectorizer. Could we do anything about it?
As Renato suggested, I've move such code into Transforms/Utils/VectorUtils, which consists utilities for vectorizer.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3833-3835
@@ +3832,5 @@
+    SmallVector<Instruction *, 4> &Queue, int Stride, bool IsLoad) {
+  unsigned QSize = Queue.size();
+  if (QSize < 2)
+    return;
+
----------------
mzolotukhin wrote:
> It'd better be
> ```
> if (Queue.size() < 2)
>   return;
> ```
> since `QSize` isn't used anywhere else.
Done.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3837-3852
@@ +3836,18 @@
+
+  SetVector<Instruction *> Heads;
+  SetVector<Instruction *> Tails;
+  SmallDenseMap<Instruction *, Instruction *> ConsecutiveChain;
+
+  // Collect all the consecutive pairs.
+  const DataLayout &DL = TheLoop->getHeader()->getModule()->getDataLayout();
+  for (auto &H : Queue) {
+    for (auto &T : Queue) {
+      if (H == T || !isConsecutiveAccess(H, T, SE, DL))
+        continue;
+
+      Heads.insert(H);
+      Tails.insert(T);
+      ConsecutiveChain[H] = T;
+    }
+  }
+
----------------
mzolotukhin wrote:
> This is very similar to the code from SLP. Any chance to reuse it?
Yes, but it's still some differences.
If we want to reuse it, also need to modify SLP a lot. I think it's better to clean up this with a separate patch.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3868
@@ +3867,3 @@
+    unsigned Size = ConsecutiveList.size();
+    unsigned Num = std::abs(Stride);
+    if (Size < Num)
----------------
mzolotukhin wrote:
> This name is misleading.
Renamed both "Size" and "Num".

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3890-3891
@@ +3889,4 @@
+      }
+      // The alignment of the interleaved access is the max member alignment.
+      IA->Align = MaxAlign;
+
----------------
mzolotukhin wrote:
> Shouldn't it be minimal alignment instead?
Ah, right.
I've changed this logic to be min align.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3921-3922
@@ +3920,4 @@
+  }
+
+  return;
+}
----------------
mzolotukhin wrote:
> This is redundant.
Removed.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3943-3945
@@ +3942,5 @@
+  //        load A[i], store A[i], load A[i+1], store A[i+1].
+  for (Loop::block_iterator bb = TheLoop->block_begin(),
+                            be = TheLoop->block_end();
+       bb != be; ++bb) {
+    if (blockNeedsPredication(*bb))
----------------
mzolotukhin wrote:
> Auto + range loops could be used here.
Refactored.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3951
@@ +3950,3 @@
+    SmallVector<Instruction *, 4> AccessQueue;
+    bool TryMerge = false;
+    Value *CurBase = nullptr;
----------------
mzolotukhin wrote:
> When I first read this code, this name arose a question "Merge with what?" to me.
Removed.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3955-3956
@@ +3954,4 @@
+    bool IsLoad = false;
+    for (BasicBlock::iterator it = (*bb)->begin(), e = (*bb)->end(); it != e;
+         ++it) {
+      if (TryMerge) {
----------------
mzolotukhin wrote:
> mzolotukhin wrote:
> > Same here.
> This loop is much more complicated than it should be I think.
> 
> If we can combine ` if (!StridedAccesses.count(it))` with ` if (CurBase != Base || CurStride != Stride || IsLoad != Read)`, we'll be able to get rid of `TryMerge` and of all these jumps to beginning of the loop.
> 
Done.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3955-3963
@@ +3954,11 @@
+    bool IsLoad = false;
+    for (BasicBlock::iterator it = (*bb)->begin(), e = (*bb)->end(); it != e;
+         ++it) {
+      if (TryMerge) {
+        collectInterleavedAccess(AccessQueue, CurStride, IsLoad);
+        AccessQueue.clear();
+        CurBase = nullptr;
+        TryMerge = false;
+      }
+
+      if (!it->mayReadOrWriteMemory())
----------------
HaoLiu wrote:
> mzolotukhin wrote:
> > mzolotukhin wrote:
> > > Same here.
> > This loop is much more complicated than it should be I think.
> > 
> > If we can combine ` if (!StridedAccesses.count(it))` with ` if (CurBase != Base || CurStride != Stride || IsLoad != Read)`, we'll be able to get rid of `TryMerge` and of all these jumps to beginning of the loop.
> > 
> Done.
Done with the new patch.
I changed the layout of this loop, now we only call "collect" function once in the loop. But after the loop, still need to call it once to handle the tail.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3988
@@ +3987,3 @@
+
+      // This instructure doesn't fit the current queue. Try to merge current
+      // queue and insert it again.
----------------
mzolotukhin wrote:
> s/instructure/instruction/
Fixed.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4004-4006
@@ +4003,5 @@
+      collectInterleavedAccess(AccessQueue, CurStride, IsLoad);
+      AccessQueue.clear();
+      CurBase = nullptr;
+      TryMerge = false;
+    }
----------------
mzolotukhin wrote:
> This seems to be redundant.
Removed.

http://reviews.llvm.org/D9368

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list