[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved memory access

Wed May 27 04:04:35 PDT 2015

Updated a new patch refactored the LoopAccessAnalysis about dependence check on strided accesses.

It's difficult to understand the logic about dependences, so I added many comments to describe it.

Review please.

Thanks,
-Hao


================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:781-782
@@ -785,1 +780,4 @@
                            VectorizerParams::VectorizationInterleave : 1);
+  // The number of iterations to be vectorized and unrolled.
+  unsigned NumIter = std::max(ForcedFactor * ForcedUnroll, 2U);
+
----------------
anemet wrote:
> Please split out of this part of the change -- a cleanup, and commit it separately (and upate this patch for easier read)
Hi Adma,

I Agree. I'll do that when commiting the patch.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:785-787
@@ +784,5 @@
+  unsigned Stride = std::abs(StrideAPtr);
+  // Safe when either one below is true
+  //     Distance < Stride * TypeByteSize
+  //     Distance >= TypeByteSize * NumIter * Stride
+  if (!(Distance < Stride * TypeByteSize ||
----------------
anemet wrote:
> I think that these two cases are quite different.  The first is essentially proving that accesses like A[2*i] and A[2*i + 1] are independent.  This is true in general not just for vectorization.  Thus we should be returning NoDep rather than BackwardVectorizable.
> 
> Please also add more comments/examples, it took me a long time to figure this out (hopefully I did at the end).  Please also add LAA tests.
Hi Adam,

You are right. I've updated a new patch refactored this logic. I added a lot of new test cases as well.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:795-806
@@ -786,16 +794,14 @@
 
-  // The distance must be bigger than the size needed for a vectorized version
-  // of the operation and the size of the vectorized operation must not be
-  // bigger than the currrent maximum size.
-  if (Distance < 2*TypeByteSize ||
-      2*TypeByteSize > MaxSafeDepDistBytes ||
-      Distance < TypeByteSize * ForcedUnroll * ForcedFactor) {
-    DEBUG(dbgs() << "LAA: Failure because of Positive distance "
-        << Val.getSExtValue() << '\n');
+  // Safe when positive distance is not greater than the max safe distance.
+  if (Distance > MaxSafeDepDistBytes) {
+    DEBUG(dbgs() << "LAA: Failure because positive distance "
+                 << Val.getSExtValue() << " is greater than max safe distance "
+                 << MaxSafeDepDistBytes << "\n");
     return Dependence::Backward;
   }
 
-  // Positive distance bigger than max vectorization factor.
-  MaxSafeDepDistBytes = Distance < MaxSafeDepDistBytes ?
-    Distance : MaxSafeDepDistBytes;
+  // If Distance < Stride * TypeByteSize, it is always safe. just keep the
+  // current MaxSafeDepDistBytes. Otherwise, update the MaxSafeDepDistBytes.
+  if (Distance >= Stride * TypeByteSize)
+    MaxSafeDepDistBytes = Distance;
 
----------------
anemet wrote:
> I am not sure I follow why you change the structure of the MaxSafeDepDistBytes logic here.  Why aren't you simply comparing MaxSafeDepDistBytes with TypeByteSize * NumIter * Stride?
> 
> Looks like this will modify the existing behavior for stride=1 which is probably not what you intend.
I think preivous logic is not correct. If "Distance > MaxSafeDepDistBytes", it is not safe to do vectorization.

http://reviews.llvm.org/D9368

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/