[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved memory access

Hao Liu Hao.Liu at arm.com
Sun May 31 23:56:02 PDT 2015


Refactored the patch according to Adam's comments.

I also found a bug that write after load to the same location like:

  A[i] = a;
  b = A[i];

is not looked as store-load forwarding currently. As this could affect the correctness, I fixed this with slight modification. 2 new test cases are added.

Review please.

Thanks,
-Hao


================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:684-705
@@ +683,24 @@
+
+  // (1) If the scaled distance is less than the stride, no dependence. It's
+  // still considered as no dependence if the distance is 0.
+  // E.g.
+  //      for (i = 0; i < 1024 ; i += 4)
+  //        A[i+2] = A[i] + 1;
+  //
+  // Two accesses in memory (scaled distance is 2, stride is 4):
+  //     | A[0] |      |      |      | A[4] |      |      |      |
+  //     |      |      | A[2] |      |      |      | A[6] |      |
+  //
+  // (2) Otherwise, no dependence if the greatest common divisor is 1.
+  // E.g.
+  //      for (i = 0; i < 1024 ; i += 3)
+  //        A[i+4] = A[i] + 1;
+  //
+  // Two accesses in memory (scaled distance is 4, stride is 3):
+  //     | A[0] |      |      | A[3] |      |      | A[6] |      |      |
+  //     |      |      |      |      | A[4] |      |      | A[7] |      |
+  if (ScaledDistance < Stride)
+    return true;
+  else
+    return GreatestCommonDivisor64(ScaledDistance, Stride) == 1;
+}
----------------
anemet wrote:
> I don't think this is GCD.  Consider the values Stride=4 and ScaledDistance=6:
> 
> 4 * i = 4 * j + 6 can not be valid for any integer values of i and j.
> 
> I think the Diophantine equation is this: ScaledDistance/Stride = (i - j) where i - j has to be integer.
> 
> So the two accesses can only reference the same element if ScaledDistance is a multiple of Stride.
Ah, not GCD. 
You are right!

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:713
@@ +712,3 @@
+/// \returns true if they are independent.
+static bool StrideAccessesIndependent(unsigned Distance, unsigned Stride,
+                                      unsigned TypeByteSize) {
----------------
anemet wrote:
> areStridedAccessIndependent is the conforming name (funtions are lower-case).  Also not the the 'd' at the end of Strided.  Same for the previous function. 
Agree.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:722-740
@@ +721,21 @@
+
+  // If the distance is not multiple of size, we couldn't get a scaled distance.
+  // Consider about two adjacent scaled distances.
+  //
+  // E.g. Assume one char is 1 byte in memory and one int is 4 bytes.
+  //      foo(int *A) {
+  //        int *B = (int *)((char *)A + 5);
+  //        for (i = 0 ; i < 1024 ; i += 2)
+  //          B[i] = A[i] + 1;
+  //      }
+  //
+  // Two accesses in memory (distance is 5 in bytes, stride is 2):
+  //     | A[0] |      | A[2] |      | A[4] |      | A[6] |      |
+  //              | B[0] |      | B[2] |      | B[4] |      | B[6] |
+  //
+  // The distance is 5, we check two adjacent scaled distances: 1 and 2.
+  // In this case it is independent when the scaled distance is 1 but it is
+  // dependent when the scaled distance is 2.
+  return StrideAccessesIndependentOnScaledDist(ScaledDistance, Stride) &&
+         StrideAccessesIndependentOnScaledDist(ScaledDistance + 1, Stride);
+}
----------------
anemet wrote:
> If you want to handle this case I prefer we come back to it later as a separate LAA-only patch.  Let's focus on the basics here that's needed for LV to handle the basic cases. 
Agree. Also this is rare case. I think it won't affect about the performance.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:855
@@ +854,3 @@
+
+  // The minimum size needed for a vectroized/unrolled version. Vectorizing one
+  // iteration in front needs TypeByteSize * Stride. Vectorizing the last
----------------
anemet wrote:
> vectorized is misspelled
Fixed.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:858
@@ +857,3 @@
+  // iteration needs TypeByteSize (No need to plus the last gap size).
+  unsigned MinSizeNeeded =
+      TypeByteSize * Stride * (MinNumIter - 1) + TypeByteSize;
----------------
HaoLiu wrote:
> anemet wrote:
> > anemet wrote:
> > > I think this is correct but I wonder if the example was less contrived if you used:
> > > 
> > >   for (i = 0; i < 1024; i+= 3)
> > >     A[i + 4] = A[i] + 1
> > > 
> > > MinSizeNeeded is 4 * 3 * (2 - 1) + 4 = 16 which is equal to the distance.
> > > 
> > > Also a nit: most or all of this comment is explaining why MinSizeNeeded is computed the way it is so the comment should be before the computation.
> > MinDistanceNeeded is probably a better name.
> This case is vectorizable as MinDistanceNeeded is exactly equal to the distance. If the distance is smaller (even 1 byte smaller), it can not be vectorizable.
> 
> Actually we already has a similar case in memdep.ll:
>    for (i = 0; i < 1024; ++i)
>        A[i+2] = A[i] + 1;
> The MinDistanceNeeded is 2*4 = 8. The distance is also 8.
Fixed

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:858-883
@@ +857,28 @@
+  // iteration needs TypeByteSize (No need to plus the last gap size).
+  unsigned MinSizeNeeded =
+      TypeByteSize * Stride * (MinNumIter - 1) + TypeByteSize;
+
+  // It's not vectorizable if the distance is smaller than the minimum size
+  // needed for a vectroized/unrolled version.
+  //
+  // E.g. Assume one char is 1 byte in memory and one int is 4 bytes.
+  //      foo(int *A) {
+  //        int *B = (int *)((char *)A + 14);
+  //        for (i = 0 ; i < 1024 ; i += 2)
+  //          B[i] = A[i] + 1;
+  //      }
+  //
+  // Two accesses in memory (stride is 2):
+  //     | A[0] |      | A[2] |      | A[4] |      | A[6] |      |
+  //                              | B[0] |      | B[2] |      | B[4] |
+  //
+  // The distance is 14 in bytes from B[i] to A[i].
+  // The minimum size needed is: 4 * 2 * (MinNumIter - 1) + 4.
+  //
+  // If the minimum number of iterations is 2, it is
+  // vectorizable as the minimum size needed is 12 which is less than distance.
+  //
+  // If the minimum number of iterations is 4 (Say if a user forces the
+  // vectorization factor to be 4), the minimum size needed is 28 which is
+  // greater than distance. It is not safe.
+  if (MinSizeNeeded > Distance) {
----------------
anemet wrote:
> anemet wrote:
> > I think this is correct but I wonder if the example was less contrived if you used:
> > 
> >   for (i = 0; i < 1024; i+= 3)
> >     A[i + 4] = A[i] + 1
> > 
> > MinSizeNeeded is 4 * 3 * (2 - 1) + 4 = 16 which is equal to the distance.
> > 
> > Also a nit: most or all of this comment is explaining why MinSizeNeeded is computed the way it is so the comment should be before the computation.
> MinDistanceNeeded is probably a better name.
This case is vectorizable as MinDistanceNeeded is exactly equal to the distance. If the distance is smaller (even 1 byte smaller), it can not be vectorizable.

Actually we already has a similar case in memdep.ll:
   for (i = 0; i < 1024; ++i)
       A[i+2] = A[i] + 1;
The MinDistanceNeeded is 2*4 = 8. The distance is also 8.

================
Comment at: test/Analysis/LoopAccessAnalysis/analyze-stride-access-depedences.ll:216-228
@@ +215,15 @@
+
+; Following cases check that strided accesses can be vectorized.
+
+; void vectorizable_Read_Write(int *A) {
+;   int *B = A + 4;
+;   for (unsigned i = 0; i < 1024; i+=2)
+;     B[i] = A[i] + 1;
+; }
+
+; CHECK: function 'vectorizable_Read_Write':
+; CHECK:     Interesting Dependences:
+; CHECK:       BackwardVectorizable:
+; CHECK:           %0 = load i32, i32* %arrayidx, align 4 -> 
+; CHECK:           store i32 %add, i32* %arrayidx2, align 4
+
----------------
anemet wrote:
> As a follow-on patch, it would probably be a good idea to print MaxSafeDepDistance in LAI::analyze and check its value here.
MaxSafeDepDistance can be printed out in "-debug-only". I don't know how to print it properly. Also as my comments on MaxSafeDepDistance say, I think we should refactor MaxSafeDepDistance as it could prevent some vectorization opportunities.

http://reviews.llvm.org/D9368

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list