# [PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved memory access

Thu Jun 4 00:59:40 PDT 2015

```================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:858-883
@@ +857,28 @@
+  // iteration needs TypeByteSize (No need to plus the last gap size).
+  unsigned MinSizeNeeded =
+      TypeByteSize * Stride * (MinNumIter - 1) + TypeByteSize;
+
+  // It's not vectorizable if the distance is smaller than the minimum size
+  // needed for a vectroized/unrolled version.
+  //
+  // E.g. Assume one char is 1 byte in memory and one int is 4 bytes.
+  //      foo(int *A) {
+  //        int *B = (int *)((char *)A + 14);
+  //        for (i = 0 ; i < 1024 ; i += 2)
+  //          B[i] = A[i] + 1;
+  //      }
+  //
+  // Two accesses in memory (stride is 2):
+  //     | A[0] |      | A[2] |      | A[4] |      | A[6] |      |
+  //                              | B[0] |      | B[2] |      | B[4] |
+  //
+  // The distance is 14 in bytes from B[i] to A[i].
+  // The minimum size needed is: 4 * 2 * (MinNumIter - 1) + 4.
+  //
+  // If the minimum number of iterations is 2, it is
+  // vectorizable as the minimum size needed is 12 which is less than distance.
+  //
+  // If the minimum number of iterations is 4 (Say if a user forces the
+  // vectorization factor to be 4), the minimum size needed is 28 which is
+  // greater than distance. It is not safe.
+  if (MinSizeNeeded > Distance) {
----------------
HaoLiu wrote:
> HaoLiu wrote:
> > anemet wrote:
> > > anemet wrote:
> > > > I think this is correct but I wonder if the example was less contrived if you used:
> > > >
> > > >   for (i = 0; i < 1024; i+= 3)
> > > >     A[i + 4] = A[i] + 1
> > > >
> > > > MinSizeNeeded is 4 * 3 * (2 - 1) + 4 = 16 which is equal to the distance.
> > > >
> > > > Also a nit: most or all of this comment is explaining why MinSizeNeeded is computed the way it is so the comment should be before the computation.
> > > MinDistanceNeeded is probably a better name.
> > This case is vectorizable as MinDistanceNeeded is exactly equal to the distance. If the distance is smaller (even 1 byte smaller), it can not be vectorizable.
> >
> > Actually we already has a similar case in memdep.ll:
> >    for (i = 0; i < 1024; ++i)
> >        A[i+2] = A[i] + 1;
> > The MinDistanceNeeded is 2*4 = 8. The distance is also 8.
> Fixed
I didn't quite understand what you were saying here but looks like you didn't change the comment, so I guess you disagree that my example is better.  Your example is vectorizable if miniter is 2 and so is mine, so I don't understand your reply.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:693-714
@@ +692,24 @@
+
+  // (1) If the scaled distance is less than the stride.
+  // E.g.
+  //      for (i = 0; i < 1024 ; i += 4)
+  //        A[i+2] = A[i] + 1;
+  //
+  // Two accesses in memory (scaled distance is 2, stride is 4):
+  //     | A[0] |      |      |      | A[4] |      |      |      |
+  //     |      |      | A[2] |      |      |      | A[6] |      |
+  //
+  // (2) Otherwise, no dependence if the scaled distance is not multiple of
+  //     the stride.
+  // E.g.
+  //      for (i = 0; i < 1024 ; i += 3)
+  //        A[i+4] = A[i] + 1;
+  //
+  // Two accesses in memory (scaled distance is 4, stride is 3):
+  //     | A[0] |      |      | A[3] |      |      | A[6] |      |      |
+  //     |      |      |      |      | A[4] |      |      | A[7] |      |
+  if (ScaledDist < Stride)
+    return true;
+  else
+    return ScaledDist % Stride;
+}
----------------
Now, that I look at this again, the second covers the first case as well (i.e. superset).  There is room to simplify it.

================
Comment at: test/Analysis/LoopAccessAnalysis/stride-access-dependence.ll:343-344
@@ +342,4 @@
+
+; FIXME: This case looks like previous case @vectorizable_Read_Write. It sould
+; to be vectorizable.
+
----------------
"It should be vectorizable."

http://reviews.llvm.org/D9368

EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/

```