[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved memory access
Adam Nemet
anemet at apple.com
Fri May 29 12:53:47 PDT 2015
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:684-705
@@ +683,24 @@
+
+ // (1) If the scaled distance is less than the stride, no dependence. It's
+ // still considered as no dependence if the distance is 0.
+ // E.g.
+ // for (i = 0; i < 1024 ; i += 4)
+ // A[i+2] = A[i] + 1;
+ //
+ // Two accesses in memory (scaled distance is 2, stride is 4):
+ // | A[0] | | | | A[4] | | | |
+ // | | | A[2] | | | | A[6] | |
+ //
+ // (2) Otherwise, no dependence if the greatest common divisor is 1.
+ // E.g.
+ // for (i = 0; i < 1024 ; i += 3)
+ // A[i+4] = A[i] + 1;
+ //
+ // Two accesses in memory (scaled distance is 4, stride is 3):
+ // | A[0] | | | A[3] | | | A[6] | | |
+ // | | | | | A[4] | | | A[7] | |
+ if (ScaledDistance < Stride)
+ return true;
+ else
+ return GreatestCommonDivisor64(ScaledDistance, Stride) == 1;
+}
----------------
I don't think this is GCD. Consider the values Stride=4 and ScaledDistance=6:
4 * i = 4 * j + 6 can not be valid for any integer values of i and j.
I think the Diophantine equation is this: ScaledDistance/Stride = (i - j) where i - j has to be integer.
So the two accesses can only reference the same element if ScaledDistance is a multiple of Stride.
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:713
@@ +712,3 @@
+/// \returns true if they are independent.
+static bool StrideAccessesIndependent(unsigned Distance, unsigned Stride,
+ unsigned TypeByteSize) {
----------------
areStridedAccessIndependent is the conforming name (funtions are lower-case). Also not the the 'd' at the end of Strided. Same for the previous function.
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:722-740
@@ +721,21 @@
+
+ // If the distance is not multiple of size, we couldn't get a scaled distance.
+ // Consider about two adjacent scaled distances.
+ //
+ // E.g. Assume one char is 1 byte in memory and one int is 4 bytes.
+ // foo(int *A) {
+ // int *B = (int *)((char *)A + 5);
+ // for (i = 0 ; i < 1024 ; i += 2)
+ // B[i] = A[i] + 1;
+ // }
+ //
+ // Two accesses in memory (distance is 5 in bytes, stride is 2):
+ // | A[0] | | A[2] | | A[4] | | A[6] | |
+ // | B[0] | | B[2] | | B[4] | | B[6] |
+ //
+ // The distance is 5, we check two adjacent scaled distances: 1 and 2.
+ // In this case it is independent when the scaled distance is 1 but it is
+ // dependent when the scaled distance is 2.
+ return StrideAccessesIndependentOnScaledDist(ScaledDistance, Stride) &&
+ StrideAccessesIndependentOnScaledDist(ScaledDistance + 1, Stride);
+}
----------------
If you want to handle this case I prefer we come back to it later as a separate LAA-only patch. Let's focus on the basics here that's needed for LV to handle the basic cases.
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:855
@@ +854,3 @@
+
+ // The minimum size needed for a vectroized/unrolled version. Vectorizing one
+ // iteration in front needs TypeByteSize * Stride. Vectorizing the last
----------------
vectorized is misspelled
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:858
@@ +857,3 @@
+ // iteration needs TypeByteSize (No need to plus the last gap size).
+ unsigned MinSizeNeeded =
+ TypeByteSize * Stride * (MinNumIter - 1) + TypeByteSize;
----------------
anemet wrote:
> I think this is correct but I wonder if the example was less contrived if you used:
>
> for (i = 0; i < 1024; i+= 3)
> A[i + 4] = A[i] + 1
>
> MinSizeNeeded is 4 * 3 * (2 - 1) + 4 = 16 which is equal to the distance.
>
> Also a nit: most or all of this comment is explaining why MinSizeNeeded is computed the way it is so the comment should be before the computation.
MinDistanceNeeded is probably a better name.
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:858-883
@@ +857,28 @@
+ // iteration needs TypeByteSize (No need to plus the last gap size).
+ unsigned MinSizeNeeded =
+ TypeByteSize * Stride * (MinNumIter - 1) + TypeByteSize;
+
+ // It's not vectorizable if the distance is smaller than the minimum size
+ // needed for a vectroized/unrolled version.
+ //
+ // E.g. Assume one char is 1 byte in memory and one int is 4 bytes.
+ // foo(int *A) {
+ // int *B = (int *)((char *)A + 14);
+ // for (i = 0 ; i < 1024 ; i += 2)
+ // B[i] = A[i] + 1;
+ // }
+ //
+ // Two accesses in memory (stride is 2):
+ // | A[0] | | A[2] | | A[4] | | A[6] | |
+ // | B[0] | | B[2] | | B[4] |
+ //
+ // The distance is 14 in bytes from B[i] to A[i].
+ // The minimum size needed is: 4 * 2 * (MinNumIter - 1) + 4.
+ //
+ // If the minimum number of iterations is 2, it is
+ // vectorizable as the minimum size needed is 12 which is less than distance.
+ //
+ // If the minimum number of iterations is 4 (Say if a user forces the
+ // vectorization factor to be 4), the minimum size needed is 28 which is
+ // greater than distance. It is not safe.
+ if (MinSizeNeeded > Distance) {
----------------
I think this is correct but I wonder if the example was less contrived if you used:
for (i = 0; i < 1024; i+= 3)
A[i + 4] = A[i] + 1
MinSizeNeeded is 4 * 3 * (2 - 1) + 4 = 16 which is equal to the distance.
Also a nit: most or all of this comment is explaining why MinSizeNeeded is computed the way it is so the comment should be before the computation.
================
Comment at: test/Analysis/LoopAccessAnalysis/analyze-stride-access-depedences.ll:216-228
@@ +215,15 @@
+
+; Following cases check that strided accesses can be vectorized.
+
+; void vectorizable_Read_Write(int *A) {
+; int *B = A + 4;
+; for (unsigned i = 0; i < 1024; i+=2)
+; B[i] = A[i] + 1;
+; }
+
+; CHECK: function 'vectorizable_Read_Write':
+; CHECK: Interesting Dependences:
+; CHECK: BackwardVectorizable:
+; CHECK: %0 = load i32, i32* %arrayidx, align 4 ->
+; CHECK: store i32 %add, i32* %arrayidx2, align 4
+
----------------
As a follow-on patch, it would probably be a good idea to print MaxSafeDepDistance in LAI::analyze and check its value here.
================
Comment at: test/Analysis/LoopAccessAnalysis/analyze-stride-access-depedences.ll:224-228
@@ +223,7 @@
+
+; CHECK: function 'vectorizable_Read_Write':
+; CHECK: Interesting Dependences:
+; CHECK: BackwardVectorizable:
+; CHECK: %0 = load i32, i32* %arrayidx, align 4 ->
+; CHECK: store i32 %add, i32* %arrayidx2, align 4
+
----------------
Use CHECK-NEXTs where you can. Same later.
================
Comment at: test/Analysis/LoopAccessAnalysis/analyze-stride-access-depedences.ll:331
@@ +330,3 @@
+
+; FIXME: This case should like @vectorizable_Read_Write to be vectorizable.
+; CHECK: function 'vectorizable_unscaled_Read_Write':
----------------
should *look* like?
http://reviews.llvm.org/D9368
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list