[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved memory access
Hao Liu
Hao.Liu at arm.com
Thu Jun 4 22:52:46 PDT 2015
I've updated the patch according to Michael's comments. My comments are inline.
Review please.
Thanks,
-Hao
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:838-839
@@ +837,4 @@
+ // Two accesses in memory (stride is 2):
+ // | A[0] | | A[2] | | A[4] | | A[6] | |
+ // | B[0] | | B[2] | | B[4] |
+ //
----------------
mzolotukhin wrote:
> Something went wrong with the indentation here.
No, The indentation is intended here to show the distance and overlap between the accesses to array A and B.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:667
@@ +666,3 @@
+private:
+ unsigned Delta; // Interleave Factor.
+ bool Reverse;
----------------
mzolotukhin wrote:
> Could we rename `Delta` to `InterleaveFactor` or something like this. When roaming through the code it'd be much easier to understand what it stands for - especially when the code is in-tree and one doesn't see the context of the current patch.
Reasonable. Renamed.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:690
@@ +689,3 @@
+///
+/// Call this class to analyze interleaved accesses only when we can vectorize
+/// a loop. Otherwise it's meaningless to do analysis as the vectorization
----------------
mzolotukhin wrote:
> s/Call this class/Use this class/
Fixed.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4455
@@ +4454,3 @@
+ if (std::abs(Stride) < 2 ||
+ std::abs(Stride) > static_cast<int>(MaxInterleaveStride))
+ continue;
----------------
mzolotukhin wrote:
> static_cast isn't needed here.
std::abs() returns a signed int. Comparing signed to unsigned will cause a warning from the compiler.
To avoid static_cast, I use a temperaral value for "std::abs(Stride)" in the new patch.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4552-4556
@@ +4551,7 @@
+
+ // Previous dependence check guarantees no dependence between two accesses
+ // if the distance is not multiple of the size. So just ignore such cases
+ // and won't worry about dependences.
+ if (DistanceToA % static_cast<int>(DesA.Size))
+ continue;
+
----------------
mzolotukhin wrote:
> What is the previous check you are referring to? And if it's guaranteed, could this `if` be replaced with an assertion?
It can not be assertion. This comment is hard to be understood. I've replaced it with another comment, which describe the thing we do right here.
A case @interleaved_stores in stride-access-dependence.ll checks for this situation. So no need to clarify this again.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4560
@@ +4559,3 @@
+ int IndexB =
+ Group->getIndex(A) + DistanceToA / static_cast<int>(DesA.Size);
+
----------------
mzolotukhin wrote:
> `static_cast` isn't necessary here (and if we want to cast everything `getIndex` should also be casted from `unsigned`).
No, the division between signed and unsigned can not be casted.
I tested with a small case:
int a = -4;
unsigned b = 4;
int c = a / b;
The result c is a very large numer (1073741823). Modulo is simialr, I once fixed a bug about "signed % unsigned" (didn't cast unsigned data) in SeparateConstOffsetFromGEP.cpp.
This is different from ADD/SUB.
================
Comment at: test/Transforms/LoopVectorize/interleaved-accesses.ll:1
@@ +1,2 @@
+; RUN: opt -S -loop-vectorize -instcombine -force-vector-width=4 -force-vector-interleave=1 -enable-interleaved-mem-accesses=true -runtime-memory-check-threshold=24 < %s | FileCheck %s
+
----------------
mzolotukhin wrote:
> Which of the tests needs `-runtime-memory-check-threshold=24`? Could we get rid of this flag by adding `noalias` attribute, or aliasing metadata?
We couldn't. Still too many runtime checks for accesses related to one pointer (Even though they are noalis).
E.g.
for( i = 0; i < 1024; i+=3) {
A[i] += a;
A[i+1] += b;
A[i+2] += c
}
It has 3 loads and 3 stores, which need 11 runtime checks (greater than the current threshold 8). The 3 stores will be compared with each other (3 checks), and each load will be compared with the 3 stores (3 * 3 checks).
This could be improved in the future. No need to compare accesses in the same group. But currently, this is the work around to test.
================
Comment at: test/Transforms/LoopVectorize/interleaved-accesses.ll:39
@@ +38,3 @@
+ %arrayidx0 = getelementptr inbounds [1024 x i32], [1024 x i32]* @AB, i64 0, i64 %indvars.iv
+ %0 = load i32, i32* %arrayidx0, align 4
+ %1 = or i64 %indvars.iv, 1
----------------
mzolotukhin wrote:
> Please run the tests through `opt -instnamer` to replace %[0-9]+ names - they are really hard to deal with when one need to modify the test.
Renamed anonymous names. This is really a helpful command.
http://reviews.llvm.org/D9368
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list