[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved memory access

Hao Liu Hao.Liu at arm.com
Thu Jun 4 22:52:46 PDT 2015


I've updated the patch according to Michael's comments. My comments are inline.

Review please.

Thanks,
-Hao


================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:838-839
@@ +837,4 @@
+  // Two accesses in memory (stride is 2):
+  //     | A[0] |      | A[2] |      | A[4] |      | A[6] |      |
+  //                              | B[0] |      | B[2] |      | B[4] |
+  //
----------------
mzolotukhin wrote:
> Something went wrong with the indentation here.
No, The indentation is intended here to show the distance and overlap between the accesses to array A and B.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:667
@@ +666,3 @@
+private:
+  unsigned Delta; // Interleave Factor.
+  bool Reverse;
----------------
mzolotukhin wrote:
> Could we rename `Delta` to `InterleaveFactor` or something like this. When roaming through the code it'd be much easier to understand what it stands for - especially when the code is in-tree and one doesn't see the context of the current patch.
Reasonable. Renamed.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:690
@@ +689,3 @@
+///
+/// Call this class to analyze interleaved accesses only when we can vectorize
+/// a loop. Otherwise it's meaningless to do analysis as the vectorization
----------------
mzolotukhin wrote:
> s/Call this class/Use this class/
Fixed.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4455
@@ +4454,3 @@
+    if (std::abs(Stride) < 2 ||
+        std::abs(Stride) > static_cast<int>(MaxInterleaveStride))
+      continue;
----------------
mzolotukhin wrote:
> static_cast isn't needed here.
std::abs() returns a signed int. Comparing signed to unsigned will cause a warning from the compiler.

To avoid static_cast, I use a temperaral value for "std::abs(Stride)" in the new patch.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4552-4556
@@ +4551,7 @@
+
+      // Previous dependence check guarantees no dependence between two accesses
+      // if the distance is not multiple of the size. So just ignore such cases
+      // and won't worry about dependences.
+      if (DistanceToA % static_cast<int>(DesA.Size))
+        continue;
+
----------------
mzolotukhin wrote:
> What is the previous check you are referring to? And if it's guaranteed, could this `if` be replaced with an assertion?
It can not be assertion. This comment is hard to be understood. I've replaced it with another comment, which describe the thing we do right here.

A case @interleaved_stores in stride-access-dependence.ll checks for this situation. So no need to clarify this again.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4560
@@ +4559,3 @@
+      int IndexB =
+          Group->getIndex(A) + DistanceToA / static_cast<int>(DesA.Size);
+
----------------
mzolotukhin wrote:
> `static_cast` isn't necessary here (and if we want to cast everything  `getIndex` should also be casted from `unsigned`).
No, the division between signed and unsigned can not be casted.
I tested with a small case:
    int a = -4;
    unsigned b = 4;
    int c = a / b;
The result c is a very large numer (1073741823). Modulo is simialr, I once fixed a bug about "signed % unsigned" (didn't cast unsigned data) in SeparateConstOffsetFromGEP.cpp.

This is different from ADD/SUB.

================
Comment at: test/Transforms/LoopVectorize/interleaved-accesses.ll:1
@@ +1,2 @@
+; RUN: opt -S -loop-vectorize -instcombine -force-vector-width=4 -force-vector-interleave=1 -enable-interleaved-mem-accesses=true -runtime-memory-check-threshold=24 < %s | FileCheck %s
+
----------------
mzolotukhin wrote:
> Which of the tests needs `-runtime-memory-check-threshold=24`? Could we get rid of this flag by adding `noalias` attribute, or aliasing metadata?
We couldn't. Still too many runtime checks for accesses related to one pointer (Even though they are noalis).
E.g. 
       for( i = 0; i < 1024; i+=3) {
         A[i] += a;
         A[i+1] += b;
         A[i+2] += c
       }
It has 3 loads and 3 stores, which need 11 runtime checks (greater than the current threshold 8). The 3 stores will be compared with each other (3 checks), and each load will be compared with the 3 stores (3 * 3 checks). 

This could be improved in the future. No need to compare accesses in the same group. But currently, this is the work around to test.

================
Comment at: test/Transforms/LoopVectorize/interleaved-accesses.ll:39
@@ +38,3 @@
+  %arrayidx0 = getelementptr inbounds [1024 x i32], [1024 x i32]* @AB, i64 0, i64 %indvars.iv
+  %0 = load i32, i32* %arrayidx0, align 4
+  %1 = or i64 %indvars.iv, 1
----------------
mzolotukhin wrote:
> Please run the tests through `opt -instnamer` to replace %[0-9]+ names - they are really hard to deal with when one need to modify the test.
Renamed anonymous names. This is really a helpful command.

http://reviews.llvm.org/D9368

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list