[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved memory access

Hao Liu Hao.Liu at arm.com
Tue May 12 05:35:10 PDT 2015


Hi Renato,

> Also, do we have those patterns recognised in the back end as VLDn / VSTn? If so, would be good to emit ASM in the arch-specific tests and check for them. If not, we should add tests here to that effect when we do have them.


Yes, currently there is no such patterns in trunk. But I have a patch to match the generated patterns into VLDn/VSTn. As it depends on this patch. I'll send it out after this patch.


================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:772
@@ +771,3 @@
+  // If one passed-in parameter is not forced, notice that the number is 0. Then
+  // we need to adjust it.
+  unsigned NumIter = ForcedFactor * ForcedUnroll;
----------------
rengolin wrote:
> I think you made this code more complicated for no reason.
> 
> Just adding...
> 
>     unsigned NumIter = std::max(ForcedFactor * ForcedUnroll, 2);
> 
> after the original lines would do what you want.
You are right. I get your point. Thanks!

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:687
@@ +686,3 @@
+    bool IsLoad;
+    unsigned Align;
+    // To avoid breaking the dependence, the new vectorized interleaved access
----------------
rengolin wrote:
> Since this is a struct, there's no way to guarantee that any of these members will be used after initialization. Please modify the constructors to initialise all members with their default valures (0, 1, nullptr, etc).
Agree.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1903
@@ +1902,3 @@
+    NewPtr =
+        Builder.CreateGEP(NewPtr, Builder.getInt32(-static_cast<int>(Idx)));
+
----------------
rengolin wrote:
> Why -Idx?
The wide vector load/store uses the address equal to the access of index 0.
E.g. If we have two interleaved loads:
    load A[i+1]  // index 1 (insert position)
    load A[i]      // index 0
We need to use the address of A[i] to load: {A[i], A[i+1], A[i+2], A[i+3], ...}
So the current pointer for A[i+1] needs to be sub by 1.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1955
@@ +1954,3 @@
+    Value *IVec = Builder.CreateShuffleVector(
+        WideVec, UndefValue::get(WideVecTy), IMask, "interleaved.vec");
+
----------------
rengolin wrote:
> on loads, it's called "strided.vec", on stores "interleaved.vec". Any reason to call them differently?
I think the loaded vectors are separately:
   V0:  A[i], A[i+2], A[i+4], ...
   V1:  A[i+1], A[i+3], A[i+5], ...
To call V0/V1 "interleaved.vec" sounds not properly. Each is "strided" vector.

But the vector to be stored is actually interleaved from several vectors:
   V0[0], V1[0], V0[1], V1[1], ...
So I think it's reasonable to call it "interleaved" vector.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3758
@@ +3757,3 @@
+
+  SetVector<Instruction *> Heads;
+  SetVector<Instruction *> Tails;
----------------
rengolin wrote:
> An explanation of what heads and tails are would be good. If I read correctly, it's not just the first and last accesses, but every first/last access in the list of consecutive pairs. So, in your example:
> 
>     A[i]
>     A[i+1]
>     A[i+2]
> 
> Heads:
> 
>     A[i], A[i+1]
> 
> Respective tails:
> 
>     A[i+1], A[i+2]
> 
> right?
Yes, exactly.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3787
@@ +3786,3 @@
+
+  for (auto &I : Heads) {
+    if (Tails.count(I))
----------------
rengolin wrote:
> I'd call this auto variable "Head", so you can use I for the load/store loops at the end.
Reasonable.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3788
@@ +3787,3 @@
+  for (auto &I : Heads) {
+    if (Tails.count(I))
+      continue;
----------------
rengolin wrote:
> Haven't you checked for this already up there?
Not checked yet. A Head may also be a Tail of another Head, so it is not the first in the consecutive chain. This is to find the real Head in the top of the chain.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3793
@@ +3792,3 @@
+    // Collect a chain with consecutive accesses.
+    auto Inst = I;
+    while (Tails.count(Inst) || Heads.count(Inst)) {
----------------
rengolin wrote:
> You don't seem to be using I or Inst any more, you could declare Inst inside the loop, like:
> 
>     for (auto Inst = I; Tails.count(Inst) || Heads.count(Inst); ) {
>       ...
>     }
> 
> However, I don't understand why you didn't just:
> 
>     for (auto Inst = I; ConsecutivePairs.count(Inst); ) {
>       ...
>     }
> 
Reasonable.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3803
@@ +3802,3 @@
+    unsigned IALen = std::abs(Stride);
+    if (ChainLen < IALen)
+      continue;
----------------
rengolin wrote:
> Shouldn't we guarantee that:
> 
>     ChainLen % IALen == 0
> 
> ?
I think it is unnecessary. The tailed nodes will be ignored as they are miss some members to form an interleaved access.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3881
@@ +3880,3 @@
+       ++BB) {
+    if (blockNeedsPredication(*BB))
+      continue;
----------------
rengolin wrote:
> delena wrote:
> > Why predicated block is not considered?
> > 
> >   for (i = 0; i < N; i+=2) {
> >    if (B[i]) {
> >          A[i] += 1;
> >          A[i+1] += 2;
> >     }
> >    }
> I think for the sake of simplicity in the first implementation. Maybe a TODO/FIXME comment would clarify better.
Agree.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3896
@@ +3895,3 @@
+      bool IsLoad = false;
+      if (StridedAccesses.count(I)) {
+        Base = getBasePointer(I, DL);
----------------
delena wrote:
> You can put Stride, Base and IsLoad in one structure.
Agree. Thanks.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3911
@@ +3910,3 @@
+        // and memory operation type.
+        if (CurBase == Base && CurStride == Stride && CurLoad == IsLoad) {
+          Queue.push_back(I);
----------------
delena wrote:
> You assume that you have only one strided access in bb. Why?
No. after analyze the current Queue, it will clear and keep on analyzing in BB.

http://reviews.llvm.org/D9368

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list