[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved memory access

Hao Liu Hao.Liu at arm.com
Wed May 13 03:47:13 PDT 2015


I've attached a patch refactored according to the comments from Elena, Renato and Michael.

Thanks,
-Hao


================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1844
@@ +1843,3 @@
+
+// E.g. Translate following interleaved loads (VF is 4):
+// for (i = 0; i < N; i+=3) {
----------------
mzolotukhin wrote:
> I'd suggest starting the comment with describing what the function does, not from an example. Also, please add some comments about what is passed in the arguments. For instance, it's not obvious what is `Ptr` and what is `Instr` from the first glance.
> 
> Also, if `Ptr` is always `Instr->getPointerOperand()` (`Instr` being `LoadInst` or  `StoreInst`), is there any sense in passing it along with the `Instr`?
> 
> The same comment actually relates to `VecTy` - I'd rather compute it one more time than introduce a new argument to the function.
> 
Reasonable.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1853
@@ +1852,3 @@
+// To:
+//   %wide.vec = load <12 x i32>, <12 x i32>* %ptr           ; read R,G,B
+//   %R.vec = shufflevector %wide.vec, undef, <0, 3, 6, 9>
----------------
mzolotukhin wrote:
> It's not 'read R,G,B', it's 'read 4 tuples of R,G,B'. I realize that it might be clear from the code though.
Yes. It's more clear.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:1903
@@ +1902,3 @@
+    NewPtr =
+        Builder.CreateGEP(NewPtr, Builder.getInt32(-static_cast<int>(Idx)));
+
----------------
mzolotukhin wrote:
> HaoLiu wrote:
> > rengolin wrote:
> > > Why -Idx?
> > The wide vector load/store uses the address equal to the access of index 0.
> > E.g. If we have two interleaved loads:
> >     load A[i+1]  // index 1 (insert position)
> >     load A[i]      // index 0
> > We need to use the address of A[i] to load: {A[i], A[i+1], A[i+2], A[i+3], ...}
> > So the current pointer for A[i+1] needs to be sub by 1.
> What if we have
> ```
> load A[i]
> load A[i+1]
> ```
> or
> ```
> load A[i+2]
> load A[i+1]
> load A[i]
> ```
> ?
For
    load A[i]
    load A[i+1]
The insert position(i.e. the first load) is of index 0, no need to adjust.

For
    load A[i+2]
    load A[i+1]
    load A[i]
As the insert position is "load A[i+2]" (which has index 2), need to adjust to the address of "A[i]" (i.e. use an offset "-2").

I'll add more comments here.

http://reviews.llvm.org/D9368

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list