[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved memory access

Thu May 21 02:10:08 PDT 2015

In http://reviews.llvm.org/D9368#176086, @mzolotukhin wrote:

> Hi Hao,
>
> Thanks for updating the patch, another bunch of comments from me inline.
>
> Also, sorry for that I give the feedback in parts - it's hard to find enough time to review the entire patch all at once. I know that it might be a bit frustrating sometimes, but in fact I really appreciate your work and see how things converge:)
>
> Thanks,
> Michael


Hi Michael,

Thanks for your comments, which really help me a lot. I've updated a new patch accordingly.

Review please.

Thanks,
-Hao


================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:447-457
@@ -446,1 +446,13 @@
 
+  /// \return The cost of the interleaved memory operation.
+  /// \p Opcode is the memory operation code
+  /// \p VecTy is the vector type of the interleaved access.
+  /// \p Delta is the interleave factor
+  /// \p Indices is the indices for interleaved load members (as interleaved
+  ///    loads allow gaps)
+  unsigned getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
+                                      unsigned Delta,
+                                      ArrayRef<unsigned> Indices,
+                                      unsigned Alignment,
+                                      unsigned AddressSpace) const;
+
----------------
mzolotukhin wrote:
> Nitpick: `Alignment` and `AddressSpace` aren't documented.
Done.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:918-926
@@ -902,1 +917,11 @@
 
+struct StrideDescriptor {
+  StrideDescriptor(int Stride, const SCEV *Scev, unsigned Size, unsigned Align)
+      : Stride(Stride), Scev(Scev), Size(Size), Align(Align) {}
+
+  int Stride;
+  const SCEV *Scev;
+  unsigned Size;
+  unsigned Align;
+};
+
----------------
mzolotukhin wrote:
> Please add some comments on this struct and its members.
Done.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:961-962
@@ +960,4 @@
+
+  if (!StrideAccess.size())
+    return;
+
----------------
mzolotukhin wrote:
> Using `empty()` is more efficient than `size()==0`.
Good idea!

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1051-1052
@@ +1050,4 @@
+
+  for (auto BB = TheLoop->block_begin(), BE = TheLoop->block_end(); BB != BE;
+       ++BB) {
+    bool IsPred = LoopAccessInfo::blockNeedsPredication(*BB, TheLoop, DT);
----------------
mzolotukhin wrote:
> Could we use a range loop here?
Done.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1055
@@ +1054,3 @@
+
+    for (auto I = (*BB)->begin(), E = (*BB)->end(); I != E; ++I) {
+      if (!isa<LoadInst>(I) && !isa<StoreInst>(I))
----------------
mzolotukhin wrote:
> Range-based loop?
Done.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1059-1060
@@ +1058,4 @@
+      // FIXME: As currently we can't handle predicated access, return directly.
+      if (IsPred)
+        return;
+
----------------
mzolotukhin wrote:
> Why not to return right after we discovered the block needs predication?
It's slightly different. 
We still support a predicted block if it doesn't have load/store. Because the vectorization of interleaved loads/stores won't break any dependences.
But if a predicted block has loads/stores, we need to handle an interleave group contains mixed normal loads/stores and predicted loads/stores.



================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1075-1076
@@ +1074,4 @@
+
+  // Filter eligible groups and set the insert position. We only keep the load
+  // group that has small gaps (less than half Delta) and the fully interleaved
+  // store group (has no gap).
----------------
mzolotukhin wrote:
> What's the rationale behind this? It might be inefficient to vectorize a group with a huge gap, but isn't it a question for cost-model rather than for the analysis?
This is just a conservative way. I think generally if a group missing more than half members, it is not very beneficial to do vectorization. Also it sounds not reasonable to still call it "interleaved" group.

But I agree with you that it's cost-model's work to decide whether it is beneficial.
The new patch keeps all load groups.
There is also a new situation that: the cost of a load group with a huge gap may even be expensive than the scalar operations. Then I think we need to replace with scalar operations. I've added a FIXME in the new patch as I haven't found an efficient way to fix this.



================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1118-1126
@@ +1117,11 @@
+
+    for (auto I = InstrList.rbegin(), E = InstrList.rend(); I != E; ++I) {
+      if (*I == PosB)
+        break;
+      // If the position of A is after the position of B, it is unsafe.
+      if (*I == PosA) {
+        IsSafe = false;
+        break;
+      }
+    }
+
----------------
mzolotukhin wrote:
> Could we just check if `PosA` dominates `PosB`?
A good idea!

http://reviews.llvm.org/D9368

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/