[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved memory access

Wed May 20 19:14:52 PDT 2015

Hi Hao,

Thanks for updating the patch, another bunch of comments from me inline.

Also, sorry for that I give the feedback in parts - it's hard to find enough time to review the entire patch all at once. I know that it might be a bit frustrating sometimes, but in fact I really appreciate your work and see how things converge:)

Thanks,
Michael


================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:447-457
@@ -446,1 +446,13 @@
 
+  /// \return The cost of the interleaved memory operation.
+  /// \p Opcode is the memory operation code
+  /// \p VecTy is the vector type of the interleaved access.
+  /// \p Delta is the interleave factor
+  /// \p Indices is the indices for interleaved load members (as interleaved
+  ///    loads allow gaps)
+  unsigned getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
+                                      unsigned Delta,
+                                      ArrayRef<unsigned> Indices,
+                                      unsigned Alignment,
+                                      unsigned AddressSpace) const;
+
----------------
Nitpick: `Alignment` and `AddressSpace` aren't documented.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:918-926
@@ -902,1 +917,11 @@
 
+struct StrideDescriptor {
+  StrideDescriptor(int Stride, const SCEV *Scev, unsigned Size, unsigned Align)
+      : Stride(Stride), Scev(Scev), Size(Size), Align(Align) {}
+
+  int Stride;
+  const SCEV *Scev;
+  unsigned Size;
+  unsigned Align;
+};
+
----------------
Please add some comments on this struct and its members.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:961-962
@@ +960,4 @@
+
+  if (!StrideAccess.size())
+    return;
+
----------------
Using `empty()` is more efficient than `size()==0`.

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1051-1052
@@ +1050,4 @@
+
+  for (auto BB = TheLoop->block_begin(), BE = TheLoop->block_end(); BB != BE;
+       ++BB) {
+    bool IsPred = LoopAccessInfo::blockNeedsPredication(*BB, TheLoop, DT);
----------------
Could we use a range loop here?

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1055
@@ +1054,3 @@
+
+    for (auto I = (*BB)->begin(), E = (*BB)->end(); I != E; ++I) {
+      if (!isa<LoadInst>(I) && !isa<StoreInst>(I))
----------------
Range-based loop?

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1059-1060
@@ +1058,4 @@
+      // FIXME: As currently we can't handle predicated access, return directly.
+      if (IsPred)
+        return;
+
----------------
Why not to return right after we discovered the block needs predication?

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1075-1076
@@ +1074,4 @@
+
+  // Filter eligible groups and set the insert position. We only keep the load
+  // group that has small gaps (less than half Delta) and the fully interleaved
+  // store group (has no gap).
----------------
What's the rationale behind this? It might be inefficient to vectorize a group with a huge gap, but isn't it a question for cost-model rather than for the analysis?

================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1118-1126
@@ +1117,11 @@
+
+    for (auto I = InstrList.rbegin(), E = InstrList.rend(); I != E; ++I) {
+      if (*I == PosB)
+        break;
+      // If the position of A is after the position of B, it is unsafe.
+      if (*I == PosA) {
+        IsSafe = false;
+        break;
+      }
+    }
+
----------------
Could we just check if `PosA` dominates `PosB`?

http://reviews.llvm.org/D9368

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/