[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved memory access
Michael Zolotukhin
mzolotukhin at apple.com
Wed May 20 19:14:52 PDT 2015
Hi Hao,
Thanks for updating the patch, another bunch of comments from me inline.
Also, sorry for that I give the feedback in parts - it's hard to find enough time to review the entire patch all at once. I know that it might be a bit frustrating sometimes, but in fact I really appreciate your work and see how things converge:)
Thanks,
Michael
================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:447-457
@@ -446,1 +446,13 @@
+ /// \return The cost of the interleaved memory operation.
+ /// \p Opcode is the memory operation code
+ /// \p VecTy is the vector type of the interleaved access.
+ /// \p Delta is the interleave factor
+ /// \p Indices is the indices for interleaved load members (as interleaved
+ /// loads allow gaps)
+ unsigned getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
+ unsigned Delta,
+ ArrayRef<unsigned> Indices,
+ unsigned Alignment,
+ unsigned AddressSpace) const;
+
----------------
Nitpick: `Alignment` and `AddressSpace` aren't documented.
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:918-926
@@ -902,1 +917,11 @@
+struct StrideDescriptor {
+ StrideDescriptor(int Stride, const SCEV *Scev, unsigned Size, unsigned Align)
+ : Stride(Stride), Scev(Scev), Size(Size), Align(Align) {}
+
+ int Stride;
+ const SCEV *Scev;
+ unsigned Size;
+ unsigned Align;
+};
+
----------------
Please add some comments on this struct and its members.
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:961-962
@@ +960,4 @@
+
+ if (!StrideAccess.size())
+ return;
+
----------------
Using `empty()` is more efficient than `size()==0`.
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1051-1052
@@ +1050,4 @@
+
+ for (auto BB = TheLoop->block_begin(), BE = TheLoop->block_end(); BB != BE;
+ ++BB) {
+ bool IsPred = LoopAccessInfo::blockNeedsPredication(*BB, TheLoop, DT);
----------------
Could we use a range loop here?
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1055
@@ +1054,3 @@
+
+ for (auto I = (*BB)->begin(), E = (*BB)->end(); I != E; ++I) {
+ if (!isa<LoadInst>(I) && !isa<StoreInst>(I))
----------------
Range-based loop?
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1059-1060
@@ +1058,4 @@
+ // FIXME: As currently we can't handle predicated access, return directly.
+ if (IsPred)
+ return;
+
----------------
Why not to return right after we discovered the block needs predication?
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1075-1076
@@ +1074,4 @@
+
+ // Filter eligible groups and set the insert position. We only keep the load
+ // group that has small gaps (less than half Delta) and the fully interleaved
+ // store group (has no gap).
----------------
What's the rationale behind this? It might be inefficient to vectorize a group with a huge gap, but isn't it a question for cost-model rather than for the analysis?
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1118-1126
@@ +1117,11 @@
+
+ for (auto I = InstrList.rbegin(), E = InstrList.rend(); I != E; ++I) {
+ if (*I == PosB)
+ break;
+ // If the position of A is after the position of B, it is unsafe.
+ if (*I == PosA) {
+ IsSafe = false;
+ break;
+ }
+ }
+
----------------
Could we just check if `PosA` dominates `PosB`?
http://reviews.llvm.org/D9368
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list