[PATCH] [LoopVectorize]Teach Loop Vectorizer about interleaved memory access
Hao Liu
Hao.Liu at arm.com
Thu May 21 02:10:08 PDT 2015
In http://reviews.llvm.org/D9368#176086, @mzolotukhin wrote:
> Hi Hao,
>
> Thanks for updating the patch, another bunch of comments from me inline.
>
> Also, sorry for that I give the feedback in parts - it's hard to find enough time to review the entire patch all at once. I know that it might be a bit frustrating sometimes, but in fact I really appreciate your work and see how things converge:)
>
> Thanks,
> Michael
Hi Michael,
Thanks for your comments, which really help me a lot. I've updated a new patch accordingly.
Review please.
Thanks,
-Hao
================
Comment at: include/llvm/Analysis/TargetTransformInfo.h:447-457
@@ -446,1 +446,13 @@
+ /// \return The cost of the interleaved memory operation.
+ /// \p Opcode is the memory operation code
+ /// \p VecTy is the vector type of the interleaved access.
+ /// \p Delta is the interleave factor
+ /// \p Indices is the indices for interleaved load members (as interleaved
+ /// loads allow gaps)
+ unsigned getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
+ unsigned Delta,
+ ArrayRef<unsigned> Indices,
+ unsigned Alignment,
+ unsigned AddressSpace) const;
+
----------------
mzolotukhin wrote:
> Nitpick: `Alignment` and `AddressSpace` aren't documented.
Done.
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:918-926
@@ -902,1 +917,11 @@
+struct StrideDescriptor {
+ StrideDescriptor(int Stride, const SCEV *Scev, unsigned Size, unsigned Align)
+ : Stride(Stride), Scev(Scev), Size(Size), Align(Align) {}
+
+ int Stride;
+ const SCEV *Scev;
+ unsigned Size;
+ unsigned Align;
+};
+
----------------
mzolotukhin wrote:
> Please add some comments on this struct and its members.
Done.
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:961-962
@@ +960,4 @@
+
+ if (!StrideAccess.size())
+ return;
+
----------------
mzolotukhin wrote:
> Using `empty()` is more efficient than `size()==0`.
Good idea!
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1051-1052
@@ +1050,4 @@
+
+ for (auto BB = TheLoop->block_begin(), BE = TheLoop->block_end(); BB != BE;
+ ++BB) {
+ bool IsPred = LoopAccessInfo::blockNeedsPredication(*BB, TheLoop, DT);
----------------
mzolotukhin wrote:
> Could we use a range loop here?
Done.
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1055
@@ +1054,3 @@
+
+ for (auto I = (*BB)->begin(), E = (*BB)->end(); I != E; ++I) {
+ if (!isa<LoadInst>(I) && !isa<StoreInst>(I))
----------------
mzolotukhin wrote:
> Range-based loop?
Done.
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1059-1060
@@ +1058,4 @@
+ // FIXME: As currently we can't handle predicated access, return directly.
+ if (IsPred)
+ return;
+
----------------
mzolotukhin wrote:
> Why not to return right after we discovered the block needs predication?
It's slightly different.
We still support a predicted block if it doesn't have load/store. Because the vectorization of interleaved loads/stores won't break any dependences.
But if a predicted block has loads/stores, we need to handle an interleave group contains mixed normal loads/stores and predicted loads/stores.
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1075-1076
@@ +1074,4 @@
+
+ // Filter eligible groups and set the insert position. We only keep the load
+ // group that has small gaps (less than half Delta) and the fully interleaved
+ // store group (has no gap).
----------------
mzolotukhin wrote:
> What's the rationale behind this? It might be inefficient to vectorize a group with a huge gap, but isn't it a question for cost-model rather than for the analysis?
This is just a conservative way. I think generally if a group missing more than half members, it is not very beneficial to do vectorization. Also it sounds not reasonable to still call it "interleaved" group.
But I agree with you that it's cost-model's work to decide whether it is beneficial.
The new patch keeps all load groups.
There is also a new situation that: the cost of a load group with a huge gap may even be expensive than the scalar operations. Then I think we need to replace with scalar operations. I've added a FIXME in the new patch as I haven't found an efficient way to fix this.
================
Comment at: lib/Analysis/LoopAccessAnalysis.cpp:1118-1126
@@ +1117,11 @@
+
+ for (auto I = InstrList.rbegin(), E = InstrList.rend(); I != E; ++I) {
+ if (*I == PosB)
+ break;
+ // If the position of A is after the position of B, it is unsafe.
+ if (*I == PosA) {
+ IsSafe = false;
+ break;
+ }
+ }
+
----------------
mzolotukhin wrote:
> Could we just check if `PosA` dominates `PosB`?
A good idea!
http://reviews.llvm.org/D9368
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list