[PATCH] D19501: Add LoadStoreVectorizer pass
Justin Lebar via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 13 13:54:16 PDT 2016
jlebar added a comment.
Looks pretty good to me; mostly nits / comment suggestions, but I have one nontrivial concern about how we ensure that we always build the longest vectors we can.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:70
@@ +69,3 @@
+
+ /// Checks if it is a consecutive access.
+ bool isConsecutiveAccess(Value *A, Value *B);
----------------
Nit, if this is all the comment says, do we need it? It seems just to repeat what's already in the function name.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:73
@@ +72,3 @@
+
+ /// Reorder the user of I after vectorization.
+ void reorder(Instruction *I);
----------------
Nit, "Reorders" would be consistent with the rest of the comments in this file. Also, do you mean s/user/users/?
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:73
@@ +72,3 @@
+
+ /// Reorder the user of I after vectorization.
+ void reorder(Instruction *I);
----------------
jlebar wrote:
> Nit, "Reorders" would be consistent with the rest of the comments in this file. Also, do you mean s/user/users/?
It's not obvious to me how this function reorders the users of I without looking at the source. Something like "Reorders instructions to ensure that I dominates all of its users." or something would have helped me.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:87
@@ +86,3 @@
+ /// Checks if there are any memory instructions which may be alias between
+ /// From and To.
+ bool isVectorizable(ArrayRef<Value *> &Chain, BasicBlock::iterator From,
----------------
Perhaps, "Checks if there are any instructions which may affect the value returned by a load/store in Chain between From and To." Since this function checks not just for aliasing memory instructions, but also side-effecting instructions.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:95
@@ +94,3 @@
+ /// Processes the collected instructions, the Map.
+ bool vectorizeChains(ValueListMap &Map);
+
----------------
Probably worth indicating that you expect all of the elements of Map to be loads, or all of them to be stores.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:97
@@ +96,3 @@
+
+ /// Finds the consecutive instructions and vectorizes them.
+ bool vectorizeInstructions(ArrayRef<Value *> Instrs);
----------------
Suggest "Finds loads/stores to consecutive memory addresses and vectorizes them." Otherwise it can be read as saying that the instructions themselves must appear consecutively in the BB. Probably also worth indicating that all elements of Instrs must be loads, or all must be stores.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:152-154
@@ +151,5 @@
+
+ // Don't vectorize when the attribute NoImplicitFloat is used.
+ if (F.hasFnAttribute(Attribute::NoImplicitFloat))
+ return false;
+
----------------
arsenm wrote:
> mzolotukhin wrote:
> > Should we bail out on this for all potential targets? E.g. why is it a problem for, e.g., vectorizing integer stores on x86 using `movdqa` integer instructions?
> I don't really know what this attribute does, it was just there
It came from r187340, back in 2013, by Nadav Rotem <nrotem at apple.com>. There's no additional information in the patch.
AFAICT the intent is specifically not to vectorize loads/stores of fp types when noimplicitfloat is set. Which sort of makes sense, based on the tiny information I can glean about noimplicitfloat.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:168
@@ +167,3 @@
+ I != E; ++I) {
+ collectInstructions(*I);
+ Changed |= vectorizeChains(LoadRefs);
----------------
s/I/BBI/ would have made this a lot more obvious to me, since we use I for values/instructions most everywhere else.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:208
@@ +207,3 @@
+ if (PtrA == PtrB ||
+ DL.getTypeStoreSize(PtrATy) != DL.getTypeStoreSize(PtrBTy) ||
+ DL.getTypeStoreSize(PtrATy->getScalarType()) !=
----------------
It's not clear to me why we need this line, although all the cases I can think of where this is false seem quite bizarre, so maybe it doesn't matter.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:230
@@ +229,3 @@
+
+ // Otherwise compute the distance with SCEV between the base pointers.
+ const SCEV *PtrSCEVA = SE.getSCEV(PtrA);
----------------
Suggest s/Otherwise// -- it's not clear that this is contrasting with "Check if they are based on the same pointer."
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:242
@@ +241,3 @@
+
+ // Look through GEPs after proving they're the same except for the last
+ // index.
----------------
Nit, suggest s/proving/checking/ -- "proving" is kind of a strong term for what we do, sort of implies it's harder than it is (so it's confusing when I see it's so simple).
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:259
@@ +258,3 @@
+
+ // Look through a ZExt/SExt
+ if (!isa<SExtInst>(OpA) && !isa<ZExtInst>(OpA))
----------------
Suggest something like
// Only look through ZExt/SExt
to make it clearer why we're doing an early return.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:305
@@ +304,3 @@
+ for (auto I = BB->begin(), E = BB->end(); I != E; ++I) {
+ if (std::find(Chain.begin(), Chain.end(), &*I) == Chain.end())
+ continue;
----------------
Nit, consider llvm::is_contained()
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:325
@@ +324,3 @@
+ Instrs.push_back(cast<Instruction>(V));
+ if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(PtrOperand))
+ Instrs.push_back(GEP);
----------------
Why this special case? This isn't as strong as a general DCE, so it seems to me like it gives us a false sense of security? i.e. should we just say "run DCE after this pass"?
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:338
@@ +337,3 @@
+std::pair<ArrayRef<Value *>, ArrayRef<Value *>>
+Vectorizer::bisect(ArrayRef<Value *> &Chain, unsigned ElementSize) {
+ unsigned elemSizeInBytes = ElementSize / 8;
----------------
s/ElementSize/ElementSizeBits/? (I don't care as much what the local vars here are called, but if the arg is called ElementSize, I might reasonably pass a value in bytes.)
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:338
@@ +337,3 @@
+std::pair<ArrayRef<Value *>, ArrayRef<Value *>>
+Vectorizer::bisect(ArrayRef<Value *> &Chain, unsigned ElementSize) {
+ unsigned elemSizeInBytes = ElementSize / 8;
----------------
jlebar wrote:
> s/ElementSize/ElementSizeBits/? (I don't care as much what the local vars here are called, but if the arg is called ElementSize, I might reasonably pass a value in bytes.)
I'm not sure "bisect" is quite the right word for this? I expected this to split Chain in half, but it looks like it actually splits off the rightmost 1, 2, or 3 bytes' worth of elements, so that the left part has a size that's a multiple of 4 bytes (ish). I might call that "splitIntoRoundSize" or something.
A postcondition assert() would probably also help clarify this.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:355
@@ +354,3 @@
+ if (isa<LoadInst>(I) || isa<StoreInst>(I)) {
+ if (std::find(Chain.begin(), Chain.end(), &*I) == Chain.end())
+ MemoryInstrs.push_back({ &*I, Idx });
----------------
Consider llvm::if_contains
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:377
@@ +376,3 @@
+ // vectorize it (the loads are inserted at the location of the first
+ // load).
+ if (isa<StoreInst>(V) && isa<LoadInst>(VV) && VVIdx < VIdx)
----------------
suggest "(the vectorized load is inserted at the location of the first load in the chain)."
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:388
@@ +387,3 @@
+ Value *Ptr0 = isa<LoadInst>(M0) ? M0->getOperand(0) : M0->getOperand(1);
+ Value *Ptr1 = isa<LoadInst>(M1) ? M1->getOperand(0) : M1->getOperand(1);
+ unsigned S0 =
----------------
Not GetPointerOperand?
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:414
@@ +413,3 @@
+ LoadRefs.clear();
+ StoreRefs.clear();
+
----------------
To me this would be a lot more clear if we returned the relevant maps (or, I suppose, explicitly took them as params). Otherwise it's not clear from the signature (or the comments) where we collect the instructions into.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:436
@@ +435,3 @@
+ if (isa<VectorType>(Ty) &&
+ !std::all_of(LI->user_begin(), LI->user_end(), [LI](const User *U) {
+ const Instruction *UI = cast<Instruction>(U);
----------------
Consider llvm::all_of(LI->users(), ...);
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:499
@@ +498,3 @@
+ DEBUG(dbgs() << "LSV: Vectorizing " << Instrs.size()
+ << " instructions.\n");
+ SmallSetVector<unsigned, 16> Heads, Tails;
----------------
Nit, It might be helpful to print out at least the first instruction in the chain. At least, that was helpful to me when I was debugging it.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:502
@@ +501,3 @@
+ unsigned ConsecutiveChain[64];
+ SmallPtrSet<Value *, 16> VectorizedValues;
+ bool Changed = false;
----------------
Can we move this closer to where it's used?
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:508
@@ +507,3 @@
+ for (int i = 0, e = Instrs.size(); i < e; ++i) {
+ ConsecutiveChain[i] = ~0U;
+ for (int j = e - 1; j >= 0; --j) {
----------------
Nit, we know that Instrs is <= 64 elements, so could we just make ConsecutiveChain an array of signed ints? Then we could use -1 instead of ~0U (which implied to me that we were going to do some bit twiddling below).
Also I think you could initialize the array with
int ConsecutiveChain[64] = {-1};
which would be clearer than setting each element in the loop, imo.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:517
@@ +516,3 @@
+ unsigned NewDistance = std::abs((int)ConsecutiveChain[i] - j);
+ if (j < i || NewDistance > CurDistance)
+ ShouldInsert = false;
----------------
I cannot for the life of me figure out what invariant you're trying to preserve with these checks. I guess a comment is in order. :)
(I get that we prefer to vectorize chains of scalar instructions that appear in order (j > i), but I can't figure out what CurDistance and NewDistance are checking. It would make sense if you wanted to prefer chains of scalar instructions that are close to each other, but then NewDistance should be abs(j - i), right?)
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:518
@@ +517,3 @@
+ if (j < i || NewDistance > CurDistance)
+ ShouldInsert = false;
+ }
----------------
Not sure, but this might be clearer if you simply used a "continue" here instead of a flag variable.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:523
@@ +522,3 @@
+ Heads.insert(i);
+ ConsecutiveChain[i] = j;
+ }
----------------
If ConsecutiveChain was not -1, don't we need to remove ConsecutiveChain[i] from Tails? (Maybe it would be better to build Heads and Tails, or at least Tails, in a separate loop, over ConsecutiveChain, so we don't have to worry about the case where one instr is a tail for two heads and then gets overwritten just one time.)
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:529
@@ +528,3 @@
+
+ for (unsigned Head : Heads) {
+ if (Tails.count(Head))
----------------
The intent here is to build the largest vectors we can, right? (Or at least, we want to build up vectors of the max vector width, if possible? Beyond that doesn't make a difference.)
I don't see how we ensure this happens, unless there is some implicit requirement on the order of Instrs that I'm missing. For example, if the elements of Instrs are
- load p[3]
- load p[2]
- load p[1]
- load p[0]
then it seems that the first element of Heads will be "load p[2]", implying a two-wide vector load. I don't see what here would cause us to wait on this one and do the four-wide load starting at p[0].
Sorry if I'm missing something obvious.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:537
@@ +536,3 @@
+ unsigned I = Head;
+ while (I != ~0U && (Tails.count(I) || Heads.count(I))) {
+ if (VectorizedValues.count(Instrs[I]))
----------------
Actually, I'm not sure why we need the (Tails.count(I) || Heads.count(I)) part of this predicate, and therefore I'm not sure why we need Tails at all. That probably deserves a comment at least.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:577
@@ +576,3 @@
+ return vectorizeStoreChain(Chain.slice(0, ChainSize - 1));
+ auto Chains = bisect(Chain, Sz);
+ return vectorizeStoreChain(Chains.first) |
----------------
Hm, looks like an additional constraint on bisect() is that the array passed should have more than 4 elements -- probably worth saying that somewhere.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:629
@@ +628,3 @@
+ BasicBlock::iterator First = Boundaries.first;
+ BasicBlock::iterator Last = Boundaries.second;
+
----------------
If you like, you could write this as
BasicBlock::iterator First, Last;
std::tie(First, Last) = getBoundaryInstrs(Chain);
Same for the other call to getBoundaryInstrs.
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:753
@@ +752,3 @@
+ // Set insert point.
+ Builder.SetInsertPoint(&*Last);
+
----------------
A correctness-critical assumption above is that
> the loads are inserted at the location of the first load
but here it looks like the opposite?
================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:763
@@ +762,3 @@
+
+ SmallVector<Instruction *, 16> InstrsToReorder;
+ if (VecLoadTy) {
----------------
Nit, I'd prefer to move this into the if and else blocks, so it's clear that we don't use it elsewhere.
http://reviews.llvm.org/D19501
More information about the llvm-commits
mailing list