[PATCH] D19501: Add LoadStoreVectorizer pass

Thu Jun 16 21:12:30 PDT 2016

arsenm added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:38
@@ +37,3 @@
+#define DEBUG_TYPE "load-store-vectorizer"
+STATISTIC(NumVectorInstructions, "Number of vector instructions generated");
+
----------------
jlebar wrote:
> Nit, number of scalar instructions removed might be a more interesting metric, because it's notable whether we manage to vectorize in groups of 2 vs 4.
I added this as well, I think both might be useful

================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:107
@@ +106,3 @@
+
+class LoadStoreVectorizer : public FunctionPass {
+public:
----------------
mzolotukhin wrote:
> Bikeshed: I'd rename `Vectorizer` to `LoadStoreVectorizer` and `LoadStoreVectorizer` to `LoadStoreVectorizerPass` os something like this.
Then you end up with the macro functions adding another "Pass" to the name, so you end up with initializeLoadStoreVectorizerPassPass etc., so it's better to not include Pass in the class name.

================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:168
@@ +167,3 @@
+       I != E; ++I) {
+    collectInstructions(*I);
+    Changed |= vectorizeChains(LoadRefs);
----------------
jlebar wrote:
> s/I/BBI/ would have made this a lot more obvious to me, since we use I for values/instructions most everywhere else.
Better, use a range loop

================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:208
@@ +207,3 @@
+  if (PtrA == PtrB ||
+      DL.getTypeStoreSize(PtrATy) != DL.getTypeStoreSize(PtrBTy) ||
+      DL.getTypeStoreSize(PtrATy->getScalarType()) !=
----------------
jlebar wrote:
> It's not clear to me why we need this line, although all the cases I can think of where this is false seem quite bizarre, so maybe it doesn't matter.
I think this is to fix cases like trying to merge i1 into i8. This was already crashing earlier though, so I'm just having sub-byte types never considered.

================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:286
@@ +285,3 @@
+
+void Vectorizer::reorder(Instruction *I) {
+  for (User *U : I->users())
----------------
mzolotukhin wrote:
> Some comments would be helpful here (and for other functions).
The comments are already on the declarations. Usually for private functions like this I would put them on the body. Should I move them?

================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:325
@@ +324,3 @@
+    Instrs.push_back(cast<Instruction>(V));
+    if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(PtrOperand))
+      Instrs.push_back(GEP);
----------------
jlebar wrote:
> Why this special case?  This isn't as strong as a general DCE, so it seems to me like it gives us a false sense of security?  i.e. should we just say "run DCE after this pass"?
It's run after each vectorization. I'm guessing the additional uses might somehow interfere with vectorization of other chains? This can be investigated later.

================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:499
@@ +498,3 @@
+  DEBUG(dbgs() << "LSV: Vectorizing " << Instrs.size()
+        << " instructions.\n");
+  SmallSetVector<unsigned, 16> Heads, Tails;
----------------
jlebar wrote:
> Nit, It might be helpful to print out at least the first instruction in the chain.  At least, that was helpful to me when I was debugging it.
I'm not sure why this one is even needed. The full chain is already printed out right above this by the "LSV: Loads to vectorize" DEBUG

================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:508
@@ +507,3 @@
+  for (int i = 0, e = Instrs.size(); i < e; ++i) {
+    ConsecutiveChain[i] = ~0U;
+    for (int j = e - 1; j >= 0; --j) {
----------------
jlebar wrote:
> Nit, we know that Instrs is <= 64 elements, so could we just make ConsecutiveChain an array of signed ints?  Then we could use -1 instead of ~0U (which implied to me that we were going to do some bit twiddling below).
> 
> Also I think you could initialize the array with
> 
>   int ConsecutiveChain[64] = {-1};
> 
> which would be clearer than setting each element in the loop, imo.
Can't do that. it needs to be reset after the j loop

================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:537
@@ +536,3 @@
+    unsigned I = Head;
+    while (I != ~0U && (Tails.count(I) || Heads.count(I))) {
+      if (VectorizedValues.count(Instrs[I]))
----------------
jlebar wrote:
> Actually, I'm not sure why we need the (Tails.count(I) || Heads.count(I)) part of this predicate, and therefore I'm not sure why we need Tails at all.  That probably deserves a comment at least.
I'm more likely to break something trying to refactor this compared to all of the style fixes, so I'll leave looking at that for later.

================
Comment at: lib/Transforms/Vectorize/LoadStoreVectorizer.cpp:753
@@ +752,3 @@
+  // Set insert point.
+  Builder.SetInsertPoint(&*Last);
+
----------------
jlebar wrote:
> A correctness-critical assumption above is that
> 
> > the loads are inserted at the location of the first load
> 
> but here it looks like the opposite?
I had another test which I apparently forgot to include which shows the alias insertion order behavior. I've also added another which shows the insertion point.

With the insertion point as Last, the load store load store case is correct and vectorizes as
store
load v2
store.

If I change the land insertion point to be before the first, it regresses and produces the incorrect
load v2
store v2

Maybe the comment is backwards?

http://reviews.llvm.org/D19501