[PATCH] D28959: [SLP] Improve horizontal vectorization for non-power-of-2 number of instructions.

Fri Jan 20 16:41:24 PST 2017

mkuper added inline comments.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4275
     Builder.setFastMathFlags(Unsafe);
-    unsigned i = 0;
+    unsigned I = 0;

----------------
As long as you're changing this, maybe name this "Idx" (or leave it as "i" - this is against convention in the strict sense, but that's one of the common exceptions). But I'm also ok with "I" if you think it's better.

Also, this is exactly an example of what I was talking about on the other patch - you could commit the name change separately, as a preparatory patch, and then this patch would only include functional changes, which makes it much easier to review.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4291
       // Estimate cost.
-      int Cost = V.getTreeCost() + getReductionCost(TTI, ReducedVals[i]);
+      int Cost = V.getTreeCost() + getReductionCost(TTI, VL[0]);
       if (Cost >= -SLPCostThreshold)
----------------
Can you explain why you changed ReducedVals[i] to VL[0] here and below?
I'm not sure whether this is a correct change or not, but it seems orthogonal to this patch.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4310
         VectorizedTree = ReducedSubTree;
+      ReduxWidth = std::max<unsigned>(4, PowerOf2Floor(NumReducedVals - I));
     }
----------------
This whole ReduxWidth stuff is fishy.
We use it internally in matchAssociativeReduction(), and then overwrite it just before calling tryToReduce(). Why does tryToReduce use this member variable, instead of having a local (and not having this silliness overwriting it just before calling tryToReduce())?

If this makes sense, then changing this can be another preparatory NFC patch before this one. If not, then why not?

================
Comment at: test/Transforms/SLPVectorizer/X86/horizontal-list.ll:572
 }

----------------
Could you add a test for something like width 31, so that we get 16 + 8 + 4 + 3?

https://reviews.llvm.org/D28959