[PATCH] D22554: [SLP] Vectorize loads in horizontal reductions when they happen to be in the reverse order

Thu Jul 21 13:18:28 PDT 2016

mssimpso added a comment.

Hi Michael,

Thanks for working on this. The issue of unordered loads seems like an arbitrary restriction to me that needs fixing. The patch basically looks okay; I inlined a few comments.

I agree with most of your high-level points. We shouldn't really need to pay the cost of rebuilding the entire tree when we find that the loads are unordered. In fact, except for loads and stores, the order of the instructions shouldn't really matter at all. The fact that we assume an order in some cases is a problem that should be fixable. For example, I think VL[0] is commonly used as a representative of the bundle for things like getting the type, not to imply an ordering. I bet we could eliminate most instances of VL[0] with appropriate additions to TreeEntry. And we keep track of a Lane index for ExternalUsers just so we can extract the correct value from the vectors. But this is already kind of redundant since ExternalUser also holds a pointer to the scalar value and we maintain a list of the scalar values in TreeEntry.

So The difficult case seems to be the store-rooted trees with unordered loads. For that, we could use shuffles like you mentioned.

I think it's fine to only check for the reverse-consecutive case since that's all we are currently doing, but a more general solution would eventually be nice to have. Compile-time might be a concern there, though.

Matt.


================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:884
@@ -883,3 +883,3 @@
 
   // Number of load-bundles of size 2, which are consecutive loads if reversed.
   int NumLoadsWantToChangeOrder;
----------------
Please update the comment since we now track reversed bundles of size greater than 2.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1158
@@ -1158,1 +1157,3 @@
+
+      // Check if the loads are consecutive or reversed.
       for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {
----------------
We also check for the isSimple case here as well. It would be nice to make the comment more detailed since you're already changing it.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:1174
@@ +1173,3 @@
+        if (!isConsecutiveAccess(VL[i], VL[i + 1], *DL, *SE))
+          Consecutive = false;
+
----------------
We can break out of the loop when setting Consecutive to false to avoid unneeded calls to isConsecutiveAccess.

But I'm wondering if there other ways to make this faster. We are looking for the all consecutive or all reverse-consecutive cases. So for example if we find at least one consecutive pair in this first loop, there's no chance the list can be all reverse-consecutive.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3818
@@ -3803,3 +3817,3 @@
     if (allowReorder && R.shouldReorder()) {
       assert(Ops.size() == 2);
       assert(BuildVectorSlice.empty());
----------------
This assert is confusing now. I think it only makes sense as-is because allowReorder is only true when coming from tryToVectorizePair. But you've removed the size-equal-2 restriction when checking for the reverse consecutive case. I think we should add a more explanatory comment. What do you think?

================
Comment at: test/Transforms/SLPVectorizer/X86/reduction_loads.ll:4-6
@@ +3,5 @@
+; CHECK-LABEL: @test
+; CHECK: %0 = bitcast i32* %p to <8 x i32>*
+; CHECK: %1 = load <8 x i32>, <8 x i32>* %0, align 4
+; CHECK: %2 = mul <8 x i32> <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>, %1
+
----------------
Please use regular expressions for the unnamed instructions to avoid future breakage.


https://reviews.llvm.org/D22554