[PATCH] D21755: Refine the set of UniformAfterVectorization instructions

Mon Jun 27 17:38:25 PDT 2016

wmi added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4947
@@ +4946,3 @@
+
+  // Loops will not be added into Worklist above. Bundle the instructions in
+  // the loop together and check their users as a group.
----------------
mkuper wrote:
> Did you mean "PHIs will not be added"? Also, can you explain here why they're not added?
> (I mean, this doesn't even apply to all PHIs, IIRC? Some PHIs will be added, it depends on whether the branch depends on the PHI itself, or on UpdateV, right?)
Yes, all PHI will not be added in the above loop. That is because PHI must exist in a dep loop, when we expand Worklist in topological order, all the nodes inside dep loop are impossible to be added into Worklist. 

That is why we want to handle induction PHI specially below.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4962
@@ +4961,3 @@
+      Worklist.insert(PN);
+      Worklist.insert(UpdateV);
+    }
----------------
mkuper wrote:
> This looks a bit weird. If the PHI has a non-uniform use, but the update doesn't, then do we expect to have a vector version of the update?
> More generally, it's a bit odd that we look at exactly one instruction backwards past the Phi. But this looks like it's a net improvement over how it worked before.
If the PHI has a non-uniform use, but the update doesn't, neither of them will be added into uniform set because we will have vector versions for both pre-increment-induction var (%vec.ind) and post-increment-induction var (%step.add).

I only look at one instruction backwords past PHI because I think it covers most of the cases. 

for.body:
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
    ......
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, %step
   br i1 %exitcond, label %for.end, label %for.body
Like the loop here, all the recursive dependents of induction PHI only includes %indvars.iv.next, %indvars.iv and %step. Because it is a vectorizable loop, %step is a loop invariant and should be defined out of loop. Then we only care whether %indvars.iv and %indvars.iv.next (i.e., PN and UpdateV) are uniform.

================
Comment at: test/Transforms/LoopVectorize/X86/uniform-phi.ll:31
@@ +30,3 @@
+; CHECK-LABEL: foo
+; CHECK-DAG: LV: Found uniform instruction:   %cond = icmp eq i64 %i.next, %n
+; CHECK-DAG: LV: Found uniform instruction:   %tmp1 = getelementptr inbounds i32, i32* %a, i32 %tmp0
----------------
mkuper wrote:
> You probably want a CHECK-NOT for %i being found uniform, right?
You are right. I will add it.

Repository:
  rL LLVM

http://reviews.llvm.org/D21755