[PATCH] D20474: when calculating RegUsages, ignore instructions which are uniformed after vectorization

Thu Jun 23 16:52:53 PDT 2016

wmi added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:6247-6249
@@ -6238,15 +6246,5 @@
 
-    // Check that the PHI is only used by the induction increment (UpdateV) or
-    // by GEPs. Then check that UpdateV is only used by a compare instruction or
-    // the loop header PHI.
-    // FIXME: Need precise def-use analysis to determine if this instruction
-    // variable will be vectorized.
-    if (std::all_of(PN->user_begin(), PN->user_end(),
-                    [&](const User *U) -> bool {
-                      return U == UpdateV || isa<GetElementPtrInst>(U);
-                    }) &&
-        std::all_of(UpdateV->user_begin(), UpdateV->user_end(),
-                    [&](const User *U) -> bool {
-                      return U == PN || isa<ICmpInst>(U);
-                    })) {
+    if (std::all_of(PN->user_begin(), PN->user_end(), [&](User *U) -> bool {
+          return Legal->isUniformAfterVectorization(cast<Instruction>(U));
+        }))
       VecValuesToIgnore.insert(PN);
----------------
mkuper wrote:
> wmi wrote:
> > mssimpso wrote:
> > > Hi Wei,
> > > 
> > > I'm joining this review a bit late, so I apologize if I'm not quite up-to-speed yet. But I'm not sure I follow this. Please correct me if I'm missing something!
> > > 
> > > If I take the following test case:
> > > 
> > > ```
> > > define void @test(i32* %a, i64 %n) {
> > > entry:
> > >   br label %for.body
> > > 
> > > for.body:
> > >   %i = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
> > >   %0 = trunc i64 %i to i32 
> > >   %1 = getelementptr inbounds i32, i32* %a, i32 %0
> > >   store i32 %0, i32* %1, align 4
> > >   %i.next = add nuw nsw i64 %i, 1
> > >   %cond = icmp eq i64 %i.next, %n
> > >   br i1 %cond, label %for.end, label %for.body
> > > 
> > > for.end:
> > >   ret void
> > > }
> > > ```
> > > 
> > > We generate vectorized induction variables so that for the store, we have:
> > > 
> > > ```
> > > store <4 x i32> %vec.ind1, <4 x i32>* %3, align 4
> > > ```
> > > 
> > > However, with your change, it looks to me like we will add the induction variable to VecValuesToIngore where previously we wouldn't have. Is this right?
> > > 
> > Thanks for pointing out the problem. For your testcase, %1 is only used in instruction %0 = ... which is a uniform instruction so it will be put into VecValuesToIngore. However, it may be problematic for %0 to be uniform because actually it has both scalar and vector version after vectorization.
> > 
> > I will add your testcase into consideration. Thanks.
> > 
> >  
> > 
> >   
> So the problem isn't here, it's in collectLoopUniforms(), right?
> That is, the problem is that we're tracing back from the consecutive pointer, and adding all of the operands of the store to the worklist, instead of just the pointer operand?
Looks like all of the operands of the store should be added to the worklist, but only those operands not being used in any nonUniform instruction should be regarded as uniform. But that requires collectLoopUniforms algorithm to use some topological order to do uniform check for the values in worklist. Do you think it is the right way to go for collectLoopUniform?



Repository:
  rL LLVM

http://reviews.llvm.org/D20474