[PATCH] D50778: [LV] Vectorize loops where non-phi instructions used outside loop

Mon Aug 20 09:11:57 PDT 2018

anna added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3727
+      if (isa<Instruction>(IncomingValue)) 
+          LastLane = Cost->isUniformAfterVectorization(
+                         cast<Instruction>(IncomingValue), VF)
----------------
anna wrote:
> Ayal wrote:
> > anna wrote:
> > > I have trouble getting a test case that exercises this path. Tried couple of different uniform instructions (GEPs with constant operands). In all cases, we were vectorizing these and then extracting the VF -1 th element. 
> > > Any ideas on what will be a good test case for this? Thanks.
> > > 
> > Yes, the GEPs themselves are immune, because they are checked by `UsersAreMemAccesses()` to see if all their users are loads/stores, so if they have a loop-closed phi as a user they will not be regarded as uniform. The following should work:
> > 
> > ```
> > ; RUN: opt < %s -loop-vectorize -S
> > 
> > target datalayout = "E-m:e-p:32:32-i64:32-f64:32:64-a:0:32-n32-S128"
> > 
> > @tab = common global [32 x i8] zeroinitializer, align 1
> > 
> > define i32 @uniform_live_out() {
> > entry:
> >   br label %for.body
> > 
> > for.body:                                         ; preds = %for.body, %entry
> >   %i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
> >   %i.09 = add i32 %i.08, 7
> >   %arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.09
> >   %0 = load i8, i8* %arrayidx, align 1
> >   %bump = add i8 %0, 1
> >   store i8 %bump, i8* %arrayidx, align 1
> >   %inc = add nsw i32 %i.08, 1
> >   %exitcond = icmp eq i32 %i.08, 20000
> >   br i1 %exitcond, label %for.end, label %for.body
> > 
> > for.end:                                          ; preds = %for.body
> >   %lcssa = phi i32 [%i.09, %for.body]
> >   %arrayidx.out = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %lcssa
> >   store i8 42, i8* %arrayidx.out, align 1
> >   ret i32 0
> > }
> > ```
> > 
> > Sorry about this. It's really due to the way uniform values are stored. Would be better to have an API for obtaining the scalar value corresponding to the last VF-1 lane, regardless of how it's stored, instead of having users check if VF-1 or 0 should be passed.
> > 
> > But wait, this test shows that `LoopVectorizationCostModel::collectLoopUniforms()` must be updated too, to refrain from recognizing instructions with live-outs as being uniform; except for instructions that are inherently uniform/invariant (i.e., all VF values will be the same), and induction variables with their bumps - which are pre-computed rather than have their last-iteration value taken.
> thanks Ayal. I've made the change in `collectLoopUniforms` to avoid incorrectly recognizing instructions with outside uses as uniform (unless those uses themselves were uniform). So, `%i.09 = add i32 %i.08, 7` is no longer scalarized and we extract the VF-1th element. Without that change, we were recognizing i.09 as uniform. 
> I will add this test case and find another one where the live-out is of a really uniform instruction.
> But wait, this test shows that LoopVectorizationCostModel::collectLoopUniforms() must be updated too, to refrain from recognizing instructions with live-outs as being uniform; except for instructions that are inherently uniform/invariant (i.e., all VF values will be the same), and induction variables with their bumps - which are pre-computed rather than have their last-iteration value taken.

IIUC, the change in the latest diff I have will recognize the following as uniform instructions:
1. the IVs and bumps which satisfy certain conditions as uniforms (no change there),
2.  uses of operands of uniform instruction which satisfy certain conditions - already marked uniform or is a uniform pointer operand of load/store (topologically identifying uniform instructions)

Instructions that are used outside and don't satisfy either of above are not marked uniform: For example "%i.09 = add i32 %i.08, 7". 

But then this patch will not have any incoming non-induction phis that need to be updated through `fixLCSSA` and are uniformAfterVectorization. 

I think I might be missing something in what you're stating?

Repository:
  rL LLVM

https://reviews.llvm.org/D50778