[PATCH] D50778: [LV] Vectorize loops where non-phi instructions used outside loop

Mon Aug 20 13:16:06 PDT 2018

Ayal accepted this revision.
Ayal added inline comments.
This revision is now accepted and ready to land.

================
Comment at: lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:728
+          continue;
+        }
         ORE->emit(createMissedAnalysis("ValueUsedOutsideLoop", &I)
----------------
anna wrote:
> Ayal wrote:
> > It would now be good to reason about and mark all exits that are **not allowed** to be live-out, rather than those that are :-). Specifically, Reduction Phi's are on this "not allowed" list, as they represent the one-before-last value, which is not available when vectorized (could possibly be computed by "subtracting" from the final reduced value), unlike the Reduction bump instructions. Induction Phi's and their bumps are, as pointed out here, also on this "not allowed" list when the SCEV predicates cannot be used outside the loop, because the latter are used to pre-compute the live-out values. But such cases could possibly be handled alternatively by extracting from within the loop, similar to internal Instructions, rather than pre-computing. FirstOrderRecurrence Phi's are also currently one the "not allowed" list, unlike their "Previous" value, and could also possibly be handled using extraction. Would be good to confirm that this enumeration is exhaustive.
> I prefer this being a follow-on NFC. By having it as a follow-on "intended" NFC, it also confirms the enumeration for the "not allowed list" is exhaustive :)
> 
> 
OK, sure. May be worth leaving a TODO behind.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:3727
+      if (isa<Instruction>(IncomingValue)) 
+          LastLane = Cost->isUniformAfterVectorization(
+                         cast<Instruction>(IncomingValue), VF)
----------------
anna wrote:
> anna wrote:
> > Ayal wrote:
> > > anna wrote:
> > > > I have trouble getting a test case that exercises this path. Tried couple of different uniform instructions (GEPs with constant operands). In all cases, we were vectorizing these and then extracting the VF -1 th element. 
> > > > Any ideas on what will be a good test case for this? Thanks.
> > > > 
> > > Yes, the GEPs themselves are immune, because they are checked by `UsersAreMemAccesses()` to see if all their users are loads/stores, so if they have a loop-closed phi as a user they will not be regarded as uniform. The following should work:
> > > 
> > > ```
> > > ; RUN: opt < %s -loop-vectorize -S
> > > 
> > > target datalayout = "E-m:e-p:32:32-i64:32-f64:32:64-a:0:32-n32-S128"
> > > 
> > > @tab = common global [32 x i8] zeroinitializer, align 1
> > > 
> > > define i32 @uniform_live_out() {
> > > entry:
> > >   br label %for.body
> > > 
> > > for.body:                                         ; preds = %for.body, %entry
> > >   %i.08 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
> > >   %i.09 = add i32 %i.08, 7
> > >   %arrayidx = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %i.09
> > >   %0 = load i8, i8* %arrayidx, align 1
> > >   %bump = add i8 %0, 1
> > >   store i8 %bump, i8* %arrayidx, align 1
> > >   %inc = add nsw i32 %i.08, 1
> > >   %exitcond = icmp eq i32 %i.08, 20000
> > >   br i1 %exitcond, label %for.end, label %for.body
> > > 
> > > for.end:                                          ; preds = %for.body
> > >   %lcssa = phi i32 [%i.09, %for.body]
> > >   %arrayidx.out = getelementptr inbounds [32 x i8], [32 x i8]* @tab, i32 0, i32 %lcssa
> > >   store i8 42, i8* %arrayidx.out, align 1
> > >   ret i32 0
> > > }
> > > ```
> > > 
> > > Sorry about this. It's really due to the way uniform values are stored. Would be better to have an API for obtaining the scalar value corresponding to the last VF-1 lane, regardless of how it's stored, instead of having users check if VF-1 or 0 should be passed.
> > > 
> > > But wait, this test shows that `LoopVectorizationCostModel::collectLoopUniforms()` must be updated too, to refrain from recognizing instructions with live-outs as being uniform; except for instructions that are inherently uniform/invariant (i.e., all VF values will be the same), and induction variables with their bumps - which are pre-computed rather than have their last-iteration value taken.
> > thanks Ayal. I've made the change in `collectLoopUniforms` to avoid incorrectly recognizing instructions with outside uses as uniform (unless those uses themselves were uniform). So, `%i.09 = add i32 %i.08, 7` is no longer scalarized and we extract the VF-1th element. Without that change, we were recognizing i.09 as uniform. 
> > I will add this test case and find another one where the live-out is of a really uniform instruction.
> > But wait, this test shows that LoopVectorizationCostModel::collectLoopUniforms() must be updated too, to refrain from recognizing instructions with live-outs as being uniform; except for instructions that are inherently uniform/invariant (i.e., all VF values will be the same), and induction variables with their bumps - which are pre-computed rather than have their last-iteration value taken.
> 
> IIUC, the change in the latest diff I have will recognize the following as uniform instructions:
> 1. the IVs and bumps which satisfy certain conditions as uniforms (no change there),
> 2.  uses of operands of uniform instruction which satisfy certain conditions - already marked uniform or is a uniform pointer operand of load/store (topologically identifying uniform instructions)
> 
> Instructions that are used outside and don't satisfy either of above are not marked uniform: For example "%i.09 = add i32 %i.08, 7". 
> 
> But then this patch will not have any incoming non-induction phis that need to be updated through `fixLCSSA` and are uniformAfterVectorization. 
> 
> I think I might be missing something in what you're stating?
This looks fine to me now! - no Uniforms are expected by `fixLCSSA`, so it should indeed continue to always ask for lane VF-1. I.e., w/o the `LastLane` change I suggested above.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4516
   // out of scope. It ensures a uniform instruction will only be used
   // by uniform instructions or out of scope instructions.
   unsigned idx = 0;
----------------
Above comment deserves an update. The original "out of scope" term was somewhat misleading, as isOutOfScope() refers to irrelevant operands rather than live-out users.

Repository:
  rL LLVM

https://reviews.llvm.org/D50778