[PATCH] D115109: [LV] Do not fold the tail when an IV is used outside of the loop.

Sun Feb 13 17:36:25 PST 2022

rickyz marked 3 inline comments as done.
rickyz added a comment.

Apologies for the super long delay on responding to this!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:549

-  // Both the PHI node itself, and the "post-increment" value feeding
-  // back into the PHI node may have external users.
-  // We can allow those uses, except if the SCEVs we have for them rely
-  // on predicates that only hold within the loop, since allowing the exit
-  // currently means re-using this SCEV outside the loop (see PR33706 for more
-  // details).
+  // The induction PHI node may have external users.
+  AllowedExit.insert(Phi);
----------------
fhahn wrote:
> While this indeed fixes the issue for the test case, I think this is not really the right fix.
> 
> IIUC `AllowedExit` is supposed to contain the set of instructions that are *allowed* to have outside users. So we need to check *all* instructions that have outsides users are also in this set.
> 
> But that's not what is happening at the moment. We won't check reduction or induction phis against this set in general.
> 
> For tail folding, we need to make sure that there are no users outside the loop for any induction, independent of whether they are allowed to have outside users or not (i.e. no IV is allowed to have outside users at the moment for tail folding).
> 
> A direct way to fix this would be to use the following
> 
> ```
> --- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
> +++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
> @@ -1292,6 +1292,17 @@ bool LoopVectorizationLegality::prepareToFoldTailByMasking() {
>    for (auto &Reduction : getReductionVars())
>      ReductionLiveOuts.insert(Reduction.second.getLoopExitInstr());
> 
> +  for (auto &Induction : getInductionVars())
> +    if (any_of(Induction.first->users(), [this](User *U) {
> +      return !TheLoop->contains(cast<Instruction>(U));
> +    })) {
> +      LLVM_DEBUG(
> +          dbgs()
> +          << "LV: Cannot fold tail by masking, loop has an outside user for "
> +          << *Induction.first << "\n");
> +      return false;
> +    }
> +
> ```
> 
> Note that it might be possible to compute the final induction value even when tail-folding similar to how we compute the final reduction value, but that's only a potential follow up.
> 
> 
> 
> Independent of that, we may be able to allow users of the phi itself even if there is a SCEV predicate in general. 
> 
> Note that the current logic does not prevent vectorization if the induction phi is used outside in the predicated case. Without tail folding, we never check if an induction phi has users outside the loop. A possible justification is the following: the phi value is guaranteed to be valid for each iteration of the loop as per the SCEV predicate, so even if it is used outside the loop, the value is only used if the predicate holds. For the incremented value this is not true. The value computed in the last iteration may be poison and should not be used outside the scope of the predicate.
Thank you for the detailed explanation and suggestion - that makes sense to me. I'll split out the AllowedExit change into a separate change (it should become a NFC change after this fix).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115109/new/

https://reviews.llvm.org/D115109