[PATCH] D115109: [LV] Fix logic preventing tail-folding when an IV is used outside of the loop

Wed Dec 22 03:54:04 PST 2021

fhahn requested changes to this revision.
fhahn added a comment.
This revision now requires changes to proceed.

Thanks for the patch!



================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:549
 
-  // Both the PHI node itself, and the "post-increment" value feeding
-  // back into the PHI node may have external users.
-  // We can allow those uses, except if the SCEVs we have for them rely
-  // on predicates that only hold within the loop, since allowing the exit
-  // currently means re-using this SCEV outside the loop (see PR33706 for more
-  // details).
+  // The induction PHI node may have external users.
+  AllowedExit.insert(Phi);
----------------
While this indeed fixes the issue for the test case, I think this is not really the right fix.

IIUC `AllowedExit` is supposed to contain the set of instructions that are *allowed* to have outside users. So we need to check *all* instructions that have outsides users are also in this set.

But that's not what is happening at the moment. We won't check reduction or induction phis against this set in general.

For tail folding, we need to make sure that there are no users outside the loop for any induction, independent of whether they are allowed to have outside users or not (i.e. no IV is allowed to have outside users at the moment for tail folding).

A direct way to fix this would be to use the following

```

--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -1292,6 +1292,17 @@ bool LoopVectorizationLegality::prepareToFoldTailByMasking() {
   for (auto &Reduction : getReductionVars())
     ReductionLiveOuts.insert(Reduction.second.getLoopExitInstr());

+  for (auto &Induction : getInductionVars())
+    if (any_of(Induction.first->users(), [this](User *U) {
+      return !TheLoop->contains(cast<Instruction>(U));
+    })) {
+      LLVM_DEBUG(
+          dbgs()
+          << "LV: Cannot fold tail by masking, loop has an outside user for "
+          << *Induction.first << "\n");
+      return false;
+    }
+
```

Note that it might be possible to compute the final induction value even when tail-folding similar to how we compute the final reduction value, but that's only a potential follow up.



Independent of that, we may be able to allow users of the phi itself even if there is a SCEV predicate in general. 

Note that the current logic does not prevent vectorization if the induction phi is used outside in the predicated case. Without tail folding, we never check if an induction phi has users outside the loop. A possible justification is the following: the phi value is guaranteed to be valid for each iteration of the loop as per the SCEV predicate, so even if it is used outside the loop, the value is only used if the predicate holds. For the incremented value this is not true. The value computed in the last iteration may be poison and should not be used outside the scope of the predicate.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:570
   /// Set up the values of the IVs correctly when exiting the vector loop.
   void fixupIVUsers(PHINode *OrigPhi, const InductionDescriptor &II,
+                    Value *VectorTripCount, Value *EndValue,
----------------
All changes in this file seem to be a cleanup independent of fixing the issue. If that's the case, could you submit them as a separate patch for review?


================
Comment at: llvm/test/Transforms/LoopVectorize/pr52335.ll:5
+;
+; #include <stddef.h>
+; #include <stdint.h>
----------------
The C example here is mostly irrelevant as the IR generated for it may change over time and depends on the flags provided and so on. The IR contains all the relevant information and is compact. Ideally the comment for the function would describe the issue. The C source is available at the bug report, which should be sufficient.


================
Comment at: llvm/test/Transforms/LoopVectorize/pr52335.ll:19
+
+target triple = "x86_64-pc-linux-gnu"
+
----------------
this should not be needed with `-force-vector-width=4`. To be safe, also pass `-force-vector-interleave=1`.


================
Comment at: llvm/test/Transforms/LoopVectorize/pr52335.ll:38
+
+for.cond.for.cond.cleanup_crit_edge:
+  %conv12.lcssa = phi i64 [ %conv12, %for.body ]
----------------
nit: moving the exit blocks after the loop slightly increases readability. Also the names of the blocks could be a bit more compact.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115109/new/

https://reviews.llvm.org/D115109