[PATCH] D79783: [LoopVectorize] Fallback to a scalar epilogue when TP fails

Ayal Zaks via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 13 13:36:28 PDT 2020


Ayal added a comment.

There are indeed loops which can be vectorized only with a scalar remainder loop, i.e., w/o foldTail. But the tests attached could easily be made an exception: LV handles induction variables that are live-out by pre-computing their values in the pre-header. So foldTail should work fine in their presence, provided the live-out value is computed using the original trip-count rather than the one foldTail rounds up.

Other live-outs could also be made to work under foldTail, such as selecting the last live element from the last live vector instead of assuming it's the last element of the last part.



================
Comment at: llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h:237
+  /// Abandons tail-folding by masking: clears the sets of masked operations and
+  /// conditional assumes.
+  void abandonTailFoldingByMasking() {
----------------
But they need to be reset to reflect original conditional blocks, as set by `canVectorizeWithIfConvert()` when it initially called `blockCanBePredicated()`.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:1233
 
-bool LoopVectorizationLegality::prepareToFoldTailByMasking() {
+bool LoopVectorizationLegality::prepareToFoldTailByMasking(bool ReportFailure) {
 
----------------
Rather than if/how to report failure, probably better to clone this method into a `canFoldTailByMasking()` (as it was originally called iirc) const method, w/o having any side-effects that may later need to be abandoned, and a `prepareToFoldTailByMasking()` which need not return any value.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:1240
   for (auto &Reduction : getReductionVars())
     ReductionLiveOuts.insert(Reduction.second.getLoopExitInstr());
 
----------------
Try also adding getInductionVars() Phi's, and the values that feed them across the backedge, to the set of live outs allowed by fold tail?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:1257
+            "LiveOutFoldingTailByMasking", ORE, TheLoop, UI);
+      } else {
+        LLVM_DEBUG(
----------------
Should an "analysis" message be reported instead of a "failure" one?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:1262
+            << *UI << "\n");
+      }
       return false;
----------------
Can dump this note regardless of ReportFailure.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:1273
   for (BasicBlock *BB : TheLoop->blocks()) {
     if (!blockCanBePredicated(BB, SafePointers, /* MaskAllLoads= */ true)) {
+      if (ReportFailure) {
----------------
In order (for `canFoldTailByMasking()`) to only check if tail can be folded, this call to `blockCanBePredicated()` should be replaced with one that does not have side-effects.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp:1281
+        LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking\n");
+      }
       return false;
----------------
ditto.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4974
+
+    if (Legal->prepareToFoldTailByMasking(/*ReportFailure=*/false))
+      FoldTailByMasking = true;
----------------
Fold this under the following switch, possibly falling through to the CM_ScalarEpilogueAllowed case?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5027
+    // Abandon tail folding if needed, so we don't generate masked load/stores
+    // for this loop (they're not needed).
+    if (FoldTailByMasking) {
----------------
They (masked load/stores) are needed, but only in blocks that were originally conditional, rather than all blocks of the loop.


================
Comment at: llvm/test/Transforms/LoopVectorize/ARM/tail-folding-scalar-epilogue-fallback.ll:33
+; CHECK-NEXT:    [[N_VEC:%.*]] = sub i32 [[TMP0]], [[N_MOD_VF]]
+; CHECK-NEXT:    [[IND_END:%.*]] = getelementptr i8, i8* [[PTR0]], i32 [[N_VEC]]
+; CHECK-NEXT:    [[IND_END4:%.*]] = sub i32 [[DEC62]], [[N_VEC]]
----------------
IND_END pre-computes the value of %incdec.ptr.


================
Comment at: llvm/test/Transforms/LoopVectorize/use-scalar-epilogue-if-tp-fails.ll:32
+; CHECK-NEXT:    [[N_VEC:%.*]] = sub i32 [[TMP0]], [[N_MOD_VF]]
+; CHECK-NEXT:    [[IND_END:%.*]] = getelementptr i8, i8* [[PTR0]], i32 [[N_VEC]]
+; CHECK-NEXT:    [[IND_END4:%.*]] = sub i32 [[DEC62]], [[N_VEC]]
----------------
ditto.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79783/new/

https://reviews.llvm.org/D79783





More information about the llvm-commits mailing list