[PATCH] D79783: [LoopVectorize] Fallback to a scalar epilogue when TP fails

Tue May 12 09:39:18 PDT 2020

Pierre-vh marked 2 inline comments as done.
Pierre-vh added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5021
+    // Cleanup if we already planned to fold the tail.
+    if (FoldTailByMasking) {
+      Legal->abandonTailFoldingByMasking();
----------------
SjoerdMeijer wrote:
> Do we need this if there is no tail? I haven't reminded myself and checked the flow, but can this condition be true?
Yes, it can happen if tail-folding is enabled, and the loop's TC is a multiple of the VF. (e.g. TC=64, VF=16)
In those cases, we do the preparation for tail-folding earlier (line 4968), but then realize here that the loop has no tail, so we must revert (abandon tail folding + clear the flag).

We need this because else it would generate masked load/stores for those kinds of loops, which isn't optimal (normal loads are better). Additionally, it will cause an assertion failure in the MVETailPredication pass (TC cannot be a multiple of the VF in a tail-predicated loop).

================
Comment at: llvm/test/Transforms/LoopVectorize/ARM/tail-folding-scalar-epilogue-fallback.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -S -loop-vectorize -mattr=+armv8.1-m.main,+mve.fp -disable-mve-tail-predication=false < %s | FileCheck %s
+; RUN: opt -S -loop-vectorize -mattr=+armv8.1-m.main,+mve.fp -disable-mve-tail-predication=true < %s | FileCheck %s
----------------
SjoerdMeijer wrote:
> SjoerdMeijer wrote:
> > Sorry for being a bit lazy, but this is a big example, but why is this rejected for tail-predication? Would be good to indicate the reason somewhere, e.g. function name, in the IR.
> > And could this example be reduced?
> This test is enabling MVE tail-predication, meaning that here we "enable" masked loads/stores that enable tail-folding. Thus, since no other options are used, this relies on TTI hook preferPredicateOverEpilogue to set CM_ScalarEpilogueNotNeededUsePredicate. And so, this hook determines for this test that tail-folding was possible, which we then overrule later, is that correct? 
> Sorry for being a bit lazy, but this is a big example, but why is this rejected for tail-predication? Would be good to indicate the reason somewhere, e.g. function name, in the IR.
> And could this example be reduced?

If I remember correctly, this test is rejected because of this: `store i8* %incdec.ptr13, i8** %pos, align 4`, it's an outside user of `%incdec.ptr13` which is defined in the loop.

I'll try to reduce the test a bit more, and I'll add a comment explaining why it should be rejected.

> This test is enabling MVE tail-predication, meaning that here we "enable" masked loads/stores that enable tail-folding. Thus, since no other options are used, this relies on TTI hook preferPredicateOverEpilogue to set CM_ScalarEpilogueNotNeededUsePredicate. And so, this hook determines for this test that tail-folding was possible, which we then overrule later, is that correct?

That is correct, this test is the same as the other one, except it relies on `preferPredicateOverEpilogue` to set the flag.
Should I add something to check that `preferPredicateOverEpilogue`  accepts the loop? (So the test doesn't silently pass if the TTI hook doesn't accept it anymore)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79783/new/

https://reviews.llvm.org/D79783