[PATCH] D75746: [LoopVectorizer] Simplify branch in the remainder loop for trivial cases
Ayal Zaks via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 3 07:29:59 PDT 2020
Ayal added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3107
+ // If VFxUF is 2 and vector loop is not skipped then remainder executes once.
+ if (VF * UF == 2 && !areSafetyChecksAdded()) {
+ if (BasicBlock *Latch = OrigLoop->getLoopLatch())
----------------
danilaml wrote:
> Ayal wrote:
> > danilaml wrote:
> > > Ayal wrote:
> > > > danilaml wrote:
> > > > > I think this check is enough unless there are other cases in which "remainder loop has `N % (VF*UF)` iterations doesn't hold.
> > > > Note that the trip count of the remainder loop may be **equal** to VF*UF, when loop `requiresScalarEpilogue()`; so in the above special case of VF*UF==2 remainder loop may iterate once **or twice**.
> > > >
> > > > Note that `emitMinimumIterationCountCheck()` also takes care of the case where adding 1 to the backedge-taken count overflows, leading to an incorrect trip count of zero; here too "remainder" loop iterates (much) more than once.
> > > Thanks. I knew I might've missed something. This makes me more skeptical about potential llvm.assume solution.
> > > Am I understanding your note correctly, that adding requiresScalarEpilogue check is enough?
> > Adding `requiresScalarEpilogue()` check is enough to handle the first case, but not the second case.
> > One way to try and handle the second case is to change `getOrCreateVectorTripCount()` so that it relies on `PSE::getBackedgeTakenCount()` w/o adding 1 to it, as this addition (done by `getOrCreateTripCount()`) may overflow to zero.
> > See r209854, and the `max_i32_backedgetaken()` test it added to test/Transforms/LoopVectorize/induction.ll.
> >
> > Another way may be to check if/when adding 1 is known not to overflow.
> I haven't figured out how to make VectorTripCount reliant solely on getBackedgeTakenCount without introducing extra checks/instructions (which seems to be the rationale behind the current code, instead of the one in r209854, at the expense of not vectorizing the rare UINT_MAX loops).
>
> Instead, I've opted in checking whether the overflow might've occurred by using getConstantMaxBackedgeTakenCount. If I understood things correctly, it would return -1 (all ones) in the overflow case.
A sketch of making VectorTripCount work correctly for loops with BTC= UINTMAX as well:
```
Step = VF*UF
BTC = PSE::BackedgeTakenCount()
N = BTC + 1 // could overflow to 0 so do not compute N % Step
if (foldTail) N = N + (Step-1) // for rounding up
R = BTC % Step // Fits foldTail: (N+Step-1)%Step == (BTC+1+Step-1)%Step == (BTC+Step)%Step == BTC%Step
if !(foldTail) { R = R + 1 // Fits requiresScalarEpilog: produces 0 < R <= Step
if !(requiresScalarEpilog) R = (R == Step ? 0 : R) == R % Step}
VectorTripCount = N - R
```
which could be optimized into
```
Step = VF*UF
BTC = PSE::BackedgeTakenCount()
R = BTC % Step
VTC = BTC - R
if !(requiresScalarEpilog) VTC = (R == Step-1 ? VTC + Step : VTC)
if (foldTail) VTC = VTC + Step
VectorTripCount = VTC
```
This also allows foldTail to work with Steps (i.e., UF's) that are not a power of 2.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D75746/new/
https://reviews.llvm.org/D75746
More information about the llvm-commits
mailing list