[PATCH] D79783: [LoopVectorize] Fallback to a scalar epilogue when TP fails
Sjoerd Meijer via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 23 09:37:37 PDT 2020
SjoerdMeijer added a comment.
In D79783#2057310 <https://reviews.llvm.org/D79783#2057310>, @SjoerdMeijer wrote:
> Hi @Ayal , with a bit of delay, I am almost ready to pick this up. Regarding your comments on rG9529597cf4562c64034943dacc29a4ff4fe93d86 <https://reviews.llvm.org/rG9529597cf4562c64034943dacc29a4ff4fe93d86>, I've copied them here below as I thought that would be more convenient, but let me know if you would like to move it to elsewhere. Anyway, I am now going to look into your comments, these ones:
>
> > This omission of Primary Induction Phi's from Scalars in collectLoopScalars(), under foldTail:
> >
> > // An induction variable will remain scalar if all users of the induction
> > // variable and induction variable update remain scalar.
> > ...
> > + // If tail-folding is applied, the primary induction variable will be used
> > + // to feed a vector compare.
> > + if (Ind == Legal->getPrimaryInduction() && foldTailByMasking())
> > + continue;
> >
> >
> > should have led widenIntOrFpInduction() to take the following early-exit for Phi's that are not in Scalars:
> >
> > // Try to create a new independent vector induction variable. If we can't
> > // create the phi node, we will splat the scalar induction variable in each
> > // loop iteration.
> > if (!shouldScalarizeInstruction(EntryVal)) {
> >
> >
> > rather than reaching CreateSplatIV() at the end.
> >
> > Perhaps the regressions occur when EntryVal is the Trunc rather than the IV Phi?
> > If so, one remedy may be to also omit Trunc from Scalars; another may be to check if (!shouldScalarizeInstruction(IV) instead-of or in addition to if (!shouldScalarizeInstruction(EntryVal)).
> >
> > Is there a short reproducer?
Yep, function `@foo_optsize()` in `test/Transforms/LoopVectorize/X86/optsize.ll` is a minimal example. It will be miscompiled with these 2 lines commented out that I added:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index ec756671ea6..d39795627ec 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -1917,8 +1917,8 @@ void InnerLoopVectorizer::widenIntOrFpInduction(PHINode *IV, TruncInst *Trunc) {
// vectorised IV. Except when we tail-fold, then the splat IV feeds the
// predicate used by the masked loads/stores.
Value *ScalarIV = CreateScalarIV(Step);
- if (!Cost->isScalarEpilogueAllowed())
- CreateSplatIV(ScalarIV, Step);
+ //if (!Cost->isScalarEpilogueAllowed())
+ // CreateSplatIV(ScalarIV, Step);
buildScalarSteps(ScalarIV, Step, EntryVal, ID);
}
The miscompilation is that we don't emit:
%induction = add <64 x i32> %broadcast.splat, <i32 0, i32 1, ...
and the icmp is incorrectly performed against:
%2 = icmp ule <64 x i32> %broadcast.splat, <i32 202, i32 202, ...
which should have been:
%2 = icmp ule <64 x i32> %induction, <i32 202, i32 202, ...
it indeed doesn't take the early exit:
// Try to create a new independent vector induction variable. If we can't
// create the phi node, we will splat the scalar induction variable in each
// loop iteration.
if (!shouldScalarizeInstruction(EntryVal)) {
I need to look again if this makes sense or not.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D79783/new/
https://reviews.llvm.org/D79783
More information about the llvm-commits
mailing list