[PATCH] D79783: [LoopVectorize] Fallback to a scalar epilogue when TP fails

Tue Jun 23 09:37:37 PDT 2020

SjoerdMeijer added a comment.

In D79783#2057310 <https://reviews.llvm.org/D79783#2057310>, @SjoerdMeijer wrote:

> Hi @Ayal , with a bit of delay, I am almost ready to pick this up. Regarding your comments on rG9529597cf4562c64034943dacc29a4ff4fe93d86 <https://reviews.llvm.org/rG9529597cf4562c64034943dacc29a4ff4fe93d86>, I've copied them here below as I thought that would be more convenient, but let me know if you would like to move it to elsewhere. Anyway, I am now going to look into your comments, these ones:
>
> > This omission of Primary Induction Phi's from Scalars in collectLoopScalars(), under foldTail:
> > 
> >   // An induction variable will remain scalar if all users of the induction
> >   // variable and induction variable update remain scalar.
> >   ...
> >   +    // If tail-folding is applied, the primary induction variable will be used
> >   +    // to feed a vector compare.
> >   +    if (Ind == Legal->getPrimaryInduction() && foldTailByMasking())
> >   +      continue;
> >    
> > 
> > should have led widenIntOrFpInduction() to take the following early-exit for Phi's that are not in Scalars:
> > 
> >   // Try to create a new independent vector induction variable. If we can't
> >   // create the phi node, we will splat the scalar induction variable in each
> >   // loop iteration.
> >   if (!shouldScalarizeInstruction(EntryVal)) {
> >    
> > 
> > rather than reaching CreateSplatIV() at the end.
> > 
> > Perhaps the regressions occur when EntryVal is the Trunc rather than the IV Phi?
> >  If so, one remedy may be to also omit Trunc from Scalars; another may be to check if (!shouldScalarizeInstruction(IV) instead-of or in addition to if (!shouldScalarizeInstruction(EntryVal)).
> > 
> > Is there a short reproducer?

Yep, function `@foo_optsize()` in `test/Transforms/LoopVectorize/X86/optsize.ll` is a minimal example. It  will be miscompiled with these 2 lines commented out that I added:

  diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  index ec756671ea6..d39795627ec 100644
  --- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  +++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  @@ -1917,8 +1917,8 @@ void InnerLoopVectorizer::widenIntOrFpInduction(PHINode *IV, TruncInst *Trunc) {
     // vectorised IV. Except when we tail-fold, then the splat IV feeds the
     // predicate used by the masked loads/stores.
     Value *ScalarIV = CreateScalarIV(Step);
  -  if (!Cost->isScalarEpilogueAllowed())
  -    CreateSplatIV(ScalarIV, Step);
  +  //if (!Cost->isScalarEpilogueAllowed())
  +  //  CreateSplatIV(ScalarIV, Step);
     buildScalarSteps(ScalarIV, Step, EntryVal, ID);
   }

The miscompilation is that we don't emit:

  %induction = add <64 x i32> %broadcast.splat, <i32 0, i32 1, ...

and the icmp is incorrectly performed against:

  %2 = icmp ule <64 x i32> %broadcast.splat, <i32 202, i32 202, ...

which should have been:

  %2 = icmp ule <64 x i32> %induction, <i32 202, i32 202, ...

it indeed doesn't take the early exit:

  // Try to create a new independent vector induction variable. If we can't
  // create the phi node, we will splat the scalar induction variable in each              
  // loop iteration.
  if (!shouldScalarizeInstruction(EntryVal)) {

I need to look again if this makes sense or not.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79783/new/

https://reviews.llvm.org/D79783