[PATCH] D92132: [LV] Support widened induction variables in epilogue vectorization.

Wed Oct 5 14:00:23 PDT 2022

Ayal added a comment.

Good improvement, adding various nits.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7883

   // Keep track of bypass blocks, as they feed start values to the induction
   // phis in the scalar loop preheader.
----------------
nit: and reduction?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7908
+    // reduction phis only.
+    if (all_of(Phi->blocks(), [&](BasicBlock *IncB) {
+          return EPI.EpilogueIterationCountCheck != IncB;
----------------
nit: none_of?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10502
+        // before vectorizing the epilogue loop.
         for (VPRecipeBase &R : Header->phis()) {
           if (auto *ReductionPhi = dyn_cast<VPReductionPHIRecipe>(&R)) {
----------------
nit: can cast R into VPHeaderPHIRecipe and have it provide SetStartValue() to be used by all below.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10508
               ReductionPhi->setOperand(0, StartVal);
             }
           }
----------------
early continue? Or have all cases issue a common R.setOperand(0, StartVal)?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10522
+            ID = &Ind->getInductionDescriptor();
+          }
+          if (!IndPhi)
----------------
else it's an FOR and continue - they resume ok?

Rather than check and continue, better assert (IndPhi && "...") for inductions - whose start/resume value must always be fixed?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10527
+              IndPhi, *ID, {EPI.MainLoopIterationCountCheck});
+          R.setOperand(0, BestEpiPlan.getOrAddExternalDef(ResumeVal));
         }
----------------
Compare with how other "live-ins" are handled by creating empty VPValues registered with VPlan to be filled with Values created during VPlan::prepareToExecute(). Are these live-ins better handled here, inside  LoopVectorizePass::processLoop() which is already overburdened with 400 LOC? Can leave behind a TODO...

================
Comment at: llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll:173
 ; AVX512-LABEL: @foo2(
-; AVX512-NEXT:  entry:
+; AVX512-NEXT:  iter.check:
 ; AVX512-NEXT:    br label [[VECTOR_BODY:%.*]]
----------------
nit: why this redundant label change, here and additional multiple instances below?

================
Comment at: llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-limitations.ll:12
 define signext i32 @f2(i8* noalias %A, i32 signext %n) {
+; CHECK-LABEL: @f2(
+; CHECK-NEXT:  entry:
----------------
Adding assertions to show we can vectorize main loop but not epilog?

================
Comment at: llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-limitations.ll:89
   ret i32 %i.0.lcssa
 }
----------------
worth retaining f3 testcase to demonstrate that widened/truncated inductions do get epilog vectorized, or becomes redundant?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92132/new/

https://reviews.llvm.org/D92132