[PATCH] D113223: [VPlan] Add VPCanonicalIVRecipe, partly retire createInductionVariable.

Ayal Zaks via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Dec 25 03:21:17 PST 2021


Ayal added a comment.

General direction is fine! Sharing thoughts and specific suggestions.



================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:510
   /// is provided, the integer induction variable will first be truncated to
   /// the corresponding type.
   void widenIntOrFpInduction(PHINode *IV, const InductionDescriptor &ID,
----------------
Document \p PrimInd

Rename PrimInd to CanonicalIV or vice-versa rename CanonicalIV to PrimaryIV, throughout?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:575
                     BasicBlock *MiddleBlock);
 
+  void createLatchTerminator(Loop *L);
----------------
Worth documenting createLatchTerminator()?
Is it really needed, or can branch be created (terminator replaced) along with its compare?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:770
   PHINode *OldInduction = nullptr;
 
+  Value *IVStartValue = nullptr;
----------------
Document IVStartValue, which being canonical is zero by default (if left as null), but can be set to some other offset to support epilog loop vectorization?

Rename CanonicalIVStartValue?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1039
 /// Return a value for Step multiplied by VF.
 static Value *createStepForVF(IRBuilder<> &B, Type *Ty, ElementCount VF,
                               int64_t Step) {
----------------
Same createStepForVF() function now implemented also in VPlan.cpp?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1132
       if (isa<VPWidenMemoryInstructionRecipe>(CurRec) ||
-          isa<VPInterleaveRecipe>(CurRec))
+          isa<VPInterleaveRecipe>(CurRec) || isa<VPCanonicalIVRecipe>(CurRec))
         continue;
----------------
(Backward slice traversal should stop/continue at any header phi recipe? Possibly in a potential separate patch)


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2445
   auto *LoopVectorLatch = LI->getLoopFor(LoopVectorBody)->getLoopLatch();
   auto *Br = cast<BranchInst>(LoopVectorLatch->getTerminator());
+  LastInduction->moveBefore(Br);
----------------
(This change is required because the condition is now being created later.)
(When this last step is given a separate recipe, it should be placed at the end of the latch VPBB, rather than sinking it here during code-gen.)


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2711
+              ? Builder.CreateSExtOrTrunc(PrimInd, IV->getType())
+              : Builder.CreateCast(Instruction::SIToFP, PrimInd, IV->getType());
       ScalarIV = emitTransformedIndex(Builder, ScalarIV, PSE.getSE(), DL, ID,
----------------
Use ScalarIV instead of PrimInd above, similar to resetting ScalarIV below?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3120
 
-  // Create i+1 and fill the PHINode.
-  //
-  // If the tail is not folded, we know that End - Start >= Step (either
-  // statically or through the minimum iteration checks). We also know that both
-  // Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
-  // %Step == %End. Hence we must exit the loop before %IV + %Step unsigned
-  // overflows and we can mark the induction increment as NUW.
-  Value *Next = B.CreateAdd(Induction, Step, "index.next",
-                            /*NUW=*/!Cost->foldTailByMasking(), /*NSW=*/false);
-  Induction->addIncoming(Start, L->getLoopPreheader());
-  Induction->addIncoming(Next, Latch);
   // Create the compare.
+  B.CreateCondBr(B.getTrue(), L->getUniqueExitBlock(), Header);
----------------
Above comment is obsolete.
Create the latch terminator later along with its compare? Seems more like "replace" latch terminator than "create" it?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3637
   // The loop step is equal to the vectorization factor (num of SIMD elements)
   // times the unroll factor (num of SIMD instructions).
   Builder.SetInsertPoint(&*Lp->getHeader()->getFirstInsertionPt());
----------------
Above comment should be moved along with createStepForVF?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3638
   // times the unroll factor (num of SIMD instructions).
   Builder.SetInsertPoint(&*Lp->getHeader()->getFirstInsertionPt());
   Value *CountRoundDown = getOrCreateVectorTripCount(Lp);
----------------
Still need to set insertion point here?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3639
   Builder.SetInsertPoint(&*Lp->getHeader()->getFirstInsertionPt());
-  Value *Step = createStepForVF(Builder, IdxTy, VF, UF);
   Value *CountRoundDown = getOrCreateVectorTripCount(Lp);
+  createLatchTerminator(Lp);
----------------
Set CountRoundDown next to its use, or have createInductionResumeValues() get or create the vector trip count.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7956
+    VPBasicBlock *Header = BestVPlan.getEntry()->getEntryBasicBlock();
+    auto *IV = cast<VPCanonicalIVRecipe>(&*Header->begin());
+    VPValue *VPV = new VPValue(ILV.IVStartValue);
----------------
Worth explaining that if IVStartValue is not set then CanonicalIV will use zero as its start value, as set initially by its constructor. Or perhaps set it to zero here inside an 'else'?

`IV = BestPlan->getCanonicalIV();`?

This seems to be better placed at the end of stage 1, than beginning of stage 2? Or stage -1 of VPlan::execute()

Slightly better to first wrap IVStartValue with VPV and add it as an externalDef, and then look up IV to finally replace its first operand with VPV?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8935
 
+static void addInduction(VPlan &Plan, Type *IdxTy, Instruction *DLInst,
+                         bool HasNUW, bool IsVPlanNative) {
----------------
addInduction >> addCanonicalIV? (starting from zero by default; and add its increment)

Pass a DebugLoc instead of DLInst, as it is passed to recipe constructor?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8950
+  } else
+    cast<VPBasicBlock>(Header)->appendRecipe(PrimaryInd);
+
----------------
In both cases PrimaryInd (CanonicalIV?) should be inserted as the first recipe of Header?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8952
+
+  auto *InductionIncrement = cast<VPInstruction>(
+      new VPInstruction(HasNUW ? VPInstruction::InductionIncrementNUW
----------------
Is the cast needed?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8961
+    EB = cast<VPBasicBlock>(EB->getSinglePredecessor());
+  VPBasicBlock *Exit = cast<VPBasicBlock>(EB);
+  Exit->appendRecipe(InductionIncrement);
----------------
is the cast needed?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:663
   switch (getOpcode()) {
+  case Instruction::PHI: {
+    break;
----------------
Which VPInstruction uses this opcode and should be ignored here?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:728
+  case VPInstruction::InductionIncrementNUW: {
+    if (Part == 0) {
+      bool IsNUW = getOpcode() == VPInstruction::InductionIncrementNUW;
----------------
early-break if Part != 0


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:734
+      Value *Next = Builder.CreateAdd(Phi, Step, "index.next", IsNUW, false);
+      State.set(this, Next, Part);
+    }
----------------
Only user is Phi, can hook-up back to it now?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:824
 void VPlan::execute(VPTransformState *State) {
   // -1. Check if the backedge taken count is needed, and if so build it.
   if (BackedgeTakenCount && BackedgeTakenCount->getNumUsers()) {
----------------
Should all invariant values that are generated in the pre-header be modelled as VPValues similar to BackedgeTakenCount, until the preheader is modelled as VPBB with recipes? I.e., treat VectorTripCount, [Canonical]IVStartValue and "CanonicalIVStepValue" (returned by createStepForVF()) similar to VPlan's BackedgeTakenCount, which is hooked-up to its State->TripCount IR Value here?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:903
 
+  // Fix the latch value of reduction and first-order recurrences phis in the
+  // vector loop.
----------------
and canonical IV phis


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:910
+  }
+  for (VPRecipeBase &R : Header->phis()) {
+    if (auto *Ind = dyn_cast<VPCanonicalIVRecipe>(&R)) {
----------------
Start the loop with the early-continue?

```
    auto *PhiR = dyn_cast<VPWidenPHIRecipe>(&R);
    if (!PhiR || !(isa<VPCanonicalIVRecipe>(&R) ||
                   isa<VPFirstOrderRecurrencePHIRecipe>(&R) ||
                   isa<VPReductionPHIRecipe>(&R)))
      continue;
```


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:914
+      auto *P = cast<PHINode>(State->get(Ind, VPIteration(0, 0)));
+      P->setName("index");
+      auto *LatchBB = State->LI->getLoopFor(State->CFG.PrevBB)->getLoopLatch();
----------------
The name was already set to "index" initially when `Ind` executed and generated P?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:916
+      auto *LatchBB = State->LI->getLoopFor(State->CFG.PrevBB)->getLoopLatch();
+      P->addIncoming(State->get(BackedgeValue, 0), LatchBB);
+      auto *Next = cast<Instruction>(P->getIncomingValueForBlock(LatchBB));
----------------
Fold addIncoming into that of FOR and Reductions below? For Canonical IV's, `SinglePartNeeded` should be true, but `Val` needs to get `Part` (i.e., 0, rather than UF-1).



================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:921
+      auto *ICmp = cast<Instruction>(
+          State->Builder.CreateICmpEQ(Next, State->VectorTripCount));
+      TermBr->setCondition(ICmp);
----------------
Would it be simpler to have a recipe that takes care of generating the canonical IV increment, the compare, and the branch? Rather than have Increment[NUW] VPInstructions take care of the former only, have the compare generated here at stage 3, and have the branch generated by createLatchTerminator() at the outset when building the skeleton?
I.e., have VPCanonicalIVRecipe be the first recipe in a (non-replicating) region, and say VPBranchOnCountRecipe be the last - analogous to BCT instructions and complementing VPBranchOnMaskRecipe?
VPCanonicalIVRecipe can alternatively be named VPLoopCountPhiRecipe.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:1322
+  EntryPart->setDebugLoc(DL);
+  State.set(this, EntryPart, VPIteration(0, 0));
+}
----------------
Store the Phi value per-part 0, rather than per-lane 0? That's how uniform (UAV) scalars are stored, and also how the induction increment is stored. Admittedly there should be a 'once/singleton' option in addition to per-lane and per-part.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:350
 
+  Value *VectorTripCount;
+
----------------
Document, place next to TripCount?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1605
 
+class VPCanonicalIVRecipe : public VPRecipeBase, public VPValue {
+  DebugLoc DL;
----------------
Document

Relationship between VPCanonicalIVRecipe and VPWidenCanonicalIVRecipe? This one is the VP[Scalar]CanonicalIVRecipe?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1620
+
+  /// Generate a canonical vector induction variable of the vector loop, with
+  void execute(VPTransformState &State) override;
----------------
"a canonical [vector] induction"

with what?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2297
+
+  VPCanonicalIVRecipe *getCanonicalIV() {
+    VPBasicBlock *EntryVPBB = getEntry()->getEntryBasicBlock();
----------------
Start by calling getVectorLoopRegion()?

Associate getCanonicalIV() with a (non replicating) Region, instead of 'entire' VPlan?
Cache a pointer to VPCanonicalIVRecipe in Region?
(IV's of replicating regions are eliminated by complete unrolling.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113223/new/

https://reviews.llvm.org/D113223



More information about the llvm-commits mailing list