[PATCH] D113223: [VPlan] Add VPCanonicalIVRecipe, partly retire createInductionVariable.

Florian Hahn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Dec 26 10:58:53 PST 2021


fhahn added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:575
                     BasicBlock *MiddleBlock);
 
+  void createLatchTerminator(Loop *L);
----------------
Ayal wrote:
> Worth documenting createLatchTerminator()?
> Is it really needed, or can branch be created (terminator replaced) along with its compare?
I added a doc-comment.

I think it is still needed for now, as `widenIntOrFpInduction sets the incoming value for vector phis, and for that the backedge needs to be already set up.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:770
   PHINode *OldInduction = nullptr;
 
+  Value *IVStartValue = nullptr;
----------------
Ayal wrote:
> Document IVStartValue, which being canonical is zero by default (if left as null), but can be set to some other offset to support epilog loop vectorization?
> 
> Rename CanonicalIVStartValue?
I updated the code so it is not stored in ILV directly. the skeleton creation methods return it instead, so it is more local.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1039
 /// Return a value for Step multiplied by VF.
 static Value *createStepForVF(IRBuilder<> &B, Type *Ty, ElementCount VF,
                               int64_t Step) {
----------------
Ayal wrote:
> Same createStepForVF() function now implemented also in VPlan.cpp?
Updated to be shared.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1132
       if (isa<VPWidenMemoryInstructionRecipe>(CurRec) ||
-          isa<VPInterleaveRecipe>(CurRec))
+          isa<VPInterleaveRecipe>(CurRec) || isa<VPCanonicalIVRecipe>(CurRec))
         continue;
----------------
Ayal wrote:
> (Backward slice traversal should stop/continue at any header phi recipe? Possibly in a potential separate patch)
I'll do that in a separate patch.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2445
   auto *LoopVectorLatch = LI->getLoopFor(LoopVectorBody)->getLoopLatch();
   auto *Br = cast<BranchInst>(LoopVectorLatch->getTerminator());
+  LastInduction->moveBefore(Br);
----------------
Ayal wrote:
> (This change is required because the condition is now being created later.)
> (When this last step is given a separate recipe, it should be placed at the end of the latch VPBB, rather than sinking it here during code-gen.)
Yes, that should be doable once we have separate increment recipes for vector inductions,


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2711
+              ? Builder.CreateSExtOrTrunc(PrimInd, IV->getType())
+              : Builder.CreateCast(Instruction::SIToFP, PrimInd, IV->getType());
       ScalarIV = emitTransformedIndex(Builder, ScalarIV, PSE.getSE(), DL, ID,
----------------
Ayal wrote:
> Use ScalarIV instead of PrimInd above, similar to resetting ScalarIV below?
update, thanks


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3120
 
-  // Create i+1 and fill the PHINode.
-  //
-  // If the tail is not folded, we know that End - Start >= Step (either
-  // statically or through the minimum iteration checks). We also know that both
-  // Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
-  // %Step == %End. Hence we must exit the loop before %IV + %Step unsigned
-  // overflows and we can mark the induction increment as NUW.
-  Value *Next = B.CreateAdd(Induction, Step, "index.next",
-                            /*NUW=*/!Cost->foldTailByMasking(), /*NSW=*/false);
-  Induction->addIncoming(Start, L->getLoopPreheader());
-  Induction->addIncoming(Next, Latch);
   // Create the compare.
+  B.CreateCondBr(B.getTrue(), L->getUniqueExitBlock(), Header);
----------------
Ayal wrote:
> Above comment is obsolete.
> Create the latch terminator later along with its compare? Seems more like "replace" latch terminator than "create" it?
> Above comment is obsolete.

removed.


> Create the latch terminator later along with its compare? Seems more like "replace" latch terminator than "create" it?

Unfortunately `widenIntOrFpInduction` still creates and sets the incoming values for vector inductions. To set the incoming value from the backedge for a phi, the latch already needs to be connected to the header. 


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3637
   // The loop step is equal to the vectorization factor (num of SIMD elements)
   // times the unroll factor (num of SIMD instructions).
   Builder.SetInsertPoint(&*Lp->getHeader()->getFirstInsertionPt());
----------------
Ayal wrote:
> Above comment should be moved along with createStepForVF?
moved to VPlan.cpp


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3638
   // times the unroll factor (num of SIMD instructions).
   Builder.SetInsertPoint(&*Lp->getHeader()->getFirstInsertionPt());
   Value *CountRoundDown = getOrCreateVectorTripCount(Lp);
----------------
Ayal wrote:
> Still need to set insertion point here?
nope, removed


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3639
   Builder.SetInsertPoint(&*Lp->getHeader()->getFirstInsertionPt());
-  Value *Step = createStepForVF(Builder, IdxTy, VF, UF);
   Value *CountRoundDown = getOrCreateVectorTripCount(Lp);
+  createLatchTerminator(Lp);
----------------
Ayal wrote:
> Set CountRoundDown next to its use, or have createInductionResumeValues() get or create the vector trip count.
sunk to `createInductionResumeValues`


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7956
+    VPBasicBlock *Header = BestVPlan.getEntry()->getEntryBasicBlock();
+    auto *IV = cast<VPCanonicalIVRecipe>(&*Header->begin());
+    VPValue *VPV = new VPValue(ILV.IVStartValue);
----------------
Ayal wrote:
> Worth explaining that if IVStartValue is not set then CanonicalIV will use zero as its start value, as set initially by its constructor. Or perhaps set it to zero here inside an 'else'?
> 
> `IV = BestPlan->getCanonicalIV();`?
> 
> This seems to be better placed at the end of stage 1, than beginning of stage 2? Or stage -1 of VPlan::execute()
> 
> Slightly better to first wrap IVStartValue with VPV and add it as an externalDef, and then look up IV to finally replace its first operand with VPV?
> IV = BestPlan->getCanonicalIV();?


Updated to use that.

> This seems to be better placed at the end of stage 1, than beginning of stage 2? Or stage -1 of VPlan::execute()

I moved it to the the end of stage 1, with a comment saying why and when we need to replace the start value.

> Slightly better to first wrap IVStartValue with VPV and add it as an externalDef, and then look up IV to finally replace its first operand with VPV?

reordered!


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8935
 
+static void addInduction(VPlan &Plan, Type *IdxTy, Instruction *DLInst,
+                         bool HasNUW, bool IsVPlanNative) {
----------------
Ayal wrote:
> addInduction >> addCanonicalIV? (starting from zero by default; and add its increment)
> 
> Pass a DebugLoc instead of DLInst, as it is passed to recipe constructor?
> addInduction >> addCanonicalIV? (starting from zero by default; and add its increment)


renamed & added a comment

> Pass a DebugLoc instead of DLInst, as it is passed to recipe constructor?

done!


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8950
+  } else
+    cast<VPBasicBlock>(Header)->appendRecipe(PrimaryInd);
+
----------------
Ayal wrote:
> In both cases PrimaryInd (CanonicalIV?) should be inserted as the first recipe of Header?
Yes, but unfortunately in the native path it looks like there's a pre-header block already. Perhaps we should make sure the native path also wraps the vector loop in a region.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8952
+
+  auto *InductionIncrement = cast<VPInstruction>(
+      new VPInstruction(HasNUW ? VPInstruction::InductionIncrementNUW
----------------
Ayal wrote:
> Is the cast needed?
removed


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8961
+    EB = cast<VPBasicBlock>(EB->getSinglePredecessor());
+  VPBasicBlock *Exit = cast<VPBasicBlock>(EB);
+  Exit->appendRecipe(InductionIncrement);
----------------
Ayal wrote:
> is the cast needed?
No, also dropped the variable.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:663
   switch (getOpcode()) {
+  case Instruction::PHI: {
+    break;
----------------
Ayal wrote:
> Which VPInstruction uses this opcode and should be ignored here?
looks like a leftover from an earlier iteration, removed


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:734
+      Value *Next = Builder.CreateAdd(Phi, Step, "index.next", IsNUW, false);
+      State.set(this, Next, Part);
+    }
----------------
Ayal wrote:
> Only user is Phi, can hook-up back to it now?
I'm not sure, modifying an instruction created by a different recipe here seems like a bit of a stretch.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:824
 void VPlan::execute(VPTransformState *State) {
   // -1. Check if the backedge taken count is needed, and if so build it.
   if (BackedgeTakenCount && BackedgeTakenCount->getNumUsers()) {
----------------
Ayal wrote:
> Should all invariant values that are generated in the pre-header be modelled as VPValues similar to BackedgeTakenCount, until the preheader is modelled as VPBB with recipes? I.e., treat VectorTripCount, [Canonical]IVStartValue and "CanonicalIVStepValue" (returned by createStepForVF()) similar to VPlan's BackedgeTakenCount, which is hooked-up to its State->TripCount IR Value here?
I am not sure if it is worth it at the moment, as long as they are only used by a single recipe. Specifically, the start value is mostly a constant (0), the step is currently only used by the induction increment and the vector trip count is currently only used in post-processing. VectorTripCount is moved to a dedicated VPValue in the patch that introduces recipes for the exit condition. The step value could be moved out as follow up. WDYT? 


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:903
 
+  // Fix the latch value of reduction and first-order recurrences phis in the
+  // vector loop.
----------------
Ayal wrote:
> and canonical IV phis
updated the comment


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:914
+      auto *P = cast<PHINode>(State->get(Ind, VPIteration(0, 0)));
+      P->setName("index");
+      auto *LatchBB = State->LI->getLoopFor(State->CFG.PrevBB)->getLoopLatch();
----------------
Ayal wrote:
> The name was already set to "index" initially when `Ind` executed and generated P?
Yes, it can be removed.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:916
+      auto *LatchBB = State->LI->getLoopFor(State->CFG.PrevBB)->getLoopLatch();
+      P->addIncoming(State->get(BackedgeValue, 0), LatchBB);
+      auto *Next = cast<Instruction>(P->getIncomingValueForBlock(LatchBB));
----------------
Ayal wrote:
> Fold addIncoming into that of FOR and Reductions below? For Canonical IV's, `SinglePartNeeded` should be true, but `Val` needs to get `Part` (i.e., 0, rather than UF-1).
> 
Updated the code as suggested, but a few more knobs are needed.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:1322
+  EntryPart->setDebugLoc(DL);
+  State.set(this, EntryPart, VPIteration(0, 0));
+}
----------------
Ayal wrote:
> Store the Phi value per-part 0, rather than per-lane 0? That's how uniform (UAV) scalars are stored, and also how the induction increment is stored. Admittedly there should be a 'once/singleton' option in addition to per-lane and per-part.
Done!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113223/new/

https://reviews.llvm.org/D113223



More information about the llvm-commits mailing list