[PATCH] D113223: [VPlan] Add VPCanonicalIVRecipe, partly retire createInductionVariable.

Ayal Zaks via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Dec 26 14:14:57 PST 2021


Ayal added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8938
+
+  auto *PrimaryInd = new VPCanonicalIVRecipe(StartV, DL);
+  VPRegionBlock *TopRegion = Plan.getVectorLoopRegion();
----------------
PrimaryInd - let's settle for CanonicalIV throughout?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3120
 
-  // Create i+1 and fill the PHINode.
-  //
-  // If the tail is not folded, we know that End - Start >= Step (either
-  // statically or through the minimum iteration checks). We also know that both
-  // Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
-  // %Step == %End. Hence we must exit the loop before %IV + %Step unsigned
-  // overflows and we can mark the induction increment as NUW.
-  Value *Next = B.CreateAdd(Induction, Step, "index.next",
-                            /*NUW=*/!Cost->foldTailByMasking(), /*NSW=*/false);
-  Induction->addIncoming(Start, L->getLoopPreheader());
-  Induction->addIncoming(Next, Latch);
   // Create the compare.
+  B.CreateCondBr(B.getTrue(), L->getUniqueExitBlock(), Header);
----------------
fhahn wrote:
> Ayal wrote:
> > Above comment is obsolete.
> > Create the latch terminator later along with its compare? Seems more like "replace" latch terminator than "create" it?
> > Above comment is obsolete.
> 
> removed.
> 
> 
> > Create the latch terminator later along with its compare? Seems more like "replace" latch terminator than "create" it?
> 
> Unfortunately `widenIntOrFpInduction` still creates and sets the incoming values for vector inductions. To set the incoming value from the backedge for a phi, the latch already needs to be connected to the header. 
>>Create the latch terminator later along with its compare? Seems more like "replace" latch terminator than "create" it?

> Unfortunately widenIntOrFpInduction still creates and sets the incoming values for vector inductions. To set the incoming value from the backedge for a phi, the latch already needs to be connected to the header.

ok, and D113183 is where widenIntOrFpInduction will stop doing that, and join other header phi's which hook-up to their backedge values at the end of VPlan::execute.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8950
+  } else
+    cast<VPBasicBlock>(Header)->appendRecipe(PrimaryInd);
+
----------------
fhahn wrote:
> fhahn wrote:
> > Ayal wrote:
> > > In both cases PrimaryInd (CanonicalIV?) should be inserted as the first recipe of Header?
> > Yes, but unfortunately in the native path it looks like there's a pre-header block already. Perhaps we should make sure the native path also wraps the vector loop in a region.
> I looked into this and wrapping the loops in the native path into regions is a bit more tricky, because of the explicit edge from latch to header; this means the header cannot simply be used as entry block to the loop region. Once we have branch-exit instructions, we should be able to address this.
Sorry, didn't mean to modify the native path, just wondering if something like this would work:

```
  if (IsVPlanNative)
    Header = Header->getSingleSuccessor();
  CanonicalIV->insertBefore(&*cast<VPBasicBlock>(Header)->begin());
```


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:902
+  for (VPRecipeBase &R : Header->phis()) {
+    auto *PhiR = dyn_cast<VPWidenPHIRecipe>(&R);
+    if (!isa<VPCanonicalIVRecipe>(&R) &&
----------------
Have VPCanonicalIVRecipe also inherit from VPWidenPHIRecipe, possibly renaming the latter VPHeaderPHIRecipe, so that it too could benefit from getBackedgeValue()?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:919
+      auto *ICmp = cast<Instruction>(
+          State->Builder.CreateICmpEQ(Next, State->VectorTripCount));
+      TermBr->setCondition(ICmp);
----------------
Introducing the ICmpEQ feeding TermBr should ideally be done by an appropriate (BCT) recipe in the normal course of VPlan::execute() stage 2 above, rather than during this backedge-rewiring stage 3.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:928
+      LastPartForNewPhi = SinglePartNeeded ? 1 : State->UF;
+      BackedgeValue = PhiR->getBackedgeValue();
+    }
----------------
This hopefully could fold into

```
      // Is a single part fed across the backedge, or all UF parts.
      IsSinglePart = isa<VPFirstOrderRecurrencePHIRecipe>(&R) ||
                     isa<VPCanonicalIVRecipe>(&R) ||
                     cast<VPReductionPHIRecipe>(&R)->isOrdered();
      SinglePart = isa<VPCanonicalIVRecipe>(&R) ? 0 : State->UF-1;
      NumPartsForNewPhi = IsSinglePart ? 1 : State->UF;
      BackedgeValue = PhiR->getBackedgeValue();
```
Note that if a single part is fed across the backedge, it must be the last part UF-1 for ordered reductions and FOR, but only the first part 0 is set for the Canonical IV backedge value (it could also set part UF-1...).


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:933
+                       ? State->get(R.getVPSingleValue(), VPIteration(Part, 0))
+                       : State->get(PhiR, Part);
+      Value *Val =
----------------
Storing per-part 0 rather than per-lane 0 and common base class should lead to a common State->get(PhiR, Part) for all, so the loop becomes

```
    for (unsigned Part = 0; Part < NumPartsForNewPhi; ++Part) {
      Value *Phi = State->get(PhiR, Part);
      Value *Val = State->get(BackedgeValue, IsSinglePart ? SinglePart : Part);
      cast<PHINode>(Phi)->addIncoming(Val, VectorLatchBB);
    }
```


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:1322
+  EntryPart->setDebugLoc(DL);
+  State.set(this, EntryPart, VPIteration(0, 0));
+}
----------------
fhahn wrote:
> Ayal wrote:
> > Store the Phi value per-part 0, rather than per-lane 0? That's how uniform (UAV) scalars are stored, and also how the induction increment is stored. Admittedly there should be a 'once/singleton' option in addition to per-lane and per-part.
> Done!
`    State.set(this, EntryPart, Part);`
and elsewhere?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1628
+
+  /// Generate the canonical vector induction phi of the vector loop.
+  void execute(VPTransformState &State) override;
----------------
"vector" still needs to be dropped too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113223/new/

https://reviews.llvm.org/D113223



More information about the llvm-commits mailing list