[PATCH] D113223: [VPlan] Add VPCanonicalIVRecipe, partly retire createInductionVariable.

Mon Dec 27 04:14:35 PST 2021

fhahn marked an inline comment as done.
fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8938
+
+  auto *PrimaryInd = new VPCanonicalIVRecipe(StartV, DL);
+  VPRegionBlock *TopRegion = Plan.getVectorLoopRegion();
----------------
Ayal wrote:
> PrimaryInd - let's settle for CanonicalIV throughout?
Thanks, I missed this instance. Updated!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8950
+  } else
+    cast<VPBasicBlock>(Header)->appendRecipe(PrimaryInd);
+
----------------
Ayal wrote:
> fhahn wrote:
> > fhahn wrote:
> > > Ayal wrote:
> > > > In both cases PrimaryInd (CanonicalIV?) should be inserted as the first recipe of Header?
> > > Yes, but unfortunately in the native path it looks like there's a pre-header block already. Perhaps we should make sure the native path also wraps the vector loop in a region.
> > I looked into this and wrapping the loops in the native path into regions is a bit more tricky, because of the explicit edge from latch to header; this means the header cannot simply be used as entry block to the loop region. Once we have branch-exit instructions, we should be able to address this.
> Sorry, didn't mean to modify the native path, just wondering if something like this would work:
> 
> ```
>   if (IsVPlanNative)
>     Header = Header->getSingleSuccessor();
>   CanonicalIV->insertBefore(&*cast<VPBasicBlock>(Header)->begin());
> ```
Ah I see, thanks! Updated!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:902
+  for (VPRecipeBase &R : Header->phis()) {
+    auto *PhiR = dyn_cast<VPWidenPHIRecipe>(&R);
+    if (!isa<VPCanonicalIVRecipe>(&R) &&
----------------
Ayal wrote:
> Have VPCanonicalIVRecipe also inherit from VPWidenPHIRecipe, possibly renaming the latter VPHeaderPHIRecipe, so that it too could benefit from getBackedgeValue()?
I think just renaming `VPWidenPHIRecipe` may not be the best way forward, as the VPCanonicalIVPRecipe will stay scalar. I put up D116304 that moves the common code to a new abstract base class `VPHeaderPHIRecipe`.

I also renamed the recipe to `VPCanonicalIVPHIRecipe`, so it fits with the other phi recipes.

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:919
+      auto *ICmp = cast<Instruction>(
+          State->Builder.CreateICmpEQ(Next, State->VectorTripCount));
+      TermBr->setCondition(ICmp);
----------------
Ayal wrote:
> Introducing the ICmpEQ feeding TermBr should ideally be done by an appropriate (BCT) recipe in the normal course of VPlan::execute() stage 2 above, rather than during this backedge-rewiring stage 3.
I added a TODO and this should be addressed soon in a follow-up patch. Or should those changes also be pulled in here?

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:928
+      LastPartForNewPhi = SinglePartNeeded ? 1 : State->UF;
+      BackedgeValue = PhiR->getBackedgeValue();
+    }
----------------
Ayal wrote:
> This hopefully could fold into
> 
> ```
>       // Is a single part fed across the backedge, or all UF parts.
>       IsSinglePart = isa<VPFirstOrderRecurrencePHIRecipe>(&R) ||
>                      isa<VPCanonicalIVRecipe>(&R) ||
>                      cast<VPReductionPHIRecipe>(&R)->isOrdered();
>       SinglePart = isa<VPCanonicalIVRecipe>(&R) ? 0 : State->UF-1;
>       NumPartsForNewPhi = IsSinglePart ? 1 : State->UF;
>       BackedgeValue = PhiR->getBackedgeValue();
> ```
> Note that if a single part is fed across the backedge, it must be the last part UF-1 for ordered reductions and FOR, but only the first part 0 is set for the Canonical IV backedge value (it could also set part UF-1...).
Updated, thanks!

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:933
+                       ? State->get(R.getVPSingleValue(), VPIteration(Part, 0))
+                       : State->get(PhiR, Part);
+      Value *Val =
----------------
Ayal wrote:
> Storing per-part 0 rather than per-lane 0 and common base class should lead to a common State->get(PhiR, Part) for all, so the loop becomes
> 
> ```
>     for (unsigned Part = 0; Part < NumPartsForNewPhi; ++Part) {
>       Value *Phi = State->get(PhiR, Part);
>       Value *Val = State->get(BackedgeValue, IsSinglePart ? SinglePart : Part);
>       cast<PHINode>(Phi)->addIncoming(Val, VectorLatchBB);
>     }
> ```
Storing a scalar value in the 'vector' part of State seems not ideal, but it does indeed simplify things a lot here. Updated

================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:1322
+  EntryPart->setDebugLoc(DL);
+  State.set(this, EntryPart, VPIteration(0, 0));
+}
----------------
Ayal wrote:
> fhahn wrote:
> > Ayal wrote:
> > > Store the Phi value per-part 0, rather than per-lane 0? That's how uniform (UAV) scalars are stored, and also how the induction increment is stored. Admittedly there should be a 'once/singleton' option in addition to per-lane and per-part.
> > Done!
> `    State.set(this, EntryPart, Part);`
> and elsewhere?
Updated

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113223/new/

https://reviews.llvm.org/D113223