[PATCH] D113223: [VPlan] Add VPCanonicalIVRecipe, partly retire createInductionVariable.
Ayal Zaks via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Dec 25 03:21:17 PST 2021
Ayal added a comment.
General direction is fine! Sharing thoughts and specific suggestions.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:510
/// is provided, the integer induction variable will first be truncated to
/// the corresponding type.
void widenIntOrFpInduction(PHINode *IV, const InductionDescriptor &ID,
----------------
Document \p PrimInd
Rename PrimInd to CanonicalIV or vice-versa rename CanonicalIV to PrimaryIV, throughout?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:575
BasicBlock *MiddleBlock);
+ void createLatchTerminator(Loop *L);
----------------
Worth documenting createLatchTerminator()?
Is it really needed, or can branch be created (terminator replaced) along with its compare?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:770
PHINode *OldInduction = nullptr;
+ Value *IVStartValue = nullptr;
----------------
Document IVStartValue, which being canonical is zero by default (if left as null), but can be set to some other offset to support epilog loop vectorization?
Rename CanonicalIVStartValue?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1039
/// Return a value for Step multiplied by VF.
static Value *createStepForVF(IRBuilder<> &B, Type *Ty, ElementCount VF,
int64_t Step) {
----------------
Same createStepForVF() function now implemented also in VPlan.cpp?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:1132
if (isa<VPWidenMemoryInstructionRecipe>(CurRec) ||
- isa<VPInterleaveRecipe>(CurRec))
+ isa<VPInterleaveRecipe>(CurRec) || isa<VPCanonicalIVRecipe>(CurRec))
continue;
----------------
(Backward slice traversal should stop/continue at any header phi recipe? Possibly in a potential separate patch)
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2445
auto *LoopVectorLatch = LI->getLoopFor(LoopVectorBody)->getLoopLatch();
auto *Br = cast<BranchInst>(LoopVectorLatch->getTerminator());
+ LastInduction->moveBefore(Br);
----------------
(This change is required because the condition is now being created later.)
(When this last step is given a separate recipe, it should be placed at the end of the latch VPBB, rather than sinking it here during code-gen.)
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2711
+ ? Builder.CreateSExtOrTrunc(PrimInd, IV->getType())
+ : Builder.CreateCast(Instruction::SIToFP, PrimInd, IV->getType());
ScalarIV = emitTransformedIndex(Builder, ScalarIV, PSE.getSE(), DL, ID,
----------------
Use ScalarIV instead of PrimInd above, similar to resetting ScalarIV below?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3120
- // Create i+1 and fill the PHINode.
- //
- // If the tail is not folded, we know that End - Start >= Step (either
- // statically or through the minimum iteration checks). We also know that both
- // Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
- // %Step == %End. Hence we must exit the loop before %IV + %Step unsigned
- // overflows and we can mark the induction increment as NUW.
- Value *Next = B.CreateAdd(Induction, Step, "index.next",
- /*NUW=*/!Cost->foldTailByMasking(), /*NSW=*/false);
- Induction->addIncoming(Start, L->getLoopPreheader());
- Induction->addIncoming(Next, Latch);
// Create the compare.
+ B.CreateCondBr(B.getTrue(), L->getUniqueExitBlock(), Header);
----------------
Above comment is obsolete.
Create the latch terminator later along with its compare? Seems more like "replace" latch terminator than "create" it?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3637
// The loop step is equal to the vectorization factor (num of SIMD elements)
// times the unroll factor (num of SIMD instructions).
Builder.SetInsertPoint(&*Lp->getHeader()->getFirstInsertionPt());
----------------
Above comment should be moved along with createStepForVF?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3638
// times the unroll factor (num of SIMD instructions).
Builder.SetInsertPoint(&*Lp->getHeader()->getFirstInsertionPt());
Value *CountRoundDown = getOrCreateVectorTripCount(Lp);
----------------
Still need to set insertion point here?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3639
Builder.SetInsertPoint(&*Lp->getHeader()->getFirstInsertionPt());
- Value *Step = createStepForVF(Builder, IdxTy, VF, UF);
Value *CountRoundDown = getOrCreateVectorTripCount(Lp);
+ createLatchTerminator(Lp);
----------------
Set CountRoundDown next to its use, or have createInductionResumeValues() get or create the vector trip count.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7956
+ VPBasicBlock *Header = BestVPlan.getEntry()->getEntryBasicBlock();
+ auto *IV = cast<VPCanonicalIVRecipe>(&*Header->begin());
+ VPValue *VPV = new VPValue(ILV.IVStartValue);
----------------
Worth explaining that if IVStartValue is not set then CanonicalIV will use zero as its start value, as set initially by its constructor. Or perhaps set it to zero here inside an 'else'?
`IV = BestPlan->getCanonicalIV();`?
This seems to be better placed at the end of stage 1, than beginning of stage 2? Or stage -1 of VPlan::execute()
Slightly better to first wrap IVStartValue with VPV and add it as an externalDef, and then look up IV to finally replace its first operand with VPV?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8935
+static void addInduction(VPlan &Plan, Type *IdxTy, Instruction *DLInst,
+ bool HasNUW, bool IsVPlanNative) {
----------------
addInduction >> addCanonicalIV? (starting from zero by default; and add its increment)
Pass a DebugLoc instead of DLInst, as it is passed to recipe constructor?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8950
+ } else
+ cast<VPBasicBlock>(Header)->appendRecipe(PrimaryInd);
+
----------------
In both cases PrimaryInd (CanonicalIV?) should be inserted as the first recipe of Header?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8952
+
+ auto *InductionIncrement = cast<VPInstruction>(
+ new VPInstruction(HasNUW ? VPInstruction::InductionIncrementNUW
----------------
Is the cast needed?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8961
+ EB = cast<VPBasicBlock>(EB->getSinglePredecessor());
+ VPBasicBlock *Exit = cast<VPBasicBlock>(EB);
+ Exit->appendRecipe(InductionIncrement);
----------------
is the cast needed?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:663
switch (getOpcode()) {
+ case Instruction::PHI: {
+ break;
----------------
Which VPInstruction uses this opcode and should be ignored here?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:728
+ case VPInstruction::InductionIncrementNUW: {
+ if (Part == 0) {
+ bool IsNUW = getOpcode() == VPInstruction::InductionIncrementNUW;
----------------
early-break if Part != 0
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:734
+ Value *Next = Builder.CreateAdd(Phi, Step, "index.next", IsNUW, false);
+ State.set(this, Next, Part);
+ }
----------------
Only user is Phi, can hook-up back to it now?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:824
void VPlan::execute(VPTransformState *State) {
// -1. Check if the backedge taken count is needed, and if so build it.
if (BackedgeTakenCount && BackedgeTakenCount->getNumUsers()) {
----------------
Should all invariant values that are generated in the pre-header be modelled as VPValues similar to BackedgeTakenCount, until the preheader is modelled as VPBB with recipes? I.e., treat VectorTripCount, [Canonical]IVStartValue and "CanonicalIVStepValue" (returned by createStepForVF()) similar to VPlan's BackedgeTakenCount, which is hooked-up to its State->TripCount IR Value here?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:903
+ // Fix the latch value of reduction and first-order recurrences phis in the
+ // vector loop.
----------------
and canonical IV phis
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:910
+ }
+ for (VPRecipeBase &R : Header->phis()) {
+ if (auto *Ind = dyn_cast<VPCanonicalIVRecipe>(&R)) {
----------------
Start the loop with the early-continue?
```
auto *PhiR = dyn_cast<VPWidenPHIRecipe>(&R);
if (!PhiR || !(isa<VPCanonicalIVRecipe>(&R) ||
isa<VPFirstOrderRecurrencePHIRecipe>(&R) ||
isa<VPReductionPHIRecipe>(&R)))
continue;
```
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:914
+ auto *P = cast<PHINode>(State->get(Ind, VPIteration(0, 0)));
+ P->setName("index");
+ auto *LatchBB = State->LI->getLoopFor(State->CFG.PrevBB)->getLoopLatch();
----------------
The name was already set to "index" initially when `Ind` executed and generated P?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:916
+ auto *LatchBB = State->LI->getLoopFor(State->CFG.PrevBB)->getLoopLatch();
+ P->addIncoming(State->get(BackedgeValue, 0), LatchBB);
+ auto *Next = cast<Instruction>(P->getIncomingValueForBlock(LatchBB));
----------------
Fold addIncoming into that of FOR and Reductions below? For Canonical IV's, `SinglePartNeeded` should be true, but `Val` needs to get `Part` (i.e., 0, rather than UF-1).
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:921
+ auto *ICmp = cast<Instruction>(
+ State->Builder.CreateICmpEQ(Next, State->VectorTripCount));
+ TermBr->setCondition(ICmp);
----------------
Would it be simpler to have a recipe that takes care of generating the canonical IV increment, the compare, and the branch? Rather than have Increment[NUW] VPInstructions take care of the former only, have the compare generated here at stage 3, and have the branch generated by createLatchTerminator() at the outset when building the skeleton?
I.e., have VPCanonicalIVRecipe be the first recipe in a (non-replicating) region, and say VPBranchOnCountRecipe be the last - analogous to BCT instructions and complementing VPBranchOnMaskRecipe?
VPCanonicalIVRecipe can alternatively be named VPLoopCountPhiRecipe.
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:1322
+ EntryPart->setDebugLoc(DL);
+ State.set(this, EntryPart, VPIteration(0, 0));
+}
----------------
Store the Phi value per-part 0, rather than per-lane 0? That's how uniform (UAV) scalars are stored, and also how the induction increment is stored. Admittedly there should be a 'once/singleton' option in addition to per-lane and per-part.
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:350
+ Value *VectorTripCount;
+
----------------
Document, place next to TripCount?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1605
+class VPCanonicalIVRecipe : public VPRecipeBase, public VPValue {
+ DebugLoc DL;
----------------
Document
Relationship between VPCanonicalIVRecipe and VPWidenCanonicalIVRecipe? This one is the VP[Scalar]CanonicalIVRecipe?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:1620
+
+ /// Generate a canonical vector induction variable of the vector loop, with
+ void execute(VPTransformState &State) override;
----------------
"a canonical [vector] induction"
with what?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2297
+
+ VPCanonicalIVRecipe *getCanonicalIV() {
+ VPBasicBlock *EntryVPBB = getEntry()->getEntryBasicBlock();
----------------
Start by calling getVectorLoopRegion()?
Associate getCanonicalIV() with a (non replicating) Region, instead of 'entire' VPlan?
Cache a pointer to VPCanonicalIVRecipe in Region?
(IV's of replicating regions are eliminated by complete unrolling.)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D113223/new/
https://reviews.llvm.org/D113223
More information about the llvm-commits
mailing list