[PATCH] D158779: [VPlan] Add active-lane-mask as VPlan-to-VPlan transformation.
Ayal Zaks via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Sep 17 14:37:12 PDT 2023
Ayal added a comment.
Trying to (further) simplify and clarify the logic as it gets refactored.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:139
+ VPValue *createOverflowingOp(unsigned Opcode,
+ std::initializer_list<VPValue *> Operands,
----------------
nit: these “create op”’s should arguably return an op/recipe/vpinstruction rather than a vpvalue, unless they’re expected to potentially fold constants?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8617
// Add the necessary canonical IV and branch recipes required to control the
// loop.
+static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, bool HasNUW,
----------------
nit: canonical IV also serves as a basis for derived IV's.
Note that the CanonicalIVPHI - CanonicalIVIncrement cycle introduced here along with BranchOnCount produce a single value/instruction per vector iteration, with StartV producing a single live-in (scalar) value. In contrast to subsequent IV/AVL recipes producing values per part.
Note that the three recipes are created and introduced into header and latch directly, w/o VPBuilder.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8724
// so this function is better to be conservative, rather than to split
// it up into different VPlans.
bool IVUpdateMayOverflow = false;
----------------
nit: worth a TODO to revisit this note?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8730
DebugLoc DL = getDebugLocFromInstOrOperands(Legal->getPrimaryInduction());
- addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), DL,
- CM.getTailFoldingStyle(IVUpdateMayOverflow));
+ bool HasNUW = CM.getTailFoldingStyle() == TailFoldingStyle::None;
+ addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW, DL);
----------------
This call of getTailFoldingStyle() need not pass IVUpdateMayOverflow?
Suffice to call getTailFoldingStyle() once and record its result in `Style`, to be reused below?
nit: worth a comment explaining why NUW corresponds with not folding tail.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8942
- addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), DebugLoc(),
- CM.getTailFoldingStyle());
+ addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), true,
+ DebugLoc());
----------------
nit: comment that `true` stands for HasNUW and why that is chosen here.
================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2084
}
+ Type *getScalarType() { return getOperand(0)->getLiveInIRValue()->getType(); }
----------------
Where/is this (non-const variant) needed?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:518
}
-
VPScalarIVStepsRecipe *Steps = new VPScalarIVStepsRecipe(ID, BaseIV, Step);
----------------
nit: is this related?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:821
+ auto *CanonicalIVPHI = Plan.getCanonicalIV();
+ VPValue *StartV = CanonicalIVPHI->getOperand(0);
+
----------------
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:823
+
+ // Create the active lane mask instruction in the vplan preheader.
+ auto *CanonicalIVIncrement =
----------------
Place this comment below, before actually creating EntryALM and its Increment-for-part?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:825
+ auto *CanonicalIVIncrement =
+ cast<VPInstruction>(CanonicalIVPHI->getOperand(1));
+ CanonicalIVIncrement->dropPoisonGeneratingFlags();
----------------
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:830
+ // we have to take unrolling into account. Each part needs to start at
+ // Part * VF
+ VPBuilder Builder;
----------------
This depends on how ALM recipe is defined - it could generate a mask per part based on using a single value per vector iteration of its operand, effectively folding the increment-per-part recipe into the ALM recipe. This could first be set up w/o unrolling in mind, i.e., when UF=1, CanonicalIVIncrementForPart is redundant; and then extend to multiple parts, eventually to replicate them in VPlan.
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:833
+ auto *VecPreheader =
+ cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
+ Builder.setInsertPoint(VecPreheader);
----------------
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:834
+ cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
+ Builder.setInsertPoint(VecPreheader);
+
----------------
nit: worth having a constructor of VPBuilder that takes an insertion point as parameter.
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:852
+ {TC}, DL);
+ IncrementValue = CanonicalIVPHI;
+ }
----------------
nit: keep the order of setting IncrementValue and TripCount consistent across both then and else.
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:868
+ // Create the active lane mask for the next iteration of the loop.
+ Builder.setInsertPoint(EB, EB->back().getIterator());
+ auto *CanonicalIVIncrementParts =
----------------
nit: clarify that new Inc-for-part/ALM/Not/BrOnCond sequence is generated before BrOnCount, which is then eliminated.
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:869
+ Builder.setInsertPoint(EB, EB->back().getIterator());
+ auto *CanonicalIVIncrementParts =
+ Builder.createOverflowingOp(VPInstruction::CanonicalIVIncrementForPart,
----------------
`CanonicalIVIncrementParts` is essentially the opcode. Better distinguish it from the other CanonicalIVIncrementForPart above, called `EntryIncrement`. How about `InLoopIncrement`?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:880
+ auto *NotMask = Builder.createNot(ALM, DL);
+ Builder.createNaryOp(VPInstruction::BranchOnCond, {NotMask}, DL);
+ EB->getTerminator()->eraseFromParent();
----------------
Maybe worth having both BranchOnCondTrue and BranchOnCondFalse.
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:908
+
+ // Walk users of the canonical induction and replace all compares of the form
+ // (ICMP_ULE, wide canonical IV, backedge-taken-count) with an
----------------
This def-use walk over all in/direct users of the canonical induction may end up scanning all recipes.
To visit all (ICMP_ULE, wide canonical IV, backedge-taken-count) patterns, suffice to look at all direct users of BTC - looking for ICMP_ULE's whose other operand is a wide canonical IV?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:912
+ vp_for_all_users(Plan.getCanonicalIV(), [&](VPUser *U) {
+ auto *VPI = dyn_cast<VPInstruction>(U);
+ if (!VPI || VPI->getOpcode() != Instruction::ICmp ||
----------------
VPI >> CompareToReplace ?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:924
+ nullptr, "active.lane.mask");
+ cast<VPRecipeBase>(LaneMask->getDefiningRecipe())
+ ->insertAfter(cast<VPWidenCanonicalIVRecipe>(VPI->getOperand(0)));
----------------
Define LaneMask as a RecipeBase instead of VPValue, to avoid VPInstruction->VPValue->VPRecipeBase?
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:814
+
+// Add a VPActiveLaneMaskPHIRecipe to \p Plan.
+static VPActiveLaneMaskPHIRecipe *
----------------
fhahn wrote:
> Ayal wrote:
> > This adds much more than a VPActiveLaneMaskPHIRecipe. One part adds recipes in preheader and this phi, returning it for further/any use. Another part replaces the loop branch with several recipes including an ActiveLaneMask. The two parts are apparently independent of each other.
> Clarified the name & comment. The branch part only adds a Not and Branch recipe, so not sure if it is worth splitting this off?
Worth explaining what is returned.
Worth sketching the Inc-for-part(StartV)/ALM at preheader plus Inc-for-part(CanInc/CanPhi)/ALM both feeding ALM-Phi, the latter also feeding Not/BrOnCond which replaces BrOnCount.
Worth mentioning that StartV/CanPhi/CanInc and their users remain intact, only BrOnCount is replaced, and ALM-phi is returned.
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:917
+ // Walk users of the canonical induction and replace all compares of the form
+ // (ICMP_ULE, wide canonical IV, backedge-taken-count) with an
+ // active-lane-mask.
----------------
fhahn wrote:
> Ayal wrote:
> > How many instances of the same "WidenIV <= BTC" are we expecting?
> > Why replace each with a separate copy of ActiveLaneMask(IV, TC)?
> > Suffice to find_if one exists and process it? Can then verify find_if again does not find another.
> Updated to re-use created lane mask. At the moment there should be only one, but not sure if it is worth relying on that here.
In order to re-use a lane mask, need to make sure it is placed such that it dominates all uses? When its an ALM phi its fine, but here it is placed after a wide canonical IV, which it uses.
Seems clearer to generate a (wide canonical IV and) ALM and place LaneMask in the header at the outset, to be reused by any ICMP_ULE pattern, or collected by DCE later if no such pattern exists(*)? I.e., something like
```
if (UseActiveLaneMaskForControlFlow)
LaneMask = addVPLaneMaskPhiAndUpdateExitBranch();
// Can update exit branch here, subsequently.
else { // generate it here
LaneMask = ...;
}
```
(*) Note that addALM is (currently) called only if tail is folded, so at-least one ICMP_ULE pattern is expected, and could be asserted.
================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.h:72
+ /// active-lane-mask recipe. If \p UseActiveLaneMaskForControlFlow is true,
+ /// introduce an VPActiveLaneMaskPHIRecipe.
+ static void addActiveLaneMask(VPlan &Plan,
----------------
Can more explicitly explain that (ICMP_ULE, WideCanonIIV, BTC) is replaced with (ActiveLaneMask, IV, BTC).
Worth also explaining here about DataAndControlFlowWithoutRuntimeCheck.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D158779/new/
https://reviews.llvm.org/D158779
More information about the llvm-commits
mailing list