[PATCH] D158779: [VPlan] Add active-lane-mask as VPlan-to-VPlan transformation.

Ayal Zaks via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Sep 17 14:37:12 PDT 2023


Ayal added a comment.

Trying to (further) simplify and clarify the logic as it gets refactored.



================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:139
 
+  VPValue *createOverflowingOp(unsigned Opcode,
+                               std::initializer_list<VPValue *> Operands,
----------------
nit: these “create op”’s should arguably return an op/recipe/vpinstruction rather than a vpvalue, unless they’re expected to potentially fold constants?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8617
 // Add the necessary canonical IV and branch recipes required to control the
 // loop.
+static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, bool HasNUW,
----------------
nit: canonical IV also serves as a basis for derived IV's.

Note that the CanonicalIVPHI - CanonicalIVIncrement cycle introduced here along with BranchOnCount produce a single value/instruction per vector iteration, with StartV producing a single live-in (scalar) value. In contrast to subsequent IV/AVL recipes producing values per part.

Note that the three recipes are created and introduced into header and latch directly, w/o VPBuilder.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8724
   // so this function is better to be conservative, rather than to split
   // it up into different VPlans.
   bool IVUpdateMayOverflow = false;
----------------
nit: worth a TODO to revisit this note?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8730
   DebugLoc DL = getDebugLocFromInstOrOperands(Legal->getPrimaryInduction());
-  addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), DL,
-                        CM.getTailFoldingStyle(IVUpdateMayOverflow));
+  bool HasNUW = CM.getTailFoldingStyle() == TailFoldingStyle::None;
+  addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW, DL);
----------------
This call of getTailFoldingStyle() need not pass IVUpdateMayOverflow?
Suffice to call getTailFoldingStyle() once and record its result in `Style`, to be reused below?
nit: worth a comment explaining why NUW corresponds with not folding tail. 


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8942
 
-  addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), DebugLoc(),
-                        CM.getTailFoldingStyle());
+  addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), true,
+                        DebugLoc());
----------------
nit: comment that `true` stands for HasNUW and why that is chosen here.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2084
   }
+  Type *getScalarType() { return getOperand(0)->getLiveInIRValue()->getType(); }
 
----------------
Where/is this (non-const variant) needed?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:518
   }
-
   VPScalarIVStepsRecipe *Steps = new VPScalarIVStepsRecipe(ID, BaseIV, Step);
----------------
nit: is this related?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:821
+  auto *CanonicalIVPHI = Plan.getCanonicalIV();
+  VPValue *StartV = CanonicalIVPHI->getOperand(0);
+
----------------



================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:823
+
+  // Create the active lane mask instruction in the vplan preheader.
+  auto *CanonicalIVIncrement =
----------------
Place this comment below, before actually creating EntryALM and its Increment-for-part?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:825
+  auto *CanonicalIVIncrement =
+      cast<VPInstruction>(CanonicalIVPHI->getOperand(1));
+  CanonicalIVIncrement->dropPoisonGeneratingFlags();
----------------



================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:830
+  // we have to take unrolling into account. Each part needs to start at
+  //   Part * VF
+  VPBuilder Builder;
----------------
This depends on how ALM recipe is defined - it could generate a mask per part based on using a single value per vector iteration of its operand, effectively folding the increment-per-part recipe into the ALM recipe. This could first be set up w/o unrolling in mind, i.e., when UF=1, CanonicalIVIncrementForPart is redundant; and then extend to multiple parts, eventually to  replicate them in VPlan.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:833
+  auto *VecPreheader =
+      cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
+  Builder.setInsertPoint(VecPreheader);
----------------



================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:834
+      cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
+  Builder.setInsertPoint(VecPreheader);
+
----------------
nit: worth having a constructor of VPBuilder that takes an insertion point as parameter.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:852
+                                     {TC}, DL);
+    IncrementValue = CanonicalIVPHI;
+  }
----------------
nit: keep the order of setting IncrementValue and TripCount consistent across both then and else.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:868
+  // Create the active lane mask for the next iteration of the loop.
+  Builder.setInsertPoint(EB, EB->back().getIterator());
+  auto *CanonicalIVIncrementParts =
----------------
nit: clarify that new Inc-for-part/ALM/Not/BrOnCond sequence is generated before BrOnCount, which is then eliminated.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:869
+  Builder.setInsertPoint(EB, EB->back().getIterator());
+  auto *CanonicalIVIncrementParts =
+      Builder.createOverflowingOp(VPInstruction::CanonicalIVIncrementForPart,
----------------
`CanonicalIVIncrementParts` is essentially the opcode. Better distinguish it from the other CanonicalIVIncrementForPart above, called `EntryIncrement`. How about `InLoopIncrement`?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:880
+  auto *NotMask = Builder.createNot(ALM, DL);
+  Builder.createNaryOp(VPInstruction::BranchOnCond, {NotMask}, DL);
+  EB->getTerminator()->eraseFromParent();
----------------
Maybe worth having both BranchOnCondTrue and BranchOnCondFalse.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:908
+
+  // Walk users of the canonical induction and replace all compares of the form
+  // (ICMP_ULE, wide canonical IV, backedge-taken-count) with an
----------------
This def-use walk over all in/direct users of the canonical induction may end up scanning all recipes.
To visit all (ICMP_ULE, wide canonical IV, backedge-taken-count) patterns, suffice to look at all direct users of BTC - looking for ICMP_ULE's whose other operand is a wide canonical IV?




================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:912
+  vp_for_all_users(Plan.getCanonicalIV(), [&](VPUser *U) {
+    auto *VPI = dyn_cast<VPInstruction>(U);
+    if (!VPI || VPI->getOpcode() != Instruction::ICmp ||
----------------
VPI >> CompareToReplace ?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:924
+                                   nullptr, "active.lane.mask");
+      cast<VPRecipeBase>(LaneMask->getDefiningRecipe())
+          ->insertAfter(cast<VPWidenCanonicalIVRecipe>(VPI->getOperand(0)));
----------------
Define LaneMask as a RecipeBase instead of VPValue, to avoid VPInstruction->VPValue->VPRecipeBase?


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:814
+
+// Add a VPActiveLaneMaskPHIRecipe to \p Plan.
+static VPActiveLaneMaskPHIRecipe *
----------------
fhahn wrote:
> Ayal wrote:
> > This adds much more than a VPActiveLaneMaskPHIRecipe. One part adds recipes in preheader and this phi, returning it for further/any use. Another part replaces the loop branch with several recipes including an ActiveLaneMask. The two parts are apparently independent of each other.
> Clarified the name & comment. The branch part only adds a Not and Branch recipe, so not sure if it is worth splitting this off?
Worth explaining what is returned.

Worth sketching the Inc-for-part(StartV)/ALM at preheader plus Inc-for-part(CanInc/CanPhi)/ALM both feeding ALM-Phi, the latter also feeding Not/BrOnCond which replaces BrOnCount.

Worth mentioning that StartV/CanPhi/CanInc and their users remain intact, only BrOnCount is replaced, and ALM-phi is returned.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:917
+  // Walk users of the canonical induction and replace all compares of the form
+  // (ICMP_ULE, wide canonical IV, backedge-taken-count) with an
+  // active-lane-mask.
----------------
fhahn wrote:
> Ayal wrote:
> > How many instances of the same "WidenIV <= BTC" are we expecting?
> > Why replace each with a separate copy of ActiveLaneMask(IV, TC)?
> > Suffice to find_if one exists and process it? Can then verify find_if again does not find another.
> Updated to re-use created lane mask. At the moment there should be only one, but not sure if it is worth relying on that here.
In order to re-use a lane mask, need to make sure it is placed such that it dominates all uses? When its an ALM phi its fine, but here it is placed after a wide canonical IV, which it uses.

Seems clearer to generate a (wide canonical IV and) ALM and place LaneMask in the header at the outset, to be reused by any ICMP_ULE pattern, or collected by DCE later if no such pattern exists(*)? I.e., something like

```
  if (UseActiveLaneMaskForControlFlow)
    LaneMask = addVPLaneMaskPhiAndUpdateExitBranch();
    // Can update exit branch here, subsequently.
  else { // generate it here
    LaneMask = ...;
  }
```

(*) Note that addALM is (currently) called only if tail is folded, so at-least one ICMP_ULE pattern is expected, and could be asserted.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.h:72
+  /// active-lane-mask recipe. If \p UseActiveLaneMaskForControlFlow is true,
+  /// introduce an VPActiveLaneMaskPHIRecipe.
+  static void addActiveLaneMask(VPlan &Plan,
----------------
Can more explicitly explain that (ICMP_ULE, WideCanonIIV, BTC) is replaced with (ActiveLaneMask, IV, BTC).

Worth also explaining here about DataAndControlFlowWithoutRuntimeCheck.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158779/new/

https://reviews.llvm.org/D158779



More information about the llvm-commits mailing list