[PATCH] D158779: [VPlan] Add active-lane-mask as VPlan-to-VPlan transformation.

Florian Hahn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 19 08:22:27 PDT 2023


fhahn added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h:139
 
+  VPValue *createOverflowingOp(unsigned Opcode,
+                               std::initializer_list<VPValue *> Operands,
----------------
Ayal wrote:
> nit: these “create op”’s should arguably return an op/recipe/vpinstruction rather than a vpvalue, unless they’re expected to potentially fold constants?
Updated for the new one, can adjust the others as follow-up cleanup, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8617
 // Add the necessary canonical IV and branch recipes required to control the
 // loop.
+static void addCanonicalIVRecipes(VPlan &Plan, Type *IdxTy, bool HasNUW,
----------------
Ayal wrote:
> nit: canonical IV also serves as a basis for derived IV's.
> 
> Note that the CanonicalIVPHI - CanonicalIVIncrement cycle introduced here along with BranchOnCount produce a single value/instruction per vector iteration, with StartV producing a single live-in (scalar) value. In contrast to subsequent IV/AVL recipes producing values per part.
> 
> Note that the three recipes are created and introduced into header and latch directly, w/o VPBuilder.
Should the comment be clarified separately with more details here,  after landing the change now that it is much simpler to spell out what is going on?


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8724
   // so this function is better to be conservative, rather than to split
   // it up into different VPlans.
   bool IVUpdateMayOverflow = false;
----------------
Ayal wrote:
> nit: worth a TODO to revisit this note?
Can do separately, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8730
   DebugLoc DL = getDebugLocFromInstOrOperands(Legal->getPrimaryInduction());
-  addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), DL,
-                        CM.getTailFoldingStyle(IVUpdateMayOverflow));
+  bool HasNUW = CM.getTailFoldingStyle() == TailFoldingStyle::None;
+  addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), HasNUW, DL);
----------------
Ayal wrote:
> This call of getTailFoldingStyle() need not pass IVUpdateMayOverflow?
> Suffice to call getTailFoldingStyle() once and record its result in `Style`, to be reused below?
> nit: worth a comment explaining why NUW corresponds with not folding tail. 
Adjusted and added a comment, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8942
 
-  addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), DebugLoc(),
-                        CM.getTailFoldingStyle());
+  addCanonicalIVRecipes(*Plan, Legal->getWidestInductionType(), true,
+                        DebugLoc());
----------------
Ayal wrote:
> nit: comment that `true` stands for HasNUW and why that is chosen here.
Commented and use variable to attach name, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2084
   }
+  Type *getScalarType() { return getOperand(0)->getLiveInIRValue()->getType(); }
 
----------------
Ayal wrote:
> Where/is this (non-const variant) needed?
Not needed in the current version. Removed, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:518
   }
-
   VPScalarIVStepsRecipe *Steps = new VPScalarIVStepsRecipe(ID, BaseIV, Step);
----------------
Ayal wrote:
> nit: is this related?
Removed , thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:821
+  auto *CanonicalIVPHI = Plan.getCanonicalIV();
+  VPValue *StartV = CanonicalIVPHI->getOperand(0);
+
----------------
Ayal wrote:
> 
Adjusted, thanks! Also update to use `getBackedgeValue` below.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:823
+
+  // Create the active lane mask instruction in the vplan preheader.
+  auto *CanonicalIVIncrement =
----------------
Ayal wrote:
> Place this comment below, before actually creating EntryALM and its Increment-for-part?
Moved, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:825
+  auto *CanonicalIVIncrement =
+      cast<VPInstruction>(CanonicalIVPHI->getOperand(1));
+  CanonicalIVIncrement->dropPoisonGeneratingFlags();
----------------
Ayal wrote:
> 
Adjusted, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:830
+  // we have to take unrolling into account. Each part needs to start at
+  //   Part * VF
+  VPBuilder Builder;
----------------
Ayal wrote:
> This depends on how ALM recipe is defined - it could generate a mask per part based on using a single value per vector iteration of its operand, effectively folding the increment-per-part recipe into the ALM recipe. This could first be set up w/o unrolling in mind, i.e., when UF=1, CanonicalIVIncrementForPart is redundant; and then extend to multiple parts, eventually to  replicate them in VPlan.
I've a set of patches that introduce explicit unrolling, will share that soon. To keep the current patch as simple as possible, it would be great to keep the increment recipe as is for now.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:833
+  auto *VecPreheader =
+      cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
+  Builder.setInsertPoint(VecPreheader);
----------------
Ayal wrote:
> 
Adjusted, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:834
+      cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
+  Builder.setInsertPoint(VecPreheader);
+
----------------
Ayal wrote:
> nit: worth having a constructor of VPBuilder that takes an insertion point as parameter.
Added, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:852
+                                     {TC}, DL);
+    IncrementValue = CanonicalIVPHI;
+  }
----------------
Ayal wrote:
> nit: keep the order of setting IncrementValue and TripCount consistent across both then and else.
reordered, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:868
+  // Create the active lane mask for the next iteration of the loop.
+  Builder.setInsertPoint(EB, EB->back().getIterator());
+  auto *CanonicalIVIncrementParts =
----------------
Ayal wrote:
> nit: clarify that new Inc-for-part/ALM/Not/BrOnCond sequence is generated before BrOnCount, which is then eliminated.
Expanded comment and also clarified the code by using `EB->getTerminator()`.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:869
+  Builder.setInsertPoint(EB, EB->back().getIterator());
+  auto *CanonicalIVIncrementParts =
+      Builder.createOverflowingOp(VPInstruction::CanonicalIVIncrementForPart,
----------------
Ayal wrote:
> `CanonicalIVIncrementParts` is essentially the opcode. Better distinguish it from the other CanonicalIVIncrementForPart above, called `EntryIncrement`. How about `InLoopIncrement`?
Renamed, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:880
+  auto *NotMask = Builder.createNot(ALM, DL);
+  Builder.createNaryOp(VPInstruction::BranchOnCond, {NotMask}, DL);
+  EB->getTerminator()->eraseFromParent();
----------------
Ayal wrote:
> Maybe worth having both BranchOnCondTrue and BranchOnCondFalse.
Maybe it would be better to just have `BranchOnCond` to use the blocks `Successor` to set destinations for the branch, then this can be controlled by adjusting the order of successors? In any case, would be good as follow up.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:908
+
+  // Walk users of the canonical induction and replace all compares of the form
+  // (ICMP_ULE, wide canonical IV, backedge-taken-count) with an
----------------
Ayal wrote:
> This def-use walk over all in/direct users of the canonical induction may end up scanning all recipes.
> To visit all (ICMP_ULE, wide canonical IV, backedge-taken-count) patterns, suffice to look at all direct users of BTC - looking for ICMP_ULE's whose other operand is a wide canonical IV?
> 
> 
Adjusted, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:912
+  vp_for_all_users(Plan.getCanonicalIV(), [&](VPUser *U) {
+    auto *VPI = dyn_cast<VPInstruction>(U);
+    if (!VPI || VPI->getOpcode() != Instruction::ICmp ||
----------------
Ayal wrote:
> VPI >> CompareToReplace ?
Renamed, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:924
+                                   nullptr, "active.lane.mask");
+      cast<VPRecipeBase>(LaneMask->getDefiningRecipe())
+          ->insertAfter(cast<VPWidenCanonicalIVRecipe>(VPI->getOperand(0)));
----------------
Ayal wrote:
> Define LaneMask as a RecipeBase instead of VPValue, to avoid VPInstruction->VPValue->VPRecipeBase?
Adjusted, thanks! Trade-off is the need to use `getVPSingleValue`. Really need a base class for recipes defining a single VPValue....


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.h:72
+  /// active-lane-mask recipe. If \p UseActiveLaneMaskForControlFlow is true,
+  /// introduce an VPActiveLaneMaskPHIRecipe.
+  static void addActiveLaneMask(VPlan &Plan,
----------------
Ayal wrote:
> Can more explicitly explain that (ICMP_ULE, WideCanonIIV, BTC) is replaced with (ActiveLaneMask, IV, BTC).
> 
> Worth also explaining here about DataAndControlFlowWithoutRuntimeCheck.
Expanded comment, thanks! Also added that `DataAndControlFlowWithoutRuntimeCheck` implies `UseActiveLaneMaskForControlFlow` ( and added an assert)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158779/new/

https://reviews.llvm.org/D158779



More information about the llvm-commits mailing list