[llvm] [VPlan] Refactor VPlan creation, add transform introducing region (NFC). (PR #128419)

Florian Hahn via llvm-commits llvm-commits at lists.llvm.org
Sun Mar 9 08:05:09 PDT 2025


https://github.com/fhahn updated https://github.com/llvm/llvm-project/pull/128419

>From 4a3ec74861317c64359bdf5222ec96f64af16fe1 Mon Sep 17 00:00:00 2001
From: Florian Hahn <flo at fhahn.com>
Date: Sun, 23 Feb 2025 10:16:00 +0000
Subject: [PATCH 1/6] [VPlan] Refactor VPlan creation, add transform
 introducing region (NFC).

Create an empty VPlan first, then let the HCFG builder create a plain
CFG for the top-level loop (w/o a top-level region). The top-level
region is introduced by a separate VPlan-transform. This is instead
of creating the vector loop region before building the VPlan CFG
for the input loop.

This simplifies the HCFG builder (which should probably be renamed) and
moves along the roadmap ('buildLoop') outlined in [1].

As follow-up, I plan to also preserve the exit branches in the initial
VPlan out of the CFG builder, including connections to the exit blocks.

The conversion from plain CFG with potentially multiple exits to a
single entry/exit region will be done as VPlan transform in a follow-up.

This is needed to enable VPlan-based predication. Currently early exit
support relies on building the block-in masks on the original CFG,
because exiting branches and conditions aren't preserved in the VPlan.
So in order to switch to VPlan-based predication, we will have to
preserve them in the initial plain CFG, so the exit conditions are
available explicitly when we convert to single entry/exit regions.

Another follow-up is updating the outer loop handling to also introduce
VPRegionBlocks for nested loops as transform. Currently the existing
logic in the builder will take care of creating VPRegionBlocks for
nested loops, but not the top-level loop.

[1] https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf
---
 .../Transforms/Vectorize/LoopVectorize.cpp    | 16 ++--
 llvm/lib/Transforms/Vectorize/VPlan.cpp       | 91 ++-----------------
 llvm/lib/Transforms/Vectorize/VPlan.h         | 17 +---
 .../Transforms/Vectorize/VPlanHCFGBuilder.cpp | 53 ++++-------
 .../Transforms/Vectorize/VPlanTransforms.cpp  | 76 ++++++++++++++++
 .../Transforms/Vectorize/VPlanTransforms.h    | 15 +++
 .../Transforms/Vectorize/VPlanTestBase.h      |  6 +-
 7 files changed, 133 insertions(+), 141 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 226fc23888f02..71ffa7f7a8516 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -9312,14 +9312,15 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
             return !CM.requiresScalarEpilogue(VF.isVector());
           },
           Range);
-  VPlanPtr Plan = VPlan::createInitialVPlan(Legal->getWidestInductionType(),
-                                            PSE, RequiresScalarEpilogueCheck,
-                                            CM.foldTailByMasking(), OrigLoop);
-
+  auto Plan = std::make_unique<VPlan>(OrigLoop);
   // Build hierarchical CFG.
   VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
   HCFGBuilder.buildHierarchicalCFG();
 
+  VPlanTransforms::introduceTopLevelVectorLoopRegion(
+      *Plan, Legal->getWidestInductionType(), PSE, RequiresScalarEpilogueCheck,
+      CM.foldTailByMasking(), OrigLoop);
+
   // Don't use getDecisionAndClampRange here, because we don't know the UF
   // so this function is better to be conservative, rather than to split
   // it up into different VPlans.
@@ -9615,13 +9616,14 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
   assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");
 
   // Create new empty VPlan
-  auto Plan = VPlan::createInitialVPlan(Legal->getWidestInductionType(), PSE,
-                                        true, false, OrigLoop);
-
+  auto Plan = std::make_unique<VPlan>(OrigLoop);
   // Build hierarchical CFG
   VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
   HCFGBuilder.buildHierarchicalCFG();
 
+  VPlanTransforms::introduceTopLevelVectorLoopRegion(
+      *Plan, Legal->getWidestInductionType(), PSE, true, false, OrigLoop);
+
   for (ElementCount VF : Range)
     Plan->addVF(VF);
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index d3c195d4a70ea..6aee9514ff8ed 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -880,85 +880,6 @@ VPlan::~VPlan() {
     delete BackedgeTakenCount;
 }
 
-VPlanPtr VPlan::createInitialVPlan(Type *InductionTy,
-                                   PredicatedScalarEvolution &PSE,
-                                   bool RequiresScalarEpilogueCheck,
-                                   bool TailFolded, Loop *TheLoop) {
-  auto Plan = std::make_unique<VPlan>(TheLoop);
-  VPBlockBase *ScalarHeader = Plan->getScalarHeader();
-
-  // Connect entry only to vector preheader initially. Entry will also be
-  // connected to the scalar preheader later, during skeleton creation when
-  // runtime guards are added as needed. Note that when executing the VPlan for
-  // an epilogue vector loop, the original entry block here will be replaced by
-  // a new VPIRBasicBlock wrapping the entry to the epilogue vector loop after
-  // generating code for the main vector loop.
-  VPBasicBlock *VecPreheader = Plan->createVPBasicBlock("vector.ph");
-  VPBlockUtils::connectBlocks(Plan->getEntry(), VecPreheader);
-
-  // Create SCEV and VPValue for the trip count.
-  // We use the symbolic max backedge-taken-count, which works also when
-  // vectorizing loops with uncountable early exits.
-  const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
-  assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
-         "Invalid loop count");
-  ScalarEvolution &SE = *PSE.getSE();
-  const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
-                                                       InductionTy, TheLoop);
-  Plan->TripCount =
-      vputils::getOrCreateVPValueForSCEVExpr(*Plan, TripCount, SE);
-
-  // Create VPRegionBlock, with empty header and latch blocks, to be filled
-  // during processing later.
-  VPBasicBlock *HeaderVPBB = Plan->createVPBasicBlock("vector.body");
-  VPBasicBlock *LatchVPBB = Plan->createVPBasicBlock("vector.latch");
-  VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
-  auto *TopRegion = Plan->createVPRegionBlock(
-      HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
-
-  VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
-  VPBasicBlock *MiddleVPBB = Plan->createVPBasicBlock("middle.block");
-  VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
-
-  VPBasicBlock *ScalarPH = Plan->createVPBasicBlock("scalar.ph");
-  VPBlockUtils::connectBlocks(ScalarPH, ScalarHeader);
-  if (!RequiresScalarEpilogueCheck) {
-    VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
-    return Plan;
-  }
-
-  // If needed, add a check in the middle block to see if we have completed
-  // all of the iterations in the first vector loop.  Three cases:
-  // 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
-  //    Thus if tail is to be folded, we know we don't need to run the
-  //    remainder and we can set the condition to true.
-  // 2) If we require a scalar epilogue, there is no conditional branch as
-  //    we unconditionally branch to the scalar preheader.  Do nothing.
-  // 3) Otherwise, construct a runtime check.
-  BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
-  VPIRBasicBlock *VPExitBlock = Plan->getExitBlock(IRExitBlock);
-  // The connection order corresponds to the operands of the conditional branch.
-  VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
-  VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
-
-  auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
-  // Here we use the same DebugLoc as the scalar loop latch terminator instead
-  // of the corresponding compare because they may have ended up with
-  // different line numbers and we want to avoid awkward line stepping while
-  // debugging. Eg. if the compare has got a line number inside the loop.
-  VPBuilder Builder(MiddleVPBB);
-  VPValue *Cmp =
-      TailFolded
-          ? Plan->getOrAddLiveIn(ConstantInt::getTrue(
-                IntegerType::getInt1Ty(TripCount->getType()->getContext())))
-          : Builder.createICmp(CmpInst::ICMP_EQ, Plan->getTripCount(),
-                               &Plan->getVectorTripCount(),
-                               ScalarLatchTerm->getDebugLoc(), "cmp.n");
-  Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
-                       ScalarLatchTerm->getDebugLoc());
-  return Plan;
-}
-
 void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,
                              VPTransformState &State) {
   Type *TCTy = TripCountV->getType();
@@ -1135,11 +1056,13 @@ void VPlan::printLiveIns(raw_ostream &O) const {
   }
 
   O << "\n";
-  if (TripCount->isLiveIn())
-    O << "Live-in ";
-  TripCount->printAsOperand(O, SlotTracker);
-  O << " = original trip-count";
-  O << "\n";
+  if (TripCount) {
+    if (TripCount->isLiveIn())
+      O << "Live-in ";
+    TripCount->printAsOperand(O, SlotTracker);
+    O << " = original trip-count";
+    O << "\n";
+  }
 }
 
 LLVM_DUMP_METHOD
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 1f1af7f87e554..7021c0263d06c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3505,21 +3505,6 @@ class VPlan {
     VPBB->setPlan(this);
   }
 
-  /// Create initial VPlan, having an "entry" VPBasicBlock (wrapping
-  /// original scalar pre-header) which contains SCEV expansions that need
-  /// to happen before the CFG is modified (when executing a VPlan for the
-  /// epilogue vector loop, the original entry needs to be replaced by a new
-  /// one); a VPBasicBlock for the vector pre-header, followed by a region for
-  /// the vector loop, followed by the middle VPBasicBlock. If a check is needed
-  /// to guard executing the scalar epilogue loop, it will be added to the
-  /// middle block, together with VPBasicBlocks for the scalar preheader and
-  /// exit blocks. \p InductionTy is the type of the canonical induction and
-  /// used for related values, like the trip count expression.
-  static VPlanPtr createInitialVPlan(Type *InductionTy,
-                                     PredicatedScalarEvolution &PSE,
-                                     bool RequiresScalarEpilogueCheck,
-                                     bool TailFolded, Loop *TheLoop);
-
   /// Prepare the plan for execution, setting up the required live-in values.
   void prepareToExecute(Value *TripCount, Value *VectorTripCount,
                         VPTransformState &State);
@@ -3589,6 +3574,8 @@ class VPlan {
     TripCount = NewTripCount;
   }
 
+  void setTripCount(VPValue *NewTripCount) { TripCount = NewTripCount; }
+
   /// The backedge taken count of the original loop.
   VPValue *getOrCreateBackedgeTakenCount() {
     if (!BackedgeTakenCount)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
index dcf1057b991ee..6499565400724 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
@@ -180,7 +180,7 @@ VPBasicBlock *PlainCFGBuilder::getOrCreateVPBB(BasicBlock *BB) {
 
   // Get or create a region for the loop containing BB.
   Loop *LoopOfBB = LI->getLoopFor(BB);
-  if (!LoopOfBB || !doesContainLoop(LoopOfBB, TheLoop))
+  if (!LoopOfBB || LoopOfBB == TheLoop || !doesContainLoop(LoopOfBB, TheLoop))
     return VPBB;
 
   auto *RegionOfVPBB = Loop2Region.lookup(LoopOfBB);
@@ -353,29 +353,6 @@ void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
 // Main interface to build the plain CFG.
 void PlainCFGBuilder::buildPlainCFG(
     DenseMap<VPBlockBase *, BasicBlock *> &VPB2IRBB) {
-  // 0. Reuse the top-level region, vector-preheader and exit VPBBs from the
-  // skeleton. These were created directly rather than via getOrCreateVPBB(),
-  // revisit them now to update BB2VPBB. Note that header/entry and
-  // latch/exiting VPBB's of top-level region have yet to be created.
-  VPRegionBlock *TheRegion = Plan.getVectorLoopRegion();
-  BasicBlock *ThePreheaderBB = TheLoop->getLoopPreheader();
-  assert((ThePreheaderBB->getTerminator()->getNumSuccessors() == 1) &&
-         "Unexpected loop preheader");
-  auto *VectorPreheaderVPBB =
-      cast<VPBasicBlock>(TheRegion->getSinglePredecessor());
-  // ThePreheaderBB conceptually corresponds to both Plan.getPreheader() (which
-  // wraps the original preheader BB) and Plan.getEntry() (which represents the
-  // new vector preheader); here we're interested in setting BB2VPBB to the
-  // latter.
-  BB2VPBB[ThePreheaderBB] = VectorPreheaderVPBB;
-  Loop2Region[LI->getLoopFor(TheLoop->getHeader())] = TheRegion;
-
-  // The existing vector region's entry and exiting VPBBs correspond to the loop
-  // header and latch.
-  VPBasicBlock *VectorHeaderVPBB = TheRegion->getEntryBasicBlock();
-  VPBasicBlock *VectorLatchVPBB = TheRegion->getExitingBasicBlock();
-  BB2VPBB[TheLoop->getHeader()] = VectorHeaderVPBB;
-  VectorHeaderVPBB->clearSuccessors();
 
   // 1. Scan the body of the loop in a topological order to visit each basic
   // block after having visited its predecessor basic blocks. Create a VPBB for
@@ -386,6 +363,9 @@ void PlainCFGBuilder::buildPlainCFG(
 
   // Loop PH needs to be explicitly visited since it's not taken into account by
   // LoopBlocksDFS.
+  BasicBlock *ThePreheaderBB = TheLoop->getLoopPreheader();
+  assert((ThePreheaderBB->getTerminator()->getNumSuccessors() == 1) &&
+         "Unexpected loop preheader");
   for (auto &I : *ThePreheaderBB) {
     if (I.getType()->isVoidTy())
       continue;
@@ -406,18 +386,16 @@ void PlainCFGBuilder::buildPlainCFG(
     } else {
       // BB is a loop header, set the predecessor for the region, except for the
       // top region, whose predecessor was set when creating VPlan's skeleton.
-      assert(isHeaderVPBB(VPBB) && "isHeaderBB and isHeaderVPBB disagree");
-      if (TheRegion != Region)
+      if (LoopForBB != TheLoop)
         setRegionPredsFromBB(Region, BB);
     }
 
     // Create VPInstructions for BB.
     createVPInstructionsForVPBB(VPBB, BB);
 
-    if (TheLoop->getLoopLatch() == BB) {
-      VPBB->setOneSuccessor(VectorLatchVPBB);
-      VectorLatchVPBB->clearPredecessors();
-      VectorLatchVPBB->setPredecessors({VPBB});
+    if (BB == TheLoop->getLoopLatch()) {
+      VPBasicBlock *HeaderVPBB = getOrCreateVPBB(LoopForBB->getHeader());
+      VPBlockUtils::connectBlocks(VPBB, HeaderVPBB);
       continue;
     }
 
@@ -449,16 +427,22 @@ void PlainCFGBuilder::buildPlainCFG(
     VPBasicBlock *Successor0 = getOrCreateVPBB(IRSucc0);
     VPBasicBlock *Successor1 = getOrCreateVPBB(IRSucc1);
     if (BB == LoopForBB->getLoopLatch()) {
-      // For a latch we need to set the successor of the region rather than that
-      // of VPBB and it should be set to the exit, i.e., non-header successor,
+      // For a latch we need to set the successor of the region rather
+      // than that
+      // of VPBB and it should be set to the exit, i.e., non-header
+      // successor,
       // except for the top region, whose successor was set when creating
       // VPlan's skeleton.
-      assert(TheRegion != Region &&
+      assert(LoopForBB != TheLoop &&
              "Latch of the top region should have been handled earlier");
       Region->setOneSuccessor(isHeaderVPBB(Successor0) ? Successor1
                                                        : Successor0);
       Region->setExiting(VPBB);
       continue;
+
+      VPBasicBlock *HeaderVPBB = getOrCreateVPBB(LoopForBB->getHeader());
+      VPBlockUtils::connectBlocks(VPBB, HeaderVPBB);
+      continue;
     }
 
     // Don't connect any blocks outside the current loop except the latch for
@@ -482,6 +466,9 @@ void PlainCFGBuilder::buildPlainCFG(
   // corresponding VPlan operands.
   fixHeaderPhis();
 
+  VPBlockUtils::connectBlocks(Plan.getEntry(),
+                              getOrCreateVPBB(TheLoop->getHeader()));
+
   for (const auto &[IRBB, VPB] : BB2VPBB)
     VPB2IRBB[VPB] = IRBB;
 }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 13ef3029023f1..7ea4379ac486a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -32,6 +32,82 @@
 
 using namespace llvm;
 
+void VPlanTransforms::introduceTopLevelVectorLoopRegion(
+    VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
+    bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop) {
+  auto *HeaderVPBB = cast<VPBasicBlock>(Plan.getEntry()->getSingleSuccessor());
+  VPBlockUtils::disconnectBlocks(Plan.getEntry(), HeaderVPBB);
+
+  VPBasicBlock *OriginalLatch =
+      cast<VPBasicBlock>(HeaderVPBB->getSinglePredecessor());
+  VPBlockUtils::disconnectBlocks(OriginalLatch, HeaderVPBB);
+  VPBasicBlock *VecPreheader = Plan.createVPBasicBlock("vector.ph");
+  VPBlockUtils::connectBlocks(Plan.getEntry(), VecPreheader);
+
+  // Create SCEV and VPValue for the trip count.
+  // We use the symbolic max backedge-taken-count, which works also when
+  // vectorizing loops with uncountable early exits.
+  const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
+  assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
+         "Invalid loop count");
+  ScalarEvolution &SE = *PSE.getSE();
+  const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
+                                                       InductionTy, TheLoop);
+  Plan.setTripCount(
+      vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));
+
+  // Create VPRegionBlock, with empty header and latch blocks, to be filled
+  // during processing later.
+  VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch");
+  VPBlockUtils::insertBlockAfter(LatchVPBB, OriginalLatch);
+  auto *TopRegion = Plan.createVPRegionBlock(
+      HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
+  for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB)) {
+    VPBB->setParent(TopRegion);
+  }
+
+  VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
+  VPBasicBlock *MiddleVPBB = Plan.createVPBasicBlock("middle.block");
+  VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
+
+  VPBasicBlock *ScalarPH = Plan.createVPBasicBlock("scalar.ph");
+  VPBlockUtils::connectBlocks(ScalarPH, Plan.getScalarHeader());
+  if (!RequiresScalarEpilogueCheck) {
+    VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
+    return;
+  }
+
+  // If needed, add a check in the middle block to see if we have completed
+  // all of the iterations in the first vector loop.  Three cases:
+  // 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
+  //    Thus if tail is to be folded, we know we don't need to run the
+  //    remainder and we can set the condition to true.
+  // 2) If we require a scalar epilogue, there is no conditional branch as
+  //    we unconditionally branch to the scalar preheader.  Do nothing.
+  // 3) Otherwise, construct a runtime check.
+  BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
+  auto *VPExitBlock = Plan.getExitBlock(IRExitBlock);
+  // The connection order corresponds to the operands of the conditional branch.
+  VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
+  VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
+
+  auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
+  // Here we use the same DebugLoc as the scalar loop latch terminator instead
+  // of the corresponding compare because they may have ended up with
+  // different line numbers and we want to avoid awkward line stepping while
+  // debugging. Eg. if the compare has got a line number inside the loop.
+  VPBuilder Builder(MiddleVPBB);
+  VPValue *Cmp =
+      TailFolded
+          ? Plan.getOrAddLiveIn(ConstantInt::getTrue(
+                IntegerType::getInt1Ty(TripCount->getType()->getContext())))
+          : Builder.createICmp(CmpInst::ICMP_EQ, Plan.getTripCount(),
+                               &Plan.getVectorTripCount(),
+                               ScalarLatchTerm->getDebugLoc(), "cmp.n");
+  Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
+                       ScalarLatchTerm->getDebugLoc());
+}
+
 void VPlanTransforms::VPInstructionsToVPRecipes(
     VPlanPtr &Plan,
     function_ref<const InductionDescriptor *(PHINode *)>
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 3dd476a8526d6..d1e825e987848 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -52,6 +52,21 @@ struct VPlanTransforms {
       verifyVPlanIsValid(Plan);
   }
 
+  /// Introduce the top-level VPRegionBlock for the main loop in \p Plan. Coming
+  /// in this function, \p Plan's top-level loop is modeled using a plain CFG.
+  /// This transforms replaces the plain CFG with a VPRegionBlock wrapping the
+  /// top-level loop and creates a VPValue expressions for the original trip
+  /// count. It will also introduce a dedicated VPBasicBlock for the vector
+  /// pre-header as well a VPBasicBlock as exit block of the region
+  /// (middle.block). If a check is needed to guard executing the scalar
+  /// epilogue loop, it will be added to the middle block, together with
+  /// VPBasicBlocks for the scalar preheader and exit blocks. \p InductionTy is
+  /// the type of the canonical induction and used for related values, like the
+  /// trip count expression.
+  static void introduceTopLevelVectorLoopRegion(
+      VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
+      bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop);
+
   /// Replaces the VPInstructions in \p Plan with corresponding
   /// widen recipes.
   static void
diff --git a/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h b/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h
index 8d03e91fb26c3..caf5d2357411d 100644
--- a/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h
+++ b/llvm/unittests/Transforms/Vectorize/VPlanTestBase.h
@@ -14,6 +14,7 @@
 
 #include "../lib/Transforms/Vectorize/VPlan.h"
 #include "../lib/Transforms/Vectorize/VPlanHCFGBuilder.h"
+#include "../lib/Transforms/Vectorize/VPlanTransforms.h"
 #include "llvm/Analysis/AssumptionCache.h"
 #include "llvm/Analysis/BasicAliasAnalysis.h"
 #include "llvm/Analysis/LoopInfo.h"
@@ -70,10 +71,11 @@ class VPlanTestIRBase : public testing::Test {
 
     Loop *L = LI->getLoopFor(LoopHeader);
     PredicatedScalarEvolution PSE(*SE, *L);
-    auto Plan = VPlan::createInitialVPlan(IntegerType::get(*Ctx, 64), PSE, true,
-                                          false, L);
+    auto Plan = std::make_unique<VPlan>(L);
     VPlanHCFGBuilder HCFGBuilder(L, LI.get(), *Plan);
     HCFGBuilder.buildHierarchicalCFG();
+    VPlanTransforms::introduceTopLevelVectorLoopRegion(
+        *Plan, IntegerType::get(*Ctx, 64), PSE, true, false, L);
     return Plan;
   }
 };

>From 99ad49e53680bc23f10b2dbb367dac51e37972ab Mon Sep 17 00:00:00 2001
From: Florian Hahn <flo at fhahn.com>
Date: Sat, 1 Mar 2025 16:53:15 +0000
Subject: [PATCH 2/6] !fixup address latest comments, thanks

---
 .../Transforms/Vectorize/LoopVectorize.cpp    |  2 ++
 llvm/lib/Transforms/Vectorize/VPlan.h         | 10 ++++--
 .../Transforms/Vectorize/VPlanHCFGBuilder.cpp | 33 +++++++------------
 .../Transforms/Vectorize/VPlanTransforms.cpp  |  9 ++---
 .../Transforms/Vectorize/VPlanTransforms.h    |  6 ++--
 5 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 571d54300df65..7825add06e2dc 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -9324,6 +9324,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
   auto Plan = std::make_unique<VPlan>(OrigLoop);
   // Build hierarchical CFG.
   VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
+  // TODO: Convert to VPlan-transform and consoliate all transforms for VPlan
+  // creation.
   HCFGBuilder.buildHierarchicalCFG();
 
   VPlanTransforms::introduceTopLevelVectorLoopRegion(
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index a0c34e92ef1b6..8a6597c9bbcba 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3602,11 +3602,17 @@ class VPlan {
   /// the original trip count have been replaced.
   void resetTripCount(VPValue *NewTripCount) {
     assert(TripCount && NewTripCount && TripCount->getNumUsers() == 0 &&
-           "TripCount always must be set");
+           "TripCount must be set when resetting");
     TripCount = NewTripCount;
   }
 
-  void setTripCount(VPValue *NewTripCount) { TripCount = NewTripCount; }
+  // Set the trip count assuming it is currently null; if it is not - use
+  // resetTripCount().
+  void setTripCount(VPValue *NewTripCount) {
+    assert(!TripCount && NewTripCount && "TripCount should not be set yet.");
+
+    TripCount = NewTripCount;
+  }
 
   /// The backedge taken count of the original loop.
   VPValue *getOrCreateBackedgeTakenCount() {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
index 6499565400724..acc7f28c427ae 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
@@ -178,7 +178,8 @@ VPBasicBlock *PlainCFGBuilder::getOrCreateVPBB(BasicBlock *BB) {
   VPBasicBlock *VPBB = Plan.createVPBasicBlock(Name);
   BB2VPBB[BB] = VPBB;
 
-  // Get or create a region for the loop containing BB.
+  // Get or create a region for the loop containing BB, except for the top
+  // region of TheLoop which is created later.
   Loop *LoopOfBB = LI->getLoopFor(BB);
   if (!LoopOfBB || LoopOfBB == TheLoop || !doesContainLoop(LoopOfBB, TheLoop))
     return VPBB;
@@ -194,12 +195,8 @@ VPBasicBlock *PlainCFGBuilder::getOrCreateVPBB(BasicBlock *BB) {
   assert(!RegionOfVPBB &&
          "First visit of a header basic block expects to register its region.");
   // Handle a header - take care of its Region.
-  if (LoopOfBB == TheLoop) {
-    RegionOfVPBB = Plan.getVectorLoopRegion();
-  } else {
-    RegionOfVPBB = Plan.createVPRegionBlock(Name.str(), false /*isReplicator*/);
-    RegionOfVPBB->setParent(Loop2Region[LoopOfBB->getParentLoop()]);
-  }
+  RegionOfVPBB = Plan.createVPRegionBlock(Name.str(), false /*isReplicator*/);
+  RegionOfVPBB->setParent(Loop2Region[LoopOfBB->getParentLoop()]);
   RegionOfVPBB->setEntry(VPBB);
   Loop2Region[LoopOfBB] = RegionOfVPBB;
   return VPBB;
@@ -383,11 +380,10 @@ void PlainCFGBuilder::buildPlainCFG(
     // Set VPBB predecessors in the same order as they are in the incoming BB.
     if (!isHeaderBB(BB, LoopForBB)) {
       setVPBBPredsFromBB(VPBB, BB);
-    } else {
-      // BB is a loop header, set the predecessor for the region, except for the
-      // top region, whose predecessor was set when creating VPlan's skeleton.
-      if (LoopForBB != TheLoop)
-        setRegionPredsFromBB(Region, BB);
+    } else if (Region) {
+      // BB is a loop header and there's a corresponding region , set the
+      // predecessor for it.
+      setRegionPredsFromBB(Region, BB);
     }
 
     // Create VPInstructions for BB.
@@ -427,22 +423,15 @@ void PlainCFGBuilder::buildPlainCFG(
     VPBasicBlock *Successor0 = getOrCreateVPBB(IRSucc0);
     VPBasicBlock *Successor1 = getOrCreateVPBB(IRSucc1);
     if (BB == LoopForBB->getLoopLatch()) {
-      // For a latch we need to set the successor of the region rather
-      // than that
-      // of VPBB and it should be set to the exit, i.e., non-header
-      // successor,
-      // except for the top region, whose successor was set when creating
-      // VPlan's skeleton.
+      // For a latch we need to set the successor of the region rather than that
+      // of VPBB and it should be set to the exit, i.e., non-header successor,
+      // except for the top region, which is handled elsewhere.
       assert(LoopForBB != TheLoop &&
              "Latch of the top region should have been handled earlier");
       Region->setOneSuccessor(isHeaderVPBB(Successor0) ? Successor1
                                                        : Successor0);
       Region->setExiting(VPBB);
       continue;
-
-      VPBasicBlock *HeaderVPBB = getOrCreateVPBB(LoopForBB->getHeader());
-      VPBlockUtils::connectBlocks(VPBB, HeaderVPBB);
-      continue;
     }
 
     // Don't connect any blocks outside the current loop except the latch for
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 7bc682f83caa0..53bfa86d3ae09 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -35,6 +35,7 @@ using namespace llvm;
 void VPlanTransforms::introduceTopLevelVectorLoopRegion(
     VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
     bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop) {
+  // TODO: Generalize to introduce all loop regions.
   auto *HeaderVPBB = cast<VPBasicBlock>(Plan.getEntry()->getSingleSuccessor());
   VPBlockUtils::disconnectBlocks(Plan.getEntry(), HeaderVPBB);
 
@@ -43,6 +44,7 @@ void VPlanTransforms::introduceTopLevelVectorLoopRegion(
   VPBlockUtils::disconnectBlocks(OriginalLatch, HeaderVPBB);
   VPBasicBlock *VecPreheader = Plan.createVPBasicBlock("vector.ph");
   VPBlockUtils::connectBlocks(Plan.getEntry(), VecPreheader);
+  assert(OriginalLatch->getNumSuccessors() == 0 && "expected no predecessors");
 
   // Create SCEV and VPValue for the trip count.
   // We use the symbolic max backedge-taken-count, which works also when
@@ -56,15 +58,14 @@ void VPlanTransforms::introduceTopLevelVectorLoopRegion(
   Plan.setTripCount(
       vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));
 
-  // Create VPRegionBlock, with empty header and latch blocks, to be filled
-  // during processing later.
+  // Create VPRegionBlock, with existing header and new empty latch block, to be
+  // filled
   VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch");
   VPBlockUtils::insertBlockAfter(LatchVPBB, OriginalLatch);
   auto *TopRegion = Plan.createVPRegionBlock(
       HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
-  for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB)) {
+  for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB))
     VPBB->setParent(TopRegion);
-  }
 
   VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
   VPBasicBlock *MiddleVPBB = Plan.createVPBasicBlock("middle.block");
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index def8a035f9c0c..fab74aafb469c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -53,9 +53,9 @@ struct VPlanTransforms {
   }
 
   /// Introduce the top-level VPRegionBlock for the main loop in \p Plan. Coming
-  /// in this function, \p Plan's top-level loop is modeled using a plain CFG.
-  /// This transforms replaces the plain CFG with a VPRegionBlock wrapping the
-  /// top-level loop and creates a VPValue expressions for the original trip
+  /// into this function, \p Plan's top-level loop is modeled using a plain CFG.
+  /// This transform wraps the plain CFG of the top-level loop within a
+  /// VPRegionBlock and creates a VPValue expressions for the original trip
   /// count. It will also introduce a dedicated VPBasicBlock for the vector
   /// pre-header as well a VPBasicBlock as exit block of the region
   /// (middle.block). If a check is needed to guard executing the scalar

>From f605097ca6df611303f1f5f94bde434ecc3172db Mon Sep 17 00:00:00 2001
From: Florian Hahn <flo at fhahn.com>
Date: Sat, 1 Mar 2025 19:51:49 +0000
Subject: [PATCH 3/6] !fixup move new code to VPlanConstruction.cpp

---
 llvm/lib/Transforms/Vectorize/CMakeLists.txt  |  1 +
 .../Vectorize/VPlanConstruction.cpp           | 98 +++++++++++++++++++
 .../Transforms/Vectorize/VPlanTransforms.cpp  | 77 ---------------
 3 files changed, 99 insertions(+), 77 deletions(-)
 create mode 100644 llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp

diff --git a/llvm/lib/Transforms/Vectorize/CMakeLists.txt b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
index 38670ba304e53..f4e1d7c952675 100644
--- a/llvm/lib/Transforms/Vectorize/CMakeLists.txt
+++ b/llvm/lib/Transforms/Vectorize/CMakeLists.txt
@@ -22,6 +22,7 @@ add_llvm_component_library(LLVMVectorize
   VectorCombine.cpp
   VPlan.cpp
   VPlanAnalysis.cpp
+  VPlanConstruction.cpp
   VPlanHCFGBuilder.cpp
   VPlanRecipes.cpp
   VPlanSLP.cpp
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
new file mode 100644
index 0000000000000..fdc2ac779daed
--- /dev/null
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -0,0 +1,98 @@
+//===-- VPlanConstruction.cpp - Transforms for initial VPlan construction -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file implements transforms for initial VPlan construction
+///
+//===----------------------------------------------------------------------===//
+
+#include "LoopVectorizationPlanner.h"
+#include "VPlan.h"
+#include "VPlanCFG.h"
+#include "VPlanTransforms.h"
+#include "llvm/Analysis/LoopInfo.h"
+#include "llvm/Analysis/ScalarEvolution.h"
+
+using namespace llvm;
+
+void VPlanTransforms::introduceTopLevelVectorLoopRegion(
+    VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
+    bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop) {
+  // TODO: Generalize to introduce all loop regions.
+  auto *HeaderVPBB = cast<VPBasicBlock>(Plan.getEntry()->getSingleSuccessor());
+  VPBlockUtils::disconnectBlocks(Plan.getEntry(), HeaderVPBB);
+
+  VPBasicBlock *OriginalLatch =
+      cast<VPBasicBlock>(HeaderVPBB->getSinglePredecessor());
+  VPBlockUtils::disconnectBlocks(OriginalLatch, HeaderVPBB);
+  VPBasicBlock *VecPreheader = Plan.createVPBasicBlock("vector.ph");
+  VPBlockUtils::connectBlocks(Plan.getEntry(), VecPreheader);
+  assert(OriginalLatch->getNumSuccessors() == 0 && "expected no predecessors");
+
+  // Create SCEV and VPValue for the trip count.
+  // We use the symbolic max backedge-taken-count, which works also when
+  // vectorizing loops with uncountable early exits.
+  const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
+  assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
+         "Invalid loop count");
+  ScalarEvolution &SE = *PSE.getSE();
+  const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
+                                                       InductionTy, TheLoop);
+  Plan.setTripCount(
+      vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));
+
+  // Create VPRegionBlock, with existing header and new empty latch block, to be
+  // filled
+  VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch");
+  VPBlockUtils::insertBlockAfter(LatchVPBB, OriginalLatch);
+  auto *TopRegion = Plan.createVPRegionBlock(
+      HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
+  for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB))
+    VPBB->setParent(TopRegion);
+
+  VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
+  VPBasicBlock *MiddleVPBB = Plan.createVPBasicBlock("middle.block");
+  VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
+
+  VPBasicBlock *ScalarPH = Plan.createVPBasicBlock("scalar.ph");
+  VPBlockUtils::connectBlocks(ScalarPH, Plan.getScalarHeader());
+  if (!RequiresScalarEpilogueCheck) {
+    VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
+    return;
+  }
+
+  // If needed, add a check in the middle block to see if we have completed
+  // all of the iterations in the first vector loop.  Three cases:
+  // 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
+  //    Thus if tail is to be folded, we know we don't need to run the
+  //    remainder and we can set the condition to true.
+  // 2) If we require a scalar epilogue, there is no conditional branch as
+  //    we unconditionally branch to the scalar preheader.  Do nothing.
+  // 3) Otherwise, construct a runtime check.
+  BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
+  auto *VPExitBlock = Plan.getExitBlock(IRExitBlock);
+  // The connection order corresponds to the operands of the conditional branch.
+  VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
+  VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
+
+  auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
+  // Here we use the same DebugLoc as the scalar loop latch terminator instead
+  // of the corresponding compare because they may have ended up with
+  // different line numbers and we want to avoid awkward line stepping while
+  // debugging. Eg. if the compare has got a line number inside the loop.
+  VPBuilder Builder(MiddleVPBB);
+  VPValue *Cmp =
+      TailFolded
+          ? Plan.getOrAddLiveIn(ConstantInt::getTrue(
+                IntegerType::getInt1Ty(TripCount->getType()->getContext())))
+          : Builder.createICmp(CmpInst::ICMP_EQ, Plan.getTripCount(),
+                               &Plan.getVectorTripCount(),
+                               ScalarLatchTerm->getDebugLoc(), "cmp.n");
+  Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
+                       ScalarLatchTerm->getDebugLoc());
+}
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 53bfa86d3ae09..b09933cd0e186 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -32,83 +32,6 @@
 
 using namespace llvm;
 
-void VPlanTransforms::introduceTopLevelVectorLoopRegion(
-    VPlan &Plan, Type *InductionTy, PredicatedScalarEvolution &PSE,
-    bool RequiresScalarEpilogueCheck, bool TailFolded, Loop *TheLoop) {
-  // TODO: Generalize to introduce all loop regions.
-  auto *HeaderVPBB = cast<VPBasicBlock>(Plan.getEntry()->getSingleSuccessor());
-  VPBlockUtils::disconnectBlocks(Plan.getEntry(), HeaderVPBB);
-
-  VPBasicBlock *OriginalLatch =
-      cast<VPBasicBlock>(HeaderVPBB->getSinglePredecessor());
-  VPBlockUtils::disconnectBlocks(OriginalLatch, HeaderVPBB);
-  VPBasicBlock *VecPreheader = Plan.createVPBasicBlock("vector.ph");
-  VPBlockUtils::connectBlocks(Plan.getEntry(), VecPreheader);
-  assert(OriginalLatch->getNumSuccessors() == 0 && "expected no predecessors");
-
-  // Create SCEV and VPValue for the trip count.
-  // We use the symbolic max backedge-taken-count, which works also when
-  // vectorizing loops with uncountable early exits.
-  const SCEV *BackedgeTakenCountSCEV = PSE.getSymbolicMaxBackedgeTakenCount();
-  assert(!isa<SCEVCouldNotCompute>(BackedgeTakenCountSCEV) &&
-         "Invalid loop count");
-  ScalarEvolution &SE = *PSE.getSE();
-  const SCEV *TripCount = SE.getTripCountFromExitCount(BackedgeTakenCountSCEV,
-                                                       InductionTy, TheLoop);
-  Plan.setTripCount(
-      vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));
-
-  // Create VPRegionBlock, with existing header and new empty latch block, to be
-  // filled
-  VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch");
-  VPBlockUtils::insertBlockAfter(LatchVPBB, OriginalLatch);
-  auto *TopRegion = Plan.createVPRegionBlock(
-      HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
-  for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB))
-    VPBB->setParent(TopRegion);
-
-  VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
-  VPBasicBlock *MiddleVPBB = Plan.createVPBasicBlock("middle.block");
-  VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
-
-  VPBasicBlock *ScalarPH = Plan.createVPBasicBlock("scalar.ph");
-  VPBlockUtils::connectBlocks(ScalarPH, Plan.getScalarHeader());
-  if (!RequiresScalarEpilogueCheck) {
-    VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
-    return;
-  }
-
-  // If needed, add a check in the middle block to see if we have completed
-  // all of the iterations in the first vector loop.  Three cases:
-  // 1) If (N - N%VF) == N, then we *don't* need to run the remainder.
-  //    Thus if tail is to be folded, we know we don't need to run the
-  //    remainder and we can set the condition to true.
-  // 2) If we require a scalar epilogue, there is no conditional branch as
-  //    we unconditionally branch to the scalar preheader.  Do nothing.
-  // 3) Otherwise, construct a runtime check.
-  BasicBlock *IRExitBlock = TheLoop->getUniqueLatchExitBlock();
-  auto *VPExitBlock = Plan.getExitBlock(IRExitBlock);
-  // The connection order corresponds to the operands of the conditional branch.
-  VPBlockUtils::insertBlockAfter(VPExitBlock, MiddleVPBB);
-  VPBlockUtils::connectBlocks(MiddleVPBB, ScalarPH);
-
-  auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
-  // Here we use the same DebugLoc as the scalar loop latch terminator instead
-  // of the corresponding compare because they may have ended up with
-  // different line numbers and we want to avoid awkward line stepping while
-  // debugging. Eg. if the compare has got a line number inside the loop.
-  VPBuilder Builder(MiddleVPBB);
-  VPValue *Cmp =
-      TailFolded
-          ? Plan.getOrAddLiveIn(ConstantInt::getTrue(
-                IntegerType::getInt1Ty(TripCount->getType()->getContext())))
-          : Builder.createICmp(CmpInst::ICMP_EQ, Plan.getTripCount(),
-                               &Plan.getVectorTripCount(),
-                               ScalarLatchTerm->getDebugLoc(), "cmp.n");
-  Builder.createNaryOp(VPInstruction::BranchOnCond, {Cmp},
-                       ScalarLatchTerm->getDebugLoc());
-}
-
 void VPlanTransforms::VPInstructionsToVPRecipes(
     VPlanPtr &Plan,
     function_ref<const InductionDescriptor *(PHINode *)>

>From 64751d2525243ddce58aed80b765a37e5884f21e Mon Sep 17 00:00:00 2001
From: Florian Hahn <flo at fhahn.com>
Date: Sat, 1 Mar 2025 21:26:43 +0000
Subject: [PATCH 4/6] !fixup update printing test.

---
 .../Transforms/Vectorize/VPlanHCFGBuilder.cpp |  1 +
 .../vplan-printing-outer-loop.ll              | 71 ++++++-------------
 2 files changed, 22 insertions(+), 50 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
index 47efb47d32995..74477e86e9a0b 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
@@ -23,6 +23,7 @@
 
 #include "VPlanHCFGBuilder.h"
 #include "LoopVectorizationPlanner.h"
+#include "VPlanCFG.h"
 #include "llvm/Analysis/LoopIterator.h"
 
 #define DEBUG_TYPE "loop-vectorize"
diff --git a/llvm/test/Transforms/LoopVectorize/vplan-printing-outer-loop.ll b/llvm/test/Transforms/LoopVectorize/vplan-printing-outer-loop.ll
index 52b2bcd9aac11..625a32c098f94 100644
--- a/llvm/test/Transforms/LoopVectorize/vplan-printing-outer-loop.ll
+++ b/llvm/test/Transforms/LoopVectorize/vplan-printing-outer-loop.ll
@@ -8,62 +8,33 @@
 define void @foo(i64 %n) {
 ; CHECK:      VPlan 'HCFGBuilder: Plain CFG
 ; CHECK-NEXT: {
-; CHECK-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count
-; CHECK-NEXT: Live-in ir<8> = original trip-count
 ; CHECK-EMPTY:
 ; CHECK-NEXT: ir-bb<entry>:
-; CHECK-NEXT: Successor(s): vector.ph
+; CHECK-NEXT: Successor(s): vector.body
 ; CHECK-EMPTY:
-; CHECK-NEXT: vector.ph:
-; CHECK-NEXT: Successor(s): vector loop
+; CHECK-NEXT: vector.body:
+; CHECK-NEXT:   WIDEN-PHI ir<%outer.iv> = phi ir<0>, ir<%outer.iv.next>
+; CHECK-NEXT:   EMIT ir<%gep.1> = getelementptr ir<@arr2>, ir<0>, ir<%outer.iv>
+; CHECK-NEXT:   EMIT store ir<%outer.iv>, ir<%gep.1>
+; CHECK-NEXT:   EMIT ir<%add> = add ir<%outer.iv>, ir<%n>
+; CHECK-NEXT: Successor(s): inner
 ; CHECK-EMPTY:
-; CHECK-NEXT: <x1> vector loop: {
-; CHECK-NEXT:   vector.body:
-; CHECK-NEXT:     WIDEN-PHI ir<%outer.iv> = phi ir<0>, ir<%outer.iv.next>
-; CHECK-NEXT:     EMIT ir<%gep.1> = getelementptr ir<@arr2>, ir<0>, ir<%outer.iv>
-; CHECK-NEXT:     EMIT store ir<%outer.iv>, ir<%gep.1>
-; CHECK-NEXT:     EMIT ir<%add> = add ir<%outer.iv>, ir<%n>
-; CHECK-NEXT:   Successor(s): inner
-; CHECK-EMPTY:
-; CHECK-NEXT:   <x1> inner: {
-; CHECK-NEXT:     inner:
-; CHECK-NEXT:       WIDEN-PHI ir<%inner.iv> = phi ir<0>, ir<%inner.iv.next>
-; CHECK-NEXT:       EMIT ir<%gep.2> = getelementptr ir<@arr>, ir<0>, ir<%inner.iv>, ir<%outer.iv>
-; CHECK-NEXT:       EMIT store ir<%add>, ir<%gep.2>
-; CHECK-NEXT:       EMIT ir<%inner.iv.next> = add ir<%inner.iv>, ir<1>
-; CHECK-NEXT:       EMIT ir<%inner.ec> = icmp ir<%inner.iv.next>, ir<8>
-; CHECK-NEXT:       EMIT branch-on-cond ir<%inner.ec>
-; CHECK-NEXT:   No successors
-; CHECK-NEXT:  }
-; CHECK-NEXT:  Successor(s): outer.latch
-; CHECK-EMPTY:
-; CHECK-NEXT:  outer.latch:
-; CHECK-NEXT:     EMIT ir<%outer.iv.next> = add ir<%outer.iv>, ir<1>
-; CHECK-NEXT:     EMIT ir<%outer.ec> = icmp ir<%outer.iv.next>, ir<8>
-; CHECK-NEXT:  Successor(s): vector.latch
-; CHECK-EMPTY:
-; CHECK-NEXT:   vector.latch:
+; CHECK-NEXT: <x1> inner: {
+; CHECK-NEXT:   inner:
+; CHECK-NEXT:     WIDEN-PHI ir<%inner.iv> = phi ir<0>, ir<%inner.iv.next>
+; CHECK-NEXT:     EMIT ir<%gep.2> = getelementptr ir<@arr>, ir<0>, ir<%inner.iv>, ir<%outer.iv>
+; CHECK-NEXT:     EMIT store ir<%add>, ir<%gep.2>
+; CHECK-NEXT:     EMIT ir<%inner.iv.next> = add ir<%inner.iv>, ir<1>
+; CHECK-NEXT:     EMIT ir<%inner.ec> = icmp ir<%inner.iv.next>, ir<8>
+; CHECK-NEXT:     EMIT branch-on-cond ir<%inner.ec>
 ; CHECK-NEXT:   No successors
-; CHECK-NEXT:  }
-; CHECK-NEXT: Successor(s): middle.block
-; CHECK-EMPTY:
-; CHECK-NEXT: middle.block:
-; CHECK-NEXT:   EMIT vp<[[C:%.+]]> = icmp eq ir<8>, vp<[[VTC]]>
-; CHECK-NEXT:   EMIT branch-on-cond vp<[[C]]>
-; CHECK-NEXT: Successor(s): ir-bb<exit>, scalar.ph
-; CHECK-EMPTY:
-; CHECK-NEXT: scalar.ph:
-; CHECK-NEXT: Successor(s): ir-bb<outer.header>
-; CHECK-EMPTY:
-; CHECK-NEXT: ir-bb<outer.header>:
-; CHECK-NEXT:   IR   %outer.iv = phi i64 [ 0, %entry ], [ %outer.iv.next, %outer.latch ]
-; CHECK-NEXT:   IR   %gep.1 = getelementptr inbounds [8 x i64], ptr @arr2, i64 0, i64 %outer.iv
-; CHECK-NEXT:   IR   store i64 %outer.iv, ptr %gep.1, align 4
-; CHECK-NEXT:   IR   %add = add nsw i64 %outer.iv, %n
-; CHECK-NEXT: No successors
+; CHECK-NEXT: }
+; CHECK-NEXT: Successor(s): outer.latch
 ; CHECK-EMPTY:
-; CHECK-NEXT: ir-bb<exit>:
-; CHECK-NEXT: No successors
+; CHECK-NEXT: outer.latch:
+; CHECK-NEXT:   EMIT ir<%outer.iv.next> = add ir<%outer.iv>, ir<1>
+; CHECK-NEXT:   EMIT ir<%outer.ec> = icmp ir<%outer.iv.next>, ir<8>
+; CHECK-NEXT: Successor(s): vector.body
 ; CHECK-NEXT: }
 entry:
   br label %outer.header

>From e5ef6d39f83f324af54adeebcce07fc143be3ec9 Mon Sep 17 00:00:00 2001
From: Florian Hahn <flo at fhahn.com>
Date: Tue, 4 Mar 2025 13:34:54 +0000
Subject: [PATCH 5/6] !fixup address latest comments, thanks

---
 llvm/lib/Transforms/Vectorize/LoopVectorize.cpp   |  4 ++--
 llvm/lib/Transforms/Vectorize/VPlan.h             | 15 +++++++--------
 .../Transforms/Vectorize/VPlanConstruction.cpp    |  7 +++++--
 .../lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp |  2 +-
 llvm/lib/Transforms/Vectorize/VPlanTransforms.h   |  2 +-
 5 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 7265eef295edf..bd405f9080f58 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -9323,9 +9323,9 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
           Range);
   auto Plan = std::make_unique<VPlan>(OrigLoop);
   // Build hierarchical CFG.
-  VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
-  // TODO: Convert to VPlan-transform and consoliate all transforms for VPlan
+  // Convert to VPlan-transform and consoliate all transforms for VPlan
   // creation.
+  VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
   HCFGBuilder.buildHierarchicalCFG();
 
   VPlanTransforms::introduceTopLevelVectorLoopRegion(
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 11cd1327e9f43..94c49b574c0ad 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -3570,6 +3570,13 @@ class VPlan {
     return TripCount;
   }
 
+  /// Set the trip count assuming it is currently null; if it is not - use
+  /// resetTripCount().
+  void setTripCount(VPValue *NewTripCount) {
+    assert(!TripCount && NewTripCount && "TripCount should not be set yet.");
+    TripCount = NewTripCount;
+  }
+
   /// Resets the trip count for the VPlan. The caller must make sure all uses of
   /// the original trip count have been replaced.
   void resetTripCount(VPValue *NewTripCount) {
@@ -3578,14 +3585,6 @@ class VPlan {
     TripCount = NewTripCount;
   }
 
-  // Set the trip count assuming it is currently null; if it is not - use
-  // resetTripCount().
-  void setTripCount(VPValue *NewTripCount) {
-    assert(!TripCount && NewTripCount && "TripCount should not be set yet.");
-
-    TripCount = NewTripCount;
-  }
-
   /// The backedge taken count of the original loop.
   VPValue *getOrCreateBackedgeTakenCount() {
     if (!BackedgeTakenCount)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
index fdc2ac779daed..02586b4b50910 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -32,7 +32,8 @@ void VPlanTransforms::introduceTopLevelVectorLoopRegion(
   VPBlockUtils::disconnectBlocks(OriginalLatch, HeaderVPBB);
   VPBasicBlock *VecPreheader = Plan.createVPBasicBlock("vector.ph");
   VPBlockUtils::connectBlocks(Plan.getEntry(), VecPreheader);
-  assert(OriginalLatch->getNumSuccessors() == 0 && "expected no predecessors");
+  assert(OriginalLatch->getNumSuccessors() == 0 &&
+         "Plan should end at top level latch");
 
   // Create SCEV and VPValue for the trip count.
   // We use the symbolic max backedge-taken-count, which works also when
@@ -47,11 +48,13 @@ void VPlanTransforms::introduceTopLevelVectorLoopRegion(
       vputils::getOrCreateVPValueForSCEVExpr(Plan, TripCount, SE));
 
   // Create VPRegionBlock, with existing header and new empty latch block, to be
-  // filled
+  // filled.
   VPBasicBlock *LatchVPBB = Plan.createVPBasicBlock("vector.latch");
   VPBlockUtils::insertBlockAfter(LatchVPBB, OriginalLatch);
   auto *TopRegion = Plan.createVPRegionBlock(
       HeaderVPBB, LatchVPBB, "vector loop", false /*isReplicator*/);
+  // All VPBB's reachable shallowly from HeaderVPBB belong to top level loop,
+  // because VPlan is expected to end at top level latch.
   for (VPBlockBase *VPBB : vp_depth_first_shallow(HeaderVPBB))
     VPBB->setParent(TopRegion);
 
diff --git a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
index b7ce0869104e3..4b8a2420b3037 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
@@ -382,7 +382,7 @@ void PlainCFGBuilder::buildPlainCFG(
     if (!isHeaderBB(BB, LoopForBB)) {
       setVPBBPredsFromBB(VPBB, BB);
     } else if (Region) {
-      // BB is a loop header and there's a corresponding region , set the
+      // BB is a loop header and there's a corresponding region, set the
       // predecessor for it.
       setRegionPredsFromBB(Region, BB);
     }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index fab74aafb469c..8a9a81e00fe23 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -55,7 +55,7 @@ struct VPlanTransforms {
   /// Introduce the top-level VPRegionBlock for the main loop in \p Plan. Coming
   /// into this function, \p Plan's top-level loop is modeled using a plain CFG.
   /// This transform wraps the plain CFG of the top-level loop within a
-  /// VPRegionBlock and creates a VPValue expressions for the original trip
+  /// VPRegionBlock and creates a VPValue expression for the original trip
   /// count. It will also introduce a dedicated VPBasicBlock for the vector
   /// pre-header as well a VPBasicBlock as exit block of the region
   /// (middle.block). If a check is needed to guard executing the scalar

>From d7a023a9366a3bf5646a13ec6bf6be7dce835769 Mon Sep 17 00:00:00 2001
From: Florian Hahn <flo at fhahn.com>
Date: Sun, 9 Mar 2025 15:04:58 +0000
Subject: [PATCH 6/6] Update VPlanConstruction.cpp

---
 llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
index 02586b4b50910..f58f0290b5fa9 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanConstruction.cpp
@@ -7,7 +7,7 @@
 //===----------------------------------------------------------------------===//
 ///
 /// \file
-/// This file implements transforms for initial VPlan construction
+/// This file implements transforms for initial VPlan construction.
 ///
 //===----------------------------------------------------------------------===//
 



More information about the llvm-commits mailing list