[PATCH] D147964: [VPlan] Introduce new entry block to VPlan for early SCEV expansion.

Florian Hahn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Apr 30 08:50:27 PDT 2023


fhahn added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:502
   /// epilogue vectorization, this function is overriden to handle the more
   /// complex control flow around the loops.
+  virtual std::pair<BasicBlock *, Value *>
----------------
Ayal wrote:
> Worth mentioning \p Plan and State?
The latest version has been updated to not pass them.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8925
+  VPBasicBlock *VecPreheader = new VPBasicBlock("vector.ph");
+  VPBlockUtils::insertBlockAfter(VecPreheader, Preheader);
 
----------------
Ayal wrote:
> fhahn wrote:
> > Ayal wrote:
> > > Should Preheader be detached from VecPreheader (and any other block), given that they represent disparate blocks and also get executed separately? Retaining the latter as Plan's entryBlock, keeping the former as Plan's (currently) isolated block for this and all other ExpandSCEV recipes.
> > > Perhaps worth verifying that all ExpandSCEV recipes reside in this unique Preheader VPBB.
> > I updated the code to have the preheader separate (pre-header before skeleton creation). I *think* at the moment the skeleton creation code leaves the CFG in a valid state, so for now SCEV expansions only needed by the vector body can remain where they are at the moment.
> Better place all SCEV expansions at preheader to ensure their correct generation before any CFG change, and potential reuse by epilog vector loop? It may incur an overhead when the vector loop is bypassed - but some subsequent pass may sink the expanded code into VecPreheader - if not reused by epilog vector loop. Otherwise how to set the insertion block of ExpandSCEV recipes needs to be explained.
I've retained to placement of the expansion only needed in the vector loop region in the vector preheader for now, to avoid extra test changes.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:614
+
+  void setTripCount(Value *TC) { TripCount = TC; }
 
----------------
Ayal wrote:
> Ayal wrote:
> > nit: can do w/o setTripCount()? TripCount should conceptually be fixed before/when building VPlan and remain invariant when transforming it.
> Ah, this is needed for setting the IR Value TripCount of ILV, for main loop only, during LVP::executePlan() after ILV's construction.
> Deserves documentation.
Yes, added a comment, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:3288
 
-       [ ] <-- loop iteration number check.
-    /   |
+       [ ] <-- loop iteration number check and SCEVs in Plans preheader are
+    /   |      expanded here.
----------------
Ayal wrote:
> nit: add its name "(old) preheader"? Note another block called "new preheader" below.
Done, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7705
+  // before making any changes to the CFG.
+  VPTransformState State{BestVF, BestUF, LI, DT, ILV.Builder, &ILV, &BestVPlan};
+  State.CFG.PrevBB = OrigLoop->getLoopPreheader();
----------------
Ayal wrote:
> nit: State should be defined before 0, its PrevBB and insertion point set after checking if preheader is not empty right before executing it.
Adjusted, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8748
     // Create the active lane mask instruction in the vplan preheader.
-    VPBasicBlock *Preheader = Plan.getEntry()->getEntryBasicBlock();
+    VPBasicBlock *Preheader =
+        cast<VPBasicBlock>(Plan.getVectorLoopRegion()->getSinglePredecessor());
----------------
Ayal wrote:
> nit: VecPreheader?
Renamed in 4583d7ef7c33, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8903-8907
+  // Create initial VPlan skeleton, starting with a block for the pre-header
+  // which contains SCEV expansions that need to happen before the CFG is
+  // modified, followed by the vector pre-header, followed by a region for the
+  // vector loop, followed by the middle block. The skeleton vector loop region
+  // contains a header and latch block.
----------------
Ayal wrote:
> 
Thanks, updated!


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8917
+  VPBasicBlock *VecPreheader = new VPBasicBlock("vector.ph");
+  auto Plan = std::make_unique<VPlan>(VecPreheader, Preheader, TripCount);
 
----------------
Ayal wrote:
> nit: perhaps Preheader should appear before VecPreheader, and maybe TripCount before both?
> 
> nit: would be good to outline this low-level part if reasonable as method is exceeding 250 LOC.
> nit: perhaps Preheader should appear before VecPreheader, and maybe TripCount before both?

Adjusted, thanks!

> nit: would be good to outline this low-level part if reasonable as method is exceeding 250 LOC.

Agreed, but I think it would be good to do this as follow-up and also trying to unify this part with the native path.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9110
 
   // Create new empty VPlan
   auto Plan = std::make_unique<VPlan>();
----------------
Ayal wrote:
> nit: first create Trip Count, then Preheader, then define Plan given both - which should ideally be set at VPlan construction time w/o being (re)set later. HCFGBuilder will then set VecPreheader.
> Can share outlined code with tryToBuildVPlanWithVPRecipes()?
moved to a helper, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10432
+        // When vectorizing the epilogue loop, re-use the expanded values from
+        // the main vector loop.
+        for (auto &R : make_early_inc_range(*BestEpiPlan.getPreheader()))
----------------
Ayal wrote:
> This erasure empties the preheader thereby preventing re-execution of its recipes, and is probably responsible for checking if preheader is empty upon execution.
> 
> How do the users of the erased TripCount ExpandSCEV recipe get to re-use the (already) expanded values from the main vector loop, now that the recipe-VPValue feeding them is being deleted?
> How do the users of the erased TripCount ExpandSCEV recipe get to re-use the (already) expanded values from the main vector loop, now that the recipe-VPValue feeding them is being deleted?

The only users should be through ILV.TripCount. Updated the comment and add assert that the removed values don't have any users.

Down the road we really need to find a way to gradually transition the epilogue handling into a single VPlan, so we can avoid all the workarounds here...


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.cpp:596
   for (VPValue *VPV : VPLiveInsToFree)
     delete VPV;
   if (BackedgeTakenCount)
----------------
Ayal wrote:
> Where is TripCount deleted?
> Where is Preheader deleted?
If they are defined by a recipe, they will be deleted as part of recipe deletion; if they are live-in VPValues, they will be cleaned up as part of that cleanup. No special handling should be necessary.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2220
   VPBasicBlock *Entry;
 
+  VPBasicBlock *Preheader;
----------------
Ayal wrote:
> /// ...
Added comment, thanks


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2264
 
 public:
+  VPlan(VPBasicBlock *Entry = nullptr, VPBasicBlock *Preheader = nullptr,
----------------
Ayal wrote:
> /// ... \p Entry ..
Added comment, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2267
+        VPValue *TripCount = nullptr)
+      : Entry(Entry), Preheader(Preheader), TripCount(TripCount) {
     if (Entry)
----------------
Ayal wrote:
> nits: Entry >> VecPreheader? Should Preheader and TripCount drop their defaults and asserted to be non-null?
Updated to require all arguments. I also added a version that doesn't take any arguments as this is used across different unit tests.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2293
 
+  void setPreheader(VPBasicBlock *PH) {
+    Preheader = PH;
----------------
Ayal wrote:
> nit: can be dropped? Is Preheader expected to change?
it's not needed in the latest version, removed!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2299
   /// The trip count of the original loop.
-  VPValue *getOrCreateTripCount() {
-    if (!TripCount)
-      TripCount = new VPValue();
+  VPValue *getTripCount() {
+    assert(TripCount && "trip count needs to be set before accessing it");
----------------
Ayal wrote:
> nit: can add a const version, is it useful now?
A `const` version alone is sufficient, added `const`, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2304
 
+  void setTripCount(VPValue *V) {
+    assert(!TripCount);
----------------
Ayal wrote:
> nit: can be dropped? Is TripCount expected to change?
Removed in the latest version, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/VPlan.h:2763
 /// pre-header already contains a recipe expanding \p Expr, return it. If not,
 /// create a new one.
+VPValue *getOrCreateVPValueForSCEVExpr(VPlan &Plan, VPBasicBlock *Block,
----------------
Ayal wrote:
> /// <How does one decide what \p Block should be>
> In all cases, it is set to Plan.getEntry(), because ExpandSCEV for TripCount (placed in Preheader) is created directly rather through this API.
In the latest version, it is either the vector pre-header (VPlan.getEntry()) or the prehader (getPreheader()). This avoids causing test changes due to different expansion points.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:34
   ReversePostOrderTraversal<VPBlockDeepTraversalWrapper<VPBlockBase *>> RPOT(
-      Plan->getEntry());
+      Plan->getEntry()->getSingleSuccessor());
   for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(RPOT)) {
----------------
Ayal wrote:
> Should this still be (Plan->getEntry())?
Yep, updated.


================
Comment at: llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp:400
+    if (PredVPBB && PredVPBB->getNumSuccessors() == 1 &&
+        PredVPBB != Plan.getEntry())
       WorkList.push_back(VPBB);
----------------
Ayal wrote:
> Is this additional check needed - can PredVPBB be Plan.getEntry()?
Restored original code, it's not needed with the latest version.


================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll:161
 ; CHECK:       vec.epilog.vector.body:
-; CHECK-NEXT:    [[OFFSET_IDX:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT11:%.*]], [[VEC_EPILOG_VECTOR_BODY]] ]
+; CHECK-NEXT:    [[INDEX7:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], [[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT11:%.*]], [[VEC_EPILOG_VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[VEC_IND8:%.*]] = phi <2 x i64> [ [[INDUCTION]], [[VEC_EPILOG_PH]] ], [ [[VEC_IND_NEXT10:%.*]], [[VEC_EPILOG_VECTOR_BODY]] ]
----------------
Ayal wrote:
> nit: are these OFFSET_IDX >> INDEX# changes needed?
Not needed, undone!



================
Comment at: llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll:90
 ; CHECK-NEXT:    [[TMP30:%.*]] = icmp eq i64 [[INDEX_NEXT16]], [[N_VEC4]]
-; CHECK-NEXT:    br i1 [[TMP30]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
+; CHECK-NEXT:    br i1 [[TMP30]], label [[VEC_EPILOG_MIDDLE_BLOCK:%.*]], label [[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
 ; CHECK:       vec.epilog.middle.block:
----------------
Not needed, undone!



================
Comment at: llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll:866
 ; AVX512-NEXT:    [[TMP8:%.*]] = add nuw nsw i64 [[TMP7]], 8
-; AVX512-NEXT:    [[UGLYGEP:%.*]] = getelementptr i8, ptr [[DEST:%.*]], i64 [[TMP8]]
+; AVX512-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[DEST:%.*]], i64 [[TMP8]]
 ; AVX512-NEXT:    [[TMP9:%.*]] = shl nuw i64 [[TMP6]], 2
----------------
Ayal wrote:
> nit: are these UGLY >> SCEV changes needed?
Not needed, undone!


================
Comment at: llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll:900
 ; UNROLL-NO-IC-NEXT:    [[E_015:%.*]] = phi i32 [ poison, [[ENTRY]] ], [ [[E_1_LCSSA:%.*]], [[FOR_COND_CLEANUP3]] ]
-; UNROLL-NO-IC-NEXT:    [[TMP0:%.*]] = add i32 [[INDVAR]], 1
-; UNROLL-NO-IC-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[TMP0]], 8
+; UNROLL-NO-IC-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i32 [[I_016]], 8
 ; UNROLL-NO-IC-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
----------------
Ayal wrote:
> Nice LSR-type phi redundancy elimination as a by-product?
Yes it looks like it!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147964/new/

https://reviews.llvm.org/D147964



More information about the llvm-commits mailing list