[PATCH] D157037: [VPlan] Create mask for tail-folding up-front (NFCI).

Sat Aug 5 04:57:30 PDT 2023

Ayal added a comment.

> Split off mask creation for tail folding and always create the mask for
> the header block.

"always" >> "proactively"?

Down the road tail folding should be considered as a VPlan2VPlan transformation along with masking/predication.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8148

 VPValue *VPRecipeBuilder::createBlockInMask(BasicBlock *BB, VPlan &Plan) {
   assert(OrigLoop->contains(BB) && "Block is not a part of a loop");
----------------
nit (independent of this patch): more accurately prefixed getOrCreate.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8161
     if (!CM.blockNeedsPredicationForAnyReason(BB))
       return BlockMaskCache[BB] = BlockMask; // Loop incoming mask is all-one.
+  assert (OrigLoop->getHeader() != BB);
----------------
nit: indent

The comment about "Loop incoming mask" is now obsolete, it refers to BB being the header.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8162
       return BlockMaskCache[BB] = BlockMask; // Loop incoming mask is all-one.
-
-    assert(CM.foldTailByMasking() && "must fold the tail");
-
-    // If we're using the active lane mask for control flow, then we get the
-    // mask from the active lane mask PHI that is cached in the VPlan.
-    TailFoldingStyle TFStyle = CM.getTailFoldingStyle();
-    if (useActiveLaneMaskForControlFlow(TFStyle))
-      return BlockMaskCache[BB] = Plan.getActiveLaneMaskPhi();
-
-    // Introduce the early-exit compare IV <= BTC to form header block mask.
-    // This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
-    // constructing the desired canonical IV in the header block as its first
-    // non-phi instructions.
-
-    VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
-    auto NewInsertionPoint = HeaderVPBB->getFirstNonPhi();
-    auto *IV = new VPWidenCanonicalIVRecipe(Plan.getCanonicalIV());
-    HeaderVPBB->insert(IV, HeaderVPBB->getFirstNonPhi());
-
-    VPBuilder::InsertPointGuard Guard(Builder);
-    Builder.setInsertPoint(HeaderVPBB, NewInsertionPoint);
-    if (useActiveLaneMask(TFStyle)) {
-      VPValue *TC = Plan.getTripCount();
-      BlockMask = Builder.createNaryOp(VPInstruction::ActiveLaneMask, {IV, TC},
-                                       nullptr, "active.lane.mask");
-    } else {
-      VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
-      BlockMask = Builder.createNaryOp(VPInstruction::ICmpULE, {IV, BTC});
-    }
-    return BlockMaskCache[BB] = BlockMask;
-  }
+  assert (OrigLoop->getHeader() != BB);

----------------
add assert message; a header mask is handled earlier by createTailFoldHeaderMask(), and now must have been found in cache.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8181

+VPValue *VPRecipeBuilder::createTailFoldHeaderMask(VPlan &Plan) {
+  BasicBlock *Header = OrigLoop->getHeader();
----------------
Have it return void?

Note (independent of this patch): this is truly `create`, except for useActiveLaneMaskForControlFlow case...

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8185
+  BlockMaskCacheTy::iterator BCEntryIt = BlockMaskCache.find(Header);
+  if (BCEntryIt != BlockMaskCache.end())
+    return BCEntryIt->second;
----------------
assert that we create it only once?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8189
+  // All-one mask is modelled as no-mask following the convention for masked
+  // load/store/gather/scatter. Initialize BlockMask to no-mask.
+  VPValue *BlockMask = nullptr;
----------------
nit: here a null BlockMask will not be used. Better define it later closer to where it's actually set and used.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8192
+
+  assert(CM.foldTailByMasking() && "must fold the tail");
+
----------------
nit: best assert earlier, first thing.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8201
+  // Introduce the early-exit compare IV <= BTC to form header block mask.
+  // This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
+  // constructing the desired canonical IV in the header block as its first
----------------
nit (independent of this patch): some explanation of using active lane mask (below, not for control flow - above) which does use IV and TC, is missing here.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8916
                         CM.getTailFoldingStyle(IVUpdateMayOverflow));

+  if (CM.foldTailByMasking())
----------------
Worth a comment to explain that the mask of the header block is created here proactively, if needed - when folding tail, leaving  the masks of other blocks to be created on-demand, using the header block mask.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8984
           HeaderVPBB->getFirstNonPhi() != VPBB->end()) {
         // Move VPWidenIntOrFpInductionRecipes for optimized truncates to the
         // phi section of HeaderVPBB.
----------------
Why/Is this change needed/related to tail folding? Any (header) phi recipe should indeed be placed along with other phi's rather than be appended after non-phi recipes. But are there now other non-phi, non-trunc Instr's that generate non-widen-int-or-fp-induction header phi recipes?

If needed, above comment needs update.

Perhaps better optimize truncating IV's later as a VPlan2VPlan transformation, as in optimizeInductions().

================
Comment at: llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h:137
   /// A helper function that computes the predicate of the block BB, assuming
   /// that the header block of the loop is set to True. It returns the *entry*
   /// mask for the block BB.
----------------
(nit: this assumption will now hold ;-)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157037/new/

https://reviews.llvm.org/D157037