[PATCH] D157037: [VPlan] Create mask for tail-folding up-front (NFCI).

Florian Hahn via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Aug 5 06:49:50 PDT 2023


fhahn added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8161
     if (!CM.blockNeedsPredicationForAnyReason(BB))
       return BlockMaskCache[BB] = BlockMask; // Loop incoming mask is all-one.
+  assert (OrigLoop->getHeader() != BB);
----------------
Ayal wrote:
> nit: indent
> 
> The comment about "Loop incoming mask" is now obsolete, it refers to BB being the header.
Fixed indent and moved this code up, also more accurately checking for `BB == OrigLoop->getHeader()` which should be the only relevant case.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8162
       return BlockMaskCache[BB] = BlockMask; // Loop incoming mask is all-one.
-
-    assert(CM.foldTailByMasking() && "must fold the tail");
-
-    // If we're using the active lane mask for control flow, then we get the
-    // mask from the active lane mask PHI that is cached in the VPlan.
-    TailFoldingStyle TFStyle = CM.getTailFoldingStyle();
-    if (useActiveLaneMaskForControlFlow(TFStyle))
-      return BlockMaskCache[BB] = Plan.getActiveLaneMaskPhi();
-
-    // Introduce the early-exit compare IV <= BTC to form header block mask.
-    // This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
-    // constructing the desired canonical IV in the header block as its first
-    // non-phi instructions.
-
-    VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
-    auto NewInsertionPoint = HeaderVPBB->getFirstNonPhi();
-    auto *IV = new VPWidenCanonicalIVRecipe(Plan.getCanonicalIV());
-    HeaderVPBB->insert(IV, HeaderVPBB->getFirstNonPhi());
-
-    VPBuilder::InsertPointGuard Guard(Builder);
-    Builder.setInsertPoint(HeaderVPBB, NewInsertionPoint);
-    if (useActiveLaneMask(TFStyle)) {
-      VPValue *TC = Plan.getTripCount();
-      BlockMask = Builder.createNaryOp(VPInstruction::ActiveLaneMask, {IV, TC},
-                                       nullptr, "active.lane.mask");
-    } else {
-      VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
-      BlockMask = Builder.createNaryOp(VPInstruction::ICmpULE, {IV, BTC});
-    }
-    return BlockMaskCache[BB] = BlockMask;
-  }
+  assert (OrigLoop->getHeader() != BB);
 
----------------
Ayal wrote:
> add assert message; a header mask is handled earlier by createTailFoldHeaderMask(), and now must have been found in cache.
Assert removed, as we now handle `OrigLoop->getHeader() == BB` directly.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8181
 
+VPValue *VPRecipeBuilder::createTailFoldHeaderMask(VPlan &Plan) {
+  BasicBlock *Header = OrigLoop->getHeader();
----------------
Ayal wrote:
> Have it return void?
> 
> Note (independent of this patch): this is truly `create`, except for useActiveLaneMaskForControlFlow case...
Updated to return void


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8185
+  BlockMaskCacheTy::iterator BCEntryIt = BlockMaskCache.find(Header);
+  if (BCEntryIt != BlockMaskCache.end())
+    return BCEntryIt->second;
----------------
Ayal wrote:
> assert that we create it only once?
This isn't needed as `createTailFoldHeaderMask` is called once and nothing will be cached then. Remove the code.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8189
+  // All-one mask is modelled as no-mask following the convention for masked
+  // load/store/gather/scatter. Initialize BlockMask to no-mask.
+  VPValue *BlockMask = nullptr;
----------------
Ayal wrote:
> nit: here a null BlockMask will not be used. Better define it later closer to where it's actually set and used.
Moved, also dropped the comment; there always will be a non-all-one mask.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8192
+
+  assert(CM.foldTailByMasking() && "must fold the tail");
+
----------------
Ayal wrote:
> nit: best assert earlier, first thing.
Moved, thanks!


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8916
                         CM.getTailFoldingStyle(IVUpdateMayOverflow));
 
+  if (CM.foldTailByMasking())
----------------
Ayal wrote:
> Worth a comment to explain that the mask of the header block is created here proactively, if needed - when folding tail, leaving  the masks of other blocks to be created on-demand, using the header block mask.
Thanks, added a comment!


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8984
           HeaderVPBB->getFirstNonPhi() != VPBB->end()) {
         // Move VPWidenIntOrFpInductionRecipes for optimized truncates to the
         // phi section of HeaderVPBB.
----------------
Ayal wrote:
> Why/Is this change needed/related to tail folding? Any (header) phi recipe should indeed be placed along with other phi's rather than be appended after non-phi recipes. But are there now other non-phi, non-trunc Instr's that generate non-widen-int-or-fp-induction header phi recipes?
> 
> If needed, above comment needs update.
> 
> Perhaps better optimize truncating IV's later as a VPlan2VPlan transformation, as in optimizeInductions().
I updated the comment for now. Doing as a VPlan2VPlan transform should be feasible in general, but it would require TTI to check if the truncate would be free.


================
Comment at: llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h:137
   /// A helper function that computes the predicate of the block BB, assuming
   /// that the header block of the loop is set to True. It returns the *entry*
   /// mask for the block BB.
----------------
Ayal wrote:
> (nit: this assumption will now hold ;-)
Clarified comment that the header mask will either be set to true or the loop mask when tail-folding.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157037/new/

https://reviews.llvm.org/D157037



More information about the llvm-commits mailing list