[PATCH] D157037: [VPlan] Create mask for tail-folding up-front (NFCI).
Ayal Zaks via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Aug 5 04:57:30 PDT 2023
Ayal added a comment.
> Split off mask creation for tail folding and always create the mask for
> the header block.
"always" >> "proactively"?
Down the road tail folding should be considered as a VPlan2VPlan transformation along with masking/predication.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8148
VPValue *VPRecipeBuilder::createBlockInMask(BasicBlock *BB, VPlan &Plan) {
assert(OrigLoop->contains(BB) && "Block is not a part of a loop");
----------------
nit (independent of this patch): more accurately prefixed getOrCreate.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8161
if (!CM.blockNeedsPredicationForAnyReason(BB))
return BlockMaskCache[BB] = BlockMask; // Loop incoming mask is all-one.
+ assert (OrigLoop->getHeader() != BB);
----------------
nit: indent
The comment about "Loop incoming mask" is now obsolete, it refers to BB being the header.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8162
return BlockMaskCache[BB] = BlockMask; // Loop incoming mask is all-one.
-
- assert(CM.foldTailByMasking() && "must fold the tail");
-
- // If we're using the active lane mask for control flow, then we get the
- // mask from the active lane mask PHI that is cached in the VPlan.
- TailFoldingStyle TFStyle = CM.getTailFoldingStyle();
- if (useActiveLaneMaskForControlFlow(TFStyle))
- return BlockMaskCache[BB] = Plan.getActiveLaneMaskPhi();
-
- // Introduce the early-exit compare IV <= BTC to form header block mask.
- // This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
- // constructing the desired canonical IV in the header block as its first
- // non-phi instructions.
-
- VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
- auto NewInsertionPoint = HeaderVPBB->getFirstNonPhi();
- auto *IV = new VPWidenCanonicalIVRecipe(Plan.getCanonicalIV());
- HeaderVPBB->insert(IV, HeaderVPBB->getFirstNonPhi());
-
- VPBuilder::InsertPointGuard Guard(Builder);
- Builder.setInsertPoint(HeaderVPBB, NewInsertionPoint);
- if (useActiveLaneMask(TFStyle)) {
- VPValue *TC = Plan.getTripCount();
- BlockMask = Builder.createNaryOp(VPInstruction::ActiveLaneMask, {IV, TC},
- nullptr, "active.lane.mask");
- } else {
- VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
- BlockMask = Builder.createNaryOp(VPInstruction::ICmpULE, {IV, BTC});
- }
- return BlockMaskCache[BB] = BlockMask;
- }
+ assert (OrigLoop->getHeader() != BB);
----------------
add assert message; a header mask is handled earlier by createTailFoldHeaderMask(), and now must have been found in cache.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8181
+VPValue *VPRecipeBuilder::createTailFoldHeaderMask(VPlan &Plan) {
+ BasicBlock *Header = OrigLoop->getHeader();
----------------
Have it return void?
Note (independent of this patch): this is truly `create`, except for useActiveLaneMaskForControlFlow case...
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8185
+ BlockMaskCacheTy::iterator BCEntryIt = BlockMaskCache.find(Header);
+ if (BCEntryIt != BlockMaskCache.end())
+ return BCEntryIt->second;
----------------
assert that we create it only once?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8189
+ // All-one mask is modelled as no-mask following the convention for masked
+ // load/store/gather/scatter. Initialize BlockMask to no-mask.
+ VPValue *BlockMask = nullptr;
----------------
nit: here a null BlockMask will not be used. Better define it later closer to where it's actually set and used.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8192
+
+ assert(CM.foldTailByMasking() && "must fold the tail");
+
----------------
nit: best assert earlier, first thing.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8201
+ // Introduce the early-exit compare IV <= BTC to form header block mask.
+ // This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
+ // constructing the desired canonical IV in the header block as its first
----------------
nit (independent of this patch): some explanation of using active lane mask (below, not for control flow - above) which does use IV and TC, is missing here.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8916
CM.getTailFoldingStyle(IVUpdateMayOverflow));
+ if (CM.foldTailByMasking())
----------------
Worth a comment to explain that the mask of the header block is created here proactively, if needed - when folding tail, leaving the masks of other blocks to be created on-demand, using the header block mask.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8984
HeaderVPBB->getFirstNonPhi() != VPBB->end()) {
// Move VPWidenIntOrFpInductionRecipes for optimized truncates to the
// phi section of HeaderVPBB.
----------------
Why/Is this change needed/related to tail folding? Any (header) phi recipe should indeed be placed along with other phi's rather than be appended after non-phi recipes. But are there now other non-phi, non-trunc Instr's that generate non-widen-int-or-fp-induction header phi recipes?
If needed, above comment needs update.
Perhaps better optimize truncating IV's later as a VPlan2VPlan transformation, as in optimizeInductions().
================
Comment at: llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h:137
/// A helper function that computes the predicate of the block BB, assuming
/// that the header block of the loop is set to True. It returns the *entry*
/// mask for the block BB.
----------------
(nit: this assumption will now hold ;-)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D157037/new/
https://reviews.llvm.org/D157037
More information about the llvm-commits
mailing list