[PATCH] D157037: [VPlan] Create mask for tail-folding up-front (NFCI).
Florian Hahn via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Aug 5 06:49:50 PDT 2023
fhahn added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8161
if (!CM.blockNeedsPredicationForAnyReason(BB))
return BlockMaskCache[BB] = BlockMask; // Loop incoming mask is all-one.
+ assert (OrigLoop->getHeader() != BB);
----------------
Ayal wrote:
> nit: indent
>
> The comment about "Loop incoming mask" is now obsolete, it refers to BB being the header.
Fixed indent and moved this code up, also more accurately checking for `BB == OrigLoop->getHeader()` which should be the only relevant case.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8162
return BlockMaskCache[BB] = BlockMask; // Loop incoming mask is all-one.
-
- assert(CM.foldTailByMasking() && "must fold the tail");
-
- // If we're using the active lane mask for control flow, then we get the
- // mask from the active lane mask PHI that is cached in the VPlan.
- TailFoldingStyle TFStyle = CM.getTailFoldingStyle();
- if (useActiveLaneMaskForControlFlow(TFStyle))
- return BlockMaskCache[BB] = Plan.getActiveLaneMaskPhi();
-
- // Introduce the early-exit compare IV <= BTC to form header block mask.
- // This is used instead of IV < TC because TC may wrap, unlike BTC. Start by
- // constructing the desired canonical IV in the header block as its first
- // non-phi instructions.
-
- VPBasicBlock *HeaderVPBB = Plan.getVectorLoopRegion()->getEntryBasicBlock();
- auto NewInsertionPoint = HeaderVPBB->getFirstNonPhi();
- auto *IV = new VPWidenCanonicalIVRecipe(Plan.getCanonicalIV());
- HeaderVPBB->insert(IV, HeaderVPBB->getFirstNonPhi());
-
- VPBuilder::InsertPointGuard Guard(Builder);
- Builder.setInsertPoint(HeaderVPBB, NewInsertionPoint);
- if (useActiveLaneMask(TFStyle)) {
- VPValue *TC = Plan.getTripCount();
- BlockMask = Builder.createNaryOp(VPInstruction::ActiveLaneMask, {IV, TC},
- nullptr, "active.lane.mask");
- } else {
- VPValue *BTC = Plan.getOrCreateBackedgeTakenCount();
- BlockMask = Builder.createNaryOp(VPInstruction::ICmpULE, {IV, BTC});
- }
- return BlockMaskCache[BB] = BlockMask;
- }
+ assert (OrigLoop->getHeader() != BB);
----------------
Ayal wrote:
> add assert message; a header mask is handled earlier by createTailFoldHeaderMask(), and now must have been found in cache.
Assert removed, as we now handle `OrigLoop->getHeader() == BB` directly.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8181
+VPValue *VPRecipeBuilder::createTailFoldHeaderMask(VPlan &Plan) {
+ BasicBlock *Header = OrigLoop->getHeader();
----------------
Ayal wrote:
> Have it return void?
>
> Note (independent of this patch): this is truly `create`, except for useActiveLaneMaskForControlFlow case...
Updated to return void
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8185
+ BlockMaskCacheTy::iterator BCEntryIt = BlockMaskCache.find(Header);
+ if (BCEntryIt != BlockMaskCache.end())
+ return BCEntryIt->second;
----------------
Ayal wrote:
> assert that we create it only once?
This isn't needed as `createTailFoldHeaderMask` is called once and nothing will be cached then. Remove the code.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8189
+ // All-one mask is modelled as no-mask following the convention for masked
+ // load/store/gather/scatter. Initialize BlockMask to no-mask.
+ VPValue *BlockMask = nullptr;
----------------
Ayal wrote:
> nit: here a null BlockMask will not be used. Better define it later closer to where it's actually set and used.
Moved, also dropped the comment; there always will be a non-all-one mask.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8192
+
+ assert(CM.foldTailByMasking() && "must fold the tail");
+
----------------
Ayal wrote:
> nit: best assert earlier, first thing.
Moved, thanks!
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8916
CM.getTailFoldingStyle(IVUpdateMayOverflow));
+ if (CM.foldTailByMasking())
----------------
Ayal wrote:
> Worth a comment to explain that the mask of the header block is created here proactively, if needed - when folding tail, leaving the masks of other blocks to be created on-demand, using the header block mask.
Thanks, added a comment!
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8984
HeaderVPBB->getFirstNonPhi() != VPBB->end()) {
// Move VPWidenIntOrFpInductionRecipes for optimized truncates to the
// phi section of HeaderVPBB.
----------------
Ayal wrote:
> Why/Is this change needed/related to tail folding? Any (header) phi recipe should indeed be placed along with other phi's rather than be appended after non-phi recipes. But are there now other non-phi, non-trunc Instr's that generate non-widen-int-or-fp-induction header phi recipes?
>
> If needed, above comment needs update.
>
> Perhaps better optimize truncating IV's later as a VPlan2VPlan transformation, as in optimizeInductions().
I updated the comment for now. Doing as a VPlan2VPlan transform should be feasible in general, but it would require TTI to check if the truncate would be free.
================
Comment at: llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h:137
/// A helper function that computes the predicate of the block BB, assuming
/// that the header block of the loop is set to True. It returns the *entry*
/// mask for the block BB.
----------------
Ayal wrote:
> (nit: this assumption will now hold ;-)
Clarified comment that the header mask will either be set to true or the loop mask when tail-folding.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D157037/new/
https://reviews.llvm.org/D157037
More information about the llvm-commits
mailing list