[llvm] [LoopVectorize] Add cost of generating tail-folding mask to the loop (PR #90191)
Florian Hahn via llvm-commits
llvm-commits at lists.llvm.org
Mon May 20 05:26:01 PDT 2024
================
@@ -5942,6 +5945,35 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
return Discount;
}
+InstructionCost
+LoopVectorizationCostModel::getTailFoldMaskCost(ElementCount VF) {
+ if (VF.isScalar())
+ return 0;
+
+ InstructionCost MaskCost;
+ Type *IndTy = Legal->getWidestInductionType();
+ TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
+ TailFoldingStyle Style = getTailFoldingStyle();
+ LLVMContext &Context = TheLoop->getHeader()->getContext();
+ VectorType *RetTy = VectorType::get(IntegerType::getInt1Ty(Context), VF);
+ if (useActiveLaneMask(Style)) {
+ IntrinsicCostAttributes Attrs(
+ Intrinsic::get_active_lane_mask, RetTy,
+ {PoisonValue::get(IndTy), PoisonValue::get(IndTy)});
+ MaskCost = TTI.getIntrinsicInstrCost(Attrs, CostKind);
+ } else {
+ // This is just a stepvector, added to a splat of the current IV, followed
+ // by a vector comparison with a splat of the trip count. Since the
+ // stepvector is loop invariant it will be hoisted out so we can ignore it.
+ // This just leaves us with an add and an icmp.
+ VectorType *VecTy = VectorType::get(IndTy, VF);
+ MaskCost = TTI.getArithmeticInstrCost(Instruction::Add, VecTy, CostKind);
+ MaskCost += TTI.getCmpSelInstrCost(Instruction::ICmp, VecTy, RetTy,
+ ICmpInst::ICMP_ULE, CostKind, nullptr);
+ }
----------------
fhahn wrote:
> If #67934 is likely to land in the next week or so, I'm happy to hold off and base it off #67934. I can take a look at it, but I may need some help understanding how to do that. :)
Not sure about this week, but hopefully soon as all pending patches have landed.
> Does your patch automatically solve this problem @fhahn, i.e. because it walks through every VPInstruction generated in the plan and so essentially there is no extra work needed?
It won't solve it automatically; to solve it, `VPInstruction::computeCost` needs to be implemented for `ActiveLaneMask`, `GetVectorLength` and also `VPWidenCanonicalIVRecipe::compueCost`. Doing it that way would ensure we accurately cost what is actually code-gen'd (and avoid the issue with not knowing if a splat + add of step-vector will be needed as per my comment above)
https://github.com/llvm/llvm-project/pull/90191
More information about the llvm-commits
mailing list