[llvm] [LoopVectorize] Add cost of generating tail-folding mask to the loop (PR #90191)

Florian Hahn via llvm-commits llvm-commits at lists.llvm.org
Mon May 20 05:26:01 PDT 2024


================
@@ -5942,6 +5945,35 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
   return Discount;
 }
 
+InstructionCost
+LoopVectorizationCostModel::getTailFoldMaskCost(ElementCount VF) {
+  if (VF.isScalar())
+    return 0;
+
+  InstructionCost MaskCost;
+  Type *IndTy = Legal->getWidestInductionType();
+  TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
+  TailFoldingStyle Style = getTailFoldingStyle();
+  LLVMContext &Context = TheLoop->getHeader()->getContext();
+  VectorType *RetTy = VectorType::get(IntegerType::getInt1Ty(Context), VF);
+  if (useActiveLaneMask(Style)) {
+    IntrinsicCostAttributes Attrs(
+        Intrinsic::get_active_lane_mask, RetTy,
+        {PoisonValue::get(IndTy), PoisonValue::get(IndTy)});
+    MaskCost = TTI.getIntrinsicInstrCost(Attrs, CostKind);
+  } else {
+    // This is just a stepvector, added to a splat of the current IV, followed
+    // by a vector comparison with a splat of the trip count. Since the
+    // stepvector is loop invariant it will be hoisted out so we can ignore it.
+    // This just leaves us with an add and an icmp.
+    VectorType *VecTy = VectorType::get(IndTy, VF);
+    MaskCost = TTI.getArithmeticInstrCost(Instruction::Add, VecTy, CostKind);
+    MaskCost += TTI.getCmpSelInstrCost(Instruction::ICmp, VecTy, RetTy,
+                                       ICmpInst::ICMP_ULE, CostKind, nullptr);
+  }
----------------
fhahn wrote:

> If #67934 is likely to land in the next week or so, I'm happy to hold off and base it off #67934. I can take a look at it, but I may need some help understanding how to do that. :)

Not sure about this week, but hopefully soon as all pending patches have landed.

> Does your patch automatically solve this problem @fhahn, i.e. because it walks through every VPInstruction generated in the plan and so essentially there is no extra work needed?

It won't solve it automatically; to solve it, `VPInstruction::computeCost` needs to be implemented for `ActiveLaneMask`, `GetVectorLength` and also `VPWidenCanonicalIVRecipe::compueCost`. Doing it that way would ensure we accurately cost what is actually code-gen'd (and avoid the issue with not knowing if a splat + add of step-vector will be needed as per my comment above)

https://github.com/llvm/llvm-project/pull/90191


More information about the llvm-commits mailing list