[llvm] [VPlan] Materialize vector trip count using VPInstructions. (PR #151925)
Florian Hahn via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 7 03:09:02 PDT 2025
================
@@ -3278,6 +3278,65 @@ void VPlanTransforms::materializeBackedgeTakenCount(VPlan &Plan,
BTC->replaceAllUsesWith(TCMO);
}
+void VPlanTransforms::materializeVectorTripCount(VPlan &Plan,
+ VPBasicBlock *VectorPHVPBB,
+ bool TailByMasking,
+ bool RequiresScalarEpilogue) {
+ VPValue &VectorTC = Plan.getVectorTripCount();
+ assert(VectorTC.isLiveIn() && "vector-trip-count must be a live-in");
+ // There's nothing to do if there are no users of the vector trip count or its
+ // IR value has already been set.
+ if (VectorTC.getNumUsers() == 0 || VectorTC.getLiveInIRValue())
+ return;
+ VPValue *TC = Plan.getTripCount();
+ Type *TCTy = VPTypeAnalysis(Plan).inferScalarType(TC);
+ VPBuilder Builder(VectorPHVPBB, VectorPHVPBB->begin());
+
+ VPValue *Step = &Plan.getVFxUF();
+
+ // If the tail is to be folded by masking, round the number of iterations N
+ // up to a multiple of Step instead of rounding down. This is done by first
+ // adding Step-1 and then rounding down. Note that it's ok if this addition
+ // overflows: the vector induction variable will eventually wrap to zero given
+ // that it starts at zero and its Step is a power of two; the loop will then
+ // exit, with the last early-exit vector comparison also producing all-true.
+ // For scalable vectors the VF is not guaranteed to be a power of 2, but this
+ // is accounted for in emitIterationCountCheck that adds an overflow check.
+ if (TailByMasking) {
+ TC = Builder.createNaryOp(
+ Instruction::Add,
+ {TC, Builder.createNaryOp(
+ Instruction::Sub,
+ {Step, Plan.getOrAddLiveIn(ConstantInt::get(TCTy, 1))})},
+ DebugLoc::getCompilerGenerated(), "n.rnd.up");
+ }
+
+ // Now we need to generate the expression for the part of the loop that the
+ // vectorized body will execute. This is equal to N - (N % Step) if scalar
+ // iterations are not required for correctness, or N - Step, otherwise. Step
+ // is equal to the vectorization factor (number of SIMD elements) times the
+ // unroll factor (number of SIMD instructions).
+ VPValue *R =
+ Builder.createNaryOp(Instruction::URem, {TC, Step},
+ DebugLoc::getCompilerGenerated(), "n.mod.vf");
+
+ // There are cases where we *must* run at least one iteration in the remainder
+ // loop. See the cost model for when this can happen. If the step evenly
+ // divides the trip count, we set the remainder to be equal to the step. If
+ // the step does not evenly divide the trip count, no adjustment is necessary
+ // since there will already be scalar iterations. Note that the minimum
+ // iterations check ensures that N >= Step.
+ if (RequiresScalarEpilogue) {
+ auto *IsZero = Builder.createICmp(
+ CmpInst::ICMP_EQ, R, Plan.getOrAddLiveIn(ConstantInt::get(TCTy, 0)));
+ R = Builder.createSelect(IsZero, Step, R);
+ }
+
+ auto Res = Builder.createNaryOp(Instruction::Sub, {TC, R},
----------------
fhahn wrote:
Replaced the uses of `auto*` with `VPValue*`, thanks
https://github.com/llvm/llvm-project/pull/151925
More information about the llvm-commits
mailing list