[llvm] [LV] Add support for partial alias masking with tail folding (PR #182457)

Tue Mar 3 03:03:02 PST 2026

================
@@ -7471,6 +7542,9 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   VPlanTransforms::materializeVectorTripCount(
       BestVPlan, VectorPH, CM.foldTailByMasking(),
       CM.requiresScalarEpilogue(BestVF.isVector()));
+  // Do a late fix-up of the VF to replace any additional users of VF since the
+  // alias mask was materialized.
+  VPlanTransforms::fixupVFUsersForClampedVF(BestVPlan, ClampedVF);
----------------
lukel97 wrote:

Continuing the discussion at https://github.com/llvm/llvm-project/pull/177599#discussion_r2818191264

> > Otherwise we end up with an incorrect VPlan throughout the pipeline.
> I don't see it as an incorrect VPlan. Before VF is materialized it's a symbolic value that represents the runtime VF for the plan. For plans with alias-masking the runtime VF will be the number of lanes in the mask.

I don't think that matches the definition of VF today with tail folding. With tail folding the VF is always just the vector width regardless of the number of active lanes, and we materialize the VF as the vector width. Otherwise wouldn't ClampedVF just be VF then?

>> EVL folding handles the "variable step" or clamped VF as its called in this PR with VPEVLBasedIVPHIRecipe. arcbbb is working on generalizing it so it can be reused in contexts like this, renaming it to VPCurrentIterationPHIRecipe: https://github.com/llvm/llvm-project/pull/177114
> I'm not sure if this applies here? With alias-masking there is not a "variable step". The step is fixed/loop-invariant, it's just not known until the runtime (in the pre-header). It's not that dissimilar to scalable VFs in that regard.

Yeah the clamped VF isn't variable, but the thing that needs fixed up is the fact that the canonical IV increment is no longer VFxUF, so recipes need updated to reflect that. With scalable VFs the increment is still VFxUF.

My main concern is that we might have transforms that depend on e.g. VPWidenIntOrFpInductionRecipe/VPScalarIVStepsRecipe having the correct value. If another transform introduces a VPWidenIntOrFpInductionRecipe with Plan.getVF() as an operand in between materializeAliasMask and the late call to fixupVFUsersForClampedVF, then it won't see that the VF operand should really be clampedVF.

I think we need to go through and separate out the users of VPlan.getVF() and figure out which ones are the "number of elements processed this iteration" and which ones are the "width of the vector type". The former would become the clamped VF with partial alias masking, EVL with EVL tail folding, and the faulting lane in vp.load.ff. It can be loop variant or invariant. The latter is the VF as we know today. 

This is probably a larger chunk of work so I wouldn't block this PR on it. But I'd like to agree on a long-term direction which allows us to keep the vplan correct throughout, and avoids duplicating work between all the different non-VFxUF incrementing types of vectorization. 

https://github.com/llvm/llvm-project/pull/182457