[llvm] [VPlan] Fold safe divisors into VP intrinsics with EVL (PR #148828)

Wed Jul 16 10:03:37 PDT 2025

================
@@ -2176,6 +2176,52 @@ static VPRecipeBase *optimizeMaskToEVL(VPValue *HeaderMask,
       .Default([&](VPRecipeBase *R) { return nullptr; });
 }
 
+/// Try to optimize safe divisors away by converting their users to VP
+/// intrinsics:
+///
+/// udiv x, (vp.merge allones, y, 1, evl) -> vp.udiv x, y, allones, evl
+///
+/// Note the lanes past EVL will be changed from x to poison. This only works
+/// for the EVL-based IV and not any arbitrary EVL, because we know nothing
+/// will read the lanes past the EVL-based IV.
----------------
lukel97 wrote:

The users of the op aren't predicated in the sense that they're not converted to VPWidenIntrinsic VP intrinsic recipes, nor are they predicated in terms of `LoopVectorizationCostModel::isPredicatedInst`. 

I guess the point this comment is trying to clarify is that there's an invariant in tail folding that for any recipe, none of the inactive lanes/lanes past EVL will be used, which is what this transform relies on to be correct.

I think this is similar to how we can't use regular ExtractLastElement with tail folding, and we need https://github.com/llvm/llvm-project/pull/149042 to make sure we only access the last active lane.

The EVL-based IV bit stems from the fact that we can't fold for e.g. `udiv x, (vp.merge allones, y, 1, foo) -> vp.udiv x, y, allones, foo` because we don't know that the lanes past foo won't be read. But we can guarantee that for foo=EVL-based IV.


https://github.com/llvm/llvm-project/pull/148828