[llvm] [VPlan] Extract reverse operation for reverse accesses (PR #146525)

Mon Dec 1 05:42:24 PST 2025

================
@@ -2866,28 +2867,42 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
         TypeInfo.inferScalarType(MaxEVL), DebugLoc::getUnknown());
 
     Builder.setInsertPoint(Header, Header->getFirstNonPhi());
-    VPValue *PrevEVL = Builder.createScalarPhi(
-        {MaxEVL, &EVL}, DebugLoc::getUnknown(), "prev.evl");
-
-    for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
-             vp_depth_first_deep(Plan.getVectorLoopRegion()->getEntry()))) {
-      for (VPRecipeBase &R : *VPBB) {
-        VPValue *V1, *V2;
-        if (!match(&R,
-                   m_VPInstruction<VPInstruction::FirstOrderRecurrenceSplice>(
-                       m_VPValue(V1), m_VPValue(V2))))
-          continue;
+    PrevEVL = Builder.createScalarPhi({MaxEVL, &EVL}, DebugLoc::getUnknown(),
+                                      "prev.evl");
+  }
+
+  // Transform the recipes must be converted to vector predication intrinsics
+  // even if they do not use header mask.
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_deep(Plan.getVectorLoopRegion()->getEntry()))) {
+    for (VPRecipeBase &R : *VPBB) {
+      VPWidenIntrinsicRecipe *NewRecipe = nullptr;
+      VPValue *V1, *V2;
+      if (match(&R, m_VPInstruction<VPInstruction::FirstOrderRecurrenceSplice>(
+                        m_VPValue(V1), m_VPValue(V2)))) {
         VPValue *Imm = Plan.getOrAddLiveIn(
             ConstantInt::getSigned(Type::getInt32Ty(Plan.getContext()), -1));
-        VPWidenIntrinsicRecipe *VPSplice = new VPWidenIntrinsicRecipe(
+        NewRecipe = new VPWidenIntrinsicRecipe(
             Intrinsic::experimental_vp_splice,
             {V1, V2, Imm, Plan.getTrue(), PrevEVL, &EVL},
             TypeInfo.inferScalarType(R.getVPSingleValue()), {}, {},
             R.getDebugLoc());
-        VPSplice->insertBefore(&R);
-        R.getVPSingleValue()->replaceAllUsesWith(VPSplice);
-        ToErase.push_back(&R);
       }
+
+      // TODO: Only convert reverse to vp.reverse if it uses the result of
+      // vp.load, or defines the stored value of vp.store.
----------------
lukel97 wrote:

> I think that's kind-of already the case, right? Until all recipes are converted, the intermediate VPlan may be partially incorrect.

Only the FOR recipes need to be converted, everything else in optimizeMaskToEVL is just an optimisation to replace the header mask with VP intrinsics. We can skip optimizeMaskToEVL and the VPlan will still be correct, because the masked recipes still have the same semantics.

The fact that we're intertwining the transformation to a variably stepping IV with optimisations makes the EVL transform harder to reason about. This is what https://github.com/llvm/llvm-project/pull/166164 aims to fix by moving out optimizeMaskToEVL. 

> I would like to avoid correctness to depend on the exact position of the reverse.

I'm with you here, but extracting the reverse from VPWidenLoadRecipe doesn't actually require us to "fix up" anything in the EVL plan for correctness, which is why I find this part of the code confusing.

It's just that the existing optimization becomes incorrect because the semantics of VPWidenLoadRecipe have changed, which we need to adjust in optimizeMaskToEVL. I'll see if I can share a branch that shows what I mean better.

https://github.com/llvm/llvm-project/pull/146525