[llvm] [VPlan] Extract reverse operation for reverse accesses (PR #146525)

Mon Dec 1 09:02:30 PST 2025

================
@@ -2866,28 +2867,42 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
         TypeInfo.inferScalarType(MaxEVL), DebugLoc::getUnknown());
 
     Builder.setInsertPoint(Header, Header->getFirstNonPhi());
-    VPValue *PrevEVL = Builder.createScalarPhi(
-        {MaxEVL, &EVL}, DebugLoc::getUnknown(), "prev.evl");
-
-    for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
-             vp_depth_first_deep(Plan.getVectorLoopRegion()->getEntry()))) {
-      for (VPRecipeBase &R : *VPBB) {
-        VPValue *V1, *V2;
-        if (!match(&R,
-                   m_VPInstruction<VPInstruction::FirstOrderRecurrenceSplice>(
-                       m_VPValue(V1), m_VPValue(V2))))
-          continue;
+    PrevEVL = Builder.createScalarPhi({MaxEVL, &EVL}, DebugLoc::getUnknown(),
+                                      "prev.evl");
+  }
+
+  // Transform the recipes must be converted to vector predication intrinsics
+  // even if they do not use header mask.
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_deep(Plan.getVectorLoopRegion()->getEntry()))) {
+    for (VPRecipeBase &R : *VPBB) {
+      VPWidenIntrinsicRecipe *NewRecipe = nullptr;
+      VPValue *V1, *V2;
+      if (match(&R, m_VPInstruction<VPInstruction::FirstOrderRecurrenceSplice>(
+                        m_VPValue(V1), m_VPValue(V2)))) {
         VPValue *Imm = Plan.getOrAddLiveIn(
             ConstantInt::getSigned(Type::getInt32Ty(Plan.getContext()), -1));
-        VPWidenIntrinsicRecipe *VPSplice = new VPWidenIntrinsicRecipe(
+        NewRecipe = new VPWidenIntrinsicRecipe(
             Intrinsic::experimental_vp_splice,
             {V1, V2, Imm, Plan.getTrue(), PrevEVL, &EVL},
             TypeInfo.inferScalarType(R.getVPSingleValue()), {}, {},
             R.getDebugLoc());
-        VPSplice->insertBefore(&R);
-        R.getVPSingleValue()->replaceAllUsesWith(VPSplice);
-        ToErase.push_back(&R);
       }
+
+      // TODO: Only convert reverse to vp.reverse if it uses the result of
+      // vp.load, or defines the stored value of vp.store.
----------------
lukel97 wrote:

> We could convert reverse accesses into Splice(VPWidenLoadEVLRecipe(VecEndPtr(ptr, evl)), poison, -evl) inside optimizeMaskToEVLRecipes, and rely on the regular reverse rather than vp.reverse.

Yup, this is what I had in mind: https://github.com/llvm/llvm-project/commit/32504676f616a98d3282ef2601550e6ed3e25714

This approach is safer and easier to reason about since the semantics of the VPlan never change. 

> The only concern is that, if we introduce simplification rules that can eliminate the reverse, there will be a temporary performance regression because the reverse access might not be lowered into VPWidenLoadEVLRecipe/VPWidenStoreEVLRecipe. However, correctness should not be affected.

I can work on generalising the transform to be in terms of splices and not reverses to avoid the regression when the reverses are eliminated. I don't think that should block this PR, I'm happy to iterate on this in tree.

Btw I've posted an [RFC to relax the requirements on the splice intrinsic](https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974/3), I will try to push that through.

https://github.com/llvm/llvm-project/pull/146525