[llvm] [VPlan] Extract reverse operation for reverse accesses (PR #146525)

Mon Dec 1 02:54:38 PST 2025

================
@@ -2866,28 +2867,42 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
         TypeInfo.inferScalarType(MaxEVL), DebugLoc::getUnknown());
 
     Builder.setInsertPoint(Header, Header->getFirstNonPhi());
-    VPValue *PrevEVL = Builder.createScalarPhi(
-        {MaxEVL, &EVL}, DebugLoc::getUnknown(), "prev.evl");
-
-    for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
-             vp_depth_first_deep(Plan.getVectorLoopRegion()->getEntry()))) {
-      for (VPRecipeBase &R : *VPBB) {
-        VPValue *V1, *V2;
-        if (!match(&R,
-                   m_VPInstruction<VPInstruction::FirstOrderRecurrenceSplice>(
-                       m_VPValue(V1), m_VPValue(V2))))
-          continue;
+    PrevEVL = Builder.createScalarPhi({MaxEVL, &EVL}, DebugLoc::getUnknown(),
+                                      "prev.evl");
+  }
+
+  // Transform the recipes must be converted to vector predication intrinsics
+  // even if they do not use header mask.
+  for (VPBasicBlock *VPBB : VPBlockUtils::blocksOnly<VPBasicBlock>(
+           vp_depth_first_deep(Plan.getVectorLoopRegion()->getEntry()))) {
+    for (VPRecipeBase &R : *VPBB) {
+      VPWidenIntrinsicRecipe *NewRecipe = nullptr;
+      VPValue *V1, *V2;
+      if (match(&R, m_VPInstruction<VPInstruction::FirstOrderRecurrenceSplice>(
+                        m_VPValue(V1), m_VPValue(V2)))) {
         VPValue *Imm = Plan.getOrAddLiveIn(
             ConstantInt::getSigned(Type::getInt32Ty(Plan.getContext()), -1));
-        VPWidenIntrinsicRecipe *VPSplice = new VPWidenIntrinsicRecipe(
+        NewRecipe = new VPWidenIntrinsicRecipe(
             Intrinsic::experimental_vp_splice,
             {V1, V2, Imm, Plan.getTrue(), PrevEVL, &EVL},
             TypeInfo.inferScalarType(R.getVPSingleValue()), {}, {},
             R.getDebugLoc());
-        VPSplice->insertBefore(&R);
-        R.getVPSingleValue()->replaceAllUsesWith(VPSplice);
-        ToErase.push_back(&R);
       }
+
+      // TODO: Only convert reverse to vp.reverse if it uses the result of
+      // vp.load, or defines the stored value of vp.store.
----------------
lukel97 wrote:

Unconditionally replacing all the reverses to vp.reverses here means that optimizeMaskToEVL is no longer correct:

```c++
  if (match(&CurRecipe,
            m_MaskedLoad(m_VPValue(EndPtr), m_RemoveMask(HeaderMask, Mask))) &&
      match(EndPtr, m_VecEndPtr(m_VPValue(Addr), m_Specific(&Plan->getVF()))) &&
      cast<VPWidenLoadRecipe>(CurRecipe).isReverse())
    return new VPWidenLoadEVLRecipe(cast<VPWidenLoadRecipe>(CurRecipe),
                                    AdjustEndPtr(EndPtr), EVL, Mask);
```

Previously if we had a masked load like

```
headerMask: 11110000
VPWidenLoadRecipe, reverse=true: xxxxabcd
```

We would now generate

```
VPWidenLoadEVLRecipe, EVL=4, reverse=true: abcdxxxx
```

I.e. the elements aren't shifted. The test diff in this PR happens to be fine because we then unconditionally replace all reverses with vp.reverse. 

I'd really like to avoid unconditionally replacing recipes as it makes the EVL transformation error prone and hard to follow, and this would undo what #155394 fixed. We shouldn't need to transform any recipes asides from those that use the IV step.

Can we do the reverse -> vp.reverse transform in optimizeMaskToEVLRecipes alongside the load/store transforms instead?

The issues you mentioned in https://github.com/llvm/llvm-project/pull/146525#discussion_r2502220707 can be fixed when we go to implement the permutation elimination by rewriting the transform in terms of slides instead of reverses.

https://github.com/llvm/llvm-project/pull/146525