[llvm] [LV] Support strided load with a stride of -1 (PR #128718)

Tue May 13 07:15:30 PDT 2025

================
@@ -1115,6 +1115,7 @@ class LoopVectorizationCostModel {
     CM_Widen_Reverse, // For consecutive accesses with stride -1.
     CM_Interleave,
     CM_GatherScatter,
+    CM_Strided,
----------------
Mel-Chen wrote:

Thanks everyone for the feedback and suggestions!

However, I ran into an issue at the very first step of trying to remove CM_Strided :(
All reverse accesses with EVL tail folding are no longer vectorized after removing the function `collectLoopUniforms` update:
```
@@ -3412,9 +3412,9 @@ void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF)
     if (IsUniformMemOpUse(I))
       return true;

-    return (
-        WideningDecision == CM_Widen || WideningDecision == CM_Widen_Reverse ||
-        WideningDecision == CM_Strided || WideningDecision == CM_Interleave);
+    return (WideningDecision == CM_Widen ||
+            WideningDecision == CM_Widen_Reverse ||
+            WideningDecision == CM_Interleave);
   };

   // Returns true if Ptr is the pointer operand of a memory access instruction
```
The root cause seems to be that the GEP is now considered as needing to be widened, which blocks the derived induction variable. 
* The VPlan **before** we remove the update in `collectLoopUniforms`:
```
<x1> vector loop: {
  vector.body:
    EMIT vp<%6> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
    ir<%add.phi> = WIDEN-INDUCTION  ir<%startval>, ir<-1>, vp<%0>
    ir<%i> = WIDEN-INDUCTION  ir<0>, ir<1>, vp<%0>
    EMIT vp<%7> = WIDEN-CANONICAL-INDUCTION vp<%6>
    EMIT vp<%8> = icmp ule vp<%7>, vp<%3>
    CLONE ir<%add> = add ir<%add.phi>, ir<-1>
    CLONE ir<%gepl> = getelementptr inbounds ir<%ptr>, ir<%add>
    vp<%9> = vector-pointer ir<%gepl>
    WIDEN ir<%tmp> = load vp<%9>, stride = ir<-4>, runtimeVF = vp<%0>
```
* The VPlan **after** we remove the update in `collectLoopUniforms`:
```
<x1> vector loop: {
  vector.body:
    EMIT vp<%6> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
    ir<%add.phi> = WIDEN-INDUCTION  ir<%startval>, ir<-1>, vp<%0>
    ir<%i> = WIDEN-INDUCTION  ir<0>, ir<1>, vp<%0>
    EMIT vp<%7> = WIDEN-CANONICAL-INDUCTION vp<%6>
    EMIT vp<%8> = icmp ule vp<%7>, vp<%3>
    WIDEN ir<%add> = add ir<%add.phi>, ir<-1>
    WIDEN-GEP Inv[Var] ir<%gepl> = getelementptr inbounds ir<%ptr>, ir<%add>
    vp<%9> = vector-pointer ir<%gepl>
    WIDEN ir<%tmp> = load vp<%9>, stride = ir<-4>, runtimeVF = vp<%0>
```
This leads to the EVL VPlan bailing out due to the failure in converting the widened induction variable.

@fhahn Do we currently have a VPlan-based scalarization transformation? If not, do we need to install one before running `legalizeAndOptimizeInductions`?

https://github.com/llvm/llvm-project/pull/128718