[llvm] [VPlan] Convert EVL loops to variable-length stepping after dissolution (PR #147222)

Mon Jul 21 03:08:00 PDT 2025

https://github.com/lukel97 approved this pull request.

LGTM with the nits I posted earlier + Florian's comments. I tested this on TSVC too and most loops are unchanged, but there's a couple of places where we actually end up improving the loop e.g. in Reductions-dbl:

```diff
+       sub     s1, s11, a6
+       sh2add  a4, a6, s2
+       slli    a0, a6, 10
 .LBB14_7:                               # %vector.body
                                         #   Parent Loop BB14_3 Depth=1
                                         #     Parent Loop BB14_5 Depth=2
                                         # =>    This Inner Loop Header: Depth=3
-       add     a0, a5, a2
+       sub     a3, s1, a2
        add     a1, s0, a2
-       add     s1, a3, s6
-       flw     fa5, 0(a6)
-       sub     a0, s8, a0
-       sh2add  a1, a1, s3
-       vsetvli a0, a0, e32, m2, ta, ma
-       add     s1, s1, a1
-       vle32.v v8, (s1)
-       vle32.v v10, (a1)
-       sub     a4, a4, s2
-       vfnmsac.vf      v10, fa5, v8
-       vse32.v v10, (a1)
-       add     a2, a2, a0
-       bnez    a4, .LBB14_7
+       add     a5, a0, s6
+       flw     fa5, 0(a4)
+       vsetvli a3, a3, e32, m2, ta, ma
+       sh2add  a1, a1, s2
+       add     a5, a5, a1
+       vle32.v v8, (a1)
+       vle32.v v10, (a5)
+       vfnmsac.vf      v8, fa5, v10
+       add     a2, a2, a3
+       vse32.v v8, (a1)
+       bne     a2, s1, .LBB14_7
        j       .LBB14_4
```

https://github.com/llvm/llvm-project/pull/147222