[llvm] [VPlan] Add transform to fold early-exit branches into loops (PR #148404)

Mon Jul 14 01:16:49 PDT 2025

================
@@ -7242,6 +7246,10 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
   // Regions are dissolved after optimizing for VF and UF, which completely
   // removes unneeded loop regions first.
   VPlanTransforms::dissolveLoopRegions(BestVPlan);
+
+  if (FoldEarlyExitBranchIntoLoop)
----------------
david-arm wrote:

@arcbbb If you look at the actual code generated for AArch64 CPUs when targeting SVE you'll see that after lowering it does end up with this CFG. For example, if you take one of the existing tests llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll and run the command:

```
./bin/opt -p loop-vectorize,dce,instcombine -mcpu=neoverse-v1 -S < ../llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll | ./bin/llc
```

you'll see the assembly looks like this:

```
.LBB0_5:                                // %vector.body
                                        // =>This Inner Loop Header: Depth=1
        ld1b    { z0.b }, p0/z, [x14, x11]
        ld1b    { z1.b }, p0/z, [x13, x11]
        add     x15, x11, x8
        cmpne   p1.b, p0/z, z0.b, z1.b
        b.ne    .LBB0_7
        cmp     x12, x11
        mov     x11, x15
        b.ne    .LBB0_5
```

so it does end up in the form that you prefer. Have you tried achieving the same result when lowering?

https://github.com/llvm/llvm-project/pull/148404