[llvm] [VPlan] Add transform to fold early-exit branches into loops (PR #148404)
David Sherwood via llvm-commits
llvm-commits at lists.llvm.org
Mon Jul 14 01:16:49 PDT 2025
================
@@ -7242,6 +7246,10 @@ DenseMap<const SCEV *, Value *> LoopVectorizationPlanner::executePlan(
// Regions are dissolved after optimizing for VF and UF, which completely
// removes unneeded loop regions first.
VPlanTransforms::dissolveLoopRegions(BestVPlan);
+
+ if (FoldEarlyExitBranchIntoLoop)
----------------
david-arm wrote:
@arcbbb If you look at the actual code generated for AArch64 CPUs when targeting SVE you'll see that after lowering it does end up with this CFG. For example, if you take one of the existing tests llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll and run the command:
```
./bin/opt -p loop-vectorize,dce,instcombine -mcpu=neoverse-v1 -S < ../llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll | ./bin/llc
```
you'll see the assembly looks like this:
```
.LBB0_5: // %vector.body
// =>This Inner Loop Header: Depth=1
ld1b { z0.b }, p0/z, [x14, x11]
ld1b { z1.b }, p0/z, [x13, x11]
add x15, x11, x8
cmpne p1.b, p0/z, z0.b, z1.b
b.ne .LBB0_7
cmp x12, x11
mov x11, x15
b.ne .LBB0_5
```
so it does end up in the form that you prefer. Have you tried achieving the same result when lowering?
https://github.com/llvm/llvm-project/pull/148404
More information about the llvm-commits
mailing list