[llvm] [VPlan] Don't use the legacy cost model for loop conditions (PR #156864)
John Brawn via llvm-commits
llvm-commits at lists.llvm.org
Thu Sep 11 05:33:23 PDT 2025
================
@@ -174,16 +173,34 @@ attributes #0 = { "target-cpu"="knl" }
define void @PR40816() #1 {
; CHECK-LABEL: define void @PR40816(
; CHECK-SAME: ) #[[ATTR1:[0-9]+]] {
-; CHECK-NEXT: [[ENTRY:.*]]:
-; CHECK-NEXT: br label %[[FOR_BODY:.*]]
-; CHECK: [[FOR_BODY]]:
-; CHECK-NEXT: [[TMP0:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[INC:%.*]], %[[FOR_BODY]] ]
-; CHECK-NEXT: store i32 [[TMP0]], ptr @b, align 1
-; CHECK-NEXT: [[CMP2:%.*]] = icmp eq i32 [[TMP0]], 2
-; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[TMP0]], 1
-; CHECK-NEXT: br i1 [[CMP2]], label %[[RETURN:.*]], label %[[FOR_BODY]]
-; CHECK: [[RETURN]]:
-; CHECK-NEXT: ret void
+; CHECK-NEXT: [[ENTRY:.*:]]
----------------
john-brawn-arm wrote:
It looks like what's going on here is:
- Currently the load from arrayidx is considered to be part of calculating the loop exit condition, and so the cost is calculated in LoopVectorizationPlanner::precomputeCosts. It gets a very high cost due to useEmulatedMaskMemRefHack so we don't vectorize.
- In the vplan something has figured out that the loop has a constant trip count due to the load being from a constant array, so the load has been removed.
- With this patch that means we don't use the cost of the load, as it no longer exists, and the resulting cost says that vectorization is profitable.
If I manually transform the function into what it is after the vplan transformation then it looks like
```
define void @PR40816_adj() #1 {
entry:
br label %for.body
for.body:
%0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
store i32 %0, ptr @b, align 1
%inc = add nuw nsw i32 %0, 1
%cmp = icmp uge i32 %inc, 7
br i1 %cmp, label %return, label %for.body
return:
ret void
}
```
and this currently gets vectorized. This is in fact very similar to the test low_trip_count_fold_tail_scalarized_store in llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll.
I think this ultimately comes down to this FIXME in LoopVectorizationCostModel::setCostBasedWideningDecision
```
// Load: Scalar load + broadcast
// Store: Scalar store + isLoopInvariantStoreValue ? 0 : extract
// FIXME: This cost is a significant under-estimate for tail folded
// memory ops.
const InstructionCost ScalarizationCost =
IsLegalToScalarize() ? getUniformMemOpCost(&I, VF)
: InstructionCost::getInvalid();
```
https://github.com/llvm/llvm-project/pull/156864
More information about the llvm-commits
mailing list