[llvm] [lsr][term-fold] Restrict expansion budget for profiled loops (PR #74747)
Nikita Popov via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 14 07:22:29 PST 2023
nikic wrote:
> > Just to double check, do you see regressions if you just use the standard isHighCostExpansion() check? And that's why you are trying to use this more conservative profile count based heuristic instead?
>
> Yes, if I use the standard SCEV threshold. I can of course use a larger single threshold if you'd rather, but then this patch is largely pointless as we can't generate anything worse than about 8 instructions from the non-constant * non-constant case no matter what we do.
I don't think this is true. BECounts can get quite complex. To give a very simple variation on one of your tests:
```llvm
define void @expensive_expand_unknown_tc(ptr %a, i32 %offset, i32 %n, i32 %step) mustprogress {
; CHECK-LABEL: @expensive_expand_unknown_tc(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
; CHECK-NEXT: [[SMAX:%.*]] = call i32 @llvm.smax.i32(i32 [[STEP:%.*]], i32 [[N:%.*]])
; CHECK-NEXT: [[TMP0:%.*]] = sub i32 [[SMAX]], [[STEP]]
; CHECK-NEXT: [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 1)
; CHECK-NEXT: [[TMP1:%.*]] = sub i32 [[TMP0]], [[UMIN]]
; CHECK-NEXT: [[UMAX:%.*]] = call i32 @llvm.umax.i32(i32 [[STEP]], i32 1)
; CHECK-NEXT: [[TMP2:%.*]] = udiv i32 [[TMP1]], [[UMAX]]
; CHECK-NEXT: [[TMP3:%.*]] = add i32 [[UMIN]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = zext i32 [[TMP3]] to i64
; CHECK-NEXT: [[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], 1
; CHECK-NEXT: [[TMP6:%.*]] = sext i32 [[OFFSET:%.*]] to i64
; CHECK-NEXT: [[TMP7:%.*]] = mul i64 [[TMP5]], [[TMP6]]
; CHECK-NEXT: [[TMP8:%.*]] = add nsw i64 [[TMP7]], 84
; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP8]]
; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:
; CHECK-NEXT: [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[UGLYGEP]], [[ENTRY:%.*]] ]
; CHECK-NEXT: store i32 1, ptr [[LSR_IV1]], align 4
; CHECK-NEXT: [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i32 [[OFFSET]]
; CHECK-NEXT: [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND:%.*]] = icmp eq ptr [[UGLYGEP2]], [[SCEVGEP]]
; CHECK-NEXT: br i1 [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
; CHECK: for.end:
; CHECK-NEXT: ret void
;
entry:
%uglygep = getelementptr i8, ptr %a, i64 84
br label %for.body
for.body: ; preds = %for.body, %entry
%lsr.iv1 = phi ptr [ %uglygep2, %for.body ], [ %uglygep, %entry ]
%lsr.iv = phi i32 [ %lsr.iv.next, %for.body ], [ 0, %entry ]
store i32 1, ptr %lsr.iv1, align 4
%lsr.iv.next = add nsw i32 %lsr.iv, %step
%uglygep2 = getelementptr i8, ptr %lsr.iv1, i32 %offset
%exitcond.not = icmp sge i32 %lsr.iv.next, %n
br i1 %exitcond.not, label %for.end, label %for.body
for.end: ; preds = %for.body
ret void
}
```
This adds 14 instructions including 3 min/max operations which likely have higher cost.
https://github.com/llvm/llvm-project/pull/74747
More information about the llvm-commits
mailing list