[llvm] [lsr][term-fold] Restrict expansion budget for profiled loops (PR #74747)

Thu Dec 14 07:22:29 PST 2023

nikic wrote:

> > Just to double check, do you see regressions if you just use the standard isHighCostExpansion() check? And that's why you are trying to use this more conservative profile count based heuristic instead?
> 
> Yes, if I use the standard SCEV threshold. I can of course use a larger single threshold if you'd rather, but then this patch is largely pointless as we can't generate anything worse than about 8 instructions from the non-constant * non-constant case no matter what we do.

I don't think this is true. BECounts can get quite complex. To give a very simple variation on one of your tests:

```llvm
define void @expensive_expand_unknown_tc(ptr %a, i32 %offset, i32 %n, i32 %step) mustprogress {
; CHECK-LABEL: @expensive_expand_unknown_tc(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    [[UGLYGEP:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 84
; CHECK-NEXT:    [[SMAX:%.*]] = call i32 @llvm.smax.i32(i32 [[STEP:%.*]], i32 [[N:%.*]])
; CHECK-NEXT:    [[TMP0:%.*]] = sub i32 [[SMAX]], [[STEP]]
; CHECK-NEXT:    [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 1)
; CHECK-NEXT:    [[TMP1:%.*]] = sub i32 [[TMP0]], [[UMIN]]
; CHECK-NEXT:    [[UMAX:%.*]] = call i32 @llvm.umax.i32(i32 [[STEP]], i32 1)
; CHECK-NEXT:    [[TMP2:%.*]] = udiv i32 [[TMP1]], [[UMAX]]
; CHECK-NEXT:    [[TMP3:%.*]] = add i32 [[UMIN]], [[TMP2]]
; CHECK-NEXT:    [[TMP4:%.*]] = zext i32 [[TMP3]] to i64
; CHECK-NEXT:    [[TMP5:%.*]] = add nuw nsw i64 [[TMP4]], 1
; CHECK-NEXT:    [[TMP6:%.*]] = sext i32 [[OFFSET:%.*]] to i64
; CHECK-NEXT:    [[TMP7:%.*]] = mul i64 [[TMP5]], [[TMP6]]
; CHECK-NEXT:    [[TMP8:%.*]] = add nsw i64 [[TMP7]], 84
; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr [[A]], i64 [[TMP8]]
; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
; CHECK:       for.body:
; CHECK-NEXT:    [[LSR_IV1:%.*]] = phi ptr [ [[UGLYGEP2:%.*]], [[FOR_BODY]] ], [ [[UGLYGEP]], [[ENTRY:%.*]] ]
; CHECK-NEXT:    store i32 1, ptr [[LSR_IV1]], align 4
; CHECK-NEXT:    [[UGLYGEP2]] = getelementptr i8, ptr [[LSR_IV1]], i32 [[OFFSET]]
; CHECK-NEXT:    [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND:%.*]] = icmp eq ptr [[UGLYGEP2]], [[SCEVGEP]]
; CHECK-NEXT:    br i1 [[LSR_FOLD_TERM_COND_REPLACED_TERM_COND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
; CHECK:       for.end:
; CHECK-NEXT:    ret void
;
entry:
  %uglygep = getelementptr i8, ptr %a, i64 84
  br label %for.body

for.body:                                         ; preds = %for.body, %entry
  %lsr.iv1 = phi ptr [ %uglygep2, %for.body ], [ %uglygep, %entry ]
  %lsr.iv = phi i32 [ %lsr.iv.next, %for.body ], [ 0, %entry ]
  store i32 1, ptr %lsr.iv1, align 4
  %lsr.iv.next = add nsw i32 %lsr.iv, %step
  %uglygep2 = getelementptr i8, ptr %lsr.iv1, i32 %offset
  %exitcond.not = icmp sge i32 %lsr.iv.next, %n
  br i1 %exitcond.not, label %for.end, label %for.body

for.end:                                          ; preds = %for.body
  ret void
}
```

This adds 14 instructions including 3 min/max operations which likely have higher cost.

https://github.com/llvm/llvm-project/pull/74747