[PATCH] D96805: [AMDGPU][CostModel] Refine cost model for control-flow instructions.
Daniil Fukalov via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 16 15:38:04 PST 2021
dfukalov added a comment.
It seems to me this threshold bump partially compensated by cbr cost increase in all cases of unroll loops with ifs, where it is multiplicated by trip count.
This threshold bumped because of test/CodeGen/AMDGPU/unroll.ll, where started to fail
; CHECK-LABEL: @unroll_for_if
; CHECK: entry:
; CHECK-NEXT: getelementptr
; CHECK-NEXT: store
; CHECK-NEXT: getelementptr
; CHECK-NEXT: store
; CHECK-NOT: br
define amdgpu_kernel void @unroll_for_if(i32 addrspace(5)* %a) {
entry:
br label %for.body
for.body: ; preds = %entry, %for.inc
%i1 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
%and = and i32 %i1, 1
%tobool = icmp eq i32 %and, 0
br i1 %tobool, label %for.inc, label %if.then
if.then: ; preds = %for.body
%0 = sext i32 %i1 to i64
%arrayidx = getelementptr inbounds i32, i32 addrspace(5)* %a, i64 %0
store i32 0, i32 addrspace(5)* %arrayidx, align 4
br label %for.inc
for.inc: ; preds = %for.body, %if.then
%inc = add nuw nsw i32 %i1, 1
%cmp = icmp ult i32 %inc, 48
br i1 %cmp, label %for.body, label %for.end
for.end: ; preds = %for.cond
ret void
}
since cbr code size cost increased (needed increase to 250) plus phi became non-free (+50 to 300).
Perhaps at this time we should set cbr code size estimation not 4 but 3 (2 exec mask manipulations) and collect more statistics.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D96805/new/
https://reviews.llvm.org/D96805
More information about the llvm-commits
mailing list