[PATCH] D96805: [AMDGPU][CostModel] Refine cost model for control-flow instructions.
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 16 15:50:11 PST 2021
rampitec added a comment.
In D96805#2566985 <https://reviews.llvm.org/D96805#2566985>, @dfukalov wrote:
> It seems to me this threshold bump partially compensated by cbr cost increase in all cases of unroll loops with ifs, where it is multiplicated by trip count.
> This threshold bumped because of test/CodeGen/AMDGPU/unroll.ll, where started to fail
>
> ; CHECK-LABEL: @unroll_for_if
> ; CHECK: entry:
> ; CHECK-NEXT: getelementptr
> ; CHECK-NEXT: store
> ; CHECK-NEXT: getelementptr
> ; CHECK-NEXT: store
> ; CHECK-NOT: br
> define amdgpu_kernel void @unroll_for_if(i32 addrspace(5)* %a) {
> entry:
> br label %for.body
> for.body: ; preds = %entry, %for.inc
> %i1 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
> %and = and i32 %i1, 1
> %tobool = icmp eq i32 %and, 0
> br i1 %tobool, label %for.inc, label %if.then
> if.then: ; preds = %for.body
> %0 = sext i32 %i1 to i64
> %arrayidx = getelementptr inbounds i32, i32 addrspace(5)* %a, i64 %0
> store i32 0, i32 addrspace(5)* %arrayidx, align 4
> br label %for.inc
> for.inc: ; preds = %for.body, %if.then
> %inc = add nuw nsw i32 %i1, 1
> %cmp = icmp ult i32 %inc, 48
> br i1 %cmp, label %for.body, label %for.end
>
> for.end: ; preds = %for.cond
> ret void
> }
>
> since cbr code size cost increased (needed increase to 250) plus phi became non-free (+50 to 300).
> Perhaps at this time we should set cbr code size estimation not 4 but 3 (2 exec mask manipulations) and collect more statistics.
Can you do some performance testing please?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D96805/new/
https://reviews.llvm.org/D96805
More information about the llvm-commits
mailing list