[PATCH] D96805: [AMDGPU][CostModel] Refine cost model for control-flow instructions.

Tue Feb 16 15:50:11 PST 2021

rampitec added a comment.

In D96805#2566985 <https://reviews.llvm.org/D96805#2566985>, @dfukalov wrote:

> It seems to me this threshold bump partially compensated by cbr cost increase in all cases of unroll loops with ifs, where it is multiplicated by trip count.
> This threshold bumped because of test/CodeGen/AMDGPU/unroll.ll, where started to fail
>
>   ; CHECK-LABEL: @unroll_for_if
>   ; CHECK: entry:
>   ; CHECK-NEXT: getelementptr
>   ; CHECK-NEXT: store
>   ; CHECK-NEXT: getelementptr
>   ; CHECK-NEXT: store
>   ; CHECK-NOT: br
>   define amdgpu_kernel void @unroll_for_if(i32 addrspace(5)* %a) {
>   entry:
>     br label %for.body
>   for.body:                                         ; preds = %entry, %for.inc
>     %i1 = phi i32 [ 0, %entry ], [ %inc, %for.inc ]
>     %and = and i32 %i1, 1
>     %tobool = icmp eq i32 %and, 0
>     br i1 %tobool, label %for.inc, label %if.then
>   if.then:                                          ; preds = %for.body
>     %0 = sext i32 %i1 to i64
>     %arrayidx = getelementptr inbounds i32, i32 addrspace(5)* %a, i64 %0
>     store i32 0, i32 addrspace(5)* %arrayidx, align 4
>     br label %for.inc
>   for.inc:                                          ; preds = %for.body, %if.then
>     %inc = add nuw nsw i32 %i1, 1
>     %cmp = icmp ult i32 %inc, 48
>     br i1 %cmp, label %for.body, label %for.end
>   
>   for.end:                                          ; preds = %for.cond
>     ret void
>   }
>
> since cbr code size cost increased (needed increase to 250) plus phi became non-free (+50 to 300).
> Perhaps at this time we should set cbr code size estimation not 4 but 3 (2 exec mask manipulations) and collect more statistics.

Can you do some performance testing please?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96805/new/

https://reviews.llvm.org/D96805