[llvm] [AMDGPU] Update cost model gfx950 min/max tests. NFC. (PR #139310)
Stanislav Mekhanoshin via llvm-commits
llvm-commits at lists.llvm.org
Fri May 9 12:56:33 PDT 2025
================
@@ -35,6 +36,15 @@ define void @maximum_f16() {
; SLOWF64-NEXT: Cost Model: Found an estimated cost of 176 for instruction: %v16f16 = call <16 x half> @llvm.maximum.v16f16(<16 x half> undef, <16 x half> undef)
; SLOWF64-NEXT: Cost Model: Found an estimated cost of 10 for instruction: ret void
;
+; GFX950-SIZE-LABEL: 'maximum_f16'
+; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %f16 = call half @llvm.maximum.f16(half undef, half undef)
+; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2f16 = call <2 x half> @llvm.maximum.v2f16(<2 x half> undef, <2 x half> undef)
+; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v3f16 = call <3 x half> @llvm.maximum.v3f16(<3 x half> undef, <3 x half> undef)
+; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v4f16 = call <4 x half> @llvm.maximum.v4f16(<4 x half> undef, <4 x half> undef)
+; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v8f16 = call <8 x half> @llvm.maximum.v8f16(<8 x half> undef, <8 x half> undef)
+; GFX950-SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v16f16 = call <16 x half> @llvm.maximum.v16f16(<16 x half> undef, <16 x half> undef)
----------------
rampitec wrote:
That is because fmaximum.f16 is custom and fmaximum.v2f16 is legal. Generic code multiplies cost by 2 for custom operations. And that is no so wrong... Our tests:
```
define <2 x half> @v_maximum_v2f16(<2 x half> %src0, <2 x half> %src1) {
; GFX950-LABEL: v_maximum_v2f16:
; GFX950: ; %bb.0:
; GFX950-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX950-NEXT: v_pk_maximum3_f16 v0, v0, v1, v1
; GFX950-NEXT: s_setpc_b64 s[30:31]
%op = call <2 x half> @llvm.maximum.v2f16(<2 x half> %src0, <2 x half> %src1)
ret <2 x half> %op
}
define half @v_maximum_f16(half %src0, half %src1) {
; GFX950-LABEL: v_maximum_f16:
; GFX950: ; %bb.0:
; GFX950-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX950-NEXT: v_max_f16_e32 v2, v0, v1
; GFX950-NEXT: v_mov_b32_e32 v3, 0x7e00
; GFX950-NEXT: v_cmp_o_f16_e32 vcc, v0, v1
; GFX950-NEXT: s_nop 1
; GFX950-NEXT: v_cndmask_b32_e32 v0, v3, v2, vcc
; GFX950-NEXT: s_setpc_b64 s[30:31]
%op = call half @llvm.maximum.f16(half %src0, half %src1)
ret half %op
}
```
I'd say the cost shall be 4 here, not even 2.
v8 and v16 cases are wrong of course, it shall be 4 and 8 respectively, but we do not handle it anywhere ourselves.
https://github.com/llvm/llvm-project/pull/139310
More information about the llvm-commits
mailing list