[llvm] [AMDGPU] Update cost model gfx950 min/max tests. NFC. (PR #139310)

Fri May 9 12:56:33 PDT 2025

================
@@ -35,6 +36,15 @@ define void @maximum_f16() {
 ; SLOWF64-NEXT:  Cost Model: Found an estimated cost of 176 for instruction: %v16f16 = call <16 x half> @llvm.maximum.v16f16(<16 x half> undef, <16 x half> undef)
 ; SLOWF64-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: ret void
 ;
+; GFX950-SIZE-LABEL: 'maximum_f16'
+; GFX950-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %f16 = call half @llvm.maximum.f16(half undef, half undef)
+; GFX950-SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v2f16 = call <2 x half> @llvm.maximum.v2f16(<2 x half> undef, <2 x half> undef)
+; GFX950-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v3f16 = call <3 x half> @llvm.maximum.v3f16(<3 x half> undef, <3 x half> undef)
+; GFX950-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4f16 = call <4 x half> @llvm.maximum.v4f16(<4 x half> undef, <4 x half> undef)
+; GFX950-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v8f16 = call <8 x half> @llvm.maximum.v8f16(<8 x half> undef, <8 x half> undef)
+; GFX950-SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v16f16 = call <16 x half> @llvm.maximum.v16f16(<16 x half> undef, <16 x half> undef)
----------------
rampitec wrote:

That is because fmaximum.f16 is custom and fmaximum.v2f16 is legal. Generic code multiplies cost by 2 for custom operations. And that is no so wrong... Our tests:

```
define <2 x half> @v_maximum_v2f16(<2 x half> %src0, <2 x half> %src1) {
; GFX950-LABEL: v_maximum_v2f16:
; GFX950:       ; %bb.0:
; GFX950-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX950-NEXT:    v_pk_maximum3_f16 v0, v0, v1, v1
; GFX950-NEXT:    s_setpc_b64 s[30:31]

  %op = call <2 x half> @llvm.maximum.v2f16(<2 x half> %src0, <2 x half> %src1)
  ret <2 x half> %op
}

define half @v_maximum_f16(half %src0, half %src1) {
; GFX950-LABEL: v_maximum_f16:
; GFX950:       ; %bb.0:
; GFX950-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX950-NEXT:    v_max_f16_e32 v2, v0, v1
; GFX950-NEXT:    v_mov_b32_e32 v3, 0x7e00
; GFX950-NEXT:    v_cmp_o_f16_e32 vcc, v0, v1
; GFX950-NEXT:    s_nop 1
; GFX950-NEXT:    v_cndmask_b32_e32 v0, v3, v2, vcc
; GFX950-NEXT:    s_setpc_b64 s[30:31]
  %op = call half @llvm.maximum.f16(half %src0, half %src1)
  ret half %op
}
```
I'd say the cost shall be 4 here, not even 2.

v8 and v16 cases are wrong of course, it shall be 4 and 8 respectively, but we do not handle it anywhere ourselves.

https://github.com/llvm/llvm-project/pull/139310