[llvm] [AMDGPU][GlobalISel] Add register bank legalization for G_SMIN/G_SMAX/G_UMIN/G_UMAX (PR #159821)

Tue Sep 30 11:07:17 PDT 2025

================
@@ -2159,7 +2159,7 @@ define i16 @test_vector_reduce_smax_v8i16(<8 x i16> %v) {
 ; GFX10-GISEL-NEXT:    v_pk_max_i16 v0, v0, v2
 ; GFX10-GISEL-NEXT:    v_pk_max_i16 v1, v1, v3
 ; GFX10-GISEL-NEXT:    v_pk_max_i16 v0, v0, v1
-; GFX10-GISEL-NEXT:    v_lshrrev_b32_e32 v1, 16, v0
+; GFX10-GISEL-NEXT:    v_alignbit_b32 v1, s4, v0, 16
----------------
mssefat wrote:

The regression is coming from G_BUILD_VECTOR, when one of the operands is G_IMPLICIT_DEF.
While legalizing the applyMappingTrivial function converts all source operands to match the destination register bank.

So if we have:
```
  %19:sgpr(s16) = G_IMPLICIT_DEF 
  %10:vgpr(<2 x s16>) = G_BUILD_VECTOR %16:vgpr(s16), %19:sgpr(s16)
```

We get:
```
  %19:sgpr(s16) = G_IMPLICIT_DEF 
  %28:vgpr(s16) = COPY %19:sgpr(s16) 
  %10:vgpr(<2 x s16>) = G_BUILD_VECTOR %16:vgpr(s16), %28:vgpr(s16)
```

InstructionSelect for G_BUILD_VECTOR:
```
Erasing:   %10:vgpr_32(<2 x s16>) = G_BUILD_VECTOR %16:vgpr(s16), %28:vgpr_32(s16) 
Created: 
  %10:vgpr_32(<2 x s16>) = V_ALIGNBIT_B32_opsel_e64 0, %28:vgpr_32(s16), 0, %24:vgpr_32(s32), 0, 16, 0, 0, implicit $exec 
```

If we skip converting sgpr to vgpr, we have:
`  %10:vgpr(<2 x s16>) = G_BUILD_VECTOR %16:vgpr(s16), %19:sgpr(s16)`

InstructionSelect for G_BUILD_VECTOR:
```
Erasing:   %10:vgpr_32(<2 x s16>) = G_BUILD_VECTOR %16:vgpr(s16), %19:sgpr(s16) 
Created: 
  %10:vgpr_32(<2 x s16>) = COPY %16:vgpr(s16) 
```


When new-reg-bank-select flag is disabled, we get similar instruction selection:
For: 
`  %10:vgpr(<2 x s16>) = G_BUILD_VECTOR %16:vgpr(s16), %19:sgpr(s16)`

InstructionSelect for G_BUILD_VECTOR:
  `→ Generates COPY %16:vgpr(s16)`

So, I modified the the applyMappingTrivial to skip the conversion from sgpr to vgpr when one of the operands is G_IMPLICIT_DEF.

https://github.com/llvm/llvm-project/pull/159821