[llvm] [AMDGPU][GlobalISel] Add register bank legalization for G_SMIN/G_SMAX/G_UMIN/G_UMAX (PR #159821)
Syadus Sefat via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 30 11:07:17 PDT 2025
================
@@ -2159,7 +2159,7 @@ define i16 @test_vector_reduce_smax_v8i16(<8 x i16> %v) {
; GFX10-GISEL-NEXT: v_pk_max_i16 v0, v0, v2
; GFX10-GISEL-NEXT: v_pk_max_i16 v1, v1, v3
; GFX10-GISEL-NEXT: v_pk_max_i16 v0, v0, v1
-; GFX10-GISEL-NEXT: v_lshrrev_b32_e32 v1, 16, v0
+; GFX10-GISEL-NEXT: v_alignbit_b32 v1, s4, v0, 16
----------------
mssefat wrote:
The regression is coming from G_BUILD_VECTOR, when one of the operands is G_IMPLICIT_DEF.
While legalizing the applyMappingTrivial function converts all source operands to match the destination register bank.
So if we have:
```
%19:sgpr(s16) = G_IMPLICIT_DEF
%10:vgpr(<2 x s16>) = G_BUILD_VECTOR %16:vgpr(s16), %19:sgpr(s16)
```
We get:
```
%19:sgpr(s16) = G_IMPLICIT_DEF
%28:vgpr(s16) = COPY %19:sgpr(s16)
%10:vgpr(<2 x s16>) = G_BUILD_VECTOR %16:vgpr(s16), %28:vgpr(s16)
```
InstructionSelect for G_BUILD_VECTOR:
```
Erasing: %10:vgpr_32(<2 x s16>) = G_BUILD_VECTOR %16:vgpr(s16), %28:vgpr_32(s16)
Created:
%10:vgpr_32(<2 x s16>) = V_ALIGNBIT_B32_opsel_e64 0, %28:vgpr_32(s16), 0, %24:vgpr_32(s32), 0, 16, 0, 0, implicit $exec
```
If we skip converting sgpr to vgpr, we have:
` %10:vgpr(<2 x s16>) = G_BUILD_VECTOR %16:vgpr(s16), %19:sgpr(s16)`
InstructionSelect for G_BUILD_VECTOR:
```
Erasing: %10:vgpr_32(<2 x s16>) = G_BUILD_VECTOR %16:vgpr(s16), %19:sgpr(s16)
Created:
%10:vgpr_32(<2 x s16>) = COPY %16:vgpr(s16)
```
When new-reg-bank-select flag is disabled, we get similar instruction selection:
For:
` %10:vgpr(<2 x s16>) = G_BUILD_VECTOR %16:vgpr(s16), %19:sgpr(s16)`
InstructionSelect for G_BUILD_VECTOR:
`→ Generates COPY %16:vgpr(s16)`
So, I modified the the applyMappingTrivial to skip the conversion from sgpr to vgpr when one of the operands is G_IMPLICIT_DEF.
https://github.com/llvm/llvm-project/pull/159821
More information about the llvm-commits
mailing list