[llvm] [AMDGPU] Add test for GCNRegPressure tracker bug (PR #73786)
Valery Pykhtin via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 29 04:15:33 PST 2023
================
@@ -531,3 +531,126 @@ body: |
%1:vgpr_32 = V_MOV_B32_e32 %0, implicit $exec
S_NOP 0, implicit %1
...
+---
+name: movrel
+tracksRegLiveness: true
+body: |
+ ; RPU-LABEL: name: movrel
+ ; RPU: bb.0:
+ ; RPU-NEXT: Live-in:
+ ; RPU-NEXT: SGPR VGPR
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 $sgpr0 = COPY $sgpr1
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 $sgpr2_sgpr3 = S_GETPC_B64
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 $sgpr1 = COPY killed $sgpr3
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 $sgpr0_sgpr1_sgpr2_sgpr3 = S_LOAD_DWORDX4_IMM $sgpr0_sgpr1, 0, 0
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 $sgpr0 = S_BUFFER_LOAD_DWORD_IMM $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 undef %0.sub5:vreg_512 = V_MOV_B32_e32 5, implicit $exec
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 S_CMP_GT_U32 $sgpr0, 15, implicit-def $scc
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 S_CBRANCH_SCC1 %bb.2, implicit $scc
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 S_BRANCH %bb.1
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: Live-out:
+ ; RPU-NEXT: bb.1:
+ ; RPU-NEXT: Live-in:
+ ; RPU-NEXT: SGPR VGPR
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 1 undef %0.sub5:vreg_512 = V_MOV_B32_e32 5, implicit $exec
+ ; RPU-NEXT: 0 1
+ ; RPU-NEXT: 0 1 $m0 = S_MOV_B32 killed $sgpr0
+ ; RPU-NEXT: 0 1
+ ; RPU-NEXT: 0 1 %0:vreg_512 = V_INDIRECT_REG_WRITE_MOVREL_B32_V16 %0:vreg_512(tied-def 0), 42, 3, implicit $m0, implicit $exec
+ ; RPU-NEXT: 0 1
+ ; RPU-NEXT: Live-out: %0:0000000000000C00
+ ; RPU-NEXT: bb.2:
+ ; RPU-NEXT: Live-in: %0:0000000000000C00
+ ; RPU-NEXT: SGPR VGPR
+ ; RPU-NEXT: 0 1
+ ; RPU-NEXT: 0 1 %1:vgpr_32 = V_CVT_F32_UBYTE0_e64 %0.sub5:vreg_512, 0, 0, implicit $exec
----------------
vpykhtin wrote:
We need to agree on how we count such cases.
The problem is that %0 is fully defined by `V_INDIRECT_REG_WRITE_MOVREL_B32_V16` but only _sub5_ of it is used. In general this means that regalloc need to allocate full vreg_512 anyway but the unused lanes can be allocated for other needs though this is not the case here.
This makes tracking more complicated if we start model what regalloc would do. The conservative approach can be to ignore lanes at all after the GCNRewritePartialRegUses is enabled because after this pass is guaranteed we have only fully defined or used registers.
https://github.com/llvm/llvm-project/pull/73786
More information about the llvm-commits
mailing list