[llvm] [AMDGPU] Add test for GCNRegPressure tracker bug (PR #73786)

Valery Pykhtin via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 29 04:15:33 PST 2023


================
@@ -531,3 +531,126 @@ body:             |
     %1:vgpr_32 = V_MOV_B32_e32 %0, implicit $exec
     S_NOP 0, implicit %1
 ...
+---
+name: movrel
+tracksRegLiveness: true
+body: |
+  ; RPU-LABEL: name: movrel
+  ; RPU: bb.0:
+  ; RPU-NEXT:   Live-in:
+  ; RPU-NEXT:   SGPR  VGPR
+  ; RPU-NEXT:   0     0
+  ; RPU-NEXT:   0     0      $sgpr0 = COPY $sgpr1
+  ; RPU-NEXT:   0     0
+  ; RPU-NEXT:   0     0      $sgpr2_sgpr3 = S_GETPC_B64
+  ; RPU-NEXT:   0     0
+  ; RPU-NEXT:   0     0      $sgpr1 = COPY killed $sgpr3
+  ; RPU-NEXT:   0     0
+  ; RPU-NEXT:   0     0      $sgpr0_sgpr1_sgpr2_sgpr3 = S_LOAD_DWORDX4_IMM $sgpr0_sgpr1, 0, 0
+  ; RPU-NEXT:   0     0
+  ; RPU-NEXT:   0     0      $sgpr0 = S_BUFFER_LOAD_DWORD_IMM $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0
+  ; RPU-NEXT:   0     0
+  ; RPU-NEXT:   0     0      undef %0.sub5:vreg_512 = V_MOV_B32_e32 5, implicit $exec
+  ; RPU-NEXT:   0     0
+  ; RPU-NEXT:   0     0      S_CMP_GT_U32 $sgpr0, 15, implicit-def $scc
+  ; RPU-NEXT:   0     0
+  ; RPU-NEXT:   0     0      S_CBRANCH_SCC1 %bb.2, implicit $scc
+  ; RPU-NEXT:   0     0
+  ; RPU-NEXT:   0     0      S_BRANCH %bb.1
+  ; RPU-NEXT:   0     0
+  ; RPU-NEXT:   Live-out:
+  ; RPU-NEXT: bb.1:
+  ; RPU-NEXT:   Live-in:
+  ; RPU-NEXT:   SGPR  VGPR
+  ; RPU-NEXT:   0     0
+  ; RPU-NEXT:   0     1      undef %0.sub5:vreg_512 = V_MOV_B32_e32 5, implicit $exec
+  ; RPU-NEXT:   0     1
+  ; RPU-NEXT:   0     1      $m0 = S_MOV_B32 killed $sgpr0
+  ; RPU-NEXT:   0     1
+  ; RPU-NEXT:   0     1      %0:vreg_512 = V_INDIRECT_REG_WRITE_MOVREL_B32_V16 %0:vreg_512(tied-def 0), 42, 3, implicit $m0, implicit $exec
+  ; RPU-NEXT:   0     1
+  ; RPU-NEXT:   Live-out: %0:0000000000000C00
+  ; RPU-NEXT: bb.2:
+  ; RPU-NEXT:   Live-in:  %0:0000000000000C00
+  ; RPU-NEXT:   SGPR  VGPR
+  ; RPU-NEXT:   0     1
+  ; RPU-NEXT:   0     1      %1:vgpr_32 = V_CVT_F32_UBYTE0_e64 %0.sub5:vreg_512, 0, 0, implicit $exec
----------------
vpykhtin wrote:

We need to agree on how we count such cases.

The problem is that %0 is fully defined by `V_INDIRECT_REG_WRITE_MOVREL_B32_V16` but only _sub5_ of it is used. In general this means that regalloc need to allocate full vreg_512 anyway but the unused lanes can be allocated for other needs though this is not the case here. 

This makes tracking more complicated if we start model what regalloc would do. The conservative approach can be to ignore lanes at all after the GCNRewritePartialRegUses is enabled because after this pass is guaranteed we have only fully defined or used registers.


https://github.com/llvm/llvm-project/pull/73786


More information about the llvm-commits mailing list