[llvm] [AMDGPU] Add test for GCNRegPressure tracker bug (PR #73786)
Valery Pykhtin via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 29 06:07:15 PST 2023
================
@@ -531,3 +531,126 @@ body: |
%1:vgpr_32 = V_MOV_B32_e32 %0, implicit $exec
S_NOP 0, implicit %1
...
+---
+name: movrel
+tracksRegLiveness: true
+body: |
+ ; RPU-LABEL: name: movrel
+ ; RPU: bb.0:
+ ; RPU-NEXT: Live-in:
+ ; RPU-NEXT: SGPR VGPR
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 $sgpr0 = COPY $sgpr1
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 $sgpr2_sgpr3 = S_GETPC_B64
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 $sgpr1 = COPY killed $sgpr3
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 $sgpr0_sgpr1_sgpr2_sgpr3 = S_LOAD_DWORDX4_IMM $sgpr0_sgpr1, 0, 0
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 $sgpr0 = S_BUFFER_LOAD_DWORD_IMM $sgpr0_sgpr1_sgpr2_sgpr3, 0, 0
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 undef %0.sub5:vreg_512 = V_MOV_B32_e32 5, implicit $exec
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 S_CMP_GT_U32 $sgpr0, 15, implicit-def $scc
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 S_CBRANCH_SCC1 %bb.2, implicit $scc
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 0 S_BRANCH %bb.1
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: Live-out:
+ ; RPU-NEXT: bb.1:
+ ; RPU-NEXT: Live-in:
+ ; RPU-NEXT: SGPR VGPR
+ ; RPU-NEXT: 0 0
+ ; RPU-NEXT: 0 1 undef %0.sub5:vreg_512 = V_MOV_B32_e32 5, implicit $exec
+ ; RPU-NEXT: 0 1
+ ; RPU-NEXT: 0 1 $m0 = S_MOV_B32 killed $sgpr0
+ ; RPU-NEXT: 0 1
+ ; RPU-NEXT: 0 1 %0:vreg_512 = V_INDIRECT_REG_WRITE_MOVREL_B32_V16 %0:vreg_512(tied-def 0), 42, 3, implicit $m0, implicit $exec
+ ; RPU-NEXT: 0 1
+ ; RPU-NEXT: Live-out: %0:0000000000000C00
+ ; RPU-NEXT: bb.2:
+ ; RPU-NEXT: Live-in: %0:0000000000000C00
+ ; RPU-NEXT: SGPR VGPR
+ ; RPU-NEXT: 0 1
+ ; RPU-NEXT: 0 1 %1:vgpr_32 = V_CVT_F32_UBYTE0_e64 %0.sub5:vreg_512, 0, 0, implicit $exec
----------------
vpykhtin wrote:
> Unless I misunderstood the suggestion I do not think this is related to `GCNRewritePartialRegUses`.
GCNRewritePartialRegUses is irrelevant to this case indeed, I just thought about a conservative way of register pressure accounting when we account for the whole reg always, but we can do better than that as RPD does.
> The movrel instruction is kind of special because of the indirection. It can't just operate on %0.sub5.
Sorry I don't really know how it works but I believe
`%0:vreg_512 = V_INDIRECT_REG_WRITE_MOVREL_B32_V16 %0:vreg_512(tied-def 0)`
models correctly what is does, that is it fully defines _%0:vreg_512_ on output, right?
https://github.com/llvm/llvm-project/pull/73786
More information about the llvm-commits
mailing list