[llvm] [AMDGPU] Add hazard workarounds to insertIndirectBranch (PR #109127)

Carl Ritson via llvm-commits llvm-commits at lists.llvm.org
Sat Sep 21 21:17:10 PDT 2024


================
@@ -42,6 +44,71 @@ define amdgpu_kernel void @uniform_conditional_max_short_forward_branch(ptr addr
 ; GCN-NEXT:    buffer_store_dword v0, off, s[4:7], 0
 ; GCN-NEXT:    s_waitcnt vmcnt(0)
 ; GCN-NEXT:    s_endpgm
+;
+; GFX11-LABEL: uniform_conditional_max_short_forward_branch:
+; GFX11:       ; %bb.0: ; %bb
+; GFX11-NEXT:    s_load_b32 s0, s[2:3], 0x2c
+; GFX11-NEXT:    s_waitcnt lgkmcnt(0)
+; GFX11-NEXT:    s_cmp_eq_u32 s0, 0
+; GFX11-NEXT:    s_cbranch_scc0 .LBB0_1
+; GFX11-NEXT:  ; %bb.3: ; %bb
+; GFX11-NEXT:    s_getpc_b64 s[4:5]
+; GFX11-NEXT:  .Lpost_getpc0:
+; GFX11-NEXT:    s_waitcnt_depctr 0xfffe
+; GFX11-NEXT:    s_add_u32 s4, s4, (.LBB0_2-.Lpost_getpc0)&4294967295
+; GFX11-NEXT:    s_waitcnt_depctr 0xfffe
----------------
perlfu wrote:

@jayfoad - Second `s_waitcnt_depctr` should be unnecessary on both GFX11 and GFX12.
I added it out of a sense of caution, but will remove it inline with normal workaround sequence.
Third `s_waitcnt_depctr` is only needed on GFX12, but in my opinion it is simpler to have same code for 11 and 12 as the perf. impact in this case will basically be zero.

@arsenm - This is about VALU access to SGPR before these instructions.
If a VALU has accessed the SGPRs used here then each SALU write must be flushed before a subsequent SALU read to the same register.
Technically flush is not required if SGPR was not used by VALU, or some expiry condition was reached, but adding backward scan to establish this is excessive compared to low perf. cost of always mitigating this sequence.


https://github.com/llvm/llvm-project/pull/109127


More information about the llvm-commits mailing list