[llvm] [AMDGPU] Insert waitcnt for non-global fence release in GFX12 (PR #159282)

Sameer Sahasrabuddhe via llvm-commits llvm-commits at lists.llvm.org
Wed Sep 17 22:27:06 PDT 2025


================
@@ -810,14 +827,18 @@ define amdgpu_kernel void @agent_seq_cst_fence() {
 ;
 ; GFX12-WGP-LABEL: agent_seq_cst_fence:
 ; GFX12-WGP:       ; %bb.0: ; %entry
+; GFX12-WGP-NEXT:    s_wait_dscnt 0x0
 ; GFX12-WGP-NEXT:    s_endpgm
 ;
 ; GFX12-CU-LABEL: agent_seq_cst_fence:
 ; GFX12-CU:       ; %bb.0: ; %entry
+; GFX12-CU-NEXT:    s_wait_dscnt 0x0
 ; GFX12-CU-NEXT:    s_endpgm
 ;
 ; GFX1250-LABEL: agent_seq_cst_fence:
 ; GFX1250:       ; %bb.0: ; %entry
+; GFX1250-NEXT:    global_wb scope:SCOPE_DEV
----------------
ssahasra wrote:

More importantly, this change caused a global write-back for a fence that targets local address space only. That is not what we intended. The whole point is to order only local accesses.More importantly, this change caused a global write-back for a fence that targets local address space only. That is not what we intended. The whole point is to order only local accesses.

https://github.com/llvm/llvm-project/pull/159282


More information about the llvm-commits mailing list