[llvm] [AMDGPU] Insert waitcnt for non-global fence release in GFX12 (PR #159282)
Sameer Sahasrabuddhe via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 17 22:27:06 PDT 2025
================
@@ -810,14 +827,18 @@ define amdgpu_kernel void @agent_seq_cst_fence() {
;
; GFX12-WGP-LABEL: agent_seq_cst_fence:
; GFX12-WGP: ; %bb.0: ; %entry
+; GFX12-WGP-NEXT: s_wait_dscnt 0x0
; GFX12-WGP-NEXT: s_endpgm
;
; GFX12-CU-LABEL: agent_seq_cst_fence:
; GFX12-CU: ; %bb.0: ; %entry
+; GFX12-CU-NEXT: s_wait_dscnt 0x0
; GFX12-CU-NEXT: s_endpgm
;
; GFX1250-LABEL: agent_seq_cst_fence:
; GFX1250: ; %bb.0: ; %entry
+; GFX1250-NEXT: global_wb scope:SCOPE_DEV
----------------
ssahasra wrote:
More importantly, this change caused a global write-back for a fence that targets local address space only. That is not what we intended. The whole point is to order only local accesses.More importantly, this change caused a global write-back for a fence that targets local address space only. That is not what we intended. The whole point is to order only local accesses.
https://github.com/llvm/llvm-project/pull/159282
More information about the llvm-commits
mailing list