[llvm] [AMDGPU] Lazily emit waitcnts on function entry (PR #73122)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 6 06:55:36 PST 2023
================
@@ -6,10 +6,12 @@ declare hidden ptr addrspace(1) @ext(ptr addrspace(1))
define ptr addrspace(1) @call_assert_align() {
; CHECK-LABEL: call_assert_align:
; CHECK: ; %bb.0: ; %entry
-; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT: s_waitcnt lgkmcnt(0)
; CHECK-NEXT: s_mov_b32 s16, s33
; CHECK-NEXT: s_mov_b32 s33, s32
+; CHECK-NEXT: s_waitcnt expcnt(0)
----------------
jayfoad wrote:
Quoting from `SIInsertWaitcnts`:
```
// Export & GDS instructions do not read the EXEC mask until after the export
// is granted (which can occur well after the instruction is issued).
// The shader program must flush all EXP operations on the export-count
// before overwriting the EXEC mask.
```
https://github.com/llvm/llvm-project/pull/73122
More information about the llvm-commits
mailing list