[llvm] [AMDGPU] Lazily emit waitcnts on function entry (PR #73122)

Jay Foad via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 6 06:55:36 PST 2023


================
@@ -6,10 +6,12 @@ declare hidden ptr addrspace(1) @ext(ptr addrspace(1))
 define ptr addrspace(1) @call_assert_align() {
 ; CHECK-LABEL: call_assert_align:
 ; CHECK:       ; %bb.0: ; %entry
-; CHECK-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
 ; CHECK-NEXT:    s_mov_b32 s16, s33
 ; CHECK-NEXT:    s_mov_b32 s33, s32
+; CHECK-NEXT:    s_waitcnt expcnt(0)
----------------
jayfoad wrote:

Quoting from `SIInsertWaitcnts`:
```
  // Export & GDS instructions do not read the EXEC mask until after the export
  // is granted (which can occur well after the instruction is issued).
  // The shader program must flush all EXP operations on the export-count
  // before overwriting the EXEC mask.
```

https://github.com/llvm/llvm-project/pull/73122


More information about the llvm-commits mailing list