[llvm] [AMDGPU][SIMemoryLegalizer][GFX12] Correctly insert sample/bvhcnt (PR #161637)

Pierre van Houtryve via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 6 07:27:06 PDT 2025


Pierre-vh wrote:

> > Before it was inserting those SAMPLE/BVHcnt and they were eliminated by InsertWaitCnt
> 
> So a more interesting test case would be when those waits were not eliminated, because they were preceded by some SAMPLE/BVH instructions.

There was no such case, because the extra waits were only affecting the acq_rel/seq_cst waits inserted _after_ the atomic. 
So we'd have this for example:

```
; GFX12-CU-NEXT:    s_wait_bvhcnt 0x0
; GFX12-CU-NEXT:    s_wait_samplecnt 0x0
; GFX12-CU-NEXT:    s_wait_storecnt 0x0
; GFX12-CU-NEXT:    s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT:    s_wait_kmcnt 0x0
; GFX12-CU-NEXT:    global_load_b32 v1, v0, s[2:3] scope:SCOPE_DEV
; GFX12-CU-NEXT:    s_wait_bvhcnt 0x0
; GFX12-CU-NEXT:    s_wait_samplecnt 0x0
; GFX12-CU-NEXT:    s_wait_loadcnt 0x0
; GFX12-CU-NEXT:    global_inv scope:SCOPE_DEV
```

The second bvh/samplecnt were always eliminated >O0 because they were redundant. We need them for the release sequence, but were accidentally adding them again for the acquire sequence.


https://github.com/llvm/llvm-project/pull/161637


More information about the llvm-commits mailing list