[llvm] [AMDGPU][SIMemoryLegalizer][GFX12] Correctly insert sample/bvhcnt (PR #161637)
Pierre van Houtryve via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 6 07:27:06 PDT 2025
Pierre-vh wrote:
> > Before it was inserting those SAMPLE/BVHcnt and they were eliminated by InsertWaitCnt
>
> So a more interesting test case would be when those waits were not eliminated, because they were preceded by some SAMPLE/BVH instructions.
There was no such case, because the extra waits were only affecting the acq_rel/seq_cst waits inserted _after_ the atomic.
So we'd have this for example:
```
; GFX12-CU-NEXT: s_wait_bvhcnt 0x0
; GFX12-CU-NEXT: s_wait_samplecnt 0x0
; GFX12-CU-NEXT: s_wait_storecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt_dscnt 0x0
; GFX12-CU-NEXT: s_wait_kmcnt 0x0
; GFX12-CU-NEXT: global_load_b32 v1, v0, s[2:3] scope:SCOPE_DEV
; GFX12-CU-NEXT: s_wait_bvhcnt 0x0
; GFX12-CU-NEXT: s_wait_samplecnt 0x0
; GFX12-CU-NEXT: s_wait_loadcnt 0x0
; GFX12-CU-NEXT: global_inv scope:SCOPE_DEV
```
The second bvh/samplecnt were always eliminated >O0 because they were redundant. We need them for the release sequence, but were accidentally adding them again for the acquire sequence.
https://github.com/llvm/llvm-project/pull/161637
More information about the llvm-commits
mailing list