[llvm] [AMDGPU][SIInsertWaitCnts] Gfx12.5 - Refactor xcnt optimization (PR #164357)
Jay Foad via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 6 07:12:53 PST 2025
jayfoad wrote:
> Is an xcnt necessary here?
>
> ```
> ; GFX1250-LABEL: flat_atomic_fadd_f32_noret_pat:
> ; GFX1250: ; %bb.0:
> ; GFX1250-NEXT: s_load_b64 s[0:1], s[4:5], 0x24
> ; GFX1250-NEXT: v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 4.0
> ; GFX1250-NEXT: global_wb scope:SCOPE_SYS
> ; GFX1250-NEXT: s_wait_storecnt 0x0
> ; GFX1250-NEXT: s_wait_xcnt 0x0
> ; GFX1250-NEXT: s_wait_kmcnt 0x0
> ; GFX1250-NEXT: flat_atomic_add_f32 v0, v1, s[0:1] scope:SCOPE_SYS
> ; GFX1250-NEXT: s_wait_storecnt_dscnt 0x0
> ; GFX1250-NEXT: global_inv scope:SCOPE_SYS
> ; GFX1250-NEXT: s_endpgm
> ```
Dunno! :)
I assume that global_wb does not increment xcnt since it has no address to translate. So the only pending translation would be for the s_load_b64, so I think you would only need an s_wait_xcnt if you are about to clobber s[4:5], which we're not. And also the s_wait_kmcnt would make it redundant. But maybe I'm missing something.
https://github.com/llvm/llvm-project/pull/164357
More information about the llvm-commits
mailing list