[PATCH] D157437: AMDGPU: Expand remaining system atomic operations

Wed Aug 9 04:49:19 PDT 2023

JonChesterfield added a subscriber: jhuber6.
JonChesterfield added a comment.

In D157437#4571594 <https://reviews.llvm.org/D157437#4571594>, @yaxunl wrote:

> If users make sure they only use certain memories, then they can specify suitable options for their usecase:
>
> corse-grained memory only: -unsafe-fp-atomics -unsafe-pcie-integer-atomics
> fine-grained memory with PCIE: 
> fine-grained memory with XGMI: -unsafe-pcie-integer-atomics

I'm not totally sure I'm following the setup here. It sounds like someone writes fetch_or to a global in addrspace(1) and gets it expanded to CAS unless they also pass a scary command line flag which will miscompile other atomic operations which do actually need the expansion.

We could use 'system' scope to mean 'flat CAS instruction' but as pointed out above that'll kill performance on non-pcie systems, plus it's really easy to emit system scope instructions unintentionally since that's the default.

Is the root problem that addrspace annotation is inaccurate? It seems that architecture + which memory contains the variable is exactly sufficient to pick the instruction, at which point I don't see what the command line flags would be for.

@jhuber6 I think libc uses system scope everywhere because the only intrinsics that give access to device scope are opencl ones with constraints on the right type of atomic type argument. If it expands fetch_or to CAS we're losing performance.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D157437/new/

https://reviews.llvm.org/D157437