[llvm] [AMDGPU] Allow merging unordered and monotonic atomic loads in SILoadStoreOptimizer (PR #189932)

Wed Apr 1 21:16:26 PDT 2026

harrisonGPU wrote:

> > I think this optimization is not suitable for IR. Combining atomic operations at the IR level changes the number of atomic events and can violate the memory model. This is why the transformation is implemented at the MachineIR level.
> 
> That doesn't change when you perform it in machine IR. This is valid or it's not, it doesn't become legal just by doing it in MIR

Thanks, Matt, you're right that the legality of the transformation doesn't depend on the IR level. 

However, the reason for doing this in MachineIR rather than LLVM IR is practical.  The LLVM Atomics guide states:
`"atomic instructions are guaranteed to be lock-free, and therefore an instruction which is wider than the target natively supports can be impossible to generate."`

Merging two 32 bit load atomic into a load atomic 64 bit at the IR level would require every backend to support a lock free 64 bit atomic load. For the b128 case, it would require load atomic 128 bit, it might cause some other backend fail. At the MachineIR level, we already know the target supports the wider load natively, so there is no risk of codegen failure. Do you have some good suggestions?

Reference: https://llvm.org/docs/Atomics.html#atomic-instructions

https://github.com/llvm/llvm-project/pull/189932