[llvm] [AMDGPU] Allow merging unordered and monotonic atomic loads in SILoadStoreOptimizer (PR #189932)

Thu Apr 2 04:28:25 PDT 2026

ssahasra wrote:

What is the motivation for doing this kind of merging now? For all these years, most targets have shied away from optimizing atomics because it can get tricky, and the gains were not worth it. I do think that this should be an LLVM IR optimization, and perhaps it can be parameterized by TTI queries about the largest size supported.

I don't see why we can't merge volatile accesses. Atomics are the ones which care about "all or nothing" kinda of modification. We are free to split or merge volatile accesses, as long a they are guaranteed to always happen. This does mean that the resulting merged operation should itself be marked volatile, though. I don't know if that is a desirable outcome.

Similarly, it should be okay to merge an atomic with a non-atomic, and merge two atomics with different orderings. In fact "non-atomic" is just another ordering. The resulting ordering should be the stronger of the two. But of course, this must respect existing order of instructions. For example, one store-release W1 followed by another store-release W2, can be merged, but the resulting operation must take the place of W2, and not W1. It's as if the any memory operations between W1 and W2 were reordered to happen before W1, which is okay unless W1 is a seq_cst operation.

Similarly, merging atomics with different scopes should be okay too, as long as the resulting scope is the larger of the two.

https://github.com/llvm/llvm-project/pull/189932