[llvm] [LSV][AMDGPU] Vectorize unordered and monotonic atomic loads (PR #190152)

Thu Apr 2 08:56:21 PDT 2026

================
@@ -398,6 +398,17 @@ unsigned GCNTTIImpl::getLoadStoreVecRegBitWidth(unsigned AddrSpace) const {
   return 128;
 }
 
+bool GCNTTIImpl::isLegalToVectorizeLoad(LoadInst *LI) const {
+  // Simple (non-atomic, non-volatile) loads are always legal.
+  if (LI->isSimple())
+    return true;
+  // Allow unordered and monotonic atomic loads to be vectorized. These
+  // orderings impose no cross address synchronization constraints, so merging
+  // adjacent accesses into a wider load is safe. The hardware guarantees
+  // atomicity for naturally aligned loads up to 128 bits.
----------------
efriedma-quic wrote:

The comment says "naturally aligned loads", but the testcase shows we're generating misaligned loads.  Or at least, loads which only have element alignment, which isn't sufficient on any hardware I'm aware of.

https://discourse.llvm.org/t/rfc-add-elementwise-modifier-to-atomicrmw/90134/1 might be relevant.

https://github.com/llvm/llvm-project/pull/190152