[llvm] [AMDGPU] Sink uniform buffer address offsets into soffset (PR #169230)

Juan Manuel Martinez CaamaƱo via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 24 01:32:30 PST 2025


================
@@ -2046,6 +2056,86 @@ bool AMDGPUCodeGenPrepareImpl::visitSqrt(IntrinsicInst &Sqrt) {
   return true;
 }
 
+/// Sink uniform addends in buffer address calculations into soffset.
+///
+/// Transforms buffer loads/stores with voffset = add(uniform, divergent)
+/// into voffset = divergent, soffset = uniform for better address coalescing
+/// Only applies to raw buffer operations with soffset initially zero.
+bool AMDGPUCodeGenPrepareImpl::visitBufferIntrinsic(IntrinsicInst &I) {
+  Intrinsic::ID IID = I.getIntrinsicID();
+  bool IsLoad = (IID == Intrinsic::amdgcn_raw_buffer_load ||
+                 IID == Intrinsic::amdgcn_raw_buffer_load_format ||
+                 IID == Intrinsic::amdgcn_raw_ptr_buffer_load ||
+                 IID == Intrinsic::amdgcn_raw_ptr_buffer_load_format);
+  bool IsStore = (IID == Intrinsic::amdgcn_raw_buffer_store ||
+                  IID == Intrinsic::amdgcn_raw_buffer_store_format ||
+                  IID == Intrinsic::amdgcn_raw_ptr_buffer_store ||
+                  IID == Intrinsic::amdgcn_raw_ptr_buffer_store_format);
+
+  if (!IsLoad && !IsStore)
+    return false;
+
+  // Buffer intrinsic operand layout (same for vector and pointer descriptor):
+  // Load:  (rsrc, voffset, soffset, cachepolicy)
+  // Store: (vdata, rsrc, voffset, soffset, cachepolicy)
+  const unsigned VOffsetIdx = IsStore ? 2 : 1;
+  const unsigned SOffsetIdx = IsStore ? 3 : 2;
+
+  Value *VOffset = I.getArgOperand(VOffsetIdx);
+  Value *SOffset = I.getArgOperand(SOffsetIdx);
+
+  // Only optimize when soffset is currently zero
+  if (!match(SOffset, m_Zero()))
+    return false;
----------------
jmmartinez wrote:

How does this code evolve when we want to handle soffsets that are not zero? Or more complex voffsets like `(((non_uniform + uniform_a) + uniform_b) + 8)`.

In this previous case, 8 is kept as it is to sink it later to the constant part. uniform_a and uniform_b are kept in there too, while it'd have been possible to move, at least `uniform_b`, to the soffset ?

I understand that as a first step we can put this transformation in here; but I would not be surprised about it having its own pass (or having a pass to do "uniform reassociate" more generally).

https://github.com/llvm/llvm-project/pull/169230


More information about the llvm-commits mailing list