[llvm] [AMDGPU] Support merging 16-bit TBUFFER load/store instruction (PR #145078)

Tue Jun 24 01:30:17 PDT 2025

================
@@ -1040,32 +1040,58 @@ bool SILoadStoreOptimizer::offsetsCanBeCombined(CombineInfo &CI,
   if (CI.Offset == Paired.Offset)
     return false;
 
+  // Use 2-byte element size if both tbuffer formats are 16-bit.
+  unsigned EltSize = CI.EltSize;
+  auto Has16BitComponents = [&](unsigned Format) -> bool {
+    const auto *Info = AMDGPU::getGcnBufferFormatInfo(Format, STI);
+    return Info && Info->BitsPerComp == 16;
+  };
+
+  if ((CI.InstClass == TBUFFER_LOAD || CI.InstClass == TBUFFER_STORE)) {
+    // TODO: Support merging 8-bit tbuffer load/store instructions
+    if (Has16BitComponents(CI.Format) && Has16BitComponents(Paired.Format))
+      EltSize = 2;
+  }
+
   // This won't be valid if the offset isn't aligned.
-  if ((CI.Offset % CI.EltSize != 0) || (Paired.Offset % CI.EltSize != 0))
+  if ((CI.Offset % EltSize != 0) || (Paired.Offset % EltSize != 0))
     return false;
 
   if (CI.InstClass == TBUFFER_LOAD || CI.InstClass == TBUFFER_STORE) {
 
-    const llvm::AMDGPU::GcnBufferFormatInfo *Info0 =
-        llvm::AMDGPU::getGcnBufferFormatInfo(CI.Format, STI);
+    const AMDGPU::GcnBufferFormatInfo *Info0 =
+        AMDGPU::getGcnBufferFormatInfo(CI.Format, STI);
     if (!Info0)
       return false;
-    const llvm::AMDGPU::GcnBufferFormatInfo *Info1 =
-        llvm::AMDGPU::getGcnBufferFormatInfo(Paired.Format, STI);
+    const AMDGPU::GcnBufferFormatInfo *Info1 =
+        AMDGPU::getGcnBufferFormatInfo(Paired.Format, STI);
     if (!Info1)
       return false;
 
     if (Info0->BitsPerComp != Info1->BitsPerComp ||
         Info0->NumFormat != Info1->NumFormat)
       return false;
 
-    // TODO: Should be possible to support more formats, but if format loads
-    // are not dword-aligned, the merged load might not be valid.
----------------
harrisonGPU wrote:

Thanks, I understand your concern. I’ve verified section 9.5 “Alignment” of the RDNA 3 Shader Instruction Set Architecture manual, which states:

Formatted ops such as BUFFER_LOAD_FORMAT_* must be aligned as follows:
• 1-byte formats → 1-byte alignment
• 2-byte formats → 2-byte alignment
• 4-byte and larger formats → 4-byte alignment

I’ve therefore added an explicit alignment check and a new Lit test, `gfx11_tbuffer_load_x_off2_off4_16bit_no_merge`.

Reference:
https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna3-shader-instruction-set-architecture-feb-2023_0.pdf


https://github.com/llvm/llvm-project/pull/145078