[llvm] [AMDGPU] Support merging 16-bit and 8-bit TBUFFER load/store instruction (PR #145078)

Tue Aug 19 06:46:24 PDT 2025

================
@@ -1596,8 +1637,14 @@ MachineBasicBlock::iterator SILoadStoreOptimizer::mergeTBufferLoadPair(
   if (Regs.VAddr)
     MIB.add(*TII->getNamedOperand(*CI.I, AMDGPU::OpName::vaddr));
 
+  // For 8-bit or 16-bit tbuffer formats there is no 3-component encoding.
+  // If the combined count is 3 (e.g. X+X+X or XY+X), promote to 4 components
+  // and use XYZ of XYZW to enable the merge.
+  unsigned NumCombinedComponents = CI.Width + Paired.Width;
+  if (NumCombinedComponents == 3 && (CI.EltSize == 1 || CI.EltSize == 2))
----------------
jayfoad wrote:

```suggestion
  if (NumCombinedComponents == 3 && CI.EltSize <= 2)
```

https://github.com/llvm/llvm-project/pull/145078