[llvm] [AMDGPU] Support merging 16-bit TBUFFER load/store instruction (PR #145078)

Wed Jun 25 01:02:23 PDT 2025

harrisonGPU wrote:

> I still think we should be doing this kind of merging in the IR. SILoadStoreOptimizer was originally intended only for the case of combining the DS read/write from non-consecutive offsets. Everything else could have been done like a normal vectorization

Thanks Matt. If we try to merge tbuffer loads in the IR, we first have to expose the buffer‐format information there. At the moment `SILoadStoreOptimizer` already has easy access to that data (e.g. BitsPerComp and NumFormat), so extending the existing pass feels more pragmatic.
The pass can already merge 32-bit tbuffer loads; to cover 16-bit and 8-bit cases we mainly need to handle the different element sizes, which is a relatively small change compared with plumbing format metadata through the whole IR pipeline.

https://github.com/llvm/llvm-project/pull/145078