[llvm] [AMDGPU] Support merging 16-bit and 8-bit TBUFFER load/store instruction (PR #145078)
Harrison Hao via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 20 02:10:40 PDT 2025
================
@@ -1049,24 +1056,51 @@ bool SILoadStoreOptimizer::offsetsCanBeCombined(CombineInfo &CI,
const llvm::AMDGPU::GcnBufferFormatInfo *Info0 =
llvm::AMDGPU::getGcnBufferFormatInfo(CI.Format, STI);
- if (!Info0)
- return false;
const llvm::AMDGPU::GcnBufferFormatInfo *Info1 =
llvm::AMDGPU::getGcnBufferFormatInfo(Paired.Format, STI);
- if (!Info1)
- return false;
if (Info0->BitsPerComp != Info1->BitsPerComp ||
Info0->NumFormat != Info1->NumFormat)
return false;
- // TODO: Should be possible to support more formats, but if format loads
- // are not dword-aligned, the merged load might not be valid.
- if (Info0->BitsPerComp != 32)
+ // For 8-bit or 16-bit formats there is no 3-component variant.
+ // If NumCombinedComponents is 3, try the 4-component format and use XYZ.
+ // Example:
+ // tbuffer_load_format_x + tbuffer_load_format_x + tbuffer_load_format_x
+ // ==> tbuffer_load_format_xyz with format:[BUF_FMT_16_16_16_16_SNORM]
+ unsigned NumCombinedComponents = CI.Width + Paired.Width;
+ unsigned CombinedBufferFormat =
+ getBufferFormatWithCompCount(CI.Format, NumCombinedComponents, STI);
+ if (CombinedBufferFormat == 0 && NumCombinedComponents == 3 &&
+ CI.EltSize <= 2) {
+ unsigned TryFormat = getBufferFormatWithCompCount(CI.Format, 4, STI);
+ if (!TryFormat)
+ return false;
+ CombinedBufferFormat = TryFormat;
+ NumCombinedComponents = 4;
+ }
----------------
harrisonGPU wrote:
Thanks! I have updated it.
https://github.com/llvm/llvm-project/pull/145078
More information about the llvm-commits
mailing list