[llvm] [AMDGPU] Implement LSR cost model for GFX9+ (PR #184138)

Wed Mar 4 01:14:20 PST 2026

================
@@ -1703,3 +1704,51 @@ GCNTTIImpl::getInstructionUniformity(const Value *V) const {
 
   return InstructionUniformity::Default;
 }
+
+InstructionCost GCNTTIImpl::getScalingFactorCost(Type *Ty, GlobalValue *BaseGV,
+                                                 StackOffset BaseOffset,
+                                                 bool HasBaseReg, int64_t Scale,
+                                                 unsigned AddrSpace) const {
+  if (HasBaseReg && Scale != 0) {
+    // gfx1250+ can fold base+scale*index into the instruction when scale
+    // equals the memory access size (scale_offset bit). This is supported
+    // for global/constant/flat/scratch but NOT for LDS or GDS.
+    // GDS does not exist on gfx1250+, but we exclude REGION_ADDRESS for
+    // correctness since the address space is still representable in IR.
+    if (getST()->hasScaleOffset() && Ty && Ty->isSized() &&
+        AddrSpace != AMDGPUAS::LOCAL_ADDRESS &&
----------------
michaelselehov wrote:

Agreed, switched to an allowlist of flat/global/constant/private.

One thing I considered is whether we should also validate the scale value against supported instruction sizes. VMEM (flat/global/scratch) supports scale_offset for accesses up to 16 bytes (b128), while SMEM (s_load) supports up to 64 bytes (b512). At the TTI level we can't distinguish which path a constant-space load will take, so in theory we could report cost 0 for a scale that only SMEM can handle. In practice this shouldn't be an issue since `isLegalAddressingMode` gates which scales LSR considers, and whoever updates it for scale_offset will need to account for valid sizes there. But worth noting for the future.

https://github.com/llvm/llvm-project/pull/184138