[llvm] [AMDGPU] Implement LSR cost model for GFX9+ (PR #184138)

Sat Mar 7 05:47:36 PST 2026

================
@@ -1703,3 +1704,50 @@ GCNTTIImpl::getInstructionUniformity(const Value *V) const {
 
   return InstructionUniformity::Default;
 }
+
+InstructionCost GCNTTIImpl::getScalingFactorCost(Type *Ty, GlobalValue *BaseGV,
+                                                 StackOffset BaseOffset,
+                                                 bool HasBaseReg, int64_t Scale,
+                                                 unsigned AddrSpace) const {
+  if (HasBaseReg && Scale != 0) {
+    // gfx1250+ can fold base+scale*index into the instruction when scale
+    // equals the memory access size (scale_offset bit). Supported address
+    // spaces: flat, global, constant, private (scratch).
+    if (getST()->hasScaleOffset() && Ty && Ty->isSized() &&
+        (AddrSpace == AMDGPUAS::FLAT_ADDRESS ||
+         AddrSpace == AMDGPUAS::GLOBAL_ADDRESS ||
+         AddrSpace == AMDGPUAS::CONSTANT_ADDRESS ||
----------------
michaelselehov wrote:

Switched to `isExtendedGlobalAddrSpace(AS) || AS == AMDGPUAS::FLAT_ADDRESS || AS == AMDGPUAS::PRIVATE_ADDRESS`. Note that `isExtendedGlobalAddrSpace` also covers `AS > MAX_AMDGPU_ADDRESS` (constant buffers, streamout, etc.) — are we OK with those hitting the scale_offset path?

https://github.com/llvm/llvm-project/pull/184138