[llvm] [AMDGPU] Implement LSR cost model for GFX9+ (PR #184138)

Wed Mar 4 02:19:56 PST 2026

================
@@ -1703,3 +1704,51 @@ GCNTTIImpl::getInstructionUniformity(const Value *V) const {
 
   return InstructionUniformity::Default;
 }
+
+InstructionCost GCNTTIImpl::getScalingFactorCost(Type *Ty, GlobalValue *BaseGV,
+                                                 StackOffset BaseOffset,
+                                                 bool HasBaseReg, int64_t Scale,
+                                                 unsigned AddrSpace) const {
+  if (HasBaseReg && Scale != 0) {
+    // gfx1250+ can fold base+scale*index into the instruction when scale
+    // equals the memory access size (scale_offset bit). This is supported
+    // for global/constant/flat/scratch but NOT for LDS or GDS.
+    // GDS does not exist on gfx1250+, but we exclude REGION_ADDRESS for
+    // correctness since the address space is still representable in IR.
+    if (getST()->hasScaleOffset() && Ty && Ty->isSized() &&
+        AddrSpace != AMDGPUAS::LOCAL_ADDRESS &&
+        AddrSpace != AMDGPUAS::REGION_ADDRESS) {
+      TypeSize StoreSize = getDataLayout().getTypeStoreSize(Ty);
+      if (!StoreSize.isScalable() &&
----------------
arsenm wrote:

It still needs to not crash on a scaleable vector 

https://github.com/llvm/llvm-project/pull/184138