[PATCH] D140242: [AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing function arguments through the stack

Fri Dec 16 12:17:24 PST 2022

arsenm requested changes to this revision.
arsenm added inline comments.
This revision now requires changes to proceed.

================
Comment at: llvm/lib/Analysis/InlineCost.cpp:164-166
+static cl::opt<bool> DisableInlineSimplification(
+    "inline-disable-simplification", cl::Hidden, cl::init(false),
+    cl::desc("Disables instruction simplification during inlining"));
----------------
Not sure why you're adding this but it doesn't belong in this patch

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:86-88
+static cl::opt<unsigned> ArgStackInlinePenalty(
+    "amdgpu-inline-arg-stack-cost", cl::Hidden, cl::init(15),
+    cl::desc("Cost per argument for function arguments passed through stack"));
----------------
Should be able to compute this directly from the existing costs for stack stores

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:1189-1191
+  // Outer kernel functions can't be inlined.
+  if (llvm::AMDGPU::isKernelCC(Callee))
+    return 0;
----------------
No reason to specially consider them?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:1198
+    if (AMDGPU::isArgPassedInSGPR(&A))
+      SGPRsInUse++;
+    else
----------------
Raw argument counts don't correspond to register counts, need to get the type legalized register size

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:1234
   }
-  if (AllocaSize)
-    return ArgAllocaCost;
-  return 0;
+  adjustThreshold += adjustInlinigThresholdUsingCallee(CB->getCalledFunction());
+  adjustThreshold += AllocaSize ? ArgAllocaCost : AllocaSize;
----------------
Typo tInlinig

================
Comment at: llvm/test/Transforms/Inline/AMDGPU/amdgpu-inline-stack-argument.ll:18-21
+  %and = and i32 %shr, %shl
+  %shr1 = lshr i32 %y0, %and
+  %shr2 = lshr i32 %shr1, %t0
+  %shl3 = shl i32 %e1, %w0
----------------
Should use pass arguments or flags to set the thresholds to avoid having so many instructions in the test

================
Comment at: llvm/test/Transforms/Inline/AMDGPU/amdgpu-inline-stack-argument.ll:2027
+  %arrayidx = getelementptr inbounds i32, ptr %in, i64 0
+  %0 = load i32, ptr %arrayidx, align 4
+  %arrayidx1 = getelementptr inbounds i32, ptr %in, i64 1
----------------
Don't use anonymous values in tests

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140242/new/

https://reviews.llvm.org/D140242