[PATCH] D140242: [AMDGPU] Modify adjustInliningThreshold to also consider the cost of passing function arguments through the stack

Thu Jan 12 11:54:13 PST 2023

scchan added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:1204
+  adjustThreshold += std::max(0, SGPRsInUse - 26) * ArgStackInlinePenalty;
+  adjustThreshold += std::max(0, VGPRsInUse - 32) * ArgStackInlinePenalty;
+  return adjustThreshold;
----------------
JanekvO wrote:
> scchan wrote:
> > arsenm wrote:
> > > scchan wrote:
> > > > I guess it's subtracting the number of clobbered registers -  instead of a hardcoded value, could that be replaced by something more meaningful like a const variable or a getter?
> > > > 
> > > > Also shouldn't VGPRs have a higher penalty relative to SGPRs since they'd occupy more stack space?
> > > We only sort of handle SGPR arguments today, and not for compute. We also do not currently implement the optimization of packing SGPRs into a VGPR for the argument spill
> > I wasn't paying attention to the comments for ArgStackInlinePenalty.  The cost model is only based on the number of instructions and it doesn't take storage into account.
> I couldn't infer what measurement unit the inliner cost/threshold uses so I took a cost model relative from the cost of a single instruction. Do let me know if the storage cost should be considered (and possibly with what amount).
I was thinking about the contribution of a stack's size to the overall size of the scratch since that may add penalty to the launch overhead.  A VGPR store would take more stack space than a SGPR store and therefore has a higher cost (relatively speaking)?  I don't know how to model it at the moment but just suggesting that would be something to consider.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140242/new/

https://reviews.llvm.org/D140242