[PATCH] D94153: [AMDGPU][Inliner] Remove amdgpu-inline and add new TTI inline hooks

Wed Jan 6 10:59:24 PST 2021

rnk added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp:1152
+
+bool GCNTTIImpl::mustNotInline(const CallBase *CB) const {
+  const Function *Caller = CB->getCaller();
----------------
arsenm wrote:
> aeubanks wrote:
> > rampitec wrote:
> > > arsenm wrote:
> > > > rampitec wrote:
> > > > > aeubanks wrote:
> > > > > > arsenm wrote:
> > > > > > > aeubanks wrote:
> > > > > > > > arsenm wrote:
> > > > > > > > > I'm not sure I like having a target hook for a hack like this
> > > > > > > > Would you rather just remove this altogether?
> > > > > > > I don't remember the story here. @rampitec @vpykhtin @dfukalov ?
> > > > > > It was added in https://reviews.llvm.org/D62917.
> > > > > > Apparently it's a hack to help with compile times. That's a bit surprising, is this an AMDGPU specific issue?
> > > > > It is AMDGPU specific only in a sense. We tend to inline a lot, much more than other targets. Therefor we can have drastic compile time issues. In particular there are several pretty big codes which compile hours instead of one or two minutes without this (suboptimal) cutoff.
> > > > Is it possible this was only due to a bug in the subtarget feature compatibility handling? I believe the standard inline analysis specifically looks for wrappers as well
> > > It did not work this way at least back when it was added. I think we should not change the logic with this patch. This patch is infrastructural, if we want to tune heuristics that would be a separate work.
> > That's fair. I'll keep everything for now.
> > 
> > @arsenm Any thoughts on alternatives to target hooks?
> I think we should first try to remove this part and see how it goes
I think it makes sense to avoid building infrastructure (new TTI hooks) if it turns out that it is never needed. It's much harder to remove hooks than it is to add them. However, we want to make sure that Arthur can make progress flipping the default pass manager without doing any AMDGPU tuning work. That's clearly out of scope for him.

We could proceed with a minimal set of inliner hooks, flip the default pass manager, and allow downstream AMD folks to sort out any performance problems later.  AMD or any other vendor can configure CMake to use the old pass manager if they aren't ready to triage the performance impacts of the NPM. Would that be OK?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94153/new/

https://reviews.llvm.org/D94153