[PATCH] D99347: [AMDGPU] Set implicit arg attributes for indirect calls

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 29 15:54:32 PDT 2021


arsenm accepted this revision.
arsenm added a comment.
This revision is now accepted and ready to land.

This is good enough as a bandaid until the pass is replaced with the inverse pass



================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:278
+  if (CallingConvSupportsAllImplicits &&
+      F.hasAddressTaken(nullptr, true, true, true)) {
+    for (StringRef AttrName : ImplicitAttrNames) {
----------------
I believe kernels can have their address taken, but not be called (e.g. you could have a global table with a bunch of kernel addresses in it)


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAnnotateKernelFeatures.cpp:275
+
+  // If this function hasAddressTaken() = true
+  // then add all attributes corresponding to the implicit args.
----------------
madhur13490 wrote:
> JonChesterfield wrote:
> > madhur13490 wrote:
> > > JonChesterfield wrote:
> > > > Is the reasoning here that an indirect call site to an unknown callee will have set up all the implicit state, so for consistency a function which is ever called indirectly must always have all the state set up?
> > > > 
> > > > I would have hoped the granularity is such that functions can be annotated with only what they need, so that indirect call sites pass (and require) everything but direct call sites don't need to. With consistent register location for implicit values, I would hope that will work out of the box.
> > > > Is the reasoning here that an indirect call site to an unknown callee will have set up all the implicit state, so for consistency a function which is ever called indirectly must always have all the state set up?
> > > 
> > > We are not setting up implicit attributes per call site. This pass works at function granularity and it does not take into account the call site. Implicit attributes are not a property of call site but  the functions themselves. If function's hasAddressTaken() is true, we set all attributes corresponding to the implicits.
> > > 
> > > 
> > > > I would have hoped the granularity is such that functions can be annotated with only what they need, so that indirect call sites pass (and require) everything but direct call sites don't need to. With consistent register location for implicit values, I would hope that will work out of the box.
> > > 
> > > That won't work out. You can't ask call site to pass all implicits when you setup only a subset of them. The arguments setup must be at superset of what is passed. We use attributes on the function (not call site) to setup implicits and thus pass this pass sets them up before we begin codegen so that codegen works out-of-the-box. So your comment - " that functions can be annotated with only what they need" is invalid.
> > > 
> > > 
> > > 
> > > 
> > Argument passing is per call site, derived from on the attributes of the function being called (or a pessimistic superset when the callee is uncertain). What you're doing here makes every call to a function slower, as opposed to only the indirect calls.
> > 
> > The attributes on a function should be derived from the attributes on the functions it calls, and whether any of those call sites is indirect, not based on whether the function itself is called indirectly.
> > Argument passing is per call site, derived from on the attributes of the function being called (or a pessimistic superset when the callee is uncertain). What you're doing here makes every call to a function slower, as opposed to only the indirect calls.
> > 
> > The attributes on a function should be derived from the attributes on the functions it calls, and whether any of those call sites is indirect, not based on whether the function itself is called indirectly.
> 
> I don't think we're on same page here. Our objective is to enable -fixed-abi only for those apps which makes an indirect call and making optimization per call site is not an immediate goal. While per call site optimization is useful, it is appropriate in the world where we have address taken functions and apps making direct as well indirect calls to them. We should handle this in the longer term solution where we want to use Attributor framework.
> 
> Our immediate goal is to minimize the potential regression problem.
Codegen does not attempt to treat implicit arguments per call site, only per function. We could optimize this, but I don't think it would be particularly worthwhile


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99347/new/

https://reviews.llvm.org/D99347



More information about the llvm-commits mailing list