[PATCH] D55067: [HIP] Fix offset of kernel argument for AMDGPU target

Tue Dec 4 11:57:02 PST 2018

arsenm added a comment.

In D55067#1318959 <https://reviews.llvm.org/D55067#1318959>, @rjmccall wrote:

> In D55067#1318810 <https://reviews.llvm.org/D55067#1318810>, @arsenm wrote:
>
> > In D55067#1313419 <https://reviews.llvm.org/D55067#1313419>, @rjmccall wrote:
> >
> > > I understand that it's copied into a properly-aligned local variable, but if it affects how the function is called, that's also part of the ABI, and it should be taken from the C alignment, not any vagaries of how struct types are translated into IR.
> > >
> > > The other reason to stick with the C alignment value is that it's actually stable across compiler reasons, whereas IRGen does not promise to use the same IR type for a struct across compiler versions.
> > >
> > > Now, you're a GPU compiler, so maybe you don't have any ABI requirements, or maybe they're weaker for kernel functions, in which case this is basically irrelevant.  But if you do care about compatibility here, you should be aiming to communicate the alignment of this parameter correctly.
> > >
> > > This is part of why we largely try to avoid using arbitrary first-class aggregate as parameters in IR.
> >
> >
> > byval isn't suitable for the purposes of the entry point here, so I don't see what the alternative is with the current constraints. We've defined the ABI input buffer as just the ABI allocation size/alignment of the IR type in order. Clang needs to get this to match what it wants.
>
>
> What I have said, repeatedly, is that that is not a good ABI rule because the alignment of the IR type is not guaranteed to be meaningful or even stable.  Maybe there are good reasons why you don't care about having a stable ABI, but I need you to at least acknowledge that that's what you're doing, because otherwise it is hard for me to approve a patch that I know has this problem.

I don't understand how the alignment of the IR type isn't meaningful or stable. The DataLayout gives us the concept of an ABI alignment, but you seem to be saying this is meaningless (in which case why do we have it?)

I think we have different understandings of what an ABI means in this case. The "call" to a kernel isn't made in a vacuum unlike a normal function. For OpenCL at least, the backend produces metadata in the object file telling the explicit offsets of the individual arguments, which we today compute from the IR argument list. The actual caller isn't using the types to decide where to place arguments, the binary is explicitly telling it where. For some reason CUDA seems to be doing something different here which might be part of my confusion?

> 
> 
>> The alternative I would like is to stop using the IR argument list at all for kernels. All arguments would be accessed from offsets from an intrinsic call
> 
> That seems like a very reasonable alternative, or you could have the function take a single pointer argument and do all accesses relative to that.  I'll leave that up to you.

We would need to make up some other way to track the individual arguments, which I haven't come up with a great solution for (other than keeping them there without uses, which seems pretty awkward). As an optimization we currently do lower the kernel argument uses to look like this, but we compute the offsets from the IR arguments.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55067/new/

https://reviews.llvm.org/D55067