[PATCH] D55067: [HIP] Fix offset of kernel argument for AMDGPU target

John McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Dec 4 14:17:16 PST 2018


rjmccall added a comment.

In D55067#1319047 <https://reviews.llvm.org/D55067#1319047>, @arsenm wrote:

> I don't understand how the alignment of the IR type isn't meaningful or stable. The DataLayout gives us the concept of an ABI alignment, but you seem to be saying this is meaningless (in which case why do we have it?)


LLVM has stable rules for computing the alignment of a particular LLVM struct type.  Clang IRGen does not have stable rules for turning a Clang struct type into an LLVM struct type.  Clang always reserves the right to translate a particular source-language struct type into, say, `{ [8 x i8] }` instead of `{ i32, i32 }`.  This is unlikely for sufficiently simple types just because it's nice for debugging and for the compactness of the generated code if the generated struct type is reasonably similar to the original type, but there are complex cases where the translation is not totally obvious, and we don't want to make promises.  Because the translation to LLVM IR type is not stable, the choice of alignment for the LLVM IR type cannot be guaranteed either.

> I think we have different understandings of what an ABI means in this case. The "call" to a kernel isn't made in a vacuum unlike a normal function. For OpenCL at least, the backend produces metadata in the object file telling the explicit offsets of the individual arguments, which we today compute from the IR argument list. The actual caller isn't using the types to decide where to place arguments, the binary is explicitly telling it where. For some reason CUDA seems to be doing something different here which might be part of my confusion?

If you don't guarantee stability across recompilations for the size and layout of this buffer, and neither callers nor callees make assumptions about the alignment of pointers within it, then you might indeed not have any ABI requirements.  But in that case you might as well just use a packed structure so that you can minimize the number of bytes you have to transfer.

>>> The alternative I would like is to stop using the IR argument list at all for kernels. All arguments would be accessed from offsets from an intrinsic call
>> 
>> That seems like a very reasonable alternative, or you could have the function take a single pointer argument and do all accesses relative to that.  I'll leave that up to you.
> 
> We would need to make up some other way to track the individual arguments, which I haven't come up with a great solution for (other than keeping them there without uses, which seems pretty awkward). As an optimization we currently do lower the kernel argument uses to look like this, but we compute the offsets from the IR arguments.

Okay.  It seems to me like there should be exactly one place responsible for computing the layout of this buffer.  Like, if you really don't care about alignment, you could pack it; or if you do care about alignment but don't need stability, you could reorder the fields to minimize alignment padding; or really anything along those lines.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55067/new/

https://reviews.llvm.org/D55067





More information about the cfe-commits mailing list