[PATCH] D81311: [RFC] LangRef: Define inmem parameter attribute

Fri Jun 5 15:43:38 PDT 2020

arsenm created this revision.
arsenm added reviewers: rjmccall, jdoerfert, efriedma, t-tye, yaxunl, scott.linder, rnk, spatel, lebedev.ri, nlopes, fhahn, hfinkel, Anastasia.
Herald added subscribers: tpr, wdng.
Herald added a project: LLVM.

This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.

My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the argument,
although nothing does this in reality. There is also talk of removing
and replacing the byval attribute, so a new attribute would need to
take its place anyway.

I went with the name inmem to mirror inreg. Other name ideas I had are
"indirect", "abitype", or "pointee". One question I have is whether
this attribute is necessary, or if the definition of preallocated can
be refined to fit this case. The current description is heavy on what
it means for the call site, but I didn't understand the implications
for the callee. For the amdgpu_kernel use case, calls are illegal so
all of the details about call setup are irrelevant.

This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.

Background:

There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.

It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.

The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.

This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.

Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.

The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.

Additional context is in D79744 <https://reviews.llvm.org/D79744>.


https://reviews.llvm.org/D81311

Files:
  llvm/docs/LangRef.rst
  llvm/docs/ReleaseNotes.rst


Index: llvm/docs/ReleaseNotes.rst
===================================================================

--- llvm/docs/ReleaseNotes.rst
+++ llvm/docs/ReleaseNotes.rst
@@ -74,6 +74,9 @@
   information. This information is used to represent Fortran modules debug
   info at IR level.
 
+* Added the ``inmem`` attribute to better represent argument passing
+  for the `amdgpu_kernel` calling convention.
+
 Changes to building LLVM
 ------------------------
 
@@ -134,6 +137,9 @@
   retain the old behavior should explicitly request f32 denormal
   flushing.
 
+* The new ``inmem`` attribute is now the preferred method for
+  representing aggregate kernel arguments.
+
 Changes to the AVR Target
 -----------------------------
 
Index: llvm/docs/LangRef.rst
===================================================================
--- llvm/docs/LangRef.rst
+++ llvm/docs/LangRef.rst
@@ -1066,6 +1066,28 @@
     site. If the alignment is not specified, then the code generator
     makes a target-specific assumption.
 
+.. _attr_inmem:
+
+``inmem(<ty>)``
+
+    The ``inmem`` argument attribute allows specifying the pointee
+    memory type of an argument for ABI purposes. This is similar to
+    ``byval``, but does not imply a copy is made anywhere, or that the
+    argument is passed on the stack. This implies the pointer is
+    dereferenceable up to the storage size of the type.
+
+    It is not generally permissible to introduce a write to an
+    ``inmem`` pointer, unless it is known this will not produce an
+    observable change in the caller. The pointer may have any address
+    space and may be read only.
+
+    This is not a valid attribute for return values.
+
+    The alignment for an ``inmem`` parameter can be explicitly
+    specified by combining it with the ``align`` attribute, similar to
+    ``byval``. If the alignment is not specified, then the code generator
+    makes a target-specific assumption.
+
 .. _attr_preallocated:
 
 ``preallocated(<ty>)``


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D81311.268951.patch
Type: text/x-patch
Size: 1979 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20200605/2d784c27/attachment-0001.bin>