[all-commits] [llvm/llvm-project] 30eeb7: clang: Use byref for aggregate kernel arguments

Matt Arsenault via All-commits all-commits at lists.llvm.org
Thu Aug 6 12:52:45 PDT 2020


  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 30eeb742f1d11d7a7036e3b8a3bffc1dfd252082
      https://github.com/llvm/llvm-project/commit/30eeb742f1d11d7a7036e3b8a3bffc1dfd252082
  Author: Matt Arsenault <Matthew.Arsenault at amd.com>
  Date:   2020-08-06 (Thu, 06 Aug 2020)

  Changed paths:
    M clang/include/clang/CodeGen/CGFunctionInfo.h
    M clang/lib/CodeGen/CGCall.cpp
    M clang/lib/CodeGen/TargetInfo.cpp
    M clang/test/CodeGenCUDA/kernel-args.cu
    M clang/test/CodeGenOpenCL/amdgpu-abi-struct-coerce.cl

  Log Message:
  -----------
  clang: Use byref for aggregate kernel arguments

Add address space to indirect abi info and use it for kernels.

Previously, indirect arguments assumed assumed a stack passed object
in the alloca address space using byval. A stack pointer is unsuitable
for kernel arguments, which are passed in a separate, constant buffer
with a different address space.

Start using the new byref for aggregate kernel arguments. Previously
these were emitted as raw struct arguments, and turned into loads in
the backend. These will lower identically, although with byref you now
have the option of applying an explicit alignment. In the future, a
reasonable implementation would use byref for all kernel arguments
(this would be a practical problem at the moment due to losing things
like noalias on pointer arguments).

This is mostly to avoid fighting the optimizer's treatment of
aggregate load/store. SROA and instcombine both turn aggregate loads
and stores into a long sequence of element loads and stores, rather
than the optimizable memcpy I would expect in this situation. Now an
explicit memcpy will be introduced up-front which is better understood
and helps eliminate the alloca in more situations.

This skips using byref in the case where HIP kernel pointer arguments
in structs are promoted to global pointers. At minimum an additional
patch is needed to allow coercion with indirect arguments. This also
skips using it for OpenCL due to the current workaround used to
support kernels calling kernels. Distinct function bodies would need
to be generated up front instead of emitting an illegal call.




More information about the All-commits mailing list