[PATCH] D118084: [CUDA, NVPTX] Pass byval aggregates directly

Artem Belevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jan 24 15:37:39 PST 2022


tra created this revision.
tra added reviewers: jdoerfert, yaxunl.
Herald added subscribers: asavonic, bixia.
tra requested review of this revision.
Herald added a project: clang.

Changes the NVPTX ABI to pass aggregates directly.  Only clang-generated IR is
affected. The change does not affect ABI on thechange function signatures in the
generated PTX

Discussion: https://llvm.discourse.group/t/nvptx-calling-convention-for-aggregate-arguments-passed-by-value

Currently NVPTX ABI passes aggregate values indirectly as a byval pointer.  When
we need to pass a *value*, LLVM has to store it in an alloca, so it can have a
pointer to pass on. This is a double whammy for NVPTX. LLVM often fails to
eliminate that alloca (usually SROA considers such pointer as escaped and gives
up) and that is noticeable hit on performance. When we lower IR to PTX, the
argument is actually passed by copy, so we end up having to do more work just to
get the value loaded back from the alloca. So, we do more work for less
performance. Switching to passing aggregates directly allows us to generate
better code.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D118084

Files:
  clang/lib/CodeGen/TargetInfo.cpp
  clang/test/CodeGen/nvptx-abi.c
  clang/test/CodeGenCUDA/kernel-args-alignment.cu
  clang/test/CodeGenCUDA/kernel-args.cu
  clang/test/OpenMP/nvptx_unsupported_type_codegen.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D118084.402701.patch
Type: text/x-patch
Size: 4718 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20220124/8ab4dcb1/attachment.bin>


More information about the cfe-commits mailing list