[PATCH] D49274: [CUDA] Provide integer SIMD functions for CUDA-9.2

Artem Belevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jul 12 16:14:19 PDT 2018


tra created this revision.
tra added reviewers: jlebar, bkramer.
Herald added subscribers: bixia, sanjoy.

CUDA-9.2 made all integer SIMD functions into compiler builtins,
so clang no longer has access to the implementation of these
functions in either headers of libdevice and has to provide
its own implementation.

This is mostly a 1:1 mapping to a corresponding PTX instructions
with an exception of vhadd2/vhadd4 that don't have an equivalent
instruction and had to be implemented with a bit hack.

Performance of this implementation will be suboptimal for SM_50
and newer GPUs where PTXAS generates noticeably worse code for
the SIMD instructions compared to the code it generates
for the inline assembly generated by nvcc (or used to come
with CUDA headers).


https://reviews.llvm.org/D49274

Files:
  clang/lib/Headers/__clang_cuda_device_functions.h
  clang/lib/Headers/__clang_cuda_libdevice_declares.h

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D49274.155302.patch
Type: text/x-patch
Size: 16122 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20180712/c10b5822/attachment.bin>


More information about the cfe-commits mailing list