[PATCH] D19990: [CUDA] Implement __ldg using intrinsics.

Thu May 5 13:02:30 PDT 2016

jlebar created this revision.
jlebar added reviewers: tra, rsmith.
jlebar added subscribers: cfe-commits, jhen.
Herald added a subscriber: jholewinski.

Previously it was implemented as inline asm in the CUDA headers.

This change allows us to use the [addr+imm] addressing mode when
executing ld.global.nc instructions.  This translates into a 1.3x
speedup on some benchmarks that call this instruction from within an
unrolled loop.

http://reviews.llvm.org/D19990

Files:
  include/clang/Basic/BuiltinsNVPTX.def
  lib/CodeGen/CGBuiltin.cpp
  lib/Headers/CMakeLists.txt
  lib/Headers/__clang_cuda_intrinsics.h
  lib/Headers/__clang_cuda_runtime_wrapper.h
  test/CodeGen/builtins-nvptx.c

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D19990.56331.patch
Type: text/x-patch
Size: 21799 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20160505/9f72fee8/attachment-0001.bin>