[PATCH] D19990: [CUDA] Implement __ldg using intrinsics.
Justin Lebar via cfe-commits
cfe-commits at lists.llvm.org
Thu May 5 13:02:30 PDT 2016
jlebar created this revision.
jlebar added reviewers: tra, rsmith.
jlebar added subscribers: cfe-commits, jhen.
Herald added a subscriber: jholewinski.
Previously it was implemented as inline asm in the CUDA headers.
This change allows us to use the [addr+imm] addressing mode when
executing ld.global.nc instructions. This translates into a 1.3x
speedup on some benchmarks that call this instruction from within an
unrolled loop.
http://reviews.llvm.org/D19990
Files:
include/clang/Basic/BuiltinsNVPTX.def
lib/CodeGen/CGBuiltin.cpp
lib/Headers/CMakeLists.txt
lib/Headers/__clang_cuda_intrinsics.h
lib/Headers/__clang_cuda_runtime_wrapper.h
test/CodeGen/builtins-nvptx.c
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D19990.56331.patch
Type: text/x-patch
Size: 21799 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20160505/9f72fee8/attachment-0001.bin>
More information about the cfe-commits
mailing list