[PATCH] D21337: [CUDA] Give templated device functions internal linkage, templated kernels external linkage.
Justin Lebar via cfe-commits
cfe-commits at lists.llvm.org
Tue Jun 14 12:55:41 PDT 2016
jlebar created this revision.
jlebar added a reviewer: rsmith.
jlebar added subscribers: tra, cfe-commits.
This lets LLVM perform IPO over these functions. In particular, it
allows LLVM to emit ld.global.nc for loads to __restrict pointers in
kernels that are never written to.
http://reviews.llvm.org/D21337
Files:
lib/CodeGen/CodeGenModule.cpp
test/CodeGenCUDA/ptx-kernels.cu
Index: test/CodeGenCUDA/ptx-kernels.cu
===================================================================
--- test/CodeGenCUDA/ptx-kernels.cu
+++ test/CodeGenCUDA/ptx-kernels.cu
@@ -19,11 +19,11 @@
// Make sure host-instantiated kernels are preserved on device side.
template <typename T> __global__ void templated_kernel(T param) {}
-// CHECK-DAG: define weak_odr void @_Z16templated_kernelIiEvT_(
+// CHECK-DAG: define void @_Z16templated_kernelIiEvT_(
namespace {
__global__ void anonymous_ns_kernel() {}
-// CHECK-DAG: define weak_odr void @_ZN12_GLOBAL__N_119anonymous_ns_kernelEv(
+// CHECK-DAG: define void @_ZN12_GLOBAL__N_119anonymous_ns_kernelEv(
}
void host_function() {
Index: lib/CodeGen/CodeGenModule.cpp
===================================================================
--- lib/CodeGen/CodeGenModule.cpp
+++ lib/CodeGen/CodeGenModule.cpp
@@ -2679,9 +2679,18 @@
// explicit instantiations can occur in multiple translation units
// and must all be equivalent. However, we are not allowed to
// throw away these explicit instantiations.
- if (Linkage == GVA_StrongODR)
- return !Context.getLangOpts().AppleKext ? llvm::Function::WeakODRLinkage
- : llvm::Function::ExternalLinkage;
+ //
+ // We don't currently support CUDA device code spread out across multiple TUs,
+ // so say that CUDA templates are either external (for kernels) or internal.
+ // This lets llvm perform aggressive inter-procedural optimizations.
+ if (Linkage == GVA_StrongODR) {
+ if (Context.getLangOpts().AppleKext)
+ return llvm::Function::ExternalLinkage;
+ if (Context.getLangOpts().CUDA && Context.getLangOpts().CUDAIsDevice)
+ return D->hasAttr<CUDAGlobalAttr>() ? llvm::Function::ExternalLinkage
+ : llvm::Function::InternalLinkage;
+ return llvm::Function::WeakODRLinkage;
+ }
// C++ doesn't have tentative definitions and thus cannot have common
// linkage.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D21337.60728.patch
Type: text/x-patch
Size: 1998 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20160614/cd447117/attachment.bin>
More information about the cfe-commits
mailing list