[PATCH] D11993: [CUDA] Make sure we emit all templated __global__ functions on device side. Again.

Artem Belevich via cfe-commits cfe-commits at lists.llvm.org
Wed Aug 12 14:30:57 PDT 2015

tra created this revision.
tra added reviewers: echristo, rsmith, eliben.
tra added a subscriber: cfe-commits.

This is a somewhat different way to do it than D11666 which got rolled back.

Codegen postpones emitting instantiated kernel function template until it's used.
If kernel is used only from the host side (which is normally the case) we'll never emit 
it because on device side we don't emit the host code that uses it.

The change allows CUDA kernels to be emitted on device side unconditionally.
It's overly conservative and may emit more functions than we really need, but it 
guarantees that the kernels launched from the host side are do exist on device-side.
In case it ever causes issues, there are other ways to address the issue, 
though they are more invasive and are currently not worth the trouble.



Index: test/CodeGenCUDA/ptx-kernels.cu
--- test/CodeGenCUDA/ptx-kernels.cu
+++ test/CodeGenCUDA/ptx-kernels.cu
@@ -1,3 +1,7 @@
+// Make sure that __global__ functions are emitted along with correct
+// annotations and are added to @llvm.used to prevent their elimination.
+// REQUIRES: nvptx-registered-target
 // RUN: %clang_cc1 %s -triple nvptx-unknown-unknown -fcuda-is-device -emit-llvm -o - | FileCheck %s
 #include "Inputs/cuda.h"
@@ -13,4 +17,10 @@
+// Make sure host-instantiated kernels are preserved on device side.
+template <typename T> __global__ void templated_kernel(T param) {}
+// CHECK-LABEL: define linkonce_odr void @_Z16templated_kernelIiEvT_
+void host_function() { templated_kernel<<<0,0>>>(0); }
 // CHECK: !{{[0-9]+}} = !{void ()* @global_function, !"kernel", i32 1}
+// CHECK: !{{[0-9]+}} = !{void (i32)* @_Z16templated_kernelIiEvT_, !"kernel", i32 1}
Index: lib/AST/ASTContext.cpp
--- lib/AST/ASTContext.cpp
+++ lib/AST/ASTContext.cpp
@@ -8336,7 +8336,19 @@
     // Constructors and destructors are required.
     if (FD->hasAttr<ConstructorAttr>() || FD->hasAttr<DestructorAttr>())
       return true;
+    // Force all CUDA kernels to be emitted on device side.
+    // Otherwise, templated kernels may never be emitted as they are
+    // only used from host-side code which we never emit on device
+    // side and which therefore would never trigger us to emit
+    // device-side kernel it might've instantiated. The trade-off is
+    // that emitting all kernels is over-conservative and we may emit
+    // more of them than necessary. If excess of generated GPU code
+    // becomes a problem we can revisit this.
+    if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice &&
+        FD->hasAttr<CUDAGlobalAttr>())
+      return true;
     // The key function for a class is required.  This rule only comes
     // into play when inline functions can be key functions, though.
     if (getTargetInfo().getCXXABI().canKeyFunctionBeInline()) {

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D11993.31973.patch
Type: text/x-patch
Size: 2159 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150812/4d7aef1d/attachment.bin>

More information about the cfe-commits mailing list