[PATCH] D14370: [doc] Compile CUDA with LLVM

Thu Nov 5 16:58:42 PST 2015

tra added inline comments.

================
Comment at: docs/CompileCudaWithLLVM.rst:207
@@ +206,3 @@
+
+     $ clang++ -I<CUDA install path>/include -I<CUDA install path>/samples/common/inc -L<CUDA install path>/<lib64 or lib> axpy.cc -lcudart_static -lcuda -ldl -lrt -pthread
+
----------------
jingyue wrote:
> tra wrote:
> > You can pass device-side PTX to the host's cc1 with "-fcuda-include-gpubinary axpy.ptx"
> > and clang will embed PTX into host object file and will generate code to register kernels so that kernels can be launched with <<<...>>> without any additional steps.
> > 
> > 
> Can you clarify how to do this? I tried using `-Xclang` to set the `-fcuda-include-gpubinary` flag, but got the following. 
> 
> ```
> $ clang++ -Xclang -fcuda-include-gpubinary -Xclang axpy.ptx axpy.cc -I$CUDA_ROOT/include -I$CUDA_ROOT/samples/common/inc -L$CUDA_ROOT/lib64 -lcudart_static -lcuda -ldl -lrt -pthread
> axpy.cc:39:3: error: use of undeclared identifier 'axpy'
>   axpy<<<1, kDataLen>>>(a, device_x, device_y);
>   ^
> axpy.cc:39:9: error: expected expression
>   axpy<<<1, kDataLen>>>(a, device_x, device_y);
>         ^
> axpy.cc:39:23: error: expected expression
>   axpy<<<1, kDataLen>>>(a, device_x, device_y);
>                       ^
> axpy.cc:39:25: warning: expression result unused [-Wunused-value]
>   axpy<<<1, kDataLen>>>(a, device_x, device_y);
>                         ^
> axpy.cc:39:28: warning: expression result unused [-Wunused-value]
>   axpy<<<1, kDataLen>>>(a, device_x, device_y);
>                            ^~~~~~~~
> 2 warnings and 3 errors generated.
> ```
The kernel must be present in axpy.cu during host compilation so compiler can generate host-side stub for kernel launch, so it only works without splitting.

http://reviews.llvm.org/D14370