[PATCH] D14370: [doc] Compile CUDA with LLVM

Thu Nov 5 22:39:03 PST 2015

jingyue added inline comments.

================
Comment at: docs/CompileCudaWithLLVM.rst:207
@@ +206,3 @@
+
+     $ clang++ -I<CUDA install path>/include -I<CUDA install path>/samples/common/inc -L<CUDA install path>/<lib64 or lib> axpy.cc -lcudart_static -lcuda -ldl -lrt -pthread
+
----------------
tra wrote:
> jingyue wrote:
> > tra wrote:
> > > You can pass device-side PTX to the host's cc1 with "-fcuda-include-gpubinary axpy.ptx"
> > > and clang will embed PTX into host object file and will generate code to register kernels so that kernels can be launched with <<<...>>> without any additional steps.
> > > 
> > > 
> > Can you clarify how to do this? I tried using `-Xclang` to set the `-fcuda-include-gpubinary` flag, but got the following. 
> > 
> > ```
> > $ clang++ -Xclang -fcuda-include-gpubinary -Xclang axpy.ptx axpy.cc -I$CUDA_ROOT/include -I$CUDA_ROOT/samples/common/inc -L$CUDA_ROOT/lib64 -lcudart_static -lcuda -ldl -lrt -pthread
> > axpy.cc:39:3: error: use of undeclared identifier 'axpy'
> >   axpy<<<1, kDataLen>>>(a, device_x, device_y);
> >   ^
> > axpy.cc:39:9: error: expected expression
> >   axpy<<<1, kDataLen>>>(a, device_x, device_y);
> >         ^
> > axpy.cc:39:23: error: expected expression
> >   axpy<<<1, kDataLen>>>(a, device_x, device_y);
> >                       ^
> > axpy.cc:39:25: warning: expression result unused [-Wunused-value]
> >   axpy<<<1, kDataLen>>>(a, device_x, device_y);
> >                         ^
> > axpy.cc:39:28: warning: expression result unused [-Wunused-value]
> >   axpy<<<1, kDataLen>>>(a, device_x, device_y);
> >                            ^~~~~~~~
> > 2 warnings and 3 errors generated.
> > ```
> The kernel must be present in axpy.cu during host compilation so compiler can generate host-side stub for kernel launch, so it only works without splitting.
Still have issues with that. 

However, I managed to apply your three pending patches, and the patched version works great! So, I think it makes more sense for this document to ask early adopters to apply the patches and try the more functional patched version. Agree? 

http://reviews.llvm.org/D14370