[PATCH] D14370: [doc] Compile CUDA with LLVM

Fri Nov 6 09:59:31 PST 2015

tra added inline comments.

================
Comment at: docs/CompileCudaWithLLVM.rst:207
@@ +206,3 @@
+
+     $ clang++ -I<CUDA install path>/include -I<CUDA install path>/samples/common/inc -L<CUDA install path>/<lib64 or lib> axpy.cc -lcudart_static -lcuda -ldl -lrt -pthread
+
----------------
jingyue wrote:
> tra wrote:
> > jingyue wrote:
> > > tra wrote:
> > > > You can pass device-side PTX to the host's cc1 with "-fcuda-include-gpubinary axpy.ptx"
> > > > and clang will embed PTX into host object file and will generate code to register kernels so that kernels can be launched with <<<...>>> without any additional steps.
> > > > 
> > > > 
> > > Can you clarify how to do this? I tried using `-Xclang` to set the `-fcuda-include-gpubinary` flag, but got the following. 
> > > 
> > > ```
> > > $ clang++ -Xclang -fcuda-include-gpubinary -Xclang axpy.ptx axpy.cc -I$CUDA_ROOT/include -I$CUDA_ROOT/samples/common/inc -L$CUDA_ROOT/lib64 -lcudart_static -lcuda -ldl -lrt -pthread
> > > axpy.cc:39:3: error: use of undeclared identifier 'axpy'
> > >   axpy<<<1, kDataLen>>>(a, device_x, device_y);
> > >   ^
> > > axpy.cc:39:9: error: expected expression
> > >   axpy<<<1, kDataLen>>>(a, device_x, device_y);
> > >         ^
> > > axpy.cc:39:23: error: expected expression
> > >   axpy<<<1, kDataLen>>>(a, device_x, device_y);
> > >                       ^
> > > axpy.cc:39:25: warning: expression result unused [-Wunused-value]
> > >   axpy<<<1, kDataLen>>>(a, device_x, device_y);
> > >                         ^
> > > axpy.cc:39:28: warning: expression result unused [-Wunused-value]
> > >   axpy<<<1, kDataLen>>>(a, device_x, device_y);
> > >                            ^~~~~~~~
> > > 2 warnings and 3 errors generated.
> > > ```
> > The kernel must be present in axpy.cu during host compilation so compiler can generate host-side stub for kernel launch, so it only works without splitting.
> Still have issues with that. 
> 
> However, I managed to apply your three pending patches, and the patched version works great! So, I think it makes more sense for this document to ask early adopters to apply the patches and try the more functional patched version. Agree? 
Sure. The patches simplify large portion of this section down to 

```
clang++ -o axpy [...] axpy.cu

```
I'll need to add details on various CUDA-related options I've added to clang.
Do you want to incorporate them into this patch of should I do that after you've committed the docs?

http://reviews.llvm.org/D14370