<div dir="ltr">I'll let you do that after this patch. You know much better than me on those options. </div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Nov 6, 2015 at 9:59 AM, Artem Belevich <span dir="ltr"><<a href="mailto:tra@google.com" target="_blank">tra@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">tra added inline comments.<br>

<span class=""><br>

================<br>

Comment at: docs/CompileCudaWithLLVM.rst:207<br>

@@ +206,3 @@<br>

+<br>

+     $ clang++ -I<CUDA install path>/include -I<CUDA install path>/samples/common/inc -L<CUDA install path>/<lib64 or lib> axpy.cc -lcudart_static -lcuda -ldl -lrt -pthread<br>

+<br>

----------------<br>

</span><div><div class="h5">jingyue wrote:<br>

> tra wrote:<br>

> > jingyue wrote:<br>

> > > tra wrote:<br>

> > > > You can pass device-side PTX to the host's cc1 with "-fcuda-include-gpubinary axpy.ptx"<br>

> > > > and clang will embed PTX into host object file and will generate code to register kernels so that kernels can be launched with <<<...>>> without any additional steps.<br>

> > > ><br>

> > > ><br>

> > > Can you clarify how to do this? I tried using `-Xclang` to set the `-fcuda-include-gpubinary` flag, but got the following.<br>

> > ><br>

> > > ```<br>

> > > $ clang++ -Xclang -fcuda-include-gpubinary -Xclang axpy.ptx axpy.cc -I$CUDA_ROOT/include -I$CUDA_ROOT/samples/common/inc -L$CUDA_ROOT/lib64 -lcudart_static -lcuda -ldl -lrt -pthread<br>

> > > axpy.cc:39:3: error: use of undeclared identifier 'axpy'<br>

> > >   axpy<<<1, kDataLen>>>(a, device_x, device_y);<br>

> > >   ^<br>

> > > axpy.cc:39:9: error: expected expression<br>

> > >   axpy<<<1, kDataLen>>>(a, device_x, device_y);<br>

> > >         ^<br>

> > > axpy.cc:39:23: error: expected expression<br>

> > >   axpy<<<1, kDataLen>>>(a, device_x, device_y);<br>

> > >                       ^<br>

> > > axpy.cc:39:25: warning: expression result unused [-Wunused-value]<br>

> > >   axpy<<<1, kDataLen>>>(a, device_x, device_y);<br>

> > >                         ^<br>

> > > axpy.cc:39:28: warning: expression result unused [-Wunused-value]<br>

> > >   axpy<<<1, kDataLen>>>(a, device_x, device_y);<br>

> > >                            ^~~~~~~~<br>

> > > 2 warnings and 3 errors generated.<br>

> > > ```<br>

> > The kernel must be present in <a href="http://axpy.cu" rel="noreferrer" target="_blank">axpy.cu</a> during host compilation so compiler can generate host-side stub for kernel launch, so it only works without splitting.<br>

> Still have issues with that.<br>

><br>

> However, I managed to apply your three pending patches, and the patched version works great! So, I think it makes more sense for this document to ask early adopters to apply the patches and try the more functional patched version. Agree?<br>

</div></div>Sure. The patches simplify large portion of this section down to<br>

<br>

```<br>

clang++ -o axpy [...] <a href="http://axpy.cu" rel="noreferrer" target="_blank">axpy.cu</a><br>

<br>

```<br>

I'll need to add details on various CUDA-related options I've added to clang.<br>

Do you want to incorporate them into this patch of should I do that after you've committed the docs?<br>

<br>

<br>

<a href="http://reviews.llvm.org/D14370" rel="noreferrer" target="_blank">http://reviews.llvm.org/D14370</a><br>

<br>

<br>

<br>

</blockquote></div><br></div>