<div dir="ltr">I thought the website would pick it up automatically. How can I push that to the website? </div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 2, 2016 at 3:01 PM, Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Jingyue,<br>

<br>

Thanks for updating this! FWIW, however, these changes don't yet seem to be reflected on the web site (<a href="http://llvm.org/docs/CompileCudaWithLLVM.html" rel="noreferrer" target="_blank">http://llvm.org/docs/CompileCudaWithLLVM.html</a>).<br>

<br>

 -Hal<br>

<div class="HOEnZb"><div class="h5"><br>

----- Original Message -----<br>

> From: "Jingyue Wu via llvm-commits" <<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>><br>

> To: <a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>

> Sent: Saturday, January 30, 2016 5:48:47 PM<br>

> Subject: [llvm] r259307 - [doc] improve the doc for CUDA<br>

><br>

> Author: jingyue<br>

> Date: Sat Jan 30 17:48:47 2016<br>

> New Revision: 259307<br>

><br>

> URL: <a href="http://llvm.org/viewvc/llvm-project?rev=259307&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=259307&view=rev</a><br>

> Log:<br>

> [doc] improve the doc for CUDA<br>

><br>

> 1. Mentioned that CUDA support works best with trunk.<br>

> 2. Simplified the example by removing its dependency on the CUDA<br>

> samples.<br>

> 3. Explain the --cuda-gpu-arch flag.<br>

><br>

> Modified:<br>

>     llvm/trunk/docs/CompileCudaWithLLVM.rst<br>

><br>

> Modified: llvm/trunk/docs/CompileCudaWithLLVM.rst<br>

> URL:<br>

> <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/CompileCudaWithLLVM.rst?rev=259307&r1=259306&r2=259307&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/CompileCudaWithLLVM.rst?rev=259307&r1=259306&r2=259307&view=diff</a><br>

> ==============================================================================<br>

> --- llvm/trunk/docs/CompileCudaWithLLVM.rst (original)<br>

> +++ llvm/trunk/docs/CompileCudaWithLLVM.rst Sat Jan 30 17:48:47 2016<br>

> @@ -18,9 +18,11 @@ familiarity with CUDA. Information about<br>

>  How to Build LLVM with CUDA Support<br>

>  ===================================<br>

><br>

> -Below is a quick summary of downloading and building LLVM. Consult<br>

> the `Getting<br>

> -Started <<a href="http://llvm.org/docs/GettingStarted.html" rel="noreferrer" target="_blank">http://llvm.org/docs/GettingStarted.html</a>>`_ page for more<br>

> details on<br>

> -setting up LLVM.<br>

> +CUDA support is still in development and works the best in the trunk<br>

> version<br>

> +of LLVM. Below is a quick summary of downloading and building the<br>

> trunk<br>

> +version. Consult the `Getting Started<br>

> +<<a href="http://llvm.org/docs/GettingStarted.html" rel="noreferrer" target="_blank">http://llvm.org/docs/GettingStarted.html</a>>`_ page for more details<br>

> on setting<br>

> +up LLVM.<br>

><br>

>  #. Checkout LLVM<br>

><br>

> @@ -60,8 +62,6 @@ which multiplies a ``float`` array by a<br>

><br>

>  .. code-block:: c++<br>

><br>

> -  #include <helper_cuda.h> // for checkCudaErrors<br>

> -<br>

>    #include <iostream><br>

><br>

>    __global__ void axpy(float a, float* x, float* y) {<br>

> @@ -78,25 +78,25 @@ which multiplies a ``float`` array by a<br>

>      // Copy input data to device.<br>

>      float* device_x;<br>

>      float* device_y;<br>

> -    checkCudaErrors(cudaMalloc(&device_x, kDataLen *<br>

> sizeof(float)));<br>

> -    checkCudaErrors(cudaMalloc(&device_y, kDataLen *<br>

> sizeof(float)));<br>

> -    checkCudaErrors(cudaMemcpy(device_x, host_x, kDataLen *<br>

> sizeof(float),<br>

> -                               cudaMemcpyHostToDevice));<br>

> +    cudaMalloc(&device_x, kDataLen * sizeof(float));<br>

> +    cudaMalloc(&device_y, kDataLen * sizeof(float));<br>

> +    cudaMemcpy(device_x, host_x, kDataLen * sizeof(float),<br>

> +               cudaMemcpyHostToDevice);<br>

><br>

>      // Launch the kernel.<br>

>      axpy<<<1, kDataLen>>>(a, device_x, device_y);<br>

><br>

>      // Copy output data to host.<br>

> -    checkCudaErrors(cudaDeviceSynchronize());<br>

> -    checkCudaErrors(cudaMemcpy(host_y, device_y, kDataLen *<br>

> sizeof(float),<br>

> -                               cudaMemcpyDeviceToHost));<br>

> +    cudaDeviceSynchronize();<br>

> +    cudaMemcpy(host_y, device_y, kDataLen * sizeof(float),<br>

> +               cudaMemcpyDeviceToHost);<br>

><br>

>      // Print the results.<br>

>      for (int i = 0; i < kDataLen; ++i) {<br>

>        std::cout << "y[" << i << "] = " << host_y[i] << "\n";<br>

>      }<br>

><br>

> -    checkCudaErrors(cudaDeviceReset());<br>

> +    cudaDeviceReset();<br>

>      return 0;<br>

>    }<br>

><br>

> @@ -104,16 +104,20 @@ The command line for compilation is simi<br>

><br>

>  .. code-block:: console<br>

><br>

> -  $ clang++ -o axpy -I<CUDA install path>/samples/common/inc -L<CUDA<br>

> install path>/<lib64 or lib> <a href="http://axpy.cu" rel="noreferrer" target="_blank">axpy.cu</a> -lcudart_static -lcuda -ldl<br>

> -lrt -pthread<br>

> +  $ clang++ <a href="http://axpy.cu" rel="noreferrer" target="_blank">axpy.cu</a> -o axpy --cuda-gpu-arch=<GPU arch>  \<br>

> +      -L<CUDA install path>/<lib64 or lib>              \<br>

> +      -lcudart_static -ldl -lrt -pthread<br>

>    $ ./axpy<br>

>    y[0] = 2<br>

>    y[1] = 4<br>

>    y[2] = 6<br>

>    y[3] = 8<br>

><br>

> -Note that ``helper_cuda.h`` comes from the CUDA samples, so you need<br>

> the<br>

> -samples installed for this example. ``<CUDA install path>`` is the<br>

> root<br>

> -directory where you installed CUDA SDK, typically<br>

> ``/usr/local/cuda``.<br>

> +``<CUDA install path>`` is the root directory where you installed<br>

> CUDA SDK,<br>

> +typically ``/usr/local/cuda``. ``<GPU arch>`` is `the compute<br>

> capability of<br>

> +your GPU <<a href="https://developer.nvidia.com/cuda-gpus" rel="noreferrer" target="_blank">https://developer.nvidia.com/cuda-gpus</a>>`_. For example, if<br>

> you want<br>

> +to run your program on a GPU with compute capability of 3.5, you<br>

> should specify<br>

> +``--cuda-gpu-arch=sm_35``.<br>

><br>

>  Optimizations<br>

>  =============<br>

><br>

><br>

> _______________________________________________<br>

> llvm-commits mailing list<br>

> <a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>

> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>

><br>

<br>

</div></div><span class="HOEnZb"><font color="#888888">--<br>

Hal Finkel<br>

Assistant Computational Scientist<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

</font></span></blockquote></div><br></div>