[llvm] r259307 - [doc] improve the doc for CUDA

Tue Feb 2 15:01:08 PST 2016

Hi Jingyue,

Thanks for updating this! FWIW, however, these changes don't yet seem to be reflected on the web site (http://llvm.org/docs/CompileCudaWithLLVM.html).

 -Hal

----- Original Message -----
> From: "Jingyue Wu via llvm-commits" <llvm-commits at lists.llvm.org>
> To: llvm-commits at lists.llvm.org
> Sent: Saturday, January 30, 2016 5:48:47 PM
> Subject: [llvm] r259307 - [doc] improve the doc for CUDA
> 
> Author: jingyue
> Date: Sat Jan 30 17:48:47 2016
> New Revision: 259307
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=259307&view=rev
> Log:
> [doc] improve the doc for CUDA
> 
> 1. Mentioned that CUDA support works best with trunk.
> 2. Simplified the example by removing its dependency on the CUDA
> samples.
> 3. Explain the --cuda-gpu-arch flag.
> 
> Modified:
>     llvm/trunk/docs/CompileCudaWithLLVM.rst
> 
> Modified: llvm/trunk/docs/CompileCudaWithLLVM.rst
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/CompileCudaWithLLVM.rst?rev=259307&r1=259306&r2=259307&view=diff
> ==============================================================================
> --- llvm/trunk/docs/CompileCudaWithLLVM.rst (original)
> +++ llvm/trunk/docs/CompileCudaWithLLVM.rst Sat Jan 30 17:48:47 2016
> @@ -18,9 +18,11 @@ familiarity with CUDA. Information about
>  How to Build LLVM with CUDA Support
>  ===================================
>  
> -Below is a quick summary of downloading and building LLVM. Consult
> the `Getting
> -Started <http://llvm.org/docs/GettingStarted.html>`_ page for more
> details on
> -setting up LLVM.
> +CUDA support is still in development and works the best in the trunk
> version
> +of LLVM. Below is a quick summary of downloading and building the
> trunk
> +version. Consult the `Getting Started
> +<http://llvm.org/docs/GettingStarted.html>`_ page for more details
> on setting
> +up LLVM.
>  
>  #. Checkout LLVM
>  
> @@ -60,8 +62,6 @@ which multiplies a ``float`` array by a
>  
>  .. code-block:: c++
>  
> -  #include <helper_cuda.h> // for checkCudaErrors
> -
>    #include <iostream>
>  
>    __global__ void axpy(float a, float* x, float* y) {
> @@ -78,25 +78,25 @@ which multiplies a ``float`` array by a
>      // Copy input data to device.
>      float* device_x;
>      float* device_y;
> -    checkCudaErrors(cudaMalloc(&device_x, kDataLen *
> sizeof(float)));
> -    checkCudaErrors(cudaMalloc(&device_y, kDataLen *
> sizeof(float)));
> -    checkCudaErrors(cudaMemcpy(device_x, host_x, kDataLen *
> sizeof(float),
> -                               cudaMemcpyHostToDevice));
> +    cudaMalloc(&device_x, kDataLen * sizeof(float));
> +    cudaMalloc(&device_y, kDataLen * sizeof(float));
> +    cudaMemcpy(device_x, host_x, kDataLen * sizeof(float),
> +               cudaMemcpyHostToDevice);
>  
>      // Launch the kernel.
>      axpy<<<1, kDataLen>>>(a, device_x, device_y);
>  
>      // Copy output data to host.
> -    checkCudaErrors(cudaDeviceSynchronize());
> -    checkCudaErrors(cudaMemcpy(host_y, device_y, kDataLen *
> sizeof(float),
> -                               cudaMemcpyDeviceToHost));
> +    cudaDeviceSynchronize();
> +    cudaMemcpy(host_y, device_y, kDataLen * sizeof(float),
> +               cudaMemcpyDeviceToHost);
>  
>      // Print the results.
>      for (int i = 0; i < kDataLen; ++i) {
>        std::cout << "y[" << i << "] = " << host_y[i] << "\n";
>      }
>  
> -    checkCudaErrors(cudaDeviceReset());
> +    cudaDeviceReset();
>      return 0;
>    }
>  
> @@ -104,16 +104,20 @@ The command line for compilation is simi
>  
>  .. code-block:: console
>  
> -  $ clang++ -o axpy -I<CUDA install path>/samples/common/inc -L<CUDA
> install path>/<lib64 or lib> axpy.cu -lcudart_static -lcuda -ldl
> -lrt -pthread
> +  $ clang++ axpy.cu -o axpy --cuda-gpu-arch=<GPU arch>  \
> +      -L<CUDA install path>/<lib64 or lib>              \
> +      -lcudart_static -ldl -lrt -pthread
>    $ ./axpy
>    y[0] = 2
>    y[1] = 4
>    y[2] = 6
>    y[3] = 8
>  
> -Note that ``helper_cuda.h`` comes from the CUDA samples, so you need
> the
> -samples installed for this example. ``<CUDA install path>`` is the
> root
> -directory where you installed CUDA SDK, typically
> ``/usr/local/cuda``.
> +``<CUDA install path>`` is the root directory where you installed
> CUDA SDK,
> +typically ``/usr/local/cuda``. ``<GPU arch>`` is `the compute
> capability of
> +your GPU <https://developer.nvidia.com/cuda-gpus>`_. For example, if
> you want
> +to run your program on a GPU with compute capability of 3.5, you
> should specify
> +``--cuda-gpu-arch=sm_35``.
>  
>  Optimizations
>  =============
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory