[llvm] r259307 - [doc] improve the doc for CUDA

Sat Feb 6 04:41:02 PST 2016

----- Original Message -----
> From: "Tanya Lattner" <tanyalattner at llvm.org>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Jingyue Wu" <jingyue at google.com>, "llvm-commits" <llvm-commits at lists.llvm.org>
> Sent: Saturday, February 6, 2016 1:40:28 AM
> Subject: Re: [llvm] r259307 - [doc] improve the doc for CUDA
> 
> 
> > On Feb 2, 2016, at 3:52 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> > 
> > ----- Original Message -----
> >> From: "Jingyue Wu" <jingyue at google.com>
> >> To: "Hal Finkel" <hfinkel at anl.gov>
> >> Cc: "llvm-commits" <llvm-commits at lists.llvm.org>
> >> Sent: Tuesday, February 2, 2016 5:39:36 PM
> >> Subject: Re: [llvm] r259307 - [doc] improve the doc for CUDA
> >> 
> >> 
> >> I thought the website would pick it up automatically. How can I
> >> push
> >> that to the website?
> > 
> > It should. Maybe something is broken now? cc'ing Tanya; she might
> > know.
> > 
> 
> This should now be working again. Please confirm you are seeing the
> correct html documents.

Looks good now. Thanks!

 -Hal

> 
> -Tanya
> 
> > -Hal
> > 
> >> 
> >> 
> >> On Tue, Feb 2, 2016 at 3:01 PM, Hal Finkel < hfinkel at anl.gov >
> >> wrote:
> >> 
> >> 
> >> Hi Jingyue,
> >> 
> >> Thanks for updating this! FWIW, however, these changes don't yet
> >> seem
> >> to be reflected on the web site (
> >> http://llvm.org/docs/CompileCudaWithLLVM.html ).
> >> 
> >> -Hal
> >> 
> >> 
> >> 
> >> ----- Original Message -----
> >>> From: "Jingyue Wu via llvm-commits" < llvm-commits at lists.llvm.org
> >>> >
> >>> To: llvm-commits at lists.llvm.org
> >>> Sent: Saturday, January 30, 2016 5:48:47 PM
> >>> Subject: [llvm] r259307 - [doc] improve the doc for CUDA
> >>> 
> >>> Author: jingyue
> >>> Date: Sat Jan 30 17:48:47 2016
> >>> New Revision: 259307
> >>> 
> >>> URL: http://llvm.org/viewvc/llvm-project?rev=259307&view=rev
> >>> Log:
> >>> [doc] improve the doc for CUDA
> >>> 
> >>> 1. Mentioned that CUDA support works best with trunk.
> >>> 2. Simplified the example by removing its dependency on the CUDA
> >>> samples.
> >>> 3. Explain the --cuda-gpu-arch flag.
> >>> 
> >>> Modified:
> >>> llvm/trunk/docs/CompileCudaWithLLVM.rst
> >>> 
> >>> Modified: llvm/trunk/docs/CompileCudaWithLLVM.rst
> >>> URL:
> >>> http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/CompileCudaWithLLVM.rst?rev=259307&r1=259306&r2=259307&view=diff
> >>> ==============================================================================
> >>> --- llvm/trunk/docs/CompileCudaWithLLVM.rst (original)
> >>> +++ llvm/trunk/docs/CompileCudaWithLLVM.rst Sat Jan 30 17:48:47
> >>> 2016
> >>> @@ -18,9 +18,11 @@ familiarity with CUDA. Information about
> >>> How to Build LLVM with CUDA Support
> >>> ===================================
> >>> 
> >>> -Below is a quick summary of downloading and building LLVM.
> >>> Consult
> >>> the `Getting
> >>> -Started < http://llvm.org/docs/GettingStarted.html >`_ page for
> >>> more
> >>> details on
> >>> -setting up LLVM.
> >>> +CUDA support is still in development and works the best in the
> >>> trunk
> >>> version
> >>> +of LLVM. Below is a quick summary of downloading and building
> >>> the
> >>> trunk
> >>> +version. Consult the `Getting Started
> >>> +< http://llvm.org/docs/GettingStarted.html >`_ page for more
> >>> details
> >>> on setting
> >>> +up LLVM.
> >>> 
> >>> #. Checkout LLVM
> >>> 
> >>> @@ -60,8 +62,6 @@ which multiplies a ``float`` array by a
> >>> 
> >>> .. code-block:: c++
> >>> 
> >>> - #include <helper_cuda.h> // for checkCudaErrors
> >>> -
> >>> #include <iostream>
> >>> 
> >>> __global__ void axpy(float a, float* x, float* y) {
> >>> @@ -78,25 +78,25 @@ which multiplies a ``float`` array by a
> >>> // Copy input data to device.
> >>> float* device_x;
> >>> float* device_y;
> >>> - checkCudaErrors(cudaMalloc(&device_x, kDataLen *
> >>> sizeof(float)));
> >>> - checkCudaErrors(cudaMalloc(&device_y, kDataLen *
> >>> sizeof(float)));
> >>> - checkCudaErrors(cudaMemcpy(device_x, host_x, kDataLen *
> >>> sizeof(float),
> >>> - cudaMemcpyHostToDevice));
> >>> + cudaMalloc(&device_x, kDataLen * sizeof(float));
> >>> + cudaMalloc(&device_y, kDataLen * sizeof(float));
> >>> + cudaMemcpy(device_x, host_x, kDataLen * sizeof(float),
> >>> + cudaMemcpyHostToDevice);
> >>> 
> >>> // Launch the kernel.
> >>> axpy<<<1, kDataLen>>>(a, device_x, device_y);
> >>> 
> >>> // Copy output data to host.
> >>> - checkCudaErrors(cudaDeviceSynchronize());
> >>> - checkCudaErrors(cudaMemcpy(host_y, device_y, kDataLen *
> >>> sizeof(float),
> >>> - cudaMemcpyDeviceToHost));
> >>> + cudaDeviceSynchronize();
> >>> + cudaMemcpy(host_y, device_y, kDataLen * sizeof(float),
> >>> + cudaMemcpyDeviceToHost);
> >>> 
> >>> // Print the results.
> >>> for (int i = 0; i < kDataLen; ++i) {
> >>> std::cout << "y[" << i << "] = " << host_y[i] << "\n";
> >>> }
> >>> 
> >>> - checkCudaErrors(cudaDeviceReset());
> >>> + cudaDeviceReset();
> >>> return 0;
> >>> }
> >>> 
> >>> @@ -104,16 +104,20 @@ The command line for compilation is simi
> >>> 
> >>> .. code-block:: console
> >>> 
> >>> - $ clang++ -o axpy -I<CUDA install path>/samples/common/inc
> >>> -L<CUDA
> >>> install path>/<lib64 or lib> axpy.cu -lcudart_static -lcuda -ldl
> >>> -lrt -pthread
> >>> + $ clang++ axpy.cu -o axpy --cuda-gpu-arch=<GPU arch> \
> >>> + -L<CUDA install path>/<lib64 or lib> \
> >>> + -lcudart_static -ldl -lrt -pthread
> >>> $ ./axpy
> >>> y[0] = 2
> >>> y[1] = 4
> >>> y[2] = 6
> >>> y[3] = 8
> >>> 
> >>> -Note that ``helper_cuda.h`` comes from the CUDA samples, so you
> >>> need
> >>> the
> >>> -samples installed for this example. ``<CUDA install path>`` is
> >>> the
> >>> root
> >>> -directory where you installed CUDA SDK, typically
> >>> ``/usr/local/cuda``.
> >>> +``<CUDA install path>`` is the root directory where you
> >>> installed
> >>> CUDA SDK,
> >>> +typically ``/usr/local/cuda``. ``<GPU arch>`` is `the compute
> >>> capability of
> >>> +your GPU < https://developer.nvidia.com/cuda-gpus >`_. For
> >>> example, if
> >>> you want
> >>> +to run your program on a GPU with compute capability of 3.5, you
> >>> should specify
> >>> +``--cuda-gpu-arch=sm_35``.
> >>> 
> >>> Optimizations
> >>> =============
> >>> 
> >>> 
> >>> _______________________________________________
> >>> llvm-commits mailing list
> >>> llvm-commits at lists.llvm.org
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> >>> 
> >> 
> >> --
> >> Hal Finkel
> >> Assistant Computational Scientist
> >> Leadership Computing Facility
> >> Argonne National Laboratory
> >> 
> >> 
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory