[cfe-dev] compile a simple CUDA program using clang-3.4
Eli Bendersky
eliben at google.com
Wed Aug 13 07:55:37 PDT 2014
On Tue, Aug 12, 2014 at 6:38 PM, Xing Su <suxing1989 at gmail.com> wrote:
> Thanks Eli,
>
> It sounds that at the moment support to CUDA in Clang is far from
> production use ...
>
> I'd like to know what the status of CUDA support is in clang,
> but I am not able to find anything reporting this.
> Are you a developer of this part, or could you give me some
> guidance?
>
>
There's no documentation of these parts of Clang, as far as I know, besides
the source code. To get a feel for what's supported take a look at the
existing tests (specifically test/SemaCUDA and test/CodeGenCUDA dirs).
Eli
>
> suxing
>
>
> On 2014/8/13 1:18, Eli Bendersky wrote:
>
>
>
>
> On Tue, Aug 12, 2014 at 10:07 AM, Xing Su <suxing1989 at gmail.com> wrote:
>
>> hi everyone!
>>
>> I tried to compile a CUDA program using clang-3.4. This program is taken
>> from NVIDIA_CUDA-5.0_samples collection and it's a very simple program
>> adding
>> two vectors.
>>
>> A few modifications to the original code were done, including
>> 1. I substituted the __global__ CUDA C keyword with
>> __attribute__((global))
>> in order to use clang as the compiler.
>> 2. <stdlib.h> <math.h> were added.
>> 3. declarations of blockDim, blockIdx, threadIdx were added.
>>
>> // ==================== code begin ========================
>> /**
>>
>> * Vector addition: C = A + B.
>> *
>> * This sample is a very basic sample that implements element by element
>> * vector addition. It is the same as the sample illustrating Chapter 2
>> * of the programming guide with some additions like error checking.
>> */
>>
>> #include <stdio.h>
>>
>> // For the CUDA runtime routines (prefixed with "cuda_")
>> #include <cuda_runtime.h>
>>
>> #include <malloc.h>
>> #include <stdlib.h>
>> #include <math.h>
>> extern dim3 blockDim, blockIdx, threadIdx;
>>
>> /**
>> * CUDA Kernel Device code
>> *
>> * Computes the vector addition of A and B into C. The 3 vectors have the
>> same
>> * number of elements numElements.
>> */
>> __attribute__((global)) void
>> vectorAdd(const float *A, const float *B, float *C, int numElements)
>> {
>> int i = blockDim.x * blockIdx.x + threadIdx.x;
>>
>> if (i < numElements)
>> {
>> C[i] = A[i] + B[i];
>> }
>> }
>>
>> int main(void)
>> {
>> ... ...
>> return 0;
>> }
>>
>> //==================== code end ========================
>>
>> $ clang -std=cuda -I/usr/local/cuda-5.0/include -o vectorAdd1
>> vectorAdd1.cu -L/usr/local/cuda-5.0/lib64 -lcudart
>>
>> The compiling was successful, but running the program gives:
>>
>> [Vector addition of 50000 elements]
>> Copy input data from the host memory to the CUDA device
>> CUDA kernel launch with 196 blocks of 256 threads
>> Failed to launch vectorAdd kernel (error code invalid device function
>> )!
>>
>> while the expected output is
>>
>> [Vector addition of 50000 elements]
>> Copy input data from the host memory to the CUDA device
>> CUDA kernel launch with 196 blocks of 256 threads
>> Copy output data from the CUDA device to the host memory
>> Done
>>
>> The result said that the vectorAdd function was not a valid kernel. To
>> see what happend,
>> I compiled the program to LLVM IR. Only the IR of function vectorAdd is
>> shown below.
>>
>> $ clang -S -emit-llvm -std=cuda -I/usr/local/cuda-5.0/include -o
>> vectorAdd1.ll vectorAdd1.cu
>>
>> //==================== code begin ==========================
>> define void @_Z9vectorAddPKfS0_Pfi(float* %A, float* %B, float* %C, i32
>> %numElements) #0 {
>> %1 = alloca float*, align 8
>> %2 = alloca float*, align 8
>> %3 = alloca float*, align 8
>> %4 = alloca i32, align 4
>> store float* %A, float** %1, align 8
>> store float* %B, float** %2, align 8
>> store float* %C, float** %3, align 8
>> store i32 %numElements, i32* %4, align 4
>> %5 = bitcast float** %1 to i8*
>> %6 = call i32 @cudaSetupArgument(i8* %5, i64 ptrtoint (i1**
>> getelementptr (i1** null, i32 1) to i64), i64 0)
>> %7 = icmp eq i32 %6, 0
>> br i1 %7, label %8, label %22
>>
>> ; <label>:8 ; preds = %0
>> %9 = bitcast float** %2 to i8*
>> %10 = call i32 @cudaSetupArgument(i8* %9, i64 ptrtoint (i1**
>> getelementptr (i1** null, i32 1) to i64), i64 ptrtoint (float**
>> getelementptr ({ float*, float*, float*, i32 }* null, i64 0, i32 1) to i64))
>> %11 = icmp eq i32 %10, 0
>> br i1 %11, label %12, label %22
>>
>> ; <label>:12 ; preds = %8
>> %13 = bitcast float** %3 to i8*
>> %14 = call i32 @cudaSetupArgument(i8* %13, i64 ptrtoint (i1**
>> getelementptr (i1** null, i32 1) to i64), i64 ptrtoint (float**
>> getelementptr ({ float*, float*, float*, i32 }* null, i64 0, i32 2) to i64))
>> %15 = icmp eq i32 %14, 0
>> br i1 %15, label %16, label %22
>>
>> ; <label>:16 ; preds = %12
>> %17 = bitcast i32* %4 to i8*
>> %18 = call i32 @cudaSetupArgument(i8* %17, i64 ptrtoint (i32*
>> getelementptr (i32* null, i32 1) to i64), i64 ptrtoint (i32* getelementptr
>> ({ float*, float*, float*, i32 }* null, i64 0, i32 3) to i64))
>> %19 = icmp eq i32 %18, 0
>> br i1 %19, label %20, label %22
>>
>> ; <label>:20 ; preds = %16
>> %21 = call i32 @cudaLaunch(i8* bitcast (void (float*, float*, float*,
>> i32)* @_Z9vectorAddPKfS0_Pfi to i8*))
>> br label %22
>>
>> ; <label>:22 ; preds = %20, %16,
>> %12, %8, %0
>> ret void
>> }
>>
>> //======================= code end =====================
>>
>> From the emitted IR code, I can see NO addrspace declarations, which
>> should be specific to
>> NVVM IR. So this may be the problem.
>>
>> Could anyone tell me how to fix this?
>>
>> Any help is appreciated!
>>
>> suxing
>>
> Hi Suxing,
>
> Clang cannot yet compile CUDA out-of-the-box like that. Definitely not
> mixed CUDA code (where host and device code are in the same file). Clang
> can be made, with some effort, to compile stand-alone device code, but some
> critical steps are missing. For example, you have to map threadIdx and
> other similar special globals to appropriate intrinsic calls
> (@llvm.nvvm.read.ptx.sreg.tid.*), and not just declare them.
>
> Eli
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140813/0186547d/attachment.html>
More information about the cfe-dev
mailing list