<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Madhur, I want to do some research on heterogeneous programming
model.<br>
What about you? Perhaps we can find something in common :-)<br>
<br>
<br>
Xing<br>
<br>
<br>
<div class="moz-cite-prefix">On 08/14/2014 01:00 AM, Madhur
Amilkanthwar wrote:<br>
</div>
<blockquote
cite="mid:CAMTh1gUJETT++Z2eKi5gtN1EMDTjbW9FB90b8d=OaPUwDHUMNg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>I too suffered a lot for compiling CUDA C with Clang 3.2. <br>
</div>
By the way, Xing why do you want to compile CUDA C with Clang?<br>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Wed, Aug 13, 2014 at 8:25 PM, Eli
Bendersky <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:eliben@google.com" target="_blank">eliben@google.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">
<div class="">On Tue, Aug 12, 2014 at 6:38 PM, Xing Su
<span dir="ltr"><<a moz-do-not-send="true"
href="mailto:suxing1989@gmail.com"
target="_blank">suxing1989@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Thanks Eli,<br>
<br>
It sounds that at the moment support to CUDA in
Clang is far from<br>
production use ...<br>
<br>
I'd like to know what the status of CUDA support
is in clang,<br>
but I am not able to find anything reporting
this. <br>
Are you a developer of this part, or could you
give me some<br>
guidance?<br>
<br>
</div>
</blockquote>
<div><br>
</div>
</div>
<div>There's no documentation of these parts of Clang,
as far as I know, besides the source code. To get a
feel for what's supported take a look at the
existing tests (specifically test/SemaCUDA and
test/CodeGenCUDA dirs).</div>
<span class="HOEnZb"><font color="#888888">
<div><br>
</div>
<div>Eli</div>
</font></span>
<div>
<div class="h5">
<div><br>
</div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0
0 .8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> <br>
suxing
<div>
<div><br>
<br>
<div>On 2014/8/13 1:18, Eli Bendersky
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Tue, Aug
12, 2014 at 10:07 AM, Xing Su <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:suxing1989@gmail.com"
target="_blank">suxing1989@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<p dir="ltr">hi everyone!</p>
<p dir="ltr">I tried to compile
a CUDA program using
clang-3.4. This program is
taken<br>
from NVIDIA_CUDA-5.0_samples
collection and it's a very
simple program adding<br>
two vectors.</p>
<p dir="ltr">A few modifications
to the original code were
done, including<br>
1. I substituted the
__global__ CUDA C keyword with
__attribute__((global))<br>
in order to use clang as
the compiler.<br>
2. <stdlib.h>
<math.h> were added.<br>
3. declarations of blockDim,
blockIdx, threadIdx were
added.</p>
<p dir="ltr">//
==================== code
begin ========================<br>
/**</p>
<p dir="ltr"> * Vector addition:
C = A + B.<br>
*<br>
* This sample is a very basic
sample that implements element
by element<br>
* vector addition. It is the
same as the sample
illustrating Chapter 2<br>
* of the programming guide
with some additions like error
checking.<br>
*/</p>
<p dir="ltr">#include
<stdio.h></p>
<p dir="ltr">// For the CUDA
runtime routines (prefixed
with "cuda_")<br>
#include
<cuda_runtime.h></p>
<p dir="ltr">#include
<malloc.h><br>
#include <stdlib.h><br>
#include <math.h><br>
extern dim3 blockDim,
blockIdx, threadIdx;</p>
<p dir="ltr">/**<br>
* CUDA Kernel Device code<br>
*<br>
* Computes the vector
addition of A and B into C.
The 3 vectors have the same<br>
* number of elements
numElements.<br>
*/<br>
__attribute__((global)) void<br>
vectorAdd(const float *A,
const float *B, float *C, int
numElements)<br>
{<br>
int i = blockDim.x *
blockIdx.x + threadIdx.x;</p>
<p dir="ltr"> if (i <
numElements)<br>
{<br>
C[i] = A[i] + B[i];<br>
}<br>
}</p>
<p dir="ltr">int main(void)<br>
{<br>
... ...<br>
return 0;<br>
}</p>
<p dir="ltr">//====================
code end
========================</p>
<p dir="ltr">$ clang -std=cuda
-I/usr/local/cuda-5.0/include
-o vectorAdd1 vectorAdd1.cu
-L/usr/local/cuda-5.0/lib64
-lcudart</p>
<p dir="ltr">The compiling was
successful, but running the
program gives:</p>
<p dir="ltr"> [Vector
addition of 50000 elements]<br>
Copy input data from the
host memory to the CUDA device<br>
CUDA kernel launch with
196 blocks of 256 threads<br>
Failed to launch vectorAdd
kernel (error code invalid
device function )!</p>
<p dir="ltr">while the expected
output is</p>
<p dir="ltr"> [Vector
addition of 50000 elements]<br>
Copy input data from the host
memory to the CUDA device<br>
CUDA kernel launch with 196
blocks of 256 threads<br>
Copy output data from the CUDA
device to the host memory<br>
Done</p>
<p dir="ltr">The result said
that the vectorAdd function
was not a valid kernel. To see
what happend,<br>
I compiled the program to LLVM
IR. Only the IR of function
vectorAdd is shown below.</p>
<p dir="ltr">$ clang -S
-emit-llvm -std=cuda
-I/usr/local/cuda-5.0/include
-o vectorAdd1.ll vectorAdd1.cu</p>
<p dir="ltr">//====================
code begin
==========================<br>
define void
@_Z9vectorAddPKfS0_Pfi(float*
%A, float* %B, float* %C, i32
%numElements) #0 {<br>
%1 = alloca float*, align 8<br>
%2 = alloca float*, align 8<br>
%3 = alloca float*, align 8<br>
%4 = alloca i32, align 4<br>
store float* %A, float** %1,
align 8<br>
store float* %B, float** %2,
align 8<br>
store float* %C, float** %3,
align 8<br>
store i32 %numElements, i32*
%4, align 4<br>
%5 = bitcast float** %1 to
i8*<br>
%6 = call i32
@cudaSetupArgument(i8* %5, i64
ptrtoint (i1** getelementptr
(i1** null, i32 1) to i64),
i64 0)<br>
%7 = icmp eq i32 %6, 0<br>
br i1 %7, label %8, label
%22</p>
<p dir="ltr">; <label>:8
; preds = %0<br>
%9 = bitcast float** %2 to
i8*<br>
%10 = call i32
@cudaSetupArgument(i8* %9, i64
ptrtoint (i1** getelementptr
(i1** null, i32 1) to i64),
i64 ptrtoint (float**
getelementptr ({ float*,
float*, float*, i32 }* null,
i64 0, i32 1) to i64))<br>
%11 = icmp eq i32 %10, 0<br>
br i1 %11, label %12, label
%22</p>
<p dir="ltr">; <label>:12
; preds = %8<br>
%13 = bitcast float** %3 to
i8*<br>
%14 = call i32
@cudaSetupArgument(i8* %13,
i64 ptrtoint (i1**
getelementptr (i1** null, i32
1) to i64), i64 ptrtoint
(float** getelementptr ({
float*, float*, float*, i32 }*
null, i64 0, i32 2) to i64))<br>
%15 = icmp eq i32 %14, 0<br>
br i1 %15, label %16, label
%22</p>
<p dir="ltr">; <label>:16
; preds = %12<br>
%17 = bitcast i32* %4 to i8*<br>
%18 = call i32
@cudaSetupArgument(i8* %17,
i64 ptrtoint (i32*
getelementptr (i32* null, i32
1) to i64), i64 ptrtoint (i32*
getelementptr ({ float*,
float*, float*, i32 }* null,
i64 0, i32 3) to i64))<br>
%19 = icmp eq i32 %18, 0<br>
br i1 %19, label %20, label
%22</p>
<p dir="ltr">; <label>:20
; preds = %16<br>
%21 = call i32
@cudaLaunch(i8* bitcast (void
(float*, float*, float*, i32)*
@_Z9vectorAddPKfS0_Pfi to
i8*))<br>
br label %22</p>
<p dir="ltr">; <label>:22
; preds = %20, %16,
%12, %8, %0<br>
ret void<br>
}</p>
<p dir="ltr">//=======================
code end =====================<br>
</p>
<p dir="ltr">From the emitted IR
code, I can see NO addrspace
declarations, which should be
specific to<br>
NVVM IR. So this may be the
problem.</p>
<p dir="ltr">Could anyone tell
me how to fix this?</p>
<p dir="ltr">Any help is
appreciated!</p>
<p dir="ltr">
suxing<br>
</p>
</blockquote>
</div>
Hi Suxing,</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">Clang cannot
yet compile CUDA out-of-the-box like
that. Definitely not mixed CUDA code
(where host and device code are in
the same file). Clang can be made,
with some effort, to compile
stand-alone device code, but some
critical steps are missing. For
example, you have to map threadIdx
and other similar special globals to
appropriate intrinsic calls
(@llvm.nvvm.read.ptx.sreg.tid.*),
and not just declare them. </div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">Eli</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
<br>
</div>
</div>
<br>
_______________________________________________<br>
cfe-dev mailing list<br>
<a moz-do-not-send="true" href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>
<a moz-do-not-send="true"
href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev"
target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<br>
-- <br>
Thank You.<br>
Madhur D. Amilkanthwar<br>
RISE lab,<br>
IIT Madras.<br>
</div>
</blockquote>
<br>
</body>
</html>