<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi Madhur, I want to do some research on heterogeneous programming

    model.<br>

    What about you? Perhaps we can find something in common :-)<br>

    <br>

    <br>

    Xing<br>

    <br>

    <br>

    <div class="moz-cite-prefix">On 08/14/2014 01:00 AM, Madhur

      Amilkanthwar wrote:<br>

    </div>

    <blockquote

cite="mid:CAMTh1gUJETT++Z2eKi5gtN1EMDTjbW9FB90b8d=OaPUwDHUMNg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>I too suffered a lot for compiling CUDA C with Clang 3.2. <br>

        </div>

        By the way, Xing why do you want to compile CUDA C with Clang?<br>

      </div>

      <div class="gmail_extra"><br>

        <br>

        <div class="gmail_quote">On Wed, Aug 13, 2014 at 8:25 PM, Eli

          Bendersky <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:eliben@google.com" target="_blank">eliben@google.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div dir="ltr"><br>

              <div class="gmail_extra"><br>

                <br>

                <div class="gmail_quote">

                  <div class="">On Tue, Aug 12, 2014 at 6:38 PM, Xing Su

                    <span dir="ltr"><<a moz-do-not-send="true"

                        href="mailto:suxing1989@gmail.com"

                        target="_blank">suxing1989@gmail.com</a>></span>

                    wrote:<br>

                    <blockquote class="gmail_quote" style="margin:0 0 0

                      .8ex;border-left:1px #ccc solid;padding-left:1ex">

                      <div bgcolor="#FFFFFF" text="#000000"> Thanks Eli,<br>

                        <br>

                        It sounds that at the moment support to CUDA in

                        Clang is far from<br>

                        production use ...<br>

                        <br>

                        I'd like to know what the status of CUDA support

                        is in clang,<br>

                        but I am not able to find anything reporting

                        this. <br>

                        Are you a developer of this part, or could you

                        give me some<br>

                        guidance?<br>

                        <br>

                      </div>

                    </blockquote>

                    <div><br>

                    </div>

                  </div>

                  <div>There's no documentation of these parts of Clang,

                    as far as I know, besides the source code. To get a

                    feel for what's supported take a look at the

                    existing tests (specifically test/SemaCUDA and

                    test/CodeGenCUDA dirs).</div>

                  <span class="HOEnZb"><font color="#888888">

                      <div><br>

                      </div>

                      <div>Eli</div>

                    </font></span>

                  <div>

                    <div class="h5">

                      <div><br>

                      </div>

                      <div><br>

                      </div>

                      <div> </div>

                      <blockquote class="gmail_quote" style="margin:0 0

                        0 .8ex;border-left:1px #ccc

                        solid;padding-left:1ex">

                        <div bgcolor="#FFFFFF" text="#000000"> <br>

                                                          suxing

                          <div>

                            <div><br>

                              <br>

                              <div>On 2014/8/13 1:18, Eli Bendersky

                                wrote:<br>

                              </div>

                              <blockquote type="cite">

                                <div dir="ltr"><br>

                                  <div class="gmail_extra"><br>

                                    <br>

                                    <div class="gmail_quote">On Tue, Aug

                                      12, 2014 at 10:07 AM, Xing Su <span

                                        dir="ltr"><<a

                                          moz-do-not-send="true"

                                          href="mailto:suxing1989@gmail.com"

                                          target="_blank">suxing1989@gmail.com</a>></span>

                                      wrote:<br>

                                      <blockquote class="gmail_quote"

                                        style="margin:0px 0px 0px

0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

                                        <p dir="ltr">hi everyone!</p>

                                        <p dir="ltr">I tried to compile

                                          a CUDA program using

                                          clang-3.4. This program is

                                          taken<br>

                                          from NVIDIA_CUDA-5.0_samples

                                          collection and it's a very

                                          simple program adding<br>

                                          two vectors.</p>

                                        <p dir="ltr">A few modifications

                                          to the original code were

                                          done, including<br>

                                          1. I substituted the

                                          __global__ CUDA C keyword with

                                          __attribute__((global))<br>

                                              in order to use clang as

                                          the compiler.<br>

                                          2. <stdlib.h>

                                          <math.h> were added.<br>

                                          3. declarations of blockDim,

                                          blockIdx, threadIdx were

                                          added.</p>

                                        <p dir="ltr">//

                                          ==================== code

                                          begin ========================<br>

                                          /**</p>

                                        <p dir="ltr"> * Vector addition:

                                          C = A + B.<br>

                                           *<br>

                                           * This sample is a very basic

                                          sample that implements element

                                          by element<br>

                                           * vector addition. It is the

                                          same as the sample

                                          illustrating Chapter 2<br>

                                           * of the programming guide

                                          with some additions like error

                                          checking.<br>

                                           */</p>

                                        <p dir="ltr">#include

                                          <stdio.h></p>

                                        <p dir="ltr">// For the CUDA

                                          runtime routines (prefixed

                                          with "cuda_")<br>

                                          #include

                                          <cuda_runtime.h></p>

                                        <p dir="ltr">#include

                                          <malloc.h><br>

                                          #include <stdlib.h><br>

                                          #include <math.h><br>

                                          extern dim3 blockDim,

                                          blockIdx, threadIdx;</p>

                                        <p dir="ltr">/**<br>

                                           * CUDA Kernel Device code<br>

                                           *<br>

                                           * Computes the vector

                                          addition of A and B into C.

                                          The 3 vectors have the same<br>

                                           * number of elements

                                          numElements.<br>

                                           */<br>

                                          __attribute__((global)) void<br>

                                          vectorAdd(const float *A,

                                          const float *B, float *C, int

                                          numElements)<br>

                                          {<br>

                                              int i = blockDim.x *

                                          blockIdx.x + threadIdx.x;</p>

                                        <p dir="ltr">    if (i <

                                          numElements)<br>

                                              {<br>

                                                  C[i] = A[i] + B[i];<br>

                                              }<br>

                                          }</p>

                                        <p dir="ltr">int main(void)<br>

                                          {<br>

                                              ... ...<br>

                                              return 0;<br>

                                          }</p>

                                        <p dir="ltr">//====================

                                          code end

                                          ========================</p>

                                        <p dir="ltr">$ clang -std=cuda

                                          -I/usr/local/cuda-5.0/include

                                          -o vectorAdd1 vectorAdd1.cu

                                          -L/usr/local/cuda-5.0/lib64

                                          -lcudart</p>

                                        <p dir="ltr">The compiling was

                                          successful, but running the

                                          program gives:</p>

                                        <p dir="ltr">    [Vector

                                          addition of 50000 elements]<br>

                                              Copy input data from the

                                          host memory to the CUDA device<br>

                                              CUDA kernel launch with

                                          196 blocks of 256 threads<br>

                                              Failed to launch vectorAdd

                                          kernel (error code invalid

                                          device function )!</p>

                                        <p dir="ltr">while the expected

                                          output is</p>

                                        <p dir="ltr">    [Vector

                                          addition of 50000 elements]<br>

                                          Copy input data from the host

                                          memory to the CUDA device<br>

                                          CUDA kernel launch with 196

                                          blocks of 256 threads<br>

                                          Copy output data from the CUDA

                                          device to the host memory<br>

                                              Done</p>

                                        <p dir="ltr">The result said

                                          that the vectorAdd function

                                          was not a valid kernel. To see

                                          what happend,<br>

                                          I compiled the program to LLVM

                                          IR. Only the IR of function

                                          vectorAdd is shown below.</p>

                                        <p dir="ltr">$ clang -S

                                          -emit-llvm -std=cuda

                                           -I/usr/local/cuda-5.0/include

                                          -o vectorAdd1.ll vectorAdd1.cu</p>

                                        <p dir="ltr">//====================

                                          code begin

                                          ==========================<br>

                                          define void

                                          @_Z9vectorAddPKfS0_Pfi(float*

                                          %A, float* %B, float* %C, i32

                                          %numElements) #0 {<br>

                                            %1 = alloca float*, align 8<br>

                                            %2 = alloca float*, align 8<br>

                                            %3 = alloca float*, align 8<br>

                                            %4 = alloca i32, align 4<br>

                                            store float* %A, float** %1,

                                          align 8<br>

                                            store float* %B, float** %2,

                                          align 8<br>

                                            store float* %C, float** %3,

                                          align 8<br>

                                            store i32 %numElements, i32*

                                          %4, align 4<br>

                                            %5 = bitcast float** %1 to

                                          i8*<br>

                                            %6 = call i32

                                          @cudaSetupArgument(i8* %5, i64

                                          ptrtoint (i1** getelementptr

                                          (i1** null, i32 1) to i64),

                                          i64 0)<br>

                                            %7 = icmp eq i32 %6, 0<br>

                                            br i1 %7, label %8, label

                                          %22</p>

                                        <p dir="ltr">; <label>:8  

                                                ; preds = %0<br>

                                            %9 = bitcast float** %2 to

                                          i8*<br>

                                            %10 = call i32

                                          @cudaSetupArgument(i8* %9, i64

                                          ptrtoint (i1** getelementptr

                                          (i1** null, i32 1) to i64),

                                          i64 ptrtoint (float**

                                          getelementptr ({ float*,

                                          float*, float*, i32 }* null,

                                          i64 0, i32 1) to i64))<br>

                                            %11 = icmp eq i32 %10, 0<br>

                                            br i1 %11, label %12, label

                                          %22</p>

                                        <p dir="ltr">; <label>:12

                                                 ; preds = %8<br>

                                            %13 = bitcast float** %3 to

                                          i8*<br>

                                            %14 = call i32

                                          @cudaSetupArgument(i8* %13,

                                          i64 ptrtoint (i1**

                                          getelementptr (i1** null, i32

                                          1) to i64), i64 ptrtoint

                                          (float** getelementptr ({

                                          float*, float*, float*, i32 }*

                                          null, i64 0, i32 2) to i64))<br>

                                            %15 = icmp eq i32 %14, 0<br>

                                            br i1 %15, label %16, label

                                          %22</p>

                                        <p dir="ltr">; <label>:16

                                                 ; preds = %12<br>

                                            %17 = bitcast i32* %4 to i8*<br>

                                            %18 = call i32

                                          @cudaSetupArgument(i8* %17,

                                          i64 ptrtoint (i32*

                                          getelementptr (i32* null, i32

                                          1) to i64), i64 ptrtoint (i32*

                                          getelementptr ({ float*,

                                          float*, float*, i32 }* null,

                                          i64 0, i32 3) to i64))<br>

                                            %19 = icmp eq i32 %18, 0<br>

                                            br i1 %19, label %20, label

                                          %22</p>

                                        <p dir="ltr">; <label>:20

                                                 ; preds = %16<br>

                                            %21 = call i32

                                          @cudaLaunch(i8* bitcast (void

                                          (float*, float*, float*, i32)*

                                          @_Z9vectorAddPKfS0_Pfi to

                                          i8*))<br>

                                            br label %22</p>

                                        <p dir="ltr">; <label>:22

                                                 ; preds = %20, %16,

                                          %12, %8, %0<br>

                                            ret void<br>

                                          }</p>

                                        <p dir="ltr">//=======================

                                          code end =====================<br>

                                        </p>

                                        <p dir="ltr">From the emitted IR

                                          code, I can see NO addrspace

                                          declarations, which should be

                                          specific to<br>

                                          NVVM IR. So this may be the

                                          problem.</p>

                                        <p dir="ltr">Could anyone tell

                                          me how to fix this?</p>

                                        <p dir="ltr">Any help is

                                          appreciated!</p>

                                        <p dir="ltr">                   

                                                         suxing<br>

                                        </p>

                                      </blockquote>

                                    </div>

                                    Hi Suxing,</div>

                                  <div class="gmail_extra"><br>

                                  </div>

                                  <div class="gmail_extra">Clang cannot

                                    yet compile CUDA out-of-the-box like

                                    that. Definitely not mixed CUDA code

                                    (where host and device code are in

                                    the same file). Clang can be made,

                                    with some effort, to compile

                                    stand-alone device code, but some

                                    critical steps are missing. For

                                    example, you have to map threadIdx

                                    and other similar special globals to

                                    appropriate intrinsic calls

                                    (@llvm.nvvm.read.ptx.sreg.tid.*),

                                    and not just declare them. </div>

                                  <div class="gmail_extra"><br>

                                  </div>

                                  <div class="gmail_extra">Eli</div>

                                  <div class="gmail_extra"><br>

                                  </div>

                                  <div class="gmail_extra"><br>

                                  </div>

                                  <div class="gmail_extra"><br>

                                  </div>

                                </div>

                              </blockquote>

                              <br>

                            </div>

                          </div>

                        </div>

                      </blockquote>

                    </div>

                  </div>

                </div>

                <br>

              </div>

            </div>

            <br>

            _______________________________________________<br>

            cfe-dev mailing list<br>

            <a moz-do-not-send="true" href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>

            <a moz-do-not-send="true"

              href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev"

              target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>

            <br>

          </blockquote>

        </div>

        <br>

        <br clear="all">

        <br>

        -- <br>

        Thank You.<br>

        Madhur D. Amilkanthwar<br>

        RISE lab,<br>

        IIT Madras.<br>

      </div>

    </blockquote>

    <br>

  </body>

</html>