<div class="gmail_quote">On Thu, Jun 14, 2012 at 12:06 AM, Peter Colberg <span dir="ltr"><<a href="mailto:peter@colberg.org" target="_blank">peter@colberg.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="im">On Wed, Jun 13, 2012 at 11:28:48PM -0400, Peter Colberg wrote:<br>

> The parser interprets the compressed C++11 template parameter syntax<br>

> as a call to a CUDA kernel function. Is there a way to disable parsing<br>

> of the CUDA call syntax <<< >>>? I would be using a C++ wrapper around<br>

> cudaConfigureCall, cudaSetupArgument and cudaLaunch anyway.<br>

<br>

</div>Hmmm, the old cudaLaunch trick does not seem to work:<br>

<br>

    #include <cuda_runtime.h><br>

<br>

    #include <cstdio><br>

    #include <cstdlib><br>

<br>

    #define CUDA_REQUIRE( x ) \<br>

        { \<br>

            cudaError_t err = (x); \<br>

            if (err != cudaSuccess) { \<br>

                fprintf( \<br>

                    stderr \<br>

                  , "%s (%d): error: CUDA: %s\n" \<br>

                  , __FILE__ , __LINE__ \<br>

                  , cudaGetErrorString(err) \<br>

                ); \<br>

                exit(1); \<br>

            } \<br>

        }<br>

<br>

    __attribute__((global)) void g1(int x, int* g_array)<br>

    {<br>

        g_array[0] = x;<br>

    }<br>

<br>

    int main()<br>

    {<br>

        int* g_array = 0;<br>

        CUDA_REQUIRE( cudaMalloc(&g_array, sizeof(*g_array)) );<br>

        CUDA_REQUIRE( cudaMemset(g_array, 0, sizeof(*g_array)) );<br>

<br>

        int dev = -1;<br>

        CUDA_REQUIRE( cudaGetDevice(&dev) );<br>

        printf("Using CUDA device #%d\n", dev);<br>

<br>

        struct arguments<br>

        {<br>

            int x;<br>

            int* g_array;<br>

        };<br>

<br>

        int x = 42;<br>

    #ifdef USE_CUDA_CALL_SYNTAX<br>

        g1<<<1, 1>>>(x, g_array);<br>

    #else<br>

        CUDA_REQUIRE( cudaConfigureCall(1, 1) );<br>

        CUDA_REQUIRE( cudaSetupArgument(&x, sizeof(x), offsetof(arguments, x)) );<br>

        CUDA_REQUIRE( cudaSetupArgument(&g_array, sizeof(g_array), offsetof(arguments, g_array)) );<br>

        CUDA_REQUIRE( cudaLaunch(reinterpret_cast<char const*>(&g1)) );<br>

    #endif<br>

        CUDA_REQUIRE( cudaDeviceSynchronize() );<br>

<br>

        int result = 0;<br>

        CUDA_REQUIRE( cudaMemcpy(&result, g_array, sizeof(*g_array), cudaMemcpyDeviceToHost) );<br>

        printf("42 == %d\n", result);<br>

    }<br>

<br>

<br>

Compile with Clang using <<< >>> syntax:<br>

<br>

    clang++ -DUSE_CUDA_CALL_SYNTAX -I/usr/local/cuda-4.2/cuda/include -L/usr/local/cuda-4.2/cuda/lib64 -lcudart -o kernel-call <a href="http://kernel-call.cu" target="_blank">kernel-call.cu</a><br>

<br>

    ./kernel-call<br>

    Using CUDA device #0<br>

    42 == 0<br>

<br>

Compile with Clang using manual cudaLaunch:<br>

<div class="im"><br>

    clang++ -I/usr/local/cuda-4.2/cuda/include -L/usr/local/cuda-4.2/cuda/lib64 -lcudart -o kernel-call <a href="http://kernel-call.cu" target="_blank">kernel-call.cu</a><br>

<br>

</div>    ./kernel-call<br>

    Using CUDA device #0<br>

    <a href="http://kernel-call.cu" target="_blank">kernel-call.cu</a> (48): error: CUDA: invalid device function<br>

<br>

Compile with nvcc using manual cudaLaunch:<br>

<br>

    nvcc -I/usr/local/cuda-4.2/cuda/include -L/usr/local/cuda-4.2/cuda/lib64 -lcudart -o kernel-call <a href="http://kernel-call.cu" target="_blank">kernel-call.cu</a><br>

<br>

    ./kernel-call<br>

    Using CUDA device #0<br>

    42 == 42<br>

<br>

<br>

How does the glue between host and GPU kernel work?<br>

<br>

Could I somehow obtain a cudaLaunch-callable pointer to the GPU kernel?<br>

<br>

<br>

The other issue is the result of kernel execution. The kernel appears<br>

not to write to the global device memory array. I guess this also<br>

involves the NVPTX backend. Should I forward this to the llvm-dev<br>

mailing list, or are the NVPTX developers reading cfe-dev as well?<br></blockquote><div><br></div><div>Something is probably getting messed up in the IR generation for the kernel functions (we recently got rid of the old PTX back-end, on which the CUDA integration is based, in favor of the NVPTX back-end).  If you post the IR, I can take a look at it.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Thanks,<br>

<div class="HOEnZb"><div class="h5">Peter<br>

_______________________________________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><br><div>Thanks,</div><div><br></div><div>Justin Holewinski</div><br>