[cfe-dev] Clang and CUDA with C++11 features
Justin Holewinski
justin.holewinski at gmail.com
Thu Jun 14 06:57:27 PDT 2012
On Thu, Jun 14, 2012 at 12:06 AM, Peter Colberg <peter at colberg.org> wrote:
> On Wed, Jun 13, 2012 at 11:28:48PM -0400, Peter Colberg wrote:
> > The parser interprets the compressed C++11 template parameter syntax
> > as a call to a CUDA kernel function. Is there a way to disable parsing
> > of the CUDA call syntax <<< >>>? I would be using a C++ wrapper around
> > cudaConfigureCall, cudaSetupArgument and cudaLaunch anyway.
>
> Hmmm, the old cudaLaunch trick does not seem to work:
>
> #include <cuda_runtime.h>
>
> #include <cstdio>
> #include <cstdlib>
>
> #define CUDA_REQUIRE( x ) \
> { \
> cudaError_t err = (x); \
> if (err != cudaSuccess) { \
> fprintf( \
> stderr \
> , "%s (%d): error: CUDA: %s\n" \
> , __FILE__ , __LINE__ \
> , cudaGetErrorString(err) \
> ); \
> exit(1); \
> } \
> }
>
> __attribute__((global)) void g1(int x, int* g_array)
> {
> g_array[0] = x;
> }
>
> int main()
> {
> int* g_array = 0;
> CUDA_REQUIRE( cudaMalloc(&g_array, sizeof(*g_array)) );
> CUDA_REQUIRE( cudaMemset(g_array, 0, sizeof(*g_array)) );
>
> int dev = -1;
> CUDA_REQUIRE( cudaGetDevice(&dev) );
> printf("Using CUDA device #%d\n", dev);
>
> struct arguments
> {
> int x;
> int* g_array;
> };
>
> int x = 42;
> #ifdef USE_CUDA_CALL_SYNTAX
> g1<<<1, 1>>>(x, g_array);
> #else
> CUDA_REQUIRE( cudaConfigureCall(1, 1) );
> CUDA_REQUIRE( cudaSetupArgument(&x, sizeof(x), offsetof(arguments,
> x)) );
> CUDA_REQUIRE( cudaSetupArgument(&g_array, sizeof(g_array),
> offsetof(arguments, g_array)) );
> CUDA_REQUIRE( cudaLaunch(reinterpret_cast<char const*>(&g1)) );
> #endif
> CUDA_REQUIRE( cudaDeviceSynchronize() );
>
> int result = 0;
> CUDA_REQUIRE( cudaMemcpy(&result, g_array, sizeof(*g_array),
> cudaMemcpyDeviceToHost) );
> printf("42 == %d\n", result);
> }
>
>
> Compile with Clang using <<< >>> syntax:
>
> clang++ -DUSE_CUDA_CALL_SYNTAX -I/usr/local/cuda-4.2/cuda/include
> -L/usr/local/cuda-4.2/cuda/lib64 -lcudart -o kernel-call kernel-call.cu
>
> ./kernel-call
> Using CUDA device #0
> 42 == 0
>
> Compile with Clang using manual cudaLaunch:
>
> clang++ -I/usr/local/cuda-4.2/cuda/include
> -L/usr/local/cuda-4.2/cuda/lib64 -lcudart -o kernel-call kernel-call.cu
>
> ./kernel-call
> Using CUDA device #0
> kernel-call.cu (48): error: CUDA: invalid device function
>
> Compile with nvcc using manual cudaLaunch:
>
> nvcc -I/usr/local/cuda-4.2/cuda/include
> -L/usr/local/cuda-4.2/cuda/lib64 -lcudart -o kernel-call kernel-call.cu
>
> ./kernel-call
> Using CUDA device #0
> 42 == 42
>
>
> How does the glue between host and GPU kernel work?
>
> Could I somehow obtain a cudaLaunch-callable pointer to the GPU kernel?
>
>
> The other issue is the result of kernel execution. The kernel appears
> not to write to the global device memory array. I guess this also
> involves the NVPTX backend. Should I forward this to the llvm-dev
> mailing list, or are the NVPTX developers reading cfe-dev as well?
>
Something is probably getting messed up in the IR generation for the kernel
functions (we recently got rid of the old PTX back-end, on which the CUDA
integration is based, in favor of the NVPTX back-end). If you post the IR,
I can take a look at it.
>
> Thanks,
> Peter
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
--
Thanks,
Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120614/0df5007f/attachment.html>
More information about the cfe-dev
mailing list