[cfe-dev] Clang and CUDA with C++11 features

Fri Jun 15 10:41:16 PDT 2012

On Fri, Jun 15, 2012 at 12:48 PM, Peter Colberg <peter at colberg.org> wrote:

> On Fri, Jun 15, 2012 at 11:59:59AM -0400, Peter Colberg wrote:
> > On Fri, Jun 15, 2012 at 08:31:50AM -0700, Manjunath Kudlur wrote:
> > > As Justin mentioned, I don't think the right plumbing exists that uses
> the
> > > frontend support for parsing and lowering the CUDA syntax and connects
> it
> > > to the NVPTX backend. For instance, from the IR, it looks like the
> kernel
> > > function pointer is just cast to i8* and passed to cudaLaunch.
> cudaLaunch
> > > requires the name of the kernel to passed a char string.
> >
> > Definitely not, otherwise my cudaLaunch wrapper would be failing since
> 2007 ;-).
> >
> > The advantage of the CUDA runtime library, as opposed to the CUDA
> > driver library, lies in the convenient execution of kernels.
> >
> > When the nvcc frontend converts the <<< >>> syntax to proper C++ code,
> > it does not pass a char string to cudaLaunch, but an actual pointer.
> > The only question is how to get that pointer in Clang…
>
> I have to admit that I never looked at the memory pointed to by a
> kernel function pointer, so I did (please tell me if too naïvely…).
>
> // function.cu
>
> __global__ void f(int* array)
> {
>    array[0] = 42;
> }
>
> int main()
> {
>    void (*p)(int*) = &f;
>    printf("%s\n", reinterpret_cast<char const*>(p));
> }
>
>
> nvcc -Xcompiler -Wall -Xptxas -v -o function function.cu
> ptxas info    : Compiling entry function '_Z1fPi' for 'sm_10'
> ptxas info    : Used 2 registers, 8+16 bytes smem
>
>
> ./function
> UH��H��H�}�H�E�H�������UH��SH��(�P@
>
> So the function pointer to the GPU kernel is indeed not an ASCII string.
>
> What is it then?
>

Regardless of how nvcc works under the hood, the point is that Clang is not
currently set up to fully support CUDA.  You can invoke it once with
-fcuda-is-device and once without to get the IR for both the host and
device, but there is not yet a good way to link those together.  The path
of least resistance right now is to compile the device code to PTX and then
invoke it with the Driver API.  I'm not sure what the maintainer of the
CUDA front-end in Clang is planning, you would have to ask him/her when
full support is planned.

>
> Peter
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120615/100e4352/attachment.html>