[PATCH] D44747: Set calling convention for CUDA kernel
John McCall via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Mar 27 13:22:09 PDT 2018
rjmccall added inline comments.
================
Comment at: lib/Sema/SemaOverload.cpp:1492
+ Changed = true;
+ }
+
----------------
yaxunl wrote:
> rjmccall wrote:
> > It's cheaper not to check the CUDA language mode here; pulling the CC out of the FPT is easy.
> >
> > Why is this necessary, anyway? From the spec, it doesn't look to me like kernel function pointers can be converted to ordinary function pointers. A kernel function pointer is supposed to be declared with something like `__global__ void (*fn)(void)`. You'll need to change your patch to SemaType to apply the CC even when compiling for the host, of course.
> >
> > I was going to say that you should use this CC in your validation that calls with execution configurations go to kernel functions, but... I can't actually find where you do that validation.
> >
> > Do you need these function pointers to be a different size from the host function pointer?
> In CUDA, `__global__` can only be used with function declaration or definition. Using it in function pointer declaration will result in a warning: 'global' attribute only applies to functions.
>
> Also, there is this lit test in SemaCUDA:
>
> ```
> __global__ void kernel() {}
>
> typedef void (*fn_ptr_t)();
>
> __host__ fn_ptr_t get_ptr_h() {
> return kernel;
> }
>
> ```
> It allows implicit conversion of `__global__ void()` to void(*)(), therefore I need the above change to drop the CUDA kernel calling convention in such implicit conversion.
I see. I must have mis-read the specification, but I see that the code samples I can find online agree with that test case. So `__global__` function pointers are just treated as function pointers, and it's simply undefined behavior if you try to call a function pointer that happens to be a kernel without an execution configuration, or contrariwise if you use an execution configuration to call a function pointer that isn't a kernel.
In that case, I think the best solution is to just immediately strip `__global__` from the type of a DRE to a kernel function, since `__global__` isn't supposed to be part of the user-facing type system.
https://reviews.llvm.org/D44747
More information about the cfe-commits
mailing list