[PATCH] D120566: [OpenCL][AMDGPU]: Do not allow a call to kernel

Fri Feb 25 15:16:49 PST 2022

rjmccall added a comment.

In D120566#3346533 <https://reviews.llvm.org/D120566#3346533>, @arsenm wrote:

> In D120566#3346506 <https://reviews.llvm.org/D120566#3346506>, @rjmccall wrote:
>
>> Is there something which stops you from taking the address of a kernel and then calling it?  If not, are there actually any uses of kernels in the module that shouldn't be rewritten as uses of the clone?
>
> The actual amdgpu_kernel is uncallable and has a totally different ABI, and is invoked by external driver code. From the user's device code perspective, only the callable function version is meaningful.

I think you're misunderstanding what I'm asking.  I believe that in OpenCL, you can do `&someKernelFunction` in source code and then call that.  The rewrite in this patch does not handle non-call uses of the kernel function and so will continue to miscompile them.

>> I feel like this would be a lot easier to just fix in your LLVM pass so that you rewrite any uses of a kernel to use a clone instead before you rewrite the kernel.
>
> Then we can't ban calls to kernels (and would be pushing some kind of symbol naming conflict decision into the backend) and in principle would have to handle this actual call.

Okay, this is not an accurate description of what you're trying to do, and this is important to be precise about.  You are not "banning calls to kernels", which would be a novel language restriction and make you non-conformant to OpenCL.  You still have a language requirement to allow code to directly use kernel functions.  That is why this patch is modifying IR generation instead of emitting new errors in Sema.

What's happening here is that your target (very reasonably) requires kernels to have a special kernel entrypoint in order to be called from outside.  That entrypoint uses a very different ABI from ordinary functions, one which simplifies being dynamically called by the runtime, and so it is important that ordinary uses of the function don't accidentally resolve against that special entrypoint.  You therefore need two different functions for the kernel, one to satisfy standard uses and one to act as the kernel entrypoint.

Your current architecture is to generate code normally, which will produce what's roughly the standard entrypoint, and then have a backend pass break that down and produce a kernel entrypoint.  I can understand why you find that frustratingly limited, and I agree that it doesn't seem to handle standard uses correctly.  Something needs to change here.

Now, Clang supports many different kernel languages, all of which face very similar language/implementation problems.  It is therefore always informative to go check to see how other language implementors have tried to solve the problem you're facing.  So if you go and look at how CUDA is implemented in Clang, you will see that they have introduced a "kernel reference kind" to `GlobalDecl`, which allows them to distinguish between the kernel entrypoint and the standard entrypoint in IRGen.  You could very easily build on this in your OpenCL implementation so that Clang emits the standard entrypoint and then either your pass or IRGen itself fills in the kernel entrypoint by marshaling arguments and then calling (presumably in a way that forces inlining) the standard entrypoint.  That would also give you total control of how arguments are marshaled in the kernel entrypoint.

Alternatively, I think cloning the standard entrypoint so that uses of it are rewritten to a clone is reasonable enough.  I don't really see why doing the cloning in IRGen is necessary when you already have a module pass that does similar kinds of rewriting.  Doing the clone in IRGen also does not seem to move you closer to your goal of actually marshaling arguments differently.  Most importantly, though, I believe you do need to rewrite all the uses and not just the direct calls.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120566/new/

https://reviews.llvm.org/D120566