[clang] [llvm] [Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (PR #94549)

Fri Jun 7 17:23:03 PDT 2024

jdoerfert wrote:

There seems to be some trouble with NVIDIA offload (I tested mainly AMDGPU) and f128, I'll make sure that works too.
The nits are easy to address, I just copied the style around.
I'll also add a IR test to match the new runtime calls and kernel argument passing.

> Will kernels in TUs compiled with `-foffload-via-llvm` be interoperable with code that wants to launch them from another TU compiled w/o `-foffload-via-llvm` ?
> 
> E.g.:
> 
> * a.cu: `__global__ void kernel() { ... }`
> * b.cu: `extern __global__ void kernel(); void func() { kernel<<<1,1>>>();}`
> 
> This could use a test in the testsuite to actually check whether it works.

I'll look into this. Intuitively, the kernel launch needs -foffload-via-llvm (which implies -foffload-new-driver) and the kernel definition needs -foffload-new-driver. Similarly, with the new driver flag device code should link fine. Right now, this defaults to gpu-rdc, as OpenMP does, but we can change that. On that note, non-rdc should actually internalize all but the kernels and thereby help the middle end as well.

https://github.com/llvm/llvm-project/pull/94549