[PATCH] D100552: [HIP] Diagnose compiling kernel without offload arch

Thu Apr 15 14:37:08 PDT 2021

tra added a comment.

Enforcing explicit GPU target makes sense.

However, I think that singling out a `__global__` as the trigger is not sufficient for the intended purpose.

If we can't generate a usable GPU-side binary, then we should produce an error if we need to generate *anything* during GPU-side compilation.
Using `__global__` as a proxy would not catch some use cases and, possibly, will produce false positives in  others.

E.g. what if I have a TU which only has a `__device__ int var = 42;` combined with a host-side code to memcpy to/from it? It would still be a valid, if not very useful code, but it would still suffer from runtime being unable to load it on a GPU unless that variable is in a GPU binary compiled with a valid target.

`__device__` functions in TUs compiled with `-fgpu-rdc` would have a similar problem. They would eventually be linked into a GPU binary which will be useless if it's not compiled for correct GPU. Granted, `__device__` functions will eventually need to be called from a kernel, so we will error out on a `__global__`  *somewhere*, but it will miss the problem when such TU does not get to the linking stage (e.g. maybe the user wants to link them at runtime).

================
Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:8260
+def err_hip_kern_without_gpu : Error<
+  "compiling a HIP kernel without specifying an offload arch is not allowed">,
+  DefaultFatal;
----------------
How about compiling a file with `__device__` functions with `-fgpu-rdc`? If a kernel with no-arch is an error, then this should be an error, too.

================
Comment at: clang/lib/Sema/SemaDeclAttr.cpp:4431
   }
+  if (S.getASTContext().getTargetInfo().getTargetOpts().CPU.empty() &&
+      S.getLangOpts().HIP && S.getLangOpts().CUDAIsDevice) {
----------------
Will this fire if we have an uninstantiated kernel template? 

================
Comment at: clang/test/SemaCUDA/kernel-no-gpu.cu:7
+
+__global__ void kern1() {}
+// hip-error at -1 {{compiling a HIP kernel without specifying an offload arch is not allowed}}
----------------
We'll need few more test cases.

E.g. these should be fine.

```
template <typename T> __global__ void kernel(T arg ) {};
__global__ void kernel(T arg );

```

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100552/new/

https://reviews.llvm.org/D100552