[PATCH] D100552: [HIP] Diagnose compiling kernel without offload arch
Artem Belevich via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Apr 15 14:37:08 PDT 2021
tra added a comment.
Enforcing explicit GPU target makes sense.
However, I think that singling out a `__global__` as the trigger is not sufficient for the intended purpose.
If we can't generate a usable GPU-side binary, then we should produce an error if we need to generate *anything* during GPU-side compilation.
Using `__global__` as a proxy would not catch some use cases and, possibly, will produce false positives in others.
E.g. what if I have a TU which only has a `__device__ int var = 42;` combined with a host-side code to memcpy to/from it? It would still be a valid, if not very useful code, but it would still suffer from runtime being unable to load it on a GPU unless that variable is in a GPU binary compiled with a valid target.
`__device__` functions in TUs compiled with `-fgpu-rdc` would have a similar problem. They would eventually be linked into a GPU binary which will be useless if it's not compiled for correct GPU. Granted, `__device__` functions will eventually need to be called from a kernel, so we will error out on a `__global__` *somewhere*, but it will miss the problem when such TU does not get to the linking stage (e.g. maybe the user wants to link them at runtime).
================
Comment at: clang/include/clang/Basic/DiagnosticSemaKinds.td:8260
+def err_hip_kern_without_gpu : Error<
+ "compiling a HIP kernel without specifying an offload arch is not allowed">,
+ DefaultFatal;
----------------
How about compiling a file with `__device__` functions with `-fgpu-rdc`? If a kernel with no-arch is an error, then this should be an error, too.
================
Comment at: clang/lib/Sema/SemaDeclAttr.cpp:4431
}
+ if (S.getASTContext().getTargetInfo().getTargetOpts().CPU.empty() &&
+ S.getLangOpts().HIP && S.getLangOpts().CUDAIsDevice) {
----------------
Will this fire if we have an uninstantiated kernel template?
================
Comment at: clang/test/SemaCUDA/kernel-no-gpu.cu:7
+
+__global__ void kern1() {}
+// hip-error at -1 {{compiling a HIP kernel without specifying an offload arch is not allowed}}
----------------
We'll need few more test cases.
E.g. these should be fine.
```
template <typename T> __global__ void kernel(T arg ) {};
__global__ void kernel(T arg );
```
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100552/new/
https://reviews.llvm.org/D100552
More information about the cfe-commits
mailing list