[clang] [OpenMP] Allow GPUs to be targeted directly via `-fopenmp`. (PR #122149)

Wed Jan 8 14:17:53 PST 2025

shiltian wrote:

> To me this looks like compilation for a host, except the GPU is the host. The only functions that could be called from such a CU would be the top-level ones, not any of the auto-generated one.
> 
> Additionally, the host wouldn't support offload, so we'd need to do something about TARGET (or any other construct that cannot be inside of TARGET). We should probably ignore those with a diagnostic.

I get it, but that doesn't look like the case. If you look at the test case, the `target` region in `bar` is simply ignored. To me this looks like treating the entire TU being wrapped into a giant target region instead of compiling for host.

> This is not the case. This patch does not make omp parallel the same as omp target parallel. This patch does however implement omp parallel with _parallel_51, if the target is a GPU, and otherwise with fork_threads. From the standard perspective, what we provide is the semantics of omp parallel, the implementation just happens to be different on the targets. target parallel means "try to offload, then parallel", which for the host still does that. If your triple now is a GPU, we would not implement it as a parallel, not with this patch. We should either emit an error, or emit a task containing a parallel, effectively emitting the fallback only.

The patch does make `omp parallel` same as `omp target parallel` from code generation's perspective. That is exactly the part that I'm not 100% sure if it is right because even the same construct might have different behavior when it is nested into a target region. I can't give a concrete example here, and I also might be wrong here. If this is not an issue (from the standard perspective), then I'm fine with the change, as long as we have a defined behavior of dealing with target regions, instead of simply ignoring it, which does seem like a bug to me.

https://github.com/llvm/llvm-project/pull/122149