[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

Wed Aug 1 08:32:06 PDT 2018

hfinkel added a comment.

In https://reviews.llvm.org/D47849#1183996, @Hahnfeld wrote:

> In https://reviews.llvm.org/D47849#1183150, @hfinkel wrote:
>
> > Hrmm. Doesn't that make it so that whatever functions are implemented using that inline assembly will not be callable from target code (or, perhaps worse, will crash the backend if called)?
>
>
> You are right :-(
>
> However I'm getting worried about a more general case, not all inline assembly is guarded by `#ifdef`s that we could hope to get right. For example take `sys/io.h` which currently throws 18 errors when compiling with offloading to GPUs, even with `-O0`. The inline assembly is only guarded by `#if defined __GNUC__ && __GNUC__ >= 2` which should be defined by any modern compiler claiming compatibility with GCC. I'm not sure this particular header will ever end up in an OpenMP application, but others with inline assembly will. From a quick grep it looks like some headers dealing with atomic operations have inline assembly and even `eigen3/Eigen/src/Core/util/Memory.h` for finding the cpuid.
>
> Coming back to the original problem: Maybe we need to undefine optimization macros as in your patch to get as many correct inline functions as possible AND ignore errors from inline assembly as in my patch to not break when including weird headers?

The problem is that the inline assembly might actually be for the target, instead of the host, because we also have target preprocessor macros defined, and it's going to be hard to tell. I'm not sure that there's a great solution here, and I agree that having something more general than undefining some specific things that happen to matter for math.h would be better. As you point out, this is not just a system-header problem. We might indeed want to undefine all of the target-feature-related macros (although that won't always be sufficient, because we need basic arch macros for the system headers to work at all, and those are generally enough to guard some inline asm).

Maybe the following makes sense: Only define the host macros, minus target-feature ones, when compiling for the target in the context of the system headers. That makes the system headers work while providing a "clean" preprocessor environment for the rest of the code (and, thus, retains our ability to complain about bad inline asm).

Repository:
  rC Clang

https://reviews.llvm.org/D47849