[PATCH] D50845: [CUDA/OpenMP] Define only some host macros during device compilation

Thu Aug 16 11:53:17 PDT 2018

Hahnfeld added a comment.

In https://reviews.llvm.org/D50845#1202838, @tra wrote:

> In https://reviews.llvm.org/D50845#1202551, @ABataev wrote:
>
> > In https://reviews.llvm.org/D50845#1202550, @Hahnfeld wrote:
> >
> > > In https://reviews.llvm.org/D50845#1202540, @ABataev wrote:
> > >
> > > > Maybe for device compilation we also should define `__NO_MATH_INLINES` and `__NO_STRING_INLINES` macros to disable inline assembly in glibc?
> > >
> > >
> > > The problem is that `__NO_MATH_INLINES` doesn't even avoid all inline assembly from `bits/mathinline.h` :-( incidentally Clang already defines `__NO_MATH_INLINES` for x86 (due to an old bug which has been fixed long ago) - and on CentOS we still have problems as described in PR38464.
> > >
> > > As a second thought: This might be valid for NVPTX, but I don't think it's a good idea for x86-like offloading targets - they might well profit from inline assembly code.
> >
> >
> > I'm not saying that we should define those macros for all targets, only for NVPTX. But still, it may disable some inline assembly for other architectures.
>
>
> IMO, trying to avoid inline assembly by defining(or not) some macros and hoping for the best is rather fragile as we'll have to chase *all* patches that host's math.h may have on any given system.

Completely agree here: This patch tries to pick the low-hanging fruits that happen to fix `include <math.h>` on most systems (and addressing a long-standing `FIXME` in the code). I know there are more headers that define inline assembly unconditionally and need more advanced fixes (see below).

> If I understand it correctly, the root cause of this exercise is that we want to compile for GPU using plain C. CUDA avoids this issue by separating device and host code via target attributes and clang has few special cases to ignore inline assembly errors in the host code if we're compiling for device. For OpenMP there's no such separation, not in the system headers, at least.

Yes, that's one of the nice properties of CUDA (for the compiler). There used to be the same restriction for OpenMP where all functions used in `target` regions needed to be put in `declare target`. However that was relaxed in favor of implicitly marking all **called** functions in that TU to be `declare target`.
So ideally I think Clang should determine which functions are really `declare target` (either explicit or implicit) and only run semantical analysis on them. If a function is then found to be "broken" it's perfectly desirable to error back to the user.

Repository:
  rC Clang

https://reviews.llvm.org/D50845