[PATCH] D61399: [OpenMP][Clang] Support for target math functions

Fri May 3 07:17:22 PDT 2019

ABataev added a comment.

In D61399#1488897 <https://reviews.llvm.org/D61399#1488897>, @hfinkel wrote:

> In D61399#1488366 <https://reviews.llvm.org/D61399#1488366>, @ABataev wrote:
>
> > In D61399#1488329 <https://reviews.llvm.org/D61399#1488329>, @hfinkel wrote:
> >
> > > In D61399#1488309 <https://reviews.llvm.org/D61399#1488309>, @ABataev wrote:
> > >
> > > > In D61399#1488299 <https://reviews.llvm.org/D61399#1488299>, @hfinkel wrote:
> > > >
> > > > > In D61399#1488262 <https://reviews.llvm.org/D61399#1488262>, @ABataev wrote:
> > > > >
> > > > > > I don't like this implementation. Seems to me, it breaks one of the OpenMP standard requirements: the program can be compiled without openmp support. I assume, that with this includes the program won't be able to be compiled without OpenMP support anymore because it may use some device-specific math functions explicitly.
> > > > > >  Instead, I would like to see some additional, device-scpecific math header file, that must be included explicitly to support some device-specific math functions. And we need to provide default implementations for those extra math functions for all the platforms we're going to support, including default host implementations.
> > > > >
> > > > >
> > > > > Can you provide an example of a conforming program that can't be compiled without OpenMP support? Regardless of the use of any device-specific functions (which isn't covered by the standard, of course, but might be needed in practice), the code still needs to be compilable by the host in order to generate the host-fallback version. This doesn't change that. Thus, any program that uses anything from this math.h, etc. needs to compile for the host, and thus, likely compiles without OpenMP support. Maybe I'm missing your point, however.
> > > >
> > > >
> > > > Assume we have something like this:
> > > >
> > > >   #pragma omp target if(cond)
> > > >   a = __nv_xxxx(....);
> > > >
> > > >
> > > > Instead of `__nv_xxx` you can try to use any Cuda-specific function, which is not the part of the standard `math.h`/`cmath` files. Will it be compilable even with OpenMP?
> > >
> > >
> > > I don't think that this changes that one way or the other. Your example won't work, AFAIK, unless you do something like:
> > >
> > >   #pragma omp target if(cond)
> > >   #ifdef __NVPTX__
> > >   a = __nv_xxxx(....);
> > >   #else
> > >   a = something_on_the_host;
> > >   #endif
> > >   
> > >
> > > and anything from these headers that doesn't also have a host version will suffer the same fate: if it won't also compile for the host (one way or another), then it won't work.
> >
> >
> > The problem with this header file is that it allows to use those Cuda-specific functions unconditionally in some cases:
> >
> >   #pragma omp target
> >   a = __nv_xxxx(....);
> >
> >
> > It won't require any target-specific guards to compile this code (if we compile it only for Cuda-specific devices) and we're loosing the consistency here: in some cases target regions will require special device guards, in others, with the same function calls, it is not. And the worst thing, is that we implicitly allow to introduce this kind of incostistency into users code. That's why I would prefer to see a special kind of the include file, NVPTX specific, that must be included explicitly, so the user explictly commanded to use some target-specific math functions, if he really wants it. Plus, maybe, in this files we need force check of the platform and warn users that the functions from this header file must be used using device-specific checks. Or provide some kind of the default implementations for all the platforms, that do not support those math function natively.
>
>
> I believe that I understand your point, but two things:
>
> 1. I think that you're mistaken on the underlying premise. That code will not meaningfully compile without ifdefs, even if only CUDA-specific devices are the only ones selected. We *always* compile code for the host as well, not for offloading proper, but for the fallback (for execution when the offloading fails). If I emulate this situation by writing this:
>
>
>
>   #ifdef __NVPTX__
>   int __nv_floor();
>   #endif
>  
>   int main() {
>   #pragma omp target
>   __nv_floor();
>   }
>   
>
> and try to compile using Clang with -fopenmp -fopenmp-targets=nvptx64, the compilation fails:
>
>   int1.cpp:8:1: error: use of undeclared identifier '__nv_floor'
>   
>
> and this is because, when we invoke the compilation for the host, there is no declaration for that function. This is true even though nvptx64 is the only target for which the code is being compiled (because we always also compile the host fallback).
>
> 2. I believe that the future state -- what we get by following this patch, and then when declare variant is available using that -- gives us all what we want. When we have declare variant, then all of the definitions in these headers will be declared as variants only available on the nvptx device, and so, for a user to use such a function they would need to explicit add a variant that is only available on the host. This would be explicit.
>
>   I think that you're getting at what I mentioned earlier, where if you have code that uses some function called, for example, rnorm3d -- this is a non-standard function, and so a user is free to define it, and then they compile with OpenMP target and include math.h, then the version of rnorm3d we bring in from the CUDA header will silently override their version (which would otherwise be implicitly declare target). But I don't think that this will happen either with this patch, instead, they'll get a warning about conflicting definitions (`error: redefinition of 'rnorm3d'`), and if it's an external-linkage function, then something should give them an error (although we should check this). The more interesting case, I think, actually comes when we switch to using `declare variant`, because then I think this silent override does occur, so we'd want to add a warning for when a variant delcared in a system header file would silently override a plain (non-variant) function declare/defined elsewhere. I believe that will give us all of the benefits of this while also addressing the concern you highlight.

Ahh, yes, I forgot that we still generate the host version of the target region. Still, I think we need to prvide the default implementation of those non-standard functions (they can be very simple, maybe reporting error is going to be enough), which can be overriden by user.
Also, if I do recall correctly, this solution works only for C++ (because we use our own declaration for the standard math functions, they are automatically marked as non-builtins). For regular C in many cases, instead of the library function calls, the llvm intrinsic is generated. Did you check that it works for С?

Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61399/new/

https://reviews.llvm.org/D61399