[PATCH] D85879: [OpenMP] Overload `std::isnan` and friends multiple times for the GPU

Thu Aug 13 18:03:11 PDT 2020

jdoerfert added inline comments.

================
Comment at: clang/lib/Headers/__clang_cuda_cmath.h:85
+//        (note that we do not create implicit base functions here). To avoid
+//        this clash we add a new trait to some of them that is always true
+//        (this is LLVM after all ;)). It will only influence the mangled name
----------------
tra wrote:
> jdoerfert wrote:
> > tra wrote:
> > > jdoerfert wrote:
> > > > tra wrote:
> > > > > If you just want to disable some existing declarations that get in the way, one way to do it would be to redeclare them with an `__arrtibute__((enable_if(false)))`
> > > > > 
> > > > > Having overloads with different return types will be observable.
> > > > We need both overloads as we don't know what return type the system uses. I modeled the test below this way, that is we don't know if `isnan` has a `bool` or `int` return type.
> > > > 
> > > > > Having overloads with different return types will be observable.
> > > > 
> > > > Unsure what observable effect you expect, the variants are there, yes, but they have different names (wrt the base function and the other variant function). The variant without a base function is simply an unused internal function. Could you elaborate what problem you expect?
> > > What will be the result of `sizeof(isinf(1.0f))` ? I would expect it to be the same on host and on the device.
> > > I'm not quite sure what the pragma would do, so it's possible I'm barking at the wrong tree here.
> > > 
> > So, I actually had to run this to verify what I suspected would happen:
> > 
> > `sizeof(isinf(1.0f))` is in the AST usually:
> > ```
> >   |         `-UnaryExprOrTypeTraitExpr 0x5586a27bd590 <col:14, col:37> 'unsigned long' sizeof                                                                                                                                                                  
> >   |           `-ParenExpr 0x5586a27bd570 <col:20, col:37> 'bool'
> >   |             `-CallExpr 0x5586a27bd548 <col:21, col:36> 'bool'
> >   |               |-ImplicitCastExpr 0x5586a27bd530 <col:21, col:26> 'bool (*)(float)' <FunctionToPointerDecay>
> >   |               | `-DeclRefExpr 0x5586a27bd500 <col:21, col:26> 'bool (float)' lvalue Function 0x5586a276f9f0 'isinf' 'bool (float)' non_odr_use_unevaluated
> >   |               `-FloatingLiteral 0x5586a27bd290 <col:32> 'float' 1.000000e+00
> > ```
> > If `isinf` has an applicable variant, it will be picked up:
> > ```
> >   |         `-UnaryExprOrTypeTraitExpr 0x55f9ac949a20 <col:14, col:37> 'unsigned long' sizeof                                                                                                                                                                  
> >   |           `-ParenExpr 0x55f9ac949a00 <col:20, col:37> 'bool'
> >   |             `-PseudoObjectExpr 0x55f9ac9499e0 <col:21, col:36> 'bool'
> >   |               |-CallExpr 0x55f9ac949978 <col:21, col:36> 'bool'
> >   |               | |-ImplicitCastExpr 0x55f9ac949960 <col:21, col:26> 'bool (*)(float)' <FunctionToPointerDecay>
> >   |               | | `-DeclRefExpr 0x55f9ac949930 <col:21, col:26> 'bool (float)' lvalue Function 0x55f9ac7cd1d0 'isinf' 'bool (float)' non_odr_use_unevaluated
> >   |               | `-FloatingLiteral 0x55f9ac9496c0 <col:32> 'float' 1.000000e+00
> >   |               `-CallExpr 0x55f9ac9499b8 </data/build/llvm-project/lib/clang/12.0.0/include/__clang_cuda_cmath.h:36:20, //data/src/llvm-project/clang/test/Headers/openmp_device_math_isnan.cpp:21:36> 'bool'
> >   |                 |-ImplicitCastExpr 0x55f9ac9499a0 </data/build/llvm-project/lib/clang/12.0.0/include/__clang_cuda_cmath.h:36:20> 'bool (*)(float) __attribute__((nothrow))' <FunctionToPointerDecay>
> >   |                 | `-DeclRefExpr 0x55f9ac939790 <col:20> 'bool (float) __attribute__((nothrow))' Function 0x55f9ac939690 'isinf[implementation={extension(disable_implicit_base, match_any, allow_templates)}, device={arch(nvptx, nvptx64)}]' 'bool (float@
> >   |                 `-FloatingLiteral 0x55f9ac9496c0 <//data/src/llvm-project/clang/test/Headers/openmp_device_math_isnan.cpp:21:32> 'float' 1.000000e+00
> > ```
> > That is the behavior I expected, as it happens for any base function call with an applicable variant.
> > 
> > This patch doesn't change any of this. We have two specialization that do only differ in their return type but each will only be a variant of a base function with that return type. In any context, when we have a call to the original base function, then we try to specialize. Since only the `bool` return *or* the `int` return specializations are variants of the base function, we might replace the base call with a call, but consistent on host and device. I hope this makes some sense, I don't think I did a good job explaining.
> It sounds like openmp's 'variant' is more of an 'overlay' rather than a CUDA-style target overload that I was thinking of (and overloads don't allow different return types at all). If I understand you correctly, the code below allows (literally?) matching host-side function signatures. Because the functions returning bool and functions returning int can't coexist on the host, there will be no conflicts on device side either.
> 
> Is that in the ballpark of what's happening? If I'm still off, could you point me to more info about how "pragma omp declare variant" works?
> 
> 
> It sounds like openmp's 'variant' is more of an 'overlay' rather than a CUDA-style target overload that I was thinking of (and overloads don't allow different return types at all).
> [...]
> Is that in the ballpark of what's happening?  

Yep. Basically, you can provide N specialization for a function and calls to that function are replaced by calls to a matching specialization. We also only do this for direct calls, that is `&foo` will always give you the address of the base version, which may or may not be desirable but is certainly different from overloading. I also had to completely give up on my overloading based implementation of declare variant :(, but the new one works really well ;)

>  If I understand you correctly, the code below allows (literally?) matching host-side function signatures. Because the functions returning bool and functions returning int can't coexist on the host, there will be no conflicts on device side either.

Exactly, with the caveat mentioned here in the TODO: We mangle the variants to avoid conflicts with the base function. Since this mangling is only based on the context selector and the function name, two variants that only differ in their return type would clash. To avoid this I added a "no-op" context selector trait here that will ensure the names are different in the "overlay/variant" space.

>  how "pragma omp declare variant" works?

So, this is an extension to the context selector as allowed by the standard. The latest public version is https://www.openmp.org/wp-content/uploads/openmp-TR8.pdf, `declare variant` is on page 56, Section 2.3.5. OpenMP 5.1 (Nov 2020) will have various clarifications but the principles are the same. Note that there is `declare variant` and the `begin/end` version which behave slightly different. I implemented all of math and complex support with the begin/end version and I believe it to be far superior anyway ;)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85879/new/

https://reviews.llvm.org/D85879