[PATCH] D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls

Thu Aug 23 08:16:43 PDT 2018

gtbercea added inline comments.

================
Comment at: lib/Basic/Targets/NVPTX.cpp:232
+  // getting inlined on the device.
+  Builder.defineMacro("__NO_MATH_INLINES");
 }
----------------
tra wrote:
> This relies on implementation detail of particular variant of the header file you're assuming all compilations will include. This is a workaround of the real problem (attempting to use headers from machine X while targeting Y) at best.
> 
> D50845 is dealing with the issue of headers for target code. Hopefully, they'll find a way to provide device-specific headers, so you don't rely on host headers being parseable during device-side compilation.
I agree. The proper fix would be what the other patch is attempting to do.

================
Comment at: lib/Driver/ToolChains/Clang.cpp:4758
+    // toolchain.
+    CmdArgs.push_back("-fno-math-builtin");
   }
----------------
tra wrote:
> Could you elaborate on why you don't want the builtins?
> Builtins are enabled and are useful for CUDA. What makes their use different for OpenMP?
> Are you doing it to guarantee that math functions remain unresolved in IR so you could link them in from external bitcode?
> 
That's right. I don't particularly like this approach as this leads to OpenMP-NVPTX toolchain missing out on optimizations such as replacing math function call with basic operations ( pow(a,2) -> a*a for example).
I am trying to fix this in a future patch by allowing intrinsics/builtins to propagate.

Repository:
  rC Clang

https://reviews.llvm.org/D47849