[Openmp-commits] [PATCH] D80897: [OpenMP] Initial support for std::complex in target regions

Johannes Doerfert via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Jun 1 11:53:07 PDT 2020


jdoerfert marked 2 inline comments as done.
jdoerfert added a comment.

In D80897#2066723 <https://reviews.llvm.org/D80897#2066723>, @tra wrote:

> Hmm. I'm pretty sure tensorflow is using std::complex for various types. I'm surprised that we haven't seen these functions missing.


Which functions and missing from where? In CUDA-mode we did provide `__XXXXc3` already.

> Plain CUDA (e.g. https://godbolt.org/z/Us6oXC) code appears to have no references to `__mul*` or `__div*`, at least for optimized builds, but they do popup in unoptimized ones. Curiously enough, unoptimized code compiled with `-stdlib=libc++ --std=c++11` does not need the soft-float functions. That would explain why we don't see the build breaks.

Its not that simple, and tbh, I don't have the full picture yet. Plain (clang) CUDA uses these functions (https://godbolt.org/z/dp_FY2), they just disappear after inlining because of the linkage. If you however enable `-fast-math` they are not used (https://godbolt.org/z/_N-STh). I couldn't run with stdlib=libc++ locally and godbold cuts of the output so I'm not sure if they are used and inlined or not used.

> These differences suggest that these changes may need to be more nuanced with regard to the standard c++ library version and, possibly, the C++ standard used.
>  If possible, I would prefer to limit interference with the standard libraries only to the cases where it's necessary.

The way I understand this is that we can always provide correct weak versions of `__XXXXc3` without any correctness issues. They will be stripped if they are not needed anyway. That said, this patch should not modify the CUDA behavior (except minor float vs double corrections in the `__XXXXc3` methods). Could you elaborate what interference you expect?



================
Comment at: clang/lib/Headers/__clang_cuda_complex_builtins.h:29
+#define _ISNANd std::isnan
+#define _ISNANf _ISNANd
+#define _ISINFd std::isinf
----------------
tra wrote:
> Nit: this creates impression that we fall back on `double` variant of the function, while in reality we'll end up using `std::isnan<float>`.
> Perhaps it would be better to use fully specialized function template name in all these macros. It would also avoid potential issues if someone/somewhere adds other overloads. E.g. we may end up facing `std::complex<half>` which may overload resolution ambiguous in some cases. 
No problem. I'll just use std::NAME for all of them.


================
Comment at: clang/lib/Headers/__clang_cuda_complex_builtins.h:63
+
+__DEVICE__ double _Complex __muldc3(double __a, double __b, double __c,
+                                    double __d) {
----------------
tra wrote:
> Soft-float library has bunch of other functions. https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html
> 
> I wonder why only the complex variants of the soft-float support functions are missing. 
> Does it mean that x86 code also does rely on the library to do complex multiplication?
> If x86 can do complex ops, why can't nvptx?
> If x86 can't, would make sense to teach it?
> I wonder why only the complex variants of the soft-float support functions are missing.

I would guess others are conceptually missing too, the question is if we need them. I did grep the clang source for 7 non-complex soft-float support functions from the different categories listed in the gcc docs, none was found.

> Does it mean that x86 code also does rely on the library to do complex multiplication?

I think so, yes. Some system library will provide the implementation of `__muldc3` for the slow path of a complex multiplication.

> If x86 can do complex ops, why can't nvptx?
> If x86 can't, would make sense to teach it?

I think I don't understand this (and maybe the question above). What we do in CUDA right now, and with this patch in OpenMP, is to provide the `__XXXXc3` functions on the device. Usually they are in some system library that we just not have on the device so we have to add them somehow. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80897/new/

https://reviews.llvm.org/D80897





More information about the Openmp-commits mailing list