[Openmp-commits] [PATCH] D80897: [OpenMP] Initial support for std::complex in target regions

Artem Belevich via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Jun 1 10:46:06 PDT 2020

tra added a comment.

Hmm. I'm pretty sure tensorflow is using std::complex for various types. I'm surprised that we haven't seen these functions missing.
Plain CUDA (e.g. https://godbolt.org/z/Us6oXC) code appears to have no references to `__mul*` or `__div*`, at least for optimized builds, but they do popup in unoptimized ones. Curiously enough, unoptimized code compiled with `-stdlib=libc++ --std=c++11` does not need the soft-float functions. That would explain why we don't see the build breaks.

These differences suggest that these changes may need to be more nuanced with regard to the standard c++ library version and, possibly, the C++ standard used.
If possible, I would prefer to limit interference with the standard libraries only to the cases where it's necessary.

Comment at: clang/lib/Headers/__clang_cuda_complex_builtins.h:29
+#define _ISNANd std::isnan
+#define _ISNANf _ISNANd
+#define _ISINFd std::isinf
Nit: this creates impression that we fall back on `double` variant of the function, while in reality we'll end up using `std::isnan<float>`.
Perhaps it would be better to use fully specialized function template name in all these macros. It would also avoid potential issues if someone/somewhere adds other overloads. E.g. we may end up facing `std::complex<half>` which may overload resolution ambiguous in some cases. 

Comment at: clang/lib/Headers/__clang_cuda_complex_builtins.h:63
+__DEVICE__ double _Complex __muldc3(double __a, double __b, double __c,
+                                    double __d) {
Soft-float library has bunch of other functions. https://gcc.gnu.org/onlinedocs/gccint/Soft-float-library-routines.html

I wonder why only the complex variants of the soft-float support functions are missing. 
Does it mean that x86 code also does rely on the library to do complex multiplication?
If x86 can do complex ops, why can't nvptx?
If x86 can't, would make sense to teach it?

  rG LLVM Github Monorepo



More information about the Openmp-commits mailing list