r283680 - [CUDA] Support <complex> and std::min/max on the device.

Justin Lebar via cfe-commits cfe-commits at lists.llvm.org
Sat Oct 8 20:56:37 PDT 2016


> > The fix is not as simple as simply changing our implementation of
> e.g.
> std::isnan to call __builtin_isnanf, because we also would want to
> fix
> ::isnanf,
>
> No, if I understand what you're saying, you specifically wouldn't.

I understand how this is feasible on the CPU, because ::isnan is a
library function that can never be inlined.  But on the GPU, these
library functions are (at the moment) always declared inline.  That
seems to complicate this idea.

Right now ::isnan(x) is going to call __nv_isnan(x), which computes
abs(x).  If we pass -ffast-math, the compiler will be able to assume
that abs(x) is not nan.  I guess you're saying that we would need to
special-case __nv_isnan so that -ffast-math is always off
(essentially).  But, what if it gets inlined?

It looks like libstdc++'s std::isnan calls __builtin_isnan (same for
its std::isinf), and its ::isnan is an alias for std::isnan.  So
libstdc++'s isnan is going to return false with -ffast-math (or anyway
it will do the same thing as the builtin functions, which aiui is what
you're proposing libc++'s isnan *not* do).

> This is important for use cases where, for example, even though the user might want fast math, they still need to check their inputs for NaNs.

Since this isn't going to work with libstdc++, and it relies on not
doing anything that the compiler might construe as "arithmetic" on the
value, this seems pretty dicey to me.  One could instead compile a
separate TU without -ffast-math and do all their validation there?
I'd have a lot more confidence in that working today, continuing to
work tomorrow, and being portable across compilers and standard
libraries.

I don't mean to relitigate https://reviews.llvm.org/D18639, but I am
not convinced that libc++'s isnan should have a path that returns true
with -ffast-math, given that

 * libstdc++'s isnan will always return false with -ffast-math,
 * it's at best complicated for us to make this work if you can inline
the body of isnan (as we can on the GPU),
 * it's at best complicated for users to write "correct" C++ that
calls isnan with -ffast-math, especially if they want their code to
continue to work in the future in the face of changing compilers
(-ffast-math is not specified anywhere, so who knows what it means),
and
 * there's a relatively simple workaround (use a separate TU) that
sidesteps all these problems.

I'm not saying we should go in and change libc++'s CPU implementation
of isnan to call the builtin.  I'll leave that up to people who care
about CPU code.  But at least on the GPU, it still makes sense to me
to fix the problem you originally identified by making
std::/::isnan/isinf always return false/true with -ffast-math.  Which
I think we should be able to do with the intrinsic upgrade I
originally suggested.

On a separate note: Can we make __libcpp_isnan and __libcpp_isinf
constexpr?  This will make them implicitly host+device functions,
solving the problem on the GPU.  Otherwise I may have to reimplement
these functions in a header, and that's lame.  Although I am clearly
not above that.  :)

On Sat, Oct 8, 2016 at 6:50 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>> From: "Justin Lebar" <jlebar at google.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: "Clang Commits" <cfe-commits at lists.llvm.org>
>> Sent: Saturday, October 8, 2016 6:16:12 PM
>> Subject: Re: r283680 - [CUDA] Support <complex> and std::min/max on the device.
>>
>> Hal,
>>
>> On NVPTX, these functions eventually get resolved to function calls
>> in
>> libdevice, e.g. __nv_isinff and __nv_isnanf.
>>
>> llvm does not do a good job understanding the body of e.g.
>> __nvvm_isnanf, because it uses nvptx-specific intrinsic functions,
>> notably @llvm.nvvm.fabs.f.  These are opaque to the LLVM optimizer.
>>
>> The fix is not as simple as simply changing our implementation of
>> e.g.
>> std::isnan to call __builtin_isnanf, because we also would want to
>> fix
>> ::isnanf,
>
> No, if I understand what you're saying, you specifically wouldn't. We had a discussion about this on the review thread(s) that led to r283051, and while we want to elide the checks inside the mathematical functions, we don't want to replace isnan itself with something that will get optimized away. We want to keep the ability for the user to explicitly check for NaNs, etc. even if we don't want those checks to appear inside of mathematical operations. This is important for use cases where, for example, even though the user might want fast math, they still need to check their inputs for NaNs.
>
>  -Hal
>
>> but we can't override that implementation without some
>> major
>> surgery on the nvptx headers.
>>
>> David Majnemer and I talked about one way to fix this, namely by
>> using
>> IR intrinsic upgrades to replace the opaque nvptx intrinsics with
>> LLVM
>> intrinsics.  LLVM would then be able to understand these intrinsics
>> and optimize them.  We would reap benefits not just for std::isnan,
>> but also e.g. constant-folding calls like std::abs that also
>> eventually end up in libnvvm.
>>
>> I did the first half of this work, by adding lowerings for the
>> various
>> LLVM intrinsics to the NVPTX backend [1].  But David is now busy with
>> other things and hasn't been able to help with the second half,
>> namely
>> using IR upgrades to replace the nvptx target-specific intrinsics
>> with
>> generalized LLVM intrinsics.  Perhaps this is something you'd be able
>> to help with?
>>
>> In any case, using builtins here without fixing std::isnan and
>> ::isnan
>> feels to me to be the wrong solution.  It seems to me that we should
>> be able to rely on std::isnan and friends being fast, and if they're
>> not, we should fix that.  Using builtins here would be "cheating" to
>> make our implementation faster than user code.
>>
>> I'll note, separately, that on x86, clang does not seem to
>> constant-fold std::isinf or __builtin_isinff to false with
>> -ffast-math
>> -ffinite-math-only.  GCC can do it.  Clang gets std::isnan.
>> https://godbolt.org/g/vZB55a
>>
>> By the way, the changes you made to libc++ unfortunately break this
>> patch with libc++, because e.g. __libcpp_isnan is not a device
>> function.  I'll have to think about how to fix that -- I may send you
>> a patch.
>>
>> Regards,
>> -Justin
>>
>> [1] https://reviews.llvm.org/D24300
>>
>> On Sat, Oct 8, 2016 at 3:36 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>> > Hi Justin,
>> >
>> > This is neat!
>> >
>> > I see a bunch of uses of std::isinf, etc. here. It tends to be
>> > important that, when using -ffast-math (or -ffinite-math-only)
>> > these checks get optimized away. Can you please check that they
>> > do? If not, you might mirror what I've done in r283051 for libc++,
>> > which is similar to what libstdc++ ends up doing, so that we use
>> > __builtin_isnan/isinf/isfinite.
>> >
>> > Thanks again,
>> > Hal
>> >
>> > ----- Original Message -----
>> >> From: "Justin Lebar via cfe-commits" <cfe-commits at lists.llvm.org>
>> >> To: cfe-commits at lists.llvm.org
>> >> Sent: Saturday, October 8, 2016 5:16:13 PM
>> >> Subject: r283680 - [CUDA] Support <complex> and std::min/max on
>> >> the device.
>> >>
>> >> Author: jlebar
>> >> Date: Sat Oct  8 17:16:12 2016
>> >> New Revision: 283680
>> >>
>> >> URL: http://llvm.org/viewvc/llvm-project?rev=283680&view=rev
>> >> Log:
>> >> [CUDA] Support <complex> and std::min/max on the device.
>> >>
>> >> Summary:
>> >> We do this by wrapping <complex> and <algorithm>.
>> >>
>> >> Tests are in the test-suite.
>> >>
>> >> Reviewers: tra
>> >>
>> >> Subscribers: jhen, beanz, cfe-commits, mgorny
>> >>
>> >> Differential Revision: https://reviews.llvm.org/D24979
>> >>
>> >> Added:
>> >>     cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h
>> >>     cfe/trunk/lib/Headers/cuda_wrappers/
>> >>     cfe/trunk/lib/Headers/cuda_wrappers/algorithm
>> >>     cfe/trunk/lib/Headers/cuda_wrappers/complex
>> >> Modified:
>> >>     cfe/trunk/lib/Driver/ToolChains.cpp
>> >>     cfe/trunk/lib/Headers/CMakeLists.txt
>> >>     cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
>> >>
>> >> Modified: cfe/trunk/lib/Driver/ToolChains.cpp
>> >> URL:
>> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Driver/ToolChains.cpp?rev=283680&r1=283679&r2=283680&view=diff
>> >> ==============================================================================
>> >> --- cfe/trunk/lib/Driver/ToolChains.cpp (original)
>> >> +++ cfe/trunk/lib/Driver/ToolChains.cpp Sat Oct  8 17:16:12 2016
>> >> @@ -4694,6 +4694,15 @@ void Linux::AddClangCXXStdlibIncludeArgs
>> >>
>> >>  void Linux::AddCudaIncludeArgs(const ArgList &DriverArgs,
>> >>                                 ArgStringList &CC1Args) const {
>> >> +  if (!DriverArgs.hasArg(options::OPT_nobuiltininc)) {
>> >> +    // Add cuda_wrappers/* to our system include path.  This lets
>> >> us
>> >> wrap
>> >> +    // standard library headers.
>> >> +    SmallString<128> P(getDriver().ResourceDir);
>> >> +    llvm::sys::path::append(P, "include");
>> >> +    llvm::sys::path::append(P, "cuda_wrappers");
>> >> +    addSystemInclude(DriverArgs, CC1Args, P);
>> >> +  }
>> >> +
>> >>    if (DriverArgs.hasArg(options::OPT_nocudainc))
>> >>      return;
>> >>
>> >>
>> >> Modified: cfe/trunk/lib/Headers/CMakeLists.txt
>> >> URL:
>> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/CMakeLists.txt?rev=283680&r1=283679&r2=283680&view=diff
>> >> ==============================================================================
>> >> --- cfe/trunk/lib/Headers/CMakeLists.txt (original)
>> >> +++ cfe/trunk/lib/Headers/CMakeLists.txt Sat Oct  8 17:16:12 2016
>> >> @@ -24,10 +24,13 @@ set(files
>> >>    bmiintrin.h
>> >>    __clang_cuda_builtin_vars.h
>> >>    __clang_cuda_cmath.h
>> >> +  __clang_cuda_complex_builtins.h
>> >>    __clang_cuda_intrinsics.h
>> >>    __clang_cuda_math_forward_declares.h
>> >>    __clang_cuda_runtime_wrapper.h
>> >>    cpuid.h
>> >> +  cuda_wrappers/algorithm
>> >> +  cuda_wrappers/complex
>> >>    clflushoptintrin.h
>> >>    emmintrin.h
>> >>    f16cintrin.h
>> >>
>> >> Added: cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h
>> >> URL:
>> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h?rev=283680&view=auto
>> >> ==============================================================================
>> >> --- cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h (added)
>> >> +++ cfe/trunk/lib/Headers/__clang_cuda_complex_builtins.h Sat Oct
>> >>  8
>> >> 17:16:12 2016
>> >> @@ -0,0 +1,203 @@
>> >> +/*===-- __clang_cuda_complex_builtins - CUDA impls of runtime
>> >> complex fns ---===
>> >> + *
>> >> + * Permission is hereby granted, free of charge, to any person
>> >> obtaining a copy
>> >> + * of this software and associated documentation files (the
>> >> "Software"), to deal
>> >> + * in the Software without restriction, including without
>> >> limitation
>> >> the rights
>> >> + * to use, copy, modify, merge, publish, distribute, sublicense,
>> >> and/or sell
>> >> + * copies of the Software, and to permit persons to whom the
>> >> Software is
>> >> + * furnished to do so, subject to the following conditions:
>> >> + *
>> >> + * The above copyright notice and this permission notice shall be
>> >> included in
>> >> + * all copies or substantial portions of the Software.
>> >> + *
>> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
>> >> KIND,
>> >> EXPRESS OR
>> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> >> MERCHANTABILITY,
>> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
>> >> EVENT
>> >> SHALL THE
>> >> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
>> >> OR
>> >> OTHER
>> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>> >> OTHERWISE,
>> >> ARISING FROM,
>> >> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> >> DEALINGS IN
>> >> + * THE SOFTWARE.
>> >> + *
>> >> +
>> >> *===-----------------------------------------------------------------------===
>> >> + */
>> >> +
>> >> +#ifndef __CLANG_CUDA_COMPLEX_BUILTINS
>> >> +#define __CLANG_CUDA_COMPLEX_BUILTINS
>> >> +
>> >> +// This header defines __muldc3, __mulsc3, __divdc3, and
>> >> __divsc3.
>> >>  These are
>> >> +// libgcc functions that clang assumes are available when
>> >> compiling
>> >> c99 complex
>> >> +// operations.  (These implementations come from libc++, and have
>> >> been modified
>> >> +// to work with CUDA.)
>> >> +
>> >> +extern "C" inline __device__ double _Complex __muldc3(double __a,
>> >> double __b,
>> >> +                                                      double __c,
>> >> double __d) {
>> >> +  double __ac = __a * __c;
>> >> +  double __bd = __b * __d;
>> >> +  double __ad = __a * __d;
>> >> +  double __bc = __b * __c;
>> >> +  double _Complex z;
>> >> +  __real__(z) = __ac - __bd;
>> >> +  __imag__(z) = __ad + __bc;
>> >> +  if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {
>> >> +    int __recalc = 0;
>> >> +    if (std::isinf(__a) || std::isinf(__b)) {
>> >> +      __a = std::copysign(std::isinf(__a) ? 1 : 0, __a);
>> >> +      __b = std::copysign(std::isinf(__b) ? 1 : 0, __b);
>> >> +      if (std::isnan(__c))
>> >> +        __c = std::copysign(0, __c);
>> >> +      if (std::isnan(__d))
>> >> +        __d = std::copysign(0, __d);
>> >> +      __recalc = 1;
>> >> +    }
>> >> +    if (std::isinf(__c) || std::isinf(__d)) {
>> >> +      __c = std::copysign(std::isinf(__c) ? 1 : 0, __c);
>> >> +      __d = std::copysign(std::isinf(__d) ? 1 : 0, __d);
>> >> +      if (std::isnan(__a))
>> >> +        __a = std::copysign(0, __a);
>> >> +      if (std::isnan(__b))
>> >> +        __b = std::copysign(0, __b);
>> >> +      __recalc = 1;
>> >> +    }
>> >> +    if (!__recalc && (std::isinf(__ac) || std::isinf(__bd) ||
>> >> +                      std::isinf(__ad) || std::isinf(__bc))) {
>> >> +      if (std::isnan(__a))
>> >> +        __a = std::copysign(0, __a);
>> >> +      if (std::isnan(__b))
>> >> +        __b = std::copysign(0, __b);
>> >> +      if (std::isnan(__c))
>> >> +        __c = std::copysign(0, __c);
>> >> +      if (std::isnan(__d))
>> >> +        __d = std::copysign(0, __d);
>> >> +      __recalc = 1;
>> >> +    }
>> >> +    if (__recalc) {
>> >> +      // Can't use std::numeric_limits<double>::infinity() --
>> >> that
>> >> doesn't have
>> >> +      // a device overload (and isn't constexpr before C++11,
>> >> naturally).
>> >> +      __real__(z) = __builtin_huge_valf() * (__a * __c - __b *
>> >> __d);
>> >> +      __imag__(z) = __builtin_huge_valf() * (__a * __d + __b *
>> >> __c);
>> >> +    }
>> >> +  }
>> >> +  return z;
>> >> +}
>> >> +
>> >> +extern "C" inline __device__ float _Complex __mulsc3(float __a,
>> >> float __b,
>> >> +                                                     float __c,
>> >> float __d) {
>> >> +  float __ac = __a * __c;
>> >> +  float __bd = __b * __d;
>> >> +  float __ad = __a * __d;
>> >> +  float __bc = __b * __c;
>> >> +  float _Complex z;
>> >> +  __real__(z) = __ac - __bd;
>> >> +  __imag__(z) = __ad + __bc;
>> >> +  if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {
>> >> +    int __recalc = 0;
>> >> +    if (std::isinf(__a) || std::isinf(__b)) {
>> >> +      __a = std::copysign(std::isinf(__a) ? 1 : 0, __a);
>> >> +      __b = std::copysign(std::isinf(__b) ? 1 : 0, __b);
>> >> +      if (std::isnan(__c))
>> >> +        __c = std::copysign(0, __c);
>> >> +      if (std::isnan(__d))
>> >> +        __d = std::copysign(0, __d);
>> >> +      __recalc = 1;
>> >> +    }
>> >> +    if (std::isinf(__c) || std::isinf(__d)) {
>> >> +      __c = std::copysign(std::isinf(__c) ? 1 : 0, __c);
>> >> +      __d = std::copysign(std::isinf(__d) ? 1 : 0, __d);
>> >> +      if (std::isnan(__a))
>> >> +        __a = std::copysign(0, __a);
>> >> +      if (std::isnan(__b))
>> >> +        __b = std::copysign(0, __b);
>> >> +      __recalc = 1;
>> >> +    }
>> >> +    if (!__recalc && (std::isinf(__ac) || std::isinf(__bd) ||
>> >> +                      std::isinf(__ad) || std::isinf(__bc))) {
>> >> +      if (std::isnan(__a))
>> >> +        __a = std::copysign(0, __a);
>> >> +      if (std::isnan(__b))
>> >> +        __b = std::copysign(0, __b);
>> >> +      if (std::isnan(__c))
>> >> +        __c = std::copysign(0, __c);
>> >> +      if (std::isnan(__d))
>> >> +        __d = std::copysign(0, __d);
>> >> +      __recalc = 1;
>> >> +    }
>> >> +    if (__recalc) {
>> >> +      __real__(z) = __builtin_huge_valf() * (__a * __c - __b *
>> >> __d);
>> >> +      __imag__(z) = __builtin_huge_valf() * (__a * __d + __b *
>> >> __c);
>> >> +    }
>> >> +  }
>> >> +  return z;
>> >> +}
>> >> +
>> >> +extern "C" inline __device__ double _Complex __divdc3(double __a,
>> >> double __b,
>> >> +                                                      double __c,
>> >> double __d) {
>> >> +  int __ilogbw = 0;
>> >> +  // Can't use std::max, because that's defined in <algorithm>,
>> >> and
>> >> we don't
>> >> +  // want to pull that in for every compile.  The CUDA headers
>> >> define
>> >> +  // ::max(float, float) and ::max(double, double), which is
>> >> sufficient for us.
>> >> +  double __logbw = std::logb(max(std::abs(__c), std::abs(__d)));
>> >> +  if (std::isfinite(__logbw)) {
>> >> +    __ilogbw = (int)__logbw;
>> >> +    __c = std::scalbn(__c, -__ilogbw);
>> >> +    __d = std::scalbn(__d, -__ilogbw);
>> >> +  }
>> >> +  double __denom = __c * __c + __d * __d;
>> >> +  double _Complex z;
>> >> +  __real__(z) = std::scalbn((__a * __c + __b * __d) / __denom,
>> >> -__ilogbw);
>> >> +  __imag__(z) = std::scalbn((__b * __c - __a * __d) / __denom,
>> >> -__ilogbw);
>> >> +  if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {
>> >> +    if ((__denom == 0.0) && (!std::isnan(__a) ||
>> >> !std::isnan(__b)))
>> >> {
>> >> +      __real__(z) = std::copysign(__builtin_huge_valf(), __c) *
>> >> __a;
>> >> +      __imag__(z) = std::copysign(__builtin_huge_valf(), __c) *
>> >> __b;
>> >> +    } else if ((std::isinf(__a) || std::isinf(__b)) &&
>> >> std::isfinite(__c) &&
>> >> +               std::isfinite(__d)) {
>> >> +      __a = std::copysign(std::isinf(__a) ? 1.0 : 0.0, __a);
>> >> +      __b = std::copysign(std::isinf(__b) ? 1.0 : 0.0, __b);
>> >> +      __real__(z) = __builtin_huge_valf() * (__a * __c + __b *
>> >> __d);
>> >> +      __imag__(z) = __builtin_huge_valf() * (__b * __c - __a *
>> >> __d);
>> >> +    } else if (std::isinf(__logbw) && __logbw > 0.0 &&
>> >> std::isfinite(__a) &&
>> >> +               std::isfinite(__b)) {
>> >> +      __c = std::copysign(std::isinf(__c) ? 1.0 : 0.0, __c);
>> >> +      __d = std::copysign(std::isinf(__d) ? 1.0 : 0.0, __d);
>> >> +      __real__(z) = 0.0 * (__a * __c + __b * __d);
>> >> +      __imag__(z) = 0.0 * (__b * __c - __a * __d);
>> >> +    }
>> >> +  }
>> >> +  return z;
>> >> +}
>> >> +
>> >> +extern "C" inline __device__ float _Complex __divsc3(float __a,
>> >> float __b,
>> >> +                                                     float __c,
>> >> float __d) {
>> >> +  int __ilogbw = 0;
>> >> +  float __logbw = std::logb(max(std::abs(__c), std::abs(__d)));
>> >> +  if (std::isfinite(__logbw)) {
>> >> +    __ilogbw = (int)__logbw;
>> >> +    __c = std::scalbn(__c, -__ilogbw);
>> >> +    __d = std::scalbn(__d, -__ilogbw);
>> >> +  }
>> >> +  float __denom = __c * __c + __d * __d;
>> >> +  float _Complex z;
>> >> +  __real__(z) = std::scalbn((__a * __c + __b * __d) / __denom,
>> >> -__ilogbw);
>> >> +  __imag__(z) = std::scalbn((__b * __c - __a * __d) / __denom,
>> >> -__ilogbw);
>> >> +  if (std::isnan(__real__(z)) && std::isnan(__imag__(z))) {
>> >> +    if ((__denom == 0) && (!std::isnan(__a) || !std::isnan(__b)))
>> >> {
>> >> +      __real__(z) = std::copysign(__builtin_huge_valf(), __c) *
>> >> __a;
>> >> +      __imag__(z) = std::copysign(__builtin_huge_valf(), __c) *
>> >> __b;
>> >> +    } else if ((std::isinf(__a) || std::isinf(__b)) &&
>> >> std::isfinite(__c) &&
>> >> +               std::isfinite(__d)) {
>> >> +      __a = std::copysign(std::isinf(__a) ? 1 : 0, __a);
>> >> +      __b = std::copysign(std::isinf(__b) ? 1 : 0, __b);
>> >> +      __real__(z) = __builtin_huge_valf() * (__a * __c + __b *
>> >> __d);
>> >> +      __imag__(z) = __builtin_huge_valf() * (__b * __c - __a *
>> >> __d);
>> >> +    } else if (std::isinf(__logbw) && __logbw > 0 &&
>> >> std::isfinite(__a) &&
>> >> +               std::isfinite(__b)) {
>> >> +      __c = std::copysign(std::isinf(__c) ? 1 : 0, __c);
>> >> +      __d = std::copysign(std::isinf(__d) ? 1 : 0, __d);
>> >> +      __real__(z) = 0 * (__a * __c + __b * __d);
>> >> +      __imag__(z) = 0 * (__b * __c - __a * __d);
>> >> +    }
>> >> +  }
>> >> +  return z;
>> >> +}
>> >> +
>> >> +#endif // __CLANG_CUDA_COMPLEX_BUILTINS
>> >>
>> >> Modified: cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
>> >> URL:
>> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h?rev=283680&r1=283679&r2=283680&view=diff
>> >> ==============================================================================
>> >> --- cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h
>> >> (original)
>> >> +++ cfe/trunk/lib/Headers/__clang_cuda_runtime_wrapper.h Sat Oct
>> >>  8
>> >> 17:16:12 2016
>> >> @@ -312,6 +312,7 @@ __device__ inline __cuda_builtin_gridDim
>> >>
>> >>  #include <__clang_cuda_cmath.h>
>> >>  #include <__clang_cuda_intrinsics.h>
>> >> +#include <__clang_cuda_complex_builtins.h>
>> >>
>> >>  // curand_mtgp32_kernel helpfully redeclares blockDim and
>> >>  threadIdx
>> >>  in host
>> >>  // mode, giving them their "proper" types of dim3 and uint3.
>> >>   This
>> >>  is
>> >>
>> >> Added: cfe/trunk/lib/Headers/cuda_wrappers/algorithm
>> >> URL:
>> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/cuda_wrappers/algorithm?rev=283680&view=auto
>> >> ==============================================================================
>> >> --- cfe/trunk/lib/Headers/cuda_wrappers/algorithm (added)
>> >> +++ cfe/trunk/lib/Headers/cuda_wrappers/algorithm Sat Oct  8
>> >> 17:16:12
>> >> 2016
>> >> @@ -0,0 +1,96 @@
>> >> +/*===---- complex - CUDA wrapper for <algorithm>
>> >> ----------------------------===
>> >> + *
>> >> + * Permission is hereby granted, free of charge, to any person
>> >> obtaining a copy
>> >> + * of this software and associated documentation files (the
>> >> "Software"), to deal
>> >> + * in the Software without restriction, including without
>> >> limitation
>> >> the rights
>> >> + * to use, copy, modify, merge, publish, distribute, sublicense,
>> >> and/or sell
>> >> + * copies of the Software, and to permit persons to whom the
>> >> Software is
>> >> + * furnished to do so, subject to the following conditions:
>> >> + *
>> >> + * The above copyright notice and this permission notice shall be
>> >> included in
>> >> + * all copies or substantial portions of the Software.
>> >> + *
>> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
>> >> KIND,
>> >> EXPRESS OR
>> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> >> MERCHANTABILITY,
>> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
>> >> EVENT
>> >> SHALL THE
>> >> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
>> >> OR
>> >> OTHER
>> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>> >> OTHERWISE,
>> >> ARISING FROM,
>> >> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> >> DEALINGS IN
>> >> + * THE SOFTWARE.
>> >> + *
>> >> +
>> >> *===-----------------------------------------------------------------------===
>> >> + */
>> >> +
>> >> +#ifndef __CLANG_CUDA_WRAPPERS_ALGORITHM
>> >> +#define __CLANG_CUDA_WRAPPERS_ALGORITHM
>> >> +
>> >> +// This header defines __device__ overloads of std::min/max, but
>> >> only if we're
>> >> +// <= C++11.  In C++14, these functions are constexpr, and so are
>> >> implicitly
>> >> +// __host__ __device__.
>> >> +//
>> >> +// We don't support the initializer_list overloads because
>> >> +// initializer_list::begin() and end() are not __host__
>> >> __device__
>> >> functions.
>> >> +//
>> >> +// When compiling in C++14 mode, we could force std::min/max to
>> >> have
>> >> different
>> >> +// implementations for host and device, by declaring the device
>> >> overloads
>> >> +// before the constexpr overloads appear.  We choose not to do
>> >> this
>> >> because
>> >> +
>> >> +//  a) why write our own implementation when we can use one from
>> >> the
>> >> standard
>> >> +//     library? and
>> >> +//  b) libstdc++ is evil and declares min/max inside a header
>> >> that
>> >> is included
>> >> +//     *before* we include <algorithm>.  So we'd have to
>> >> unconditionally
>> >> +//     declare our __device__ overloads of min/max, but that
>> >> would
>> >> pollute
>> >> +//     things for people who choose not to include <algorithm>.
>> >> +
>> >> +#include_next <algorithm>
>> >> +
>> >> +#if __cplusplus <= 201103L
>> >> +
>> >> +// We need to define these overloads in exactly the namespace our
>> >> standard
>> >> +// library uses (including the right inline namespace), otherwise
>> >> they won't be
>> >> +// picked up by other functions in the standard library (e.g.
>> >> functions in
>> >> +// <complex>).  Thus the ugliness below.
>> >> +#ifdef _LIBCPP_BEGIN_NAMESPACE_STD
>> >> +_LIBCPP_BEGIN_NAMESPACE_STD
>> >> +#else
>> >> +namespace std {
>> >> +#ifdef _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> >> +_GLIBCXX_BEGIN_NAMESPACE_VERSION
>> >> +#endif
>> >> +#endif
>> >> +
>> >> +template <class __T, class __Cmp>
>> >> +inline __device__ const __T &
>> >> +max(const __T &__a, const __T &__b, __Cmp __cmp) {
>> >> +  return __cmp(__a, __b) ? __b : __a;
>> >> +}
>> >> +
>> >> +template <class __T>
>> >> +inline __device__ const __T &
>> >> +max(const __T &__a, const __T &__b) {
>> >> +  return __a < __b ? __b : __a;
>> >> +}
>> >> +
>> >> +template <class __T, class __Cmp>
>> >> +inline __device__ const __T &
>> >> +min(const __T &__a, const __T &__b, __Cmp __cmp) {
>> >> +  return __cmp(__b, __a) ? __b : __a;
>> >> +}
>> >> +
>> >> +template <class __T>
>> >> +inline __device__ const __T &
>> >> +min(const __T &__a, const __T &__b) {
>> >> +  return __a < __b ? __b : __a;
>> >> +}
>> >> +
>> >> +#ifdef _LIBCPP_END_NAMESPACE_STD
>> >> +_LIBCPP_END_NAMESPACE_STD
>> >> +#else
>> >> +#ifdef _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> >> +_GLIBCXX_END_NAMESPACE_VERSION
>> >> +#endif
>> >> +} // namespace std
>> >> +#endif
>> >> +
>> >> +#endif // __cplusplus <= 201103L
>> >> +#endif // __CLANG_CUDA_WRAPPERS_ALGORITHM
>> >>
>> >> Added: cfe/trunk/lib/Headers/cuda_wrappers/complex
>> >> URL:
>> >> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/cuda_wrappers/complex?rev=283680&view=auto
>> >> ==============================================================================
>> >> --- cfe/trunk/lib/Headers/cuda_wrappers/complex (added)
>> >> +++ cfe/trunk/lib/Headers/cuda_wrappers/complex Sat Oct  8
>> >> 17:16:12
>> >> 2016
>> >> @@ -0,0 +1,79 @@
>> >> +/*===---- complex - CUDA wrapper for <complex>
>> >> ------------------------------===
>> >> + *
>> >> + * Permission is hereby granted, free of charge, to any person
>> >> obtaining a copy
>> >> + * of this software and associated documentation files (the
>> >> "Software"), to deal
>> >> + * in the Software without restriction, including without
>> >> limitation
>> >> the rights
>> >> + * to use, copy, modify, merge, publish, distribute, sublicense,
>> >> and/or sell
>> >> + * copies of the Software, and to permit persons to whom the
>> >> Software is
>> >> + * furnished to do so, subject to the following conditions:
>> >> + *
>> >> + * The above copyright notice and this permission notice shall be
>> >> included in
>> >> + * all copies or substantial portions of the Software.
>> >> + *
>> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
>> >> KIND,
>> >> EXPRESS OR
>> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> >> MERCHANTABILITY,
>> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
>> >> EVENT
>> >> SHALL THE
>> >> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES
>> >> OR
>> >> OTHER
>> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>> >> OTHERWISE,
>> >> ARISING FROM,
>> >> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> >> DEALINGS IN
>> >> + * THE SOFTWARE.
>> >> + *
>> >> +
>> >> *===-----------------------------------------------------------------------===
>> >> + */
>> >> +
>> >> +#pragma once
>> >> +
>> >> +// Wrapper around <complex> that forces its functions to be
>> >> __host__
>> >> +// __device__.
>> >> +
>> >> +// First, include host-only headers we think are likely to be
>> >> included by
>> >> +// <complex>, so that the pragma below only applies to <complex>
>> >> itself.
>> >> +#if __cplusplus >= 201103L
>> >> +#include <type_traits>
>> >> +#endif
>> >> +#include <stdexcept>
>> >> +#include <cmath>
>> >> +#include <sstream>
>> >> +
>> >> +// Next, include our <algorithm> wrapper, to ensure that device
>> >> overloads of
>> >> +// std::min/max are available.
>> >> +#include <algorithm>
>> >> +
>> >> +#pragma clang force_cuda_host_device begin
>> >> +
>> >> +// When compiling for device, ask libstdc++ to use its own
>> >> implements of
>> >> +// complex functions, rather than calling builtins (which resolve
>> >> to
>> >> library
>> >> +// functions that don't exist when compiling CUDA device code).
>> >> +//
>> >> +// This is a little dicey, because it causes libstdc++ to define
>> >> a
>> >> different
>> >> +// set of overloads on host and device.
>> >> +//
>> >> +//   // Present only when compiling for host.
>> >> +//   __host__ __device__ void complex<float> sin(const
>> >> complex<float>& x) {
>> >> +//     return __builtin_csinf(x);
>> >> +//   }
>> >> +//
>> >> +//   // Present when compiling for host and for device.
>> >> +//   template <typename T>
>> >> +//   void __host__ __device__ complex<T> sin(const complex<T>& x)
>> >> {
>> >> +//     return complex<T>(sin(x.real()) * cosh(x.imag()),
>> >> +//                       cos(x.real()), sinh(x.imag()));
>> >> +//   }
>> >> +//
>> >> +// This is safe because when compiling for device, all function
>> >> calls in
>> >> +// __host__ code to sin() will still resolve to *something*, even
>> >> if
>> >> they don't
>> >> +// resolve to the same function as they resolve to when compiling
>> >> for host.  We
>> >> +// don't care that they don't resolve to the right function
>> >> because
>> >> we won't
>> >> +// codegen this host code when compiling for device.
>> >> +
>> >> +#pragma push_macro("_GLIBCXX_USE_C99_COMPLEX")
>> >> +#pragma push_macro("_GLIBCXX_USE_C99_COMPLEX_TR1")
>> >> +#define _GLIBCXX_USE_C99_COMPLEX 0
>> >> +#define _GLIBCXX_USE_C99_COMPLEX_TR1 0
>> >> +
>> >> +#include_next <complex>
>> >> +
>> >> +#pragma pop_macro("_GLIBCXX_USE_C99_COMPLEX_TR1")
>> >> +#pragma pop_macro("_GLIBCXX_USE_C99_COMPLEX")
>> >> +
>> >> +#pragma clang force_cuda_host_device end
>> >>
>> >>
>> >> _______________________________________________
>> >> cfe-commits mailing list
>> >> cfe-commits at lists.llvm.org
>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
>> >>
>> >
>> > --
>> > Hal Finkel
>> > Lead, Compiler Technology and Programming Languages
>> > Leadership Computing Facility
>> > Argonne National Laboratory
>>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory


More information about the cfe-commits mailing list