[clang] 7d057ef - [CUDA] Work around a bug in rint/nearbyint caused by a broken implementation provided by CUDA.

Wed Aug 5 13:14:34 PDT 2020

Author: Artem Belevich
Date: 2020-08-05T13:13:48-07:00
New Revision: 7d057efddc00ba7d03e6e684f23dd9b09fbd0527

URL: https://github.com/llvm/llvm-project/commit/7d057efddc00ba7d03e6e684f23dd9b09fbd0527
DIFF: https://github.com/llvm/llvm-project/commit/7d057efddc00ba7d03e6e684f23dd9b09fbd0527.diff

LOG: [CUDA] Work around a bug in rint/nearbyint caused by a broken implementation provided by CUDA.

Normally math functions are forwarded to __nv_* counterparts provided by CUDA's
libdevice bitcode. However, __nv_rint*()/__nv_nearbyint*() functions there have
a bug -- they use round() which rounds *up* instead of rounding towards the
nearest integer, so we end up with rint(2.5f) producing 3.0 instead of expected
2.0. The broken bitcode is not actually used by NVCC itself, which has both a
work-around in CUDA headers and, in recent versions, uses correct
implementations in NVCC's built-ins.

This patch implements equivalent workaround and directs rint*/nearbyint* to
__builtin_* variants that produce correct results.

Differential Revision: https://reviews.llvm.org/D85236

Added: 
    

Modified: 
    clang/lib/Headers/__clang_cuda_math.h

Removed: 
    


################################################################################
diff  --git a/clang/lib/Headers/__clang_cuda_math.h b/clang/lib/Headers/__clang_cuda_math.h
index 332e616702ac..acb26ad345d5 100644

--- a/clang/lib/Headers/__clang_cuda_math.h
+++ b/clang/lib/Headers/__clang_cuda_math.h
@@ -195,8 +195,8 @@ __DEVICE__ int max(int __a, int __b) { return __nv_max(__a, __b); }
 __DEVICE__ int min(int __a, int __b) { return __nv_min(__a, __b); }
 __DEVICE__ double modf(double __a, double *__b) { return __nv_modf(__a, __b); }
 __DEVICE__ float modff(float __a, float *__b) { return __nv_modff(__a, __b); }
-__DEVICE__ double nearbyint(double __a) { return __nv_nearbyint(__a); }
-__DEVICE__ float nearbyintf(float __a) { return __nv_nearbyintf(__a); }
+__DEVICE__ double nearbyint(double __a) { return __builtin_nearbyint(__a); }
+__DEVICE__ float nearbyintf(float __a) { return __builtin_nearbyintf(__a); }
 __DEVICE__ double nextafter(double __a, double __b) {
   return __nv_nextafter(__a, __b);
 }
@@ -249,8 +249,9 @@ __DEVICE__ double rhypot(double __a, double __b) {
 __DEVICE__ float rhypotf(float __a, float __b) {
   return __nv_rhypotf(__a, __b);
 }
-__DEVICE__ double rint(double __a) { return __nv_rint(__a); }
-__DEVICE__ float rintf(float __a) { return __nv_rintf(__a); }
+// __nv_rint* in libdevice is buggy and produces incorrect results.
+__DEVICE__ double rint(double __a) { return __builtin_rint(__a); }
+__DEVICE__ float rintf(float __a) { return __builtin_rintf(__a); }
 __DEVICE__ double rnorm(int __a, const double *__b) {
   return __nv_rnorm(__a, __b);
 }