[CUDA] Work around a bug in rint/nearbyint caused by a broken implementation provided...
authorArtem Belevich <tra@google.com>
Tue, 4 Aug 2020 18:52:54 +0000 (11:52 -0700)
committerArtem Belevich <tra@google.com>
Wed, 5 Aug 2020 20:13:48 +0000 (13:13 -0700)
commit7d057efddc00ba7d03e6e684f23dd9b09fbd0527
treef80dda99ecf76a1ce1aeefeacc7c0fc1237cd83f
parentec8c172d01eb14eba890f36205da0613dda7f742
[CUDA] Work around a bug in rint/nearbyint caused by a broken implementation provided by CUDA.

Normally math functions are forwarded to __nv_* counterparts provided by CUDA's
libdevice bitcode. However, __nv_rint*()/__nv_nearbyint*() functions there have
a bug -- they use round() which rounds *up* instead of rounding towards the
nearest integer, so we end up with rint(2.5f) producing 3.0 instead of expected
2.0. The broken bitcode is not actually used by NVCC itself, which has both a
work-around in CUDA headers and, in recent versions, uses correct
implementations in NVCC's built-ins.

This patch implements equivalent workaround and directs rint*/nearbyint* to
__builtin_* variants that produce correct results.

Differential Revision: https://reviews.llvm.org/D85236
clang/lib/Headers/__clang_cuda_math.h