[libc][math] Improve tanhf performance.
authorTue Ly <lntue@google.com>
Fri, 16 Sep 2022 00:48:50 +0000 (20:48 -0400)
committerTue Ly <lntue@google.com>
Mon, 19 Sep 2022 12:43:03 +0000 (08:43 -0400)
commit4973eee1228674c80f9441a36019c8a83ee3458a
treec574e06e16e2a07a55633f778d5e04537f14d4c0
parent5665d0941a3d090589843df214d78ce1dd9fce19
[libc][math] Improve tanhf performance.

Optimize the core part of `tanhf` implementation that is to compute `e^x`
similar to https://reviews.llvm.org/D133870.  Factor the constants and
polynomial approximation out so that it can be used for `exp10f`

Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanhf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput   : 13.377
System LIBC reciprocal throughput : 55.046

BEFORE:
LIBC reciprocal throughput        : 75.674
LIBC reciprocal throughput        : 33.242    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 25.927    (with `-mfma` flag)

AFTER:
LIBC reciprocal throughput        : 26.359
LIBC reciprocal throughput        : 18.888    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 14.243    (with `-mfma` flag)

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanhf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 43.365
System LIBC latency : 123.499

BEFORE
LIBC latency        : 112.968
LIBC latency        : 104.908   (with `-msse4.2` flag)
LIBC latency        : 92.310    (with `-mfma` flag)

AFTER
LIBC latency        : 69.828
LIBC latency        : 63.874    (with `-msse4.2` flag)
LIBC latency        : 57.427    (with `-mfma` flag)
```

Reviewed By: orex, zimmermann6

Differential Revision: https://reviews.llvm.org/D134002
libc/docs/math.rst
libc/src/math/generic/exp2f.cpp
libc/src/math/generic/explogxf.cpp
libc/src/math/generic/explogxf.h
libc/src/math/generic/tanhf.cpp
libc/test/src/math/explogxf_test.cpp