[libc][math] Simplify tanf implementation and improve its performance.
authorTue Ly <lntue@google.com>
Fri, 23 Sep 2022 23:12:38 +0000 (19:12 -0400)
committerTue Ly <lntue@google.com>
Tue, 27 Sep 2022 01:36:12 +0000 (21:36 -0400)
commite15b2da42f5f065fcaada9e9c151c213ff8ee060
tree91b400d218308cc5bdcb7a32a64dfcacafa0838f
parentf3e02989e6e5d6fa90d5ac1ea21c592197a0ae7d
[libc][math] Simplify tanf implementation and improve its performance.

Simplify `tanf` implementation and improve its performance.

Completely reuse the implementation of `sinf`, `cosf`, `sincosf` and use
the definition `tan(x) = sin(x)/cos(x)`.

Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput   : 18.558
System LIBC reciprocal throughput : 49.919

BEFORE:
LIBC reciprocal throughput        : 36.480
LIBC reciprocal throughput        : 27.217    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 20.205    (with `-mfma` flag)

AFTER:
LIBC reciprocal throughput        : 30.337
LIBC reciprocal throughput        : 21.072    (with `-msse4.2` flag)
LIBC reciprocal throughput        : 15.804    (with `-mfma` flag)

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 56.702
System LIBC latency : 107.206

BEFORE
LIBC latency        : 97.598
LIBC latency        : 91.119   (with `-msse4.2` flag)
LIBC latency        : 82.655    (with `-mfma` flag)

AFTER
LIBC latency        : 74.560
LIBC latency        : 66.575    (with `-msse4.2` flag)
LIBC latency        : 61.636    (with `-mfma` flag)
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D134575
libc/docs/math.rst
libc/src/math/generic/CMakeLists.txt
libc/src/math/generic/tanf.cpp
libc/test/src/math/tanf_test.cpp