[libc][math] Implement acosf function correctly rounded for all rounding modes.
Implement acosf function correctly rounded for all rounding modes.
We perform range reduction as follows:
- When `|x| < 2^(-10)`, we use cubic Taylor polynomial:
```
acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 / 6.
```
- When `2^(-10) <= |x| <= 0.5`, we use the same approximation that is used for `asinf(x)` when `|x| <= 0.5`:
```
acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 * P(x^2).
```
- When `0.5 < x <= 1`, we use the double angle formula: `cos(2y) = 1 - 2 * sin^2 (y)` to reduce to:
```
acos(x) = 2 * asin( sqrt( (1 - x)/2 ) )
```
- When `-1 <= x < -0.5`, we reduce to the positive case above using the formula:
```
acos(x) = pi - acos(-x)
```
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh acosf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput : 28.613
System LIBC reciprocal throughput : 29.204
LIBC reciprocal throughput : 24.271
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 55.554
System LIBC latency : 76.879
LIBC latency : 62.118
```
Reviewed By: orex, zimmermann6
Differential Revision: https://reviews.llvm.org/
D133550