[libc][math] Implement asinf function correctly rounded for all rounding modes.
Implement asinf function correctly rounded for all rounding modes.
For `|x| <= 0.5`, we approximate `asin(x)` by
```
asin(x) = x * P(x^2)
```
where `P(X^2) = Q(X)` is a degree-20 minimax even polynomial approximating
`asin(x)/x` on `[0, 0.5]` generated by Sollya with:
```
> Q = fpminimax(asin(x)/x, [|0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20|],
[|1, D...|], [0, 0.5]);
```
When `|x| > 0.5`, we perform range reduction as follow:
Assume further that `0.5 < x <= 1`, and let:
```
y = asin(x)
```
We will use the double angle formula:
```
cos(2X) = 1 - 2 sin^2(X)
```
and the complement angle identity:
```
x = sin(y) = cos(pi/2 - y)
= 1 - 2 sin^2 (pi/4 - y/2)
```
So:
```
sin(pi/4 - y/2) = sqrt( (1 - x)/2 )
```
And hence:
```
pi/4 - y/2 = asin( sqrt( (1 - x)/2 ) )
```
Equivalently:
```
asin(x) = y = pi/2 - 2 * asin( sqrt( (1 - x)/2 ) )
```
Let `u = (1 - x)/2`, then
```
asin(x) = pi/2 - 2 * asin(u)
```
Moreover, since `0.5 < x <= 1`,
```
0 <= u < 1/4, and 0 <= sqrt(u) < 0.5.
```
And hence we can reuse the same polynomial approximation of `asin(x)` when
`|x| <= 0.5`:
```
asin(x) = pi/2 - 2 * u * P(u^2).
```
Performance benchmark using `perf` tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf
CORE-MATH reciprocal throughput : 23.418
System LIBC reciprocal throughput : 27.310
LIBC reciprocal throughput : 22.741
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 58.884
System LIBC latency : 62.055
LIBC latency : 62.037
```
Reviewed By: orex, zimmermann6
Differential Revision: https://reviews.llvm.org/D133400
16 files changed: