x86: Optimize svml_s_tanhf4_core_sse4.S
authorNoah Goldstein <goldstein.w.n@gmail.com>
Thu, 9 Jun 2022 16:58:35 +0000 (09:58 -0700)
committerNoah Goldstein <goldstein.w.n@gmail.com>
Thu, 9 Jun 2022 19:51:25 +0000 (12:51 -0700)
commitcffb9414c55b2e169ed8af1cefd1e3f2ea97e750
tree18f8ef507ae394488b620d49cfcbf8d2c5aa93c7
parentbcc41f66a48bf764ee85fea56b8e32719e230a0a
x86: Optimize svml_s_tanhf4_core_sse4.S

Optimizations are:
    1. Reduce code size (-112 bytes).
    2. Remove redundant move instructions.
    3. Slightly improve instruction selection/scheduling where
       possible.
    4. Prefer registers which get short instruction encoding.
    5. Reduce rodata size (-4k+ rodata is shared with avx2).

Result is roughly a 15-16% speedup:

       Function, New Time, Old Time, New / Old
 _ZGVbN4v_tanhf,    3.158,    3.749,     0.842
sysdeps/x86_64/fpu/multiarch/svml_s_tanhf4_core_sse4.S