x86: Small improvements for wcscpy-ssse3
authorNoah Goldstein <goldstein.w.n@gmail.com>
Fri, 25 Mar 2022 22:13:32 +0000 (17:13 -0500)
committerNoah Goldstein <goldstein.w.n@gmail.com>
Mon, 28 Mar 2022 20:00:03 +0000 (15:00 -0500)
commitf5bff979d02cf115be94c0c0c6f1a1a505964772
treef69fa7deff5bd426f0ef6a839770795ee5ddb315
parent811c635dbae42a0ced67d2bffa8ad68b58d6e44e
x86: Small improvements for wcscpy-ssse3

Just a few small QOL changes.
    1. Prefer `add` > `lea` as it has high execution units it can run
       on.
    2. Don't break macro-fusion between `test` and `jcc`

geometric_mean(N=20) of all benchmarks New / Original: 0.973

All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
sysdeps/x86_64/multiarch/wcscpy-ssse3.S