x86-64: Optimize strrchr/wcsrchr with AVX2
authorH.J. Lu <hjl.tools@gmail.com>
Fri, 9 Jun 2017 12:45:43 +0000 (05:45 -0700)
committerH.J. Lu <hjl.tools@gmail.com>
Fri, 9 Jun 2017 12:45:52 +0000 (05:45 -0700)
commitd2538b91568af2a63c9d8649ce11756d4dfbdac3
tree22cc8602e6ab159f296651224be8a6c3460f2581
parent5ac7aa1d7cce8580f8225c33c819991abca102b9
x86-64: Optimize strrchr/wcsrchr with AVX2

Optimize strrchr/wcsrchr with AVX2 to check 32 bytes with vector
instructions.  It is as fast as SSE2 version for small data sizes
and up to 1X faster for large data sizes on Haswell.  Select AVX2
version on AVX2 machines where vzeroupper is preferred and AVX
unaligned load is fast.

* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
strrchr-sse2, strrchr-avx2, wcsrchr-sse2 and wcsrchr-avx2.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add tests for __strrchr_avx2,
__strrchr_sse2, __wcsrchr_avx2 and __wcsrchr_sse2.
* sysdeps/x86_64/multiarch/strrchr-avx2.S: New file.
* sysdeps/x86_64/multiarch/strrchr-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/strrchr.c: Likewise.
* sysdeps/x86_64/multiarch/wcsrchr-avx2.S: Likewise.
* sysdeps/x86_64/multiarch/wcsrchr-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/wcsrchr.c: Likewise.
ChangeLog
sysdeps/x86_64/multiarch/Makefile
sysdeps/x86_64/multiarch/ifunc-impl-list.c
sysdeps/x86_64/multiarch/strrchr-avx2.S [new file with mode: 0644]
sysdeps/x86_64/multiarch/strrchr-sse2.S [new file with mode: 0644]
sysdeps/x86_64/multiarch/strrchr.c [new file with mode: 0644]
sysdeps/x86_64/multiarch/wcsrchr-avx2.S [new file with mode: 0644]
sysdeps/x86_64/multiarch/wcsrchr-sse2.S [new file with mode: 0644]
sysdeps/x86_64/multiarch/wcsrchr.c [new file with mode: 0644]