optimize the following memcpy: sysdeps/i386/i686/multiarch/memcpy-ssse3.S
authorLiubov Dmitrieva <liubov.dmitrieva@gmail.com>
Fri, 30 Mar 2012 20:45:27 +0000 (16:45 -0400)
committerUlrich Drepper <drepper@gmail.com>
Fri, 30 Mar 2012 20:45:27 +0000 (16:45 -0400)
commit4b43400f6a710fa3d931a57eaae4cb332fb60edc
treeb6c7b892ce5c42a2ba042c8a3369476bac077260
parent48c41d04ee06efc6ec97325ed6697c121b40865f
optimize the following memcpy: sysdeps/i386/i686/multiarch/memcpy-ssse3.S

I've improved the following implementation of memcpy:
"sysdeps/i386/i686/multiarch/memcpy-ssse3.S".

The patch includes some minor style fixes, but the important part is
just using prefetch loops for the case:

DATA_CACHE_SIZE_HALF <= len <  SHARED_CACHE_SIZE_HALF and
src and dst pointers have unequal 16 byte alignments.

This gives from 6% - 50% performance boost on the atom machine, about
24,73% in geometric mean.
ChangeLog
sysdeps/i386/i686/multiarch/memcpy-ssse3.S