review.tizen.org Git - external/glibc.git/commit

aarch64: thunderx2 memmove performance improvements

The performance improvement is about 20%-30% for
larger cases and about 1%-5% for smaller cases.

Used SIMD load/store instead of GPR for large
overlapping forward moves.

Reused existing memcpy implementation for smaller
or overlapping backward moves.

Fixed the existing memcpy implementation to allow it
to deal with the overlapping case.

Simplified loop tails in the memcpy implementation -
use branchless overlapping sequence of fixed length
load/stores instead of branching depending on the
size.

A cleanup/optimization converting str's to stp's.

Added __memmove_thunderx2 to the list of the
available implementations.

author	Anton Youdkevitch <anton.youdkevitch@bell-sw.com>
	Fri, 3 May 2019 18:01:34 +0000 (11:01 -0700)
committer	Steve Ellcey <sellcey@caviumnetworks.com>
	Fri, 3 May 2019 18:01:34 +0000 (11:01 -0700)
commit	32e902a94e24fc5a00168d0df3301098704c61fb
tree	88bd3588b0e08141855220a21467f230366b42db	tree \| snapshot
parent	ac3da35de5cf113edfd514c2fc8ccbaed4536aaf	commit \| diff

ChangeLog		diff \| blob \| history
sysdeps/aarch64/multiarch/ifunc-impl-list.c		diff \| blob \| history
sysdeps/aarch64/multiarch/memcpy_thunderx2.S		diff \| blob \| history
sysdeps/aarch64/multiarch/memmove.c		diff \| blob \| history