AArch64: Update A64FX memset not to degrade at 16KB
authorNaohiro Tamura via Libc-alpha <libc-alpha@sourceware.org>
Fri, 27 Aug 2021 05:03:04 +0000 (05:03 +0000)
committerSzabolcs Nagy <szabolcs.nagy@arm.com>
Fri, 3 Sep 2021 14:59:46 +0000 (15:59 +0100)
commit23777232c23f80809613bdfa329f63aadf992922
tree02c8d663d92010187a3176126b7dfb6247540f81
parent69623c0db0a540f26ee537bae09446d3dcdf1f80
AArch64: Update A64FX memset not to degrade at 16KB

This patch updates unroll8 code so as not to degrade at the peak
performance 16KB for both FX1000 and FX700.

Inserted 2 instructions at the beginning of the unroll8 loop,
cmp and branch, are a workaround that is found heuristically.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
sysdeps/aarch64/multiarch/memset_a64fx.S