1 /***************************************************************************
2 * Copyright (c) 2009,2010, Code Aurora Forum. All rights reserved.
4 * Use of this source code is governed by a BSD-style license that can be
5 * found in the LICENSE file.
6 ***************************************************************************/
8 /***************************************************************************
9 Neon memset: Attempts to do a memset with Neon registers if possible,
11 s: The buffer to write to
12 c: The integer data to write to the buffer
16 ***************************************************************************/
29 /* Keep in mind that r2 -- the count argument -- is for the
30 * number of 16-bit items to copy.
36 /* If we have < 8 bytes, just do a quick loop to handle that */
39 memset_smallcopy_loop:
42 bne memset_smallcopy_loop
43 memset_smallcopy_done:
49 * Duplicate the r1 lowest 16-bits across r1. The idea is to have
50 * a register with two 16-bit-values we can copy. We do this by
51 * duplicating lowest 16-bits of r1 to upper 16-bits.
53 orr r1, r1, r1, lsl #16
55 * If we're copying > 64 bytes, then we may want to get
56 * onto a 16-byte boundary to improve speed even more.
63 * Determine the number of bytes to move forward to get to the 16-byte
64 * boundary. Note that this will be a multiple of 4, since we
65 * already are word-aligned.
77 * Decide where to route for the maximum copy sizes. Note that we
78 * build q0 and q1 depending on if we'll need it, so that's
79 * interwoven here as well.
93 vst1.64 {q0, q1}, [r0]!
94 vst1.64 {q0, q1}, [r0]!
95 vst1.64 {q0, q1}, [r0]!
96 vst1.64 {q0, q1}, [r0]!
106 vst1.64 {q0, q1}, [r0]!
120 * memset_8 isn't a loop, since we try to do our loops at 16
121 * bytes and above. We should loop there, then drop down here
122 * to finish the <16-byte versions. Same for memset_4 and