2 Copyright 2003 Richard Curnow, SuperH (UK) Ltd.
4 This file is subject to the terms and conditions of the GNU General Public
5 License. See the file "COPYING" in the main directory of this archive
8 Tight version of mempy for the case of just copying a page.
9 Prefetch strategy empirically optimised against RTL simulations
10 of SH5-101 cut2 eval chip with Cayman board DDR memory.
13 r2 : destination effective address (start of page)
14 r3 : source effective address (start of page)
16 Always copies 4096 bytes.
19 * Currently the prefetch is 4 lines ahead and the alloco is 2 lines ahead.
20 It seems like the prefetch needs to be at at least 4 lines ahead to get
21 the data into the cache in time, and the allocos contend with outstanding
22 prefetches for the same cache set, so it's better to have the numbers
26 .section .text..SHmedia32,"ax"
33 /* Copy 4096 bytes worth of data from r3 to r2.
34 Do prefetches 4 lines ahead.
35 Do alloco 2 lines ahead */
64 /* Minimal code size. The extra branches inside the loop don't cost much
65 because they overlap with the time spent waiting for prefetches to
70 bge/u r2, r6, tr2 ! skip prefetch for last 4 lines
71 ldx.q r2, r22, r63 ! prefetch 4 lines hence
74 bge/u r2, r7, tr3 ! skip alloco for last 2 lines
75 alloco r2, 0x40 ! alloc destination line 2 lines ahead
89 blink tr0, r63 ! return