ARM: use prefetch in nearest scaled 'src_0565_0565'
Benchmark on ARM Cortex-A8 r1p3 @500MHz, 32-bit LPDDR @166MHz:
Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
before: op=1, src=
10020565, dst=
10020565, speed=75.02 MPix/s
after: op=1, src=
10020565, dst=
10020565, speed=73.63 MPix/s
Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
before: op=1, src=
10020565, dst=
10020565, speed=176.12 MPix/s
after: op=1, src=
10020565, dst=
10020565, speed=267.50 MPix/s