ARM: optimization for scaled src_0565_0565 with nearest filter
authorSiarhei Siamashka <siarhei.siamashka@nokia.com>
Sun, 3 Oct 2010 22:56:59 +0000 (01:56 +0300)
committerSiarhei Siamashka <siarhei.siamashka@nokia.com>
Wed, 10 Nov 2010 15:26:49 +0000 (17:26 +0200)
commitd8fe87a6262ee661af8fb0d46bab223e4ab3d88e
treee760a5c8aa02f580c1bf6b4069154b922cddb15e
parentb8007d042354fd9bd15711d9921e6f1ebb1c3c22
ARM: optimization for scaled src_0565_0565 with nearest filter

The performance improvement is only in the ballpark of 5% when
compared against C code built with a reasonably good compiler
(gcc 4.5.1). But gcc 4.4 produces approximately 30% slower code
here, so assembly optimization makes sense to avoid dependency
on the compiler quality and/or optimization options.

Benchmark from ARM11:
    == before ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=34.86 MPix/s

    == after ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=36.62 MPix/s

Benchmark from ARM Cortex-A8:
    == before ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=89.55 MPix/s

    == after ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=94.91 MPix/s
pixman/pixman-arm-simd-asm.S
pixman/pixman-arm-simd.c