ARM: Add 'neon_composite_over_n_8888_0565_ca' fast path
authorSøren Sandmann Pedersen <ssp@redhat.com>
Sun, 3 Apr 2011 03:24:48 +0000 (23:24 -0400)
committerSøren Sandmann Pedersen <ssp@redhat.com>
Mon, 18 Apr 2011 20:25:36 +0000 (16:25 -0400)
commite75e6a4ef5c5a8ac8b0e8464f08f83fd2b6e86ed
tree351f4bb3ea92a4f1237ad2f6aca25edc01bfca77
parent1670b952143284f480c39ff087b5694a64eb7db3
ARM: Add 'neon_composite_over_n_8888_0565_ca' fast path

This improves the performance of the firefox-talos-gfx benchmark with
the image16 backend. Benchmark on an 800 MHz ARM Cortex A8:

Before:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]  image16            firefox-talos-gfx  121.773  122.218   0.15%    6/6

After:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]  image16            firefox-talos-gfx   85.247   85.563   0.22%    6/6

V2: Slightly better instruction scheduling based on comments from Taekyun Kim.
V3: Eliminate all stalls from the inner loop. Also based on comments from Taekyun Kim.
pixman/pixman-arm-neon-asm.S
pixman/pixman-arm-neon.c