SSE2 optimization for nearest scaled over_8888_n_8888
This operation shows up a little bit in some of the html5 based
games from http://www.kesiev.com/akihabara/
=== Cairo trace of the game intro animation for 'Legend of Sadness' ===
before:
[ 0] image firefox-legend-of-sadness 46.286 46.298 0.01% 5/6
after:
[ 0] image firefox-legend-of-sadness 45.088 45.102 0.04% 6/6
=== Microbenchmark (scaling ~2000x~2000 -> ~2000x~2000) ===
before:
translucent: op=3, src=8888, mask=s dst=8888, speed=131.30 MPix/s
transparent: op=3, src=8888, mask=s dst=8888, speed=132.38 MPix/s
opaque: op=3, src=8888, mask=s dst=8888, speed=167.90 MPix/s
after:
translucent: op=3, src=8888, mask=s dst=8888, speed=301.93 MPix/s
transparent: op=3, src=8888, mask=s dst=8888, speed=770.70 MPix/s
opaque: op=3, src=8888, mask=s dst=8888, speed=301.80 MPix/s