Improve performance of combine_over_u
The generic C over_u combiner can be a lot faster with the
addition of special shortcuts for 0xFF and 0x00 alpha/mask
values. This is already implemented in C and SSE2 fast paths.
Profiling the run of cairo-perf-trace benchmarks with PIXMAN_DISABLE
environment variable set to "fast mmx sse2" on Intel Core i7:
=== before ===
37.32% cairo-perf-trac libpixman-1.so.0.29.1 [.] combine_over_u
21.37% cairo-perf-trac libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_no_repeat_8888
13.51% cairo-perf-trac libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_a8r8g8b8
2.96% cairo-perf-trac libpixman-1.so.0.29.1 [.] radial_compute_color
2.74% cairo-perf-trac libpixman-1.so.0.29.1 [.] fetch_scanline_a8
2.71% cairo-perf-trac libpixman-1.so.0.29.1 [.] fetch_scanline_x8r8g8b8
2.17% cairo-perf-trac libpixman-1.so.0.29.1 [.] _pixman_gradient_walker_pixel
1.86% cairo-perf-trac libcairo.so.2.11200.0 [.] _cairo_tor_scan_converter_generate
1.57% cairo-perf-trac libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_pad_a8r8g8b8
0.97% cairo-perf-trac libpixman-1.so.0.29.1 [.] combine_in_reverse_u
0.96% cairo-perf-trac libpixman-1.so.0.29.1 [.] combine_over_ca
=== after ===
28.79% cairo-perf-trac libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_no_repeat_8888
18.44% cairo-perf-trac libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_a8r8g8b8
15.54% cairo-perf-trac libpixman-1.so.0.29.1 [.] combine_over_u
3.94% cairo-perf-trac libpixman-1.so.0.29.1 [.] radial_compute_color
3.69% cairo-perf-trac libpixman-1.so.0.29.1 [.] fetch_scanline_a8
3.69% cairo-perf-trac libpixman-1.so.0.29.1 [.] fetch_scanline_x8r8g8b8
2.94% cairo-perf-trac libpixman-1.so.0.29.1 [.] _pixman_gradient_walker_pixel
2.52% cairo-perf-trac libcairo.so.2.11200.0 [.] _cairo_tor_scan_converter_generate
2.08% cairo-perf-trac libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_pad_a8r8g8b8
1.31% cairo-perf-trac libpixman-1.so.0.29.1 [.] combine_in_reverse_u
1.29% cairo-perf-trac libpixman-1.so.0.29.1 [.] combine_over_ca