ARMv6: Add fast path for over_n_8888_8888_ca
Benchmark results, "before" is
* upstream/master
4b76bbfda670f9ede67d0449f3640605e1fc4df0
"after" contains the additional patches on top:
+ ARMv6: Support for very variable-hungry composite operations
+ ARMv6: Add fast path for over_n_8888_8888_ca (this patch)
lowlevel-blt-bench, over_n_8888_8888_ca, 100 iterations:
Before After
Mean StdDev Mean StdDev Confidence Change
L1 2.7 0.00 16.1 0.06 100.00% +500.7%
L2 2.4 0.01 14.1 0.15 100.00% +489.9%
M 2.3 0.00 14.3 0.01 100.00% +510.2%
HT 2.2 0.00 9.7 0.03 100.00% +345.0%
VT 2.2 0.00 9.4 0.02 100.00% +333.4%
R 2.2 0.01 9.5 0.03 100.00% +331.6%
RT 1.9 0.01 5.5 0.07 100.00% +192.7%
At most 1 outliers rejected per test per set.
cairo-perf-trace with trimmed traces, 30 iterations:
Before After
Mean StdDev Mean StdDev Confidence Change
t-firefox-talos-gfx.trace 33.1 0.42 25.8 0.44 100.00% +28.6%
t-firefox-scrolling.trace 31.4 0.11 24.8 0.12 100.00% +26.3%
t-gnome-terminal-vim.trace 22.4 0.10 19.9 0.14 100.00% +12.5%
t-evolution.trace 13.9 0.07 13.0 0.05 100.00% +6.5%
t-firefox-planet-gnome.trace 11.6 0.02 10.9 0.02 100.00% +6.5%
t-gvim.trace 34.0 0.21 33.2 0.21 100.00% +2.4%
t-chromium-tabs.trace 4.9 0.02 4.9 0.02 100.00% +1.0%
t-poppler.trace 9.8 0.05 9.8 0.06 100.00% +0.7%
t-firefox-canvas-swscroll.trace 32.3 0.10 32.2 0.09 100.00% +0.4%
t-firefox-paintball.trace 18.1 0.01 18.0 0.01 100.00% +0.3%
t-poppler-reseau.trace 22.5 0.09 22.4 0.11 99.29% +0.3%
t-firefox-canvas.trace 18.1 0.06 18.0 0.05 99.29% +0.2%
t-xfce4-terminal-a1.trace 4.8 0.01 4.8 0.01 99.77% +0.2%
t-firefox-fishbowl.trace 21.2 0.03 21.2 0.04 100.00% +0.2%
t-gnome-system-monitor.trace 17.3 0.03 17.3 0.03 99.54% +0.1%
t-firefox-asteroids.trace 11.1 0.01 11.1 0.01 100.00% +0.1%
t-midori-zoomed.trace 8.0 0.01 8.0 0.01 99.98% +0.1%
t-grads-heat-map.trace 4.4 0.04 4.4 0.04 34.08% +0.1% (insignificant)
t-firefox-talos-svg.trace 20.6 0.03 20.6 0.04 54.06% +0.0% (insignificant)
t-firefox-fishtank.trace 13.2 0.01 13.2 0.01 52.81% -0.0% (insignificant)
t-swfdec-giant-steps.trace 14.9 0.02 14.9 0.03 85.50% -0.1% (insignificant)
t-firefox-chalkboard.trace 36.6 0.02 36.7 0.03 100.00% -0.2%
t-firefox-canvas-alpha.trace 20.7 0.32 20.7 0.22 55.76% -0.3% (insignificant)
t-swfdec-youtube.trace 7.8 0.02 7.8 0.03 100.00% -0.5%
t-firefox-particles.trace 27.4 0.16 27.5 0.18 99.94% -0.6%
At most 4 outliers rejected per test per set.
Cairo perf reports the running time, but the change is computed for
operations per second instead (inverse of running time).
Confidence is based on Welch's t-test. Absolute changes less than 1%
can be accounted as measurement errors, even if statistically
significant.
v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
Use pixman_asm_function instead of startfunc.
Rebased. Re-benchmarked on Raspberry Pi.
Commit message.
v5, Ben Avison <bavison@riscosopen.org> :
Fixed the bug exposed in blitters-test 4928372.
15 hours of testing, compared to the 45 minutes to hit
the bug originally.
Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
Squash the fix, re-benchmark on Raspberry Pi.