vmx: implement fast path iterator vmx_fetch_a8
no changes were observed when running cairo trimmed benchmarks.
Running "lowlevel-blt-bench src_8_8888" on POWER8, 8 cores,
3.4GHz, RHEL 7.1 ppc64le gave the following results:
reference memcpy speed = 25197.2MB/s (6299.3MP/s for 32bpp fills)
Before After Change
--------------------------------------------
L1 965.34 3936 +307.73%
L2 942.99 3436.29 +264.40%
M 902.24 2757.77 +205.66%
HT 448.46 784.99 +75.04%
VT 430.05 819.78 +90.62%
R 412.9 717.04 +73.66%
RT 168.93 220.63 +30.60%
Kops/s 1025 1303 +27.12%
It was benchmarked against commid id e2d211a from pixman/master
Siarhei Siamashka reported that on playstation3, it shows the following
results:
== before ==
src_8_8888 = L1: 194.37 L2: 198.46 M:155.90 (148.35%)
HT: 59.18 VT: 36.71 R: 38.93 RT: 12.79 ( 106Kops/s)
== after ==
src_8_8888 = L1: 373.96 L2: 391.10 M:245.81 (233.88%)
HT: 80.81 VT: 44.33 R: 48.10 RT: 14.79 ( 122Kops/s)
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>