vmx: implement fast path composite_over_8888_8888
Copied impl. from sse2 file and edited to use vmx functions
It was benchmarked against commid id 2be523b from pixman/master
POWER8, 16 cores, 3.4GHz, ppc64le :
reference memcpy speed = 27036.4MB/s (6759.1MP/s for 32bpp fills)
Before After Change
---------------------------------------------
L1 129.47 1054.62 +714.57%
L2 138.31 1011.02 +630.98%
M 139.99 1008.65 +620.52%
HT 122.11 468.45 +283.63%
VT 121.06 532.21 +339.62%
R 108.48 240.5 +121.70%
RT 77.87 116.7 +49.87%
Kops/s 758 981 +29.42%
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>