vmx: implement fast path composite_add_8_8
Copied impl. from sse2 file and edited to use vmx functions
It was benchmarked against commid id 2be523b from pixman/master
POWER8, 16 cores, 3.4GHz, ppc64le :
reference memcpy speed = 27036.4MB/s (6759.1MP/s for 32bpp fills)
Before After Change
---------------------------------------------
L1 687.63 9140.84 +1229.33%
L2 715 7495.78 +948.36%
M 717.39 8460.14 +1079.29%
HT 569.56 1020.12 +79.11%
VT 520.3 1215.56 +133.63%
R 514.81 874.35 +69.84%
RT 341.28 305.42 -10.51%
Kops/s 1621 1579 -2.59%
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>