sse2: faster bilinear scaling (use _mm_loadl_epi64)
authorSiarhei Siamashka <siarhei.siamashka@gmail.com>
Mon, 25 Jun 2012 04:24:27 +0000 (07:24 +0300)
committerSiarhei Siamashka <siarhei.siamashka@gmail.com>
Fri, 29 Jun 2012 00:29:32 +0000 (03:29 +0300)
commitff5d041b88c667141b891909acd3085c3ed54994
tree079436b3d0ef6819ea444cd0ebda8e53abb5307a
parentfc162bad561a516f648daf07e9d22d427fe60e74
sse2: faster bilinear scaling (use _mm_loadl_epi64)

Using _mm_loadl_epi64() to load two pixels at once (pairs of top
and bottom pixels) is faster than loading each pixel separately
and combining them with _mm_set_epi32().

=== cairo-perf-trace ===

before: image             firefox-fishtank   66.912   66.931   0.13%    3/3
after:  image             firefox-fishtank   57.584   58.349   0.74%    3/3

=== lowlevel-blt-bench ===

before: src_8888_8888 =  L1: 181.10  L2: 179.14  M:178.08 ( 11.02%)  HT:153.22  VT:133.45  R:142.24  RT: 95.32
after:  src_8888_8888 =  L1: 228.68  L2: 225.75  M:223.98 ( 14.23%)  HT:185.32  VT:155.06  R:162.73  RT:102.52

This improvement was suggested by Matt Turner on irc.
pixman/pixman-sse2.c