sse2: _mm_madd_epi16 for faster bilinear scaling with 7-bit precision
authorSiarhei Siamashka <siarhei.siamashka@gmail.com>
Mon, 25 Jun 2012 22:47:18 +0000 (01:47 +0300)
committerSiarhei Siamashka <siarhei.siamashka@gmail.com>
Sun, 1 Jul 2012 19:40:23 +0000 (22:40 +0300)
commitc430b1dba7bfea0031227dd4b976da3dd7c4ac02
treeeae0dca3003a1602b75db369e6e1e1301a4c0385
parentccd31896bc2f1f323b3be9e8b1447cab892ee62d
sse2: _mm_madd_epi16 for faster bilinear scaling with 7-bit precision

Reducing interpolation precision allows the use of PMADDWD instruction.
This makes bilinear scaling much faster (on Intel Core i7):

8-bit: image             firefox-fishtank   57.584   58.349   0.74%    3/3
7-bit: image             firefox-fishtank   51.139   51.229   0.30%    3/3

8-bit: src_8888_8888 =  L1: 228.71  L2: 226.52  M:224.82 ( 14.95%)  HT:183.22  VT:154.02  R:171.72  RT:109.36
7-bit: src_8888_8888 =  L1: 320.45  L2: 317.43  M:314.38 ( 20.77%)  HT:215.13  VT:177.35  R:204.46  RT:121.93
pixman/pixman-sse2.c