MIPS: DSPr2: Added more bilinear fast paths (without mask)
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench -b
Referent (before):
src_8888_8888 = L1: 8.18 L2: 7.79 M: 6.32 ( 33.51%) HT: 5.78 VT: 5.70 R: 5.61 RT: 3.79 ( 29Kops/s)
src_8888_0565 = L1: 6.90 L2: 7.14 M: 6.47 ( 25.75%) HT: 5.54 VT: 5.51 R: 5.46 RT: 3.53 ( 28Kops/s)
src_0565_x888 = L1: 3.76 L2: 3.71 M: 3.37 ( 13.41%) HT: 3.26 VT: 3.22 R: 3.20 RT: 2.58 ( 23Kops/s)
src_0565_0565 = L1: 3.59 L2: 3.56 M: 3.47 ( 9.19%) HT: 3.19 VT: 3.18 R: 3.16 RT: 2.46 ( 22Kops/s)
over_8888_8888 = L1: 5.99 L2: 5.66 M: 4.95 ( 26.28%) HT: 4.40 VT: 4.38 R: 4.31 RT: 3.02 ( 26Kops/s)
add_8888_8888 = L1: 6.84 L2: 6.39 M: 5.48 ( 29.09%) HT: 4.80 VT: 4.79 R: 4.70 RT: 3.20 ( 27Kops/s)
Optimized:
src_8888_8888 = L1: 18.27 L2: 16.69 M: 12.87 ( 68.25%) HT: 11.80 VT: 11.61 R: 10.60 RT: 7.05 ( 41Kops/s)
src_8888_0565 = L1: 15.18 L2: 14.10 M: 11.75 ( 46.71%) HT: 10.64 VT: 10.50 R: 10.03 RT: 7.15 ( 41Kops/s)
src_0565_x888 = L1: 10.45 L2: 9.96 M: 9.23 ( 36.72%) HT: 8.39 VT: 8.29 R: 8.02 RT: 5.75 ( 37Kops/s)
src_0565_0565 = L1: 9.37 L2: 8.98 M: 8.50 ( 22.53%) HT: 7.71 VT: 7.66 R: 7.52 RT: 5.59 ( 37Kops/s)
over_8888_8888 = L1: 12.21 L2: 11.01 M: 8.56 ( 45.36%) HT: 7.71 VT: 7.64 R: 7.43 RT: 5.51 ( 36Kops/s)
add_8888_8888 = L1: 17.72 L2: 15.16 M: 10.78 ( 57.13%) HT: 9.46 VT: 9.30 R: 9.00 RT: 6.03 ( 38Kops/s)