MIPS: DSPr2: Added several nearest neighbor fast paths with a8 mask:
Performance numbers before/after on MIPS-74kc @ 1GHz:
lowlevel-blt-bench -n
Referent (before):
over_8888_8_0565 = L1: 9.62 L2: 8.85 M: 7.40 ( 39.27%) HT: 5.67 VT: 5.61 R: 5.45 RT: 2.98 ( 22Kops/s)
over_0565_8_0565 = L1: 7.90 L2: 7.49 M: 6.72 ( 26.75%) HT: 5.24 VT: 5.20 R: 5.06 RT: 2.90 ( 22Kops/s)
Optimized:
over_8888_8_0565 = L1: 18.51 L2: 16.82 M: 12.13 ( 64.43%) HT: 10.06 VT: 9.88 R: 9.54 RT: 5.63 ( 31Kops/s)
over_0565_8_0565 = L1: 14.82 L2: 13.94 M: 11.34 ( 45.20%) HT: 9.45 VT: 9.35 R: 9.03 RT: 5.50 ( 31Kops/s)