sbc: ARM NEON optimized joint stereo processing in SBC encoder
Improves SBC encoding performance when joint stereo is used, which
is a typical A2DP configuration.
Benchmarked on ARM Cortex-A8:
== Before: ==
$ time ./sbcenc -b53 -s8 -j test.au > /dev/null
real 0m5.239s
user 0m4.805s
sys 0m0.430s
samples % image name symbol name
26083 25.0856 sbcenc sbc_pack_frame
21548 20.7240 sbcenc sbc_calc_scalefactors_j
19910 19.1486 sbcenc sbc_analyze_4b_8s_neon
14377 13.8272 sbcenc sbc_calculate_bits
9990 9.6080 sbcenc sbc_enc_process_input_8s_be
8667 8.3356 no-vmlinux /no-vmlinux
2263 2.1765 sbcenc sbc_encode
696 0.6694 libc-2.10.1.so memcpy
== After: ==
$ time ./sbcenc -b53 -s8 -j test.au > /dev/null
real 0m4.389s
user 0m3.969s
sys 0m0.422s
samples % image name symbol name
26234 29.9625 sbcenc sbc_pack_frame
20057 22.9076 sbcenc sbc_analyze_4b_8s_neon
14306 16.3393 sbcenc sbc_calculate_bits
9866 11.2682 sbcenc sbc_enc_process_input_8s_be
8506 9.7149 no-vmlinux /no-vmlinux
5219 5.9608 sbcenc sbc_calc_scalefactors_j_neon
2280 2.6040 sbcenc sbc_encode
661 0.7549 libc-2.10.1.so memcpy