sbc: MMX optimization for scale factors calculation
Improves SBC encoding performance when joint stereo is not used.
Benchmarked on Pentium-M:
== Before: ==
$ time ./sbcenc -b53 -s8 test.au > /dev/null
real 0m1.439s
user 0m1.336s
sys 0m0.104s
samples % image name symbol name
8642 33.7473 sbcenc sbc_pack_frame
5873 22.9342 sbcenc sbc_analyze_4b_8s_mmx
4435 17.3188 sbcenc sbc_calc_scalefactors
4285 16.7331 sbcenc sbc_calculate_bits
1942 7.5836 sbcenc sbc_enc_process_input_8s_be
322 1.2574 sbcenc sbc_encode
== After: ==
$ time ./sbcenc -b53 -s8 test.au > /dev/null
real 0m1.319s
user 0m1.220s
sys 0m0.084s
samples % image name symbol name
8706 37.9959 sbcenc sbc_pack_frame
5740 25.0513 sbcenc sbc_analyze_4b_8s_mmx
4307 18.7972 sbcenc sbc_calculate_bits
1937 8.4537 sbcenc sbc_enc_process_input_8s_be
1801 7.8602 sbcenc sbc_calc_scalefactors_mmx
307 1.3399 sbcenc sbc_encode