audiofxbasefirfilter: FFT convolution implementation
This provides a great speedup, especially the relationship between kernel
length and processing size is now logarithmic instead of linear. Below a
kernel size of 32 it's a bit slower, afterwards it's much faster:
17 0.788000 -> 0.950000
33 1.208000 -> 1.146000
65 2.166000 -> 1.146000
...
4097 107.444000 -> 1.508000
For sizes smaller 32 the normal time-domain convolution is chosen,
for larger sizes the FFT convolution is automatically used.
Fixes bug #594381.