optimize float complex FFT
1. To optimize FFT, the algorithm is changed. Bit reversal is removed and radix 8 is added.
2. After test, the optimized FFT show the best performance, so that the old implementations are removed.
The performance result is as follows:
toolchain: gcc 4.8 at -O2
omx fft's execute time is the base. The ratio is less, the performance is better.
panda board A9:
| |16 |32 |64 |128 |256 |512 |1024 |2048 |4096 |
|Ne10 |84.27%|89.57%|85.63%|85.79%|87.89%|87.91%|83.51%|97.08%|92.68%|
|omx |100% |100% |100% |100% |100% |100% |100% |100% |100% |
nexus10 A15:
| |16 |32 |64 |128 |256 |512 |1024 |2048 |4096 |
|Ne10 |84.88%|98.43%|89.46%|101.0%|99.24%|103.2%|93.80%|105.1%|97.44%|
|omx |100% |100% |100% |100% |100% |100% |100% |100% |100% |
Change-Id: I363ee1602f08532e566d3a5a4f3d7a99972a1283
20 files changed: