optimize float complex FFT v1.0.1
authorYang Zhang <yang.zhang@arm.com>
Fri, 30 May 2014 11:36:23 +0000 (19:36 +0800)
committerYang Zhang <yang.zhang@arm.com>
Wed, 4 Jun 2014 06:09:13 +0000 (14:09 +0800)
commitc3bbc6148cbbc7cb9c0f5e94456bfaac4c77976b
tree61d02e6fe8f17cda4f1153c2b057f65c311e8173
parent30a6c3f7c710c394617158dda85925d755999185
optimize float complex FFT

1. To optimize FFT, the algorithm is changed. Bit reversal is removed and radix 8 is added.
2. After test, the optimized FFT show the best performance, so that the old implementations are removed.

The performance result is as follows:

toolchain: gcc 4.8 at -O2
omx fft's execute time is the base. The ratio is less, the performance is better.

panda board A9:
|     |16    |32    |64    |128   |256   |512   |1024  |2048  |4096  |
|Ne10 |84.27%|89.57%|85.63%|85.79%|87.89%|87.91%|83.51%|97.08%|92.68%|
|omx  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |

nexus10 A15:
|     |16    |32    |64    |128   |256   |512   |1024  |2048  |4096  |
|Ne10 |84.88%|98.43%|89.46%|101.0%|99.24%|103.2%|93.80%|105.1%|97.44%|
|omx  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |

Change-Id: I363ee1602f08532e566d3a5a4f3d7a99972a1283
20 files changed:
inc/NE10_dsp.h
modules/CMakeLists.txt
modules/dsp/NE10_cfft.c [deleted file]
modules/dsp/NE10_cfft.neon.s [deleted file]
modules/dsp/NE10_cfft_init.c [deleted file]
modules/dsp/NE10_fft.h
modules/dsp/NE10_fft_float32.c
modules/dsp/NE10_fft_float32.neon.c
modules/dsp/NE10_fft_float32.neon.s
modules/dsp/NE10_init_dsp.c
modules/dsp/NE10_rfft.c [deleted file]
modules/dsp/NE10_rfft.neon.c [deleted file]
modules/dsp/NE10_rfft_init.c [deleted file]
modules/dsp/test/test_main.c
modules/dsp/test/test_suite_cfft.c [deleted file]
modules/dsp/test/test_suite_fft_float32.c
modules/dsp/test/test_suite_fft_int16.c
modules/dsp/test/test_suite_fft_int32.c
modules/dsp/test/test_suite_rfft.c [deleted file]
test/CMakeLists.txt