Phil.Wang [Tue, 21 Jul 2015 01:14:28 +0000 (09:14 +0800)]
Enable NEON optimized image rotate for ARM v7-A
NEON opimized image rotate was disabled by accident. This patch
enables it again. Fix issue #115.
Change-Id: I4aa977de8534557d98a707cc9504aac94805d571
Phil Wang [Wed, 1 Jul 2015 12:27:15 +0000 (13:27 +0100)]
Improve data layout in RFFT float32
Transpose part of twiddles in RFFT float32 to avoid memory access by
a large stride.
Change-Id: I5e05c5baed523183ed3948371e6b1fbffc916e9b
Phil Wang [Fri, 10 Jul 2015 11:43:27 +0000 (12:43 +0100)]
Enable neon code on ARM32 for physics and dsp
Change-Id: Id7159b952cdd317c25ad97cc7388244407b75da9
Phil Wang [Thu, 9 Jul 2015 15:20:37 +0000 (16:20 +0100)]
main fuction in tests return 0 if all tests pass
Change-Id: I2bb30f4107174f043809828981484d61e903c30d
Phil Wang [Tue, 23 Jun 2015 16:56:23 +0000 (17:56 +0100)]
Fix test_suite_fft_int32.c fails under armv7
For 4-point and 8-point FFTs, we fall back to VFP because
they are too short for NEON optimizations.
Change-Id: I72cc37e73d9a62459a0ee9d85b36144aad43606f
Yang Zhang [Tue, 23 Jun 2015 07:47:39 +0000 (15:47 +0800)]
Limit the range of input to avoid the failed tests in math module
Change-Id: If2001c421a8d879a375e41304b607be75790278b
Phil Wang [Fri, 12 Jun 2015 09:26:56 +0000 (10:26 +0100)]
fix NE10_INLINE_ASM_OPT for RFFT
Change-Id: I43a2480a42a6166bff6a72afab022b232d085fea
Phil Wang [Tue, 26 May 2015 17:45:17 +0000 (18:45 +0100)]
Bugfix for #111 and #112
- Link stdc++ for static build
- Switch to intrinsic on iOS
Change-Id: I42e741058241f2ea133d98ba67c7a747d3712bec
Phil.Wang [Tue, 12 May 2015 08:22:32 +0000 (16:22 +0800)]
Bug fix: prevent overflow by moving scaling in front of butterfly
Change-Id: I02353d982f0af0920751ced276232f9552b5a675
Phil.Wang [Tue, 12 May 2015 08:20:04 +0000 (16:20 +0800)]
Bug fix: Call appropriate NEON butterfly functions in
NE10_fft_int32.neon.c
Change-Id: I25040b18cc73bb3171e1be5ee6429f7dd6fc073d
Phil Wang [Tue, 7 Apr 2015 08:18:39 +0000 (16:18 +0800)]
Merge pull request #108 from viswanath-puttagunta/rfcv1_rc3_armv8
aarch64: Enable build for aarch64 GNU Linux
Phil.Wang [Fri, 3 Apr 2015 02:28:58 +0000 (10:28 +0800)]
DSP: Fix bug in scaled non-power-of-2 fixed-point FFT
Without this patch, the non-power-of-2 fixed-point FFT does scale
the output even when the scaled_flag is set to 1.
Change-Id: I0e373f1d2ff110b64905902acf7529eafc2cba19
Viswanath Puttagunta [Wed, 18 Mar 2015 15:29:17 +0000 (10:29 -0500)]
aarch64: Enable build for aarch64 GNU Linux
Enables build configuration to be able to build Ne10
for aarch64 GNU Linux.
Eg: build instructions:
mkdir build && cd build
export NE10_LINUX_TARGET_ARCH=aarch64
cmake -DCMAKE_TOOLCHAIN_FILE=../GNUlinux_config.cmake ..
make -j6
Note: By default, NE10_LINUX_TARGET_ARCH will be set to
"armv7"
Also adds -funsafe-math-optimizations for ARMv7 GNU targets
Signed-off-by: Viswanath Puttagunta <viswanath.puttagunta@linaro.org>
Phil.Wang [Wed, 25 Mar 2015 09:35:05 +0000 (17:35 +0800)]
Provide NE10_UNROLL_LEVEL to control FFT algorithm
* Macro NE10_UNROLL_LEVEL should be one of following:
0: use less registers, default value on AArch32
1: use more registers, default value on AArch64
* Fix typos in doc/BuildingNe10.txt and CMakeLists.txt
* Update doc/BuildingNe10.txt for environment varialbe
NE10_(LINUX/ANDROID/IOS)_TARGET_ARCH
Change-Id: I991d11eedf3553d542406a25f89bd1157fe5c5ff
Phil.Wang [Thu, 19 Mar 2015 08:26:57 +0000 (16:26 +0800)]
DSP: Fine tune fixed-point non-power-of-2 CFFT for GCC 4.9.0
Now Ne10 provides inline assembly for fixed-point non-power-of-2
Complex FFT, on AArch64. For GCC 4.9.0, user can define
NE10_INLINE_ASM_OPT to enable this optimization.
Below is performance data with or without this
optimization.
Cortex-A53 AArch64 mode (1.69GHz)
GCC 4.9.0, with -O2
Android-21, AArch64
| C2C FFT Time Cost|
| in ms|
|size| Ne10|
| |Without| With|
| 60| 4.67| 2.91|
| 120| 5.79| 3.37|
| 240| 5.57| 3.30|
| 480| 6.76| 3.87|
| 970| 6.89| 4.00|
Change-Id: I4f74217b026d8ef6ab6af4e1fb178ce4f1398b50
Phil.Wang [Thu, 19 Mar 2015 06:37:53 +0000 (14:37 +0800)]
DSP: Fine tune floating-point non-power-of-2 CFFT for GCC 4.9.0
Now Ne10 provides inline assembly for floating-point non-power-of-2
Complex FFT, on AArch64. For GCC 4.9.0, user can define
NE10_INLINE_ASM_OPT to enable this optimization.
Below is performance data with or without this optimization.
Cortex-A53 AArch64 mode (1.69GHz)
GCC 4.9.0, with -O2
Android-21, AArch64
| C2C FFT Time Cost in ms|
|size| Ne10|pffft|pffft/Ne10|
| |Without| With| | With|
| 60| 4.74| 3.56| NA| NA|
| 120| 6.09| 3.57| NA| NA|
| 240| 6.00| 3.52| NA| NA|
| 480| 7.02| 3.96| 4.48| 113%|
| 960| 7.05| 4.03| 4.50| 112%|
Change-Id: Ib241141c4ef962b665ef47f70abfcf4515fe80ba
Phil.Wang [Fri, 13 Mar 2015 10:10:26 +0000 (18:10 +0800)]
Tuning floating point RFFT for GCC 4.9.0
Cortex-A53 (1.69GHz)
GCC 4.9.0, with -O2
Android-L, AArch64
| R2C FFT Time Cost in ms|
|size|Ne10|pffft|pffft/Ne10|
| 32| 118| 254| 215%|
| 64| 126| 198| 157%|
| 128| 109| 177| 162%|
| 256| 126| 154| 122%|
| 512| 122| 165| 135%|
|1024| 143| 162| 113%|
|2048| 153| 188| 123%|
The larger the last column is, the faster Ne10 is.
Change-Id: I8921fc83afb8c7307ffd0fcb2a4bb1a88b349339
Phil.Wang [Thu, 26 Feb 2015 06:30:12 +0000 (14:30 +0800)]
Update doc/BuildingNe10.txt for Linux
Now, users need to set NE10_LINUX_TARGET_ARCH for native compiling Ne10
on Linux.
Change-Id: I56ce3187f39b07ea5408051d606be7d8dd62f9af
Phil.Wang [Wed, 25 Feb 2015 09:38:31 +0000 (17:38 +0800)]
Remove dependency on libstdc++
Set linker language to C.
libstdc++ is no longer needed when a 3rd party library is linked to Ne10.
Change-Id: Iaa7cfcd069044c896360a35f04e9d2abb5b6b6eb
Phil.Wang [Wed, 25 Feb 2015 08:01:34 +0000 (16:01 +0800)]
Fix function declaration in NE10_init.h
Specify type(s) of function ne10_init and function ne10_HasNEON in
declarations so that they are prototypes as well.
Change-Id: Ie445d5bd7b04f8bf56718dd87435b16b7f554aac
Phil.Wang [Thu, 18 Dec 2014 11:16:22 +0000 (19:16 +0800)]
Fix error in building for iOS #99
Change-Id: Iec2b8dda82c2a8c3b189988ddef15bcc3df36070
Zhongwei Yao [Tue, 17 Feb 2015 06:57:23 +0000 (14:57 +0800)]
Enable building for ARMv7 under GNU Linux. Fix issue #97.
Change-Id: Ib8f2bc4b1d5d3b7c6b06648a4699f76e9d243332
Phil.Wang [Mon, 2 Feb 2015 04:25:02 +0000 (12:25 +0800)]
Add flags is_forward/backward_scaled
Only complex non-power-of-2 floating point FFT is affected by these
flag. Ne10 used to scale output of backward FFT, but not scale output of
forward FFT. Now Ne10 will scale output of forward FFT if is_forward_scaled
is set to anything but zero. It it possible to disable scaling output of
backward FFT by setting is_backward_scaled to zero.
Change-Id: I947b1896af46dff16ff725136868028f54d5dad8
Phil.Wang [Mon, 2 Feb 2015 04:46:32 +0000 (12:46 +0800)]
Add destroy functions for FFT.
Using following functions instead of NE10_FREE to make sure memory
allocated by Ne10 is also freed by Ne10:
- Float/Fixed point Complex FFT Destroy functions:
- ne10_fft_destroy_c2c_float32
- ne10_fft_destroy_c2c_int32
- ne10_fft_destroy_c2c_int16
- Float/Fixed point Real2Complex FFT Destroy functions:
- ne10_fft_destroy_r2c_float32
- ne10_fft_destroy_r2c_int32
- ne10_fft_destroy_r2c_int16
Change-Id: Ia2eacb5faa8501cf3a8d7705ef732db400fc7013
Phil.Wang [Sun, 1 Feb 2015 09:45:58 +0000 (17:45 +0800)]
Update API for fixed-point non-power-of-2 FFT
original:
ne10_fft_cfg_int32_t ne10_fft_alloc_c2c_int32 (ne10_int32_t nfft);
now:
ne10_fft_cfg_int32_t (*ne10_fft_alloc_c2c_int32) (ne10_int32_t nfft);
ne10_fft_cfg_int32_t ne10_fft_alloc_c2c_int32_c (ne10_int32_t nfft);
ne10_fft_cfg_int32_t ne10_fft_alloc_c2c_int32_neon (ne10_int32_t nfft);
Use _c version for ne10_fft_c2c_1d_int32_c, and use _neon version for
ne10_fft_c2c_1d_int32_neon.
ne10_fft_alloc_c2c_int32 becomes a functon pointer now. Function
ne10_init_dsp will set it pointing to the right function according to
runtime condition.
Test suite is updated accordingly.
Change-Id: I15cbfe75a29995696335c9f6939e03cd2d5fe57a
Phil.Wang [Fri, 23 Jan 2015 07:19:44 +0000 (15:19 +0800)]
Enable Fixed-Point Non-power-of-2 FFT.
For Cortex-A53 (AArch64)
LLVM 3.5, -O2
Time: in ms
SNR : in dB
| |Forward |Backward |
| |Size| Time|SNR| Time|SNR|
| C| 8| 26| 92| 26| 93|
| C| 16| 51| 90| 51| 90|
| C| 32| 130| 89| 132| 91|
| C| 60| 452| 81| 469| 83|
| C| 64| 304| 88| 305| 88|
| C| 120| 1070| 82| 1149| 82|
| C| 128| 727| 88| 735| 89|
| C| 240| 2197| 81| 2312| 82|
| C| 256| 1659| 88| 1659| 88|
| C| 480| 5127| 82| 5520| 82|
| C| 512| 3819| 88| 3855| 88|
| C| 900|11621| 80|12190| 81|
| C| 960|10640| 82|11246| 82|
|NEON| 8| 10| 93| 10| 95|
|NEON| 16| 18| 97| 18| 97|
|NEON| 32| 54| 88| 55| 89|
|NEON| 60| 163| 88| 169| 88|
|NEON| 64| 133| 85| 133| 86|
|NEON| 120| 346| 88| 358| 90|
|NEON| 128| 263| 87| 264| 87|
|NEON| 240| 668| 89| 704| 88|
|NEON| 256| 635| 85| 635| 85|
|NEON| 480| 1526| 89| 1595| 89|
|NEON| 512| 1300| 86| 1299| 87|
|NEON| 900| 3207| 88| 3372| 89|
|NEON| 960| 3107| 89| 3394| 89|
Change-Id: I256d1e4eff40ff20e19fe941f3222a3f7d2944f6
Phil.Wang [Wed, 21 Jan 2015 06:20:48 +0000 (14:20 +0800)]
DSP: Generic Fixed-Point complex FFT is enabled.
Currently, only C verion is available.
Pass conformance test. Since there is no NEON optimized Fixed-Point FFT,
performance test is runned but not compared.
For A53, with GCC
Complex Fixed-Point FFT
Time in ms, SNR in dB
|Direction| |nfft|time|SNR|
| forward| C| 16| 66| 82|
| forward| C| 32| 108| 82|
| forward| C| 60| 366| 80|
| forward| C| 64| 360| 82|
| forward| C| 120| 876| 80|
| forward| C| 128| 641| 82|
| forward| C| 240|1715| 80|
| forward| C| 256|1876| 82|
| forward| C| 480|4075| 79|
| forward| C| 512|3484| 82|
| forward| C| 900|9266| 79|
| forward| C| 960|8224| 79|
| backward| C| 16| 65| 82|
| backward| C| 32| 106| 82|
| backward| C| 60| 361| 80|
| backward| C| 64| 360| 82|
| backward| C| 120| 874| 80|
| backward| C| 128| 639| 83|
| backward| C| 240|1731| 79|
| backward| C| 256|1892| 82|
| backward| C| 480|4113| 79|
| backward| C| 512|3509| 82|
| backward| C| 900|9285| 79|
| backward| C| 960|8384| 79|
Change-Id: Icb4575a22c51e2f684a1ebbe0464a782e912f769
Phil.Wang [Wed, 21 Jan 2015 10:35:04 +0000 (18:35 +0800)]
Disable full path in doxygen.cfg
Change-Id: I7c4f19c8055f6b16d2680cc8d8bbdc149b330860
Zhongwei Yao [Tue, 20 Jan 2015 10:41:25 +0000 (18:41 +0800)]
Extend copyright year to 2015.
Change-Id: I77b17e0c07713d03e2c74300b9427e0c15d0f963
Zhongwei Yao [Tue, 13 Jan 2015 11:50:58 +0000 (19:50 +0800)]
Disable following functions which have not been optimized for aarch64.
Following functions have been disabled for aarch64 target:
-. 6 dsp functions:
ne10_fir_float_neon
ne10_fir_decimate_float_neon
ne10_fir_interpolate_float_neon
ne10_fir_lattice_float_neon
ne10_fir_sparse_float_neon
ne10_iir_lattice_float_neon
-. 1 imgproc function:
ne10_img_rotate_rgba_neon
-. 3 physics functions:
ne10_physics_compute_aabb_vec2f_neon
ne10_physics_relative_v_vec2f_neon
ne10_physics_apply_impulse_vec2f_neon
-. all functions in math module
Smoke test has been passed under both aarch64 and armv7.
Change-Id: Ied135a4ac83260564081180f4bfef6eda5042aa0
Zhongwei Yao [Tue, 13 Jan 2015 09:16:35 +0000 (17:16 +0800)]
Enable aarch64 in cmake build script for android platform.
Change-Id: Iae0fa7b89f8ed8ee6cc932055c0602e3191e85b6
Haruki Hasegawa [Sun, 9 Nov 2014 07:57:45 +0000 (16:57 +0900)]
Fix wrong definition of DIV_TW81 and DIV_TW81N values
Phil.Wang [Wed, 24 Dec 2014 08:55:57 +0000 (16:55 +0800)]
DSP: remove unit tests in RFFT
Change-Id: If535d81f1dbef710d6f7ac807e0313408345ac4a
Phil.Wang [Wed, 24 Dec 2014 08:17:30 +0000 (16:17 +0800)]
Change include Syntax
Change-Id: I2872d8bba91beaad26d2b2b3aa56e63b10186083
Phil.Wang [Wed, 24 Dec 2014 09:21:47 +0000 (17:21 +0800)]
FFT: Fix false assert at the last loop
Change-Id: I9954b32f4dafd4eed63db98a24cd47cabe9c1b3d
Phil.Wang [Tue, 23 Dec 2014 09:20:57 +0000 (17:20 +0800)]
DSP/FFT: declare ne10_radix8_r2c(c2r)_c
Change-Id: Ia01574388bed413954d66a6927c8d3d969f3d329
Zhongwei Yao [Thu, 18 Dec 2014 08:19:55 +0000 (16:19 +0800)]
Update the copyright and license.
Change-Id: I224478247aa7201fb8bf150dfd14e7b92b2e5141
Phil.Wang [Wed, 17 Dec 2014 03:52:11 +0000 (11:52 +0800)]
NE10/FFT/complex-non-power-of-2 NEON
ARM 64-bit (Cortex-A57)
complex forward float LLVM 3.5
Time in ms |
|kiss|opus|pffft|NE10|
| C| C| NEON|NEON|
60| 129| 113| NA| 44|
120| 148| 127| NA| 49|
240| 151| 128| 55| 47|
480| 169| 142| 60| 55|
960| 183| 149| 65| 58|
1920| 193| 167| 71| 66|
3840| 217| 175| 76| 71|
SNR > 100dB
ARM 64-bit (Cortex-A53)
complex forward float LLVM 3.5
Time in ms |
|kiss|opus|pffft|NE10|
| C| C| NEON|NEON|
60| 295| 311| NA| 72|
120| 368| 375| NA| 79|
240| 345| 342| 104| 77|
480| 415| 407| 115| 87|
960| 406| 378| 121| 95|
1920| 476| 441| 138| 113|
3840| 497| 424| 161| 126|
SNR > 100dB
ARM 32-bit (Cortex-A9)
complex forward float LLVM 3.5
Time in ms |
|kiss|opus|pffft|NE10|
| C| C| NEON|NEON|
60| 224| 211| NA| 98|
120| 265| 245| NA| 104|
240| 262| 240| 130| 106|
480| 302| 274| 150| 122|
960| 305| 271| 162| 153|
1920| 369| 356| 230| 206|
3840| 415| 440| 282| 239|
SNR > 100dB
Change-Id: If9418041b01eed49dbdc8d6a18dd03f2c5684da8
Phil.Wang [Wed, 17 Dec 2014 03:42:40 +0000 (11:42 +0800)]
NE10/FFT/backward-complex-non-power-of-2 C
ARM 32-bit (Cortex-A9)
complex backward unscaled float GCC 4.9
Time in ms |
|kiss|opus|pffft|NE10|
| C| C| NEON| C|
60| 195| 172| NA| 144|
120| 234| 200| NA| 173|
240| 231| 203| 148| 175|
480| 267| 231| 176| 215|
ARM 64-bit (Cortex-A57)
complex backward unscaled float GCC 4.9
Time in ms |
|kiss|opus|pffft|NE10|
| C| C| NEON| C|
60| 125| 89| NA| 87|
120| 141| 104| NA| 103|
240| 146| 106| 52| 109|
480| 163| 120| 58| 127|
SNR > 100dB for all 2^M*3^N*5^K
Change-Id: Ie4bb27d053213bfbf2dbdd0020f9fda5db4312f9
Phil.Wang [Mon, 17 Nov 2014 08:57:44 +0000 (16:57 +0800)]
NE10/FFT/forward-complex-non-power-of-2 C
ARM 32-bit (Cortex-A9)
complex forward float GCC 4.9
Time in ms |
|kiss|opus|pffft|NE10|
| C| C| NEON| C|
60| 213| 200| NA| 131|
120| 252| 225| NA| 164|
240| 245| 245| 163| 171|
480| 268| 262| 187| 195|
SNR > 100dB, for all following cases
|size|time|
NE10| 16| 69|
NE10| 32| 80|
NE10| 60| 131|
NE10| 64| 106|
NE10| 120| 164|
NE10| 128| 110|
NE10| 240| 171|
NE10| 256| 148|
NE10| 480| 195|
NE10| 512| 144|
NE10|1024| 214|
ARM 64-bit (Cortex-A57)
complex forward float LLVM 3.5
Time in ms |
|kiss|opus|pffft|NE10|
| C| C| NEON| C|
60| 131| 111| NA| 95|
120| 154| 121| NA| 121|
240| 153| 129| 55| 121|
480| 168| 144| 61| 146|
SNR > 100dB, for all following cases
|size|time|
NE10| 16| 28|
NE10| 32| 31|
NE10| 60| 95|
NE10| 64| 43|
NE10| 120| 121|
NE10| 128| 46|
NE10| 240| 121|
NE10| 256| 56|
NE10| 480| 146|
NE10| 512| 61|
NE10|1024| 73|
Change-Id: I9fab61e9e47279d848815699e928ab4abead5635
Phil.Wang [Thu, 13 Nov 2014 06:33:09 +0000 (14:33 +0800)]
NE10/DSP/FFT formating
Change-Id: Ie96cc6945844d0eb23b04027cce7fb03c7163f1d
Matthew DuPuy [Sun, 9 Nov 2014 17:03:15 +0000 (09:03 -0800)]
Contributing
Also see: https://github.com/projectNe10/Ne10/wiki/Contribution-of-code-F.A.Q.
Matthew DuPuy [Sun, 9 Nov 2014 16:49:15 +0000 (08:49 -0800)]
3 clause BSD only
Matthew DuPuy [Sun, 9 Nov 2014 16:43:01 +0000 (08:43 -0800)]
Created project contribution instructions
Anyone wishing to submit a pull request must sign a contribution license agreement (CLA) with ARM first.
Matthew DuPuy [Sun, 9 Nov 2014 16:40:01 +0000 (08:40 -0800)]
Update email contacts to match landing page
Kévin PETIT [Mon, 29 Sep 2014 12:34:36 +0000 (13:34 +0100)]
Add line wrapping to the documentation files
Change-Id: I81ca1c1980d250b2df020c7b584f91cc0595fdc2
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
Phil.Wang [Mon, 22 Sep 2014 04:26:46 +0000 (12:26 +0800)]
NE10/DSP/RFFT: optimise RFFT for armv8
intrinsic, LLVM 3.5, -O2
on a57, juno, android
size| time in ms | boost |
| NE10 | pffft | pffft/NE10|
|R2C|C2R|R2C|C2R*| R2C| C2R|
32|145|185|279| 319|1.92x|1.72x|
64|175|200|239| 279|1.36x|1.39x|
128|166|185|237| 262|1.42x|1.41x|
256|197|208|232| 256|1.17x|1.23x|
512|208|216|254| 270|1.22x|1.25x|
1024|241|244|260| 278|1.07x|1.14x|
2048|258|263|332| 322|1.28x|1.22x|
4096|303|304|388| 353|1.28x|1.16x|
8192|339|334|424| 426|1.25x|1.27x|
intrinsic, GCC 4.9, -O2
on a57, juno, android
size| time in ms | boost |
| NE10 | pffft | pffft/NE10|
|R2C|C2R|R2C|C2R*| R2C| C2R|
32|174|181|328| 410|1.88x|2.26x|
64|214|216|270| 338|1.26x|1.56x|
128|210|197|259| 310|1.23x|1.57x|
256|232|223|243| 283|1.04x|1.26x|
512|250|222|263| 307|1.04x|1.38x|
1024|274|251|272| 304|1.00x|1.20x|
2048|288|277|314| 353|1.08x|1.27x|
4096|333|303|349| 379|1.04x|1.25x|
8192|370|342|424| 452|1.14x|1.31x|
* Ne10 supports scale of output for backward RFFT,
while pffft doesn't. To normalize the benchmark,
a scale operation was added to the end of each
call to pffft.
* pffft C2R FFT costs 410ms when size==32, 338ms when
size==64, this is because the former loops more times
than the latter does, so it does not mean pffft cost
more time for short input.
intrinsic, GCC 4.9, -O2
on a53, juno, android
size| time in ms | boost |
| NE10 | pffft | pffft/NE10|
| R2C| R2C| R2C|
32| 347| 607| 1.74x|
64| 389| 489| 1.25x|
128| 334| 484| 1.44x|
256| 401| 456| 1.13x|
512| 380| 502| 1.32x|
1024| 460| 512| 1.11x|
2048| 481| 593| 1.23x|
4096| 605| 709| 1.17x|
8192| 704| 891| 1.26x|
Change-Id: Ide0b974620ae8d06cfa862769004b2110abaaeff
Yang Zhang [Tue, 22 Jul 2014 09:34:15 +0000 (17:34 +0800)]
add v8 assembly for float fft.
This assembly file is suitable for gcc only. If you want to use llvm, please use the files with suffix .neonintrinsic.c
Change-Id: I27528b2ebbf079db0e4f1c6ecf7828baa0fbaf0a
Yang Zhang [Thu, 3 Jul 2014 03:29:03 +0000 (11:29 +0800)]
fix the issues
- update NE10_FREE macro define
when a pointer is freed, the NULL should be assigned.
- remove needless backslash
- modify the data type of address for ARM v8
- modify IFFT scaling for int32/int16
Change-Id: I62dc5803ba106e13fb9c91dba6ac3099f3fb5737
Zhou (Joe) Yu [Wed, 2 Jul 2014 10:22:40 +0000 (11:22 +0100)]
Merge "make sure the address of buffer 64-bit alignment"
Yang Zhang [Wed, 2 Jul 2014 06:41:57 +0000 (14:41 +0800)]
make sure the address of buffer 64-bit alignment
For FFT NEON implementation, the 64-bit alignment address of input/output/twiddle can improve the
speed of data load/store. If the address isn't 64-bit alignment, there will be BUS error.
Change-Id: I201307de980eef544025bcb498b0093a272e2936
Yang Zhang [Tue, 1 Jul 2014 04:39:23 +0000 (12:39 +0800)]
fix bug in fir
When input length isn't multiple of 4, the filter output result is wrong. This patch is to fix this issue.
Change-Id: I212d86fef3beb9aaeb3292d98719665ba521daee
Zhou (Joe) Yu [Tue, 1 Jul 2014 06:04:39 +0000 (07:04 +0100)]
Merge "fix bug in fir"
Yang Zhang [Tue, 1 Jul 2014 04:39:23 +0000 (12:39 +0800)]
fix bug in fir
When input length isn't multiple of 4, the filter output result is wrong. This patch is to fix this issue.
Change-Id: I212d86fef3beb9aaeb3292d98719665ba521daee
Yang Zhang [Mon, 30 Jun 2014 07:45:47 +0000 (15:45 +0800)]
add temp buffer allocation and scaling by 2 for rfft
- add temp buffer allocation in init function
- add scaling by 2 for C, NEON assembly and intrinsic version
Change-Id: I7e46f327f43664e06700089f4d38f0d868d44f3e
Yang Zhang [Thu, 19 Jun 2014 09:21:01 +0000 (17:21 +0800)]
update the FFT implementation
- add scaling by nfft in IFFT
- add temp buffer to protect the source data
- change the interface for passing temp buffer
- add intrinsic version of FFT
- indent the code
Change-Id: I35f46e60bb88070127eb59281ddbd3a72f6b8e7d
Matthew DuPuy [Wed, 18 Jun 2014 05:03:46 +0000 (22:03 -0700)]
ignore *.so and *.prefs
Matthew DuPuy [Wed, 18 Jun 2014 04:58:26 +0000 (21:58 -0700)]
Minor semantic update to demo
Matthew DuPuy [Wed, 18 Jun 2014 03:31:30 +0000 (20:31 -0700)]
cfft and rfft test modules removed
NE10_TEST_DSP could no longer build with cfft and rfft test modules
removed.
Yang Zhang [Fri, 13 Jun 2014 06:59:51 +0000 (14:59 +0800)]
optimize int32/int16 complex FFT
The performance result is as follows:
toolchain: gcc 4.8 at -O2
omx fft's execute time is the base. The ratio is less, the performance is better.
int32 FFT
A9:
| |16 |32 |64 |128 |256 |512 |1024 |2048 |4096 |
|Ne10 |73.24%|99.95%|95.78%|96.04%|97.97%|97.57%|99.51%|97.87%|98.12%|
|omx |100% |100% |100% |100% |100% |100% |100% |100% |100% |
A15:
| |16 |32 |64 |128 |256 |512 |1024 |2048 |4096 |
|Ne10 |84.89%|98.62%|89.33%|100.7%|99.28%|103.9%|101.7%|105.1%|96.67%|
|omx |100% |100% |100% |100% |100% |100% |100% |100% |100% |
int16 FFT
A9:
| |16 |32 |64 |128 |256 |512 |1024 |2048 |4096 |
|Ne10 |109.2%|97.81%|100.3%|97.20%|101.3%|99.01%|103.4%|103.5%|94.67%|
|omx |100% |100% |100% |100% |100% |100% |100% |100% |100% |
A15:
| |16 |32 |64 |128 |256 |512 |1024 |2048 |4096 |
|Ne10 |112.6%|95.78%|104.3%|101.7%|112.3%|111.5%|102.3%|105.1%|99.78%|
|omx |100% |100% |100% |100% |100% |100% |100% |100% |100% |
Change-Id: I7290ae5f9abfd3d04f8ca501f5ecbff452973d4b
Yang Zhang [Fri, 30 May 2014 11:36:23 +0000 (19:36 +0800)]
optimize float complex FFT
1. To optimize FFT, the algorithm is changed. Bit reversal is removed and radix 8 is added.
2. After test, the optimized FFT show the best performance, so that the old implementations are removed.
The performance result is as follows:
toolchain: gcc 4.8 at -O2
omx fft's execute time is the base. The ratio is less, the performance is better.
panda board A9:
| |16 |32 |64 |128 |256 |512 |1024 |2048 |4096 |
|Ne10 |84.27%|89.57%|85.63%|85.79%|87.89%|87.91%|83.51%|97.08%|92.68%|
|omx |100% |100% |100% |100% |100% |100% |100% |100% |100% |
nexus10 A15:
| |16 |32 |64 |128 |256 |512 |1024 |2048 |4096 |
|Ne10 |84.88%|98.43%|89.46%|101.0%|99.24%|103.2%|93.80%|105.1%|97.44%|
|omx |100% |100% |100% |100% |100% |100% |100% |100% |100% |
Change-Id: I363ee1602f08532e566d3a5a4f3d7a99972a1283
Zhongwei Yao [Thu, 15 May 2014 06:20:15 +0000 (14:20 +0800)]
extend copyright year and add the extend script.
Change-Id: Ice948d88f2dc6122b562bf479aea53c060181345
Zhongwei Yao [Mon, 2 Dec 2013 05:27:20 +0000 (13:27 +0800)]
add box filter to image processing module.
Matthew DuPuy [Tue, 22 Apr 2014 21:11:17 +0000 (14:11 -0700)]
Create Acknowledgements.md
Matthew DuPuy [Wed, 12 Mar 2014 21:31:46 +0000 (14:31 -0700)]
Create LICENSE
Requested for clarification of license in code file headers.
Yang Zhang [Wed, 19 Feb 2014 09:59:14 +0000 (17:59 +0800)]
make changes as follows:
-optimize float/int32 fft for 4-4096
-add unscaled/scaled implementation for int32 fft
-add neon intrinsic version for float/int32 fft
Matthew DuPuy [Fri, 14 Feb 2014 20:49:48 +0000 (12:49 -0800)]
Call for use cases
Help us track Ne10 usage since downloads are not a great metric and didn't even exist in GitHub till 2014.
Yang Zhang [Fri, 24 Jan 2014 09:48:51 +0000 (17:48 +0800)]
make the following changes
-add 3 functions for collision detection
-add test cases and doc
-update the ReleaseNote
Zhongwei Yao [Mon, 16 Dec 2013 06:04:06 +0000 (14:04 +0800)]
add following changes:
- add MIN_IOS_VER configuration for iOS platform building
- add new added FFT functions' iOS support
- remove resize function's assembly version, only keep the intrinsics version
- refine the smoke test case for resize function
Yang Zhang [Mon, 9 Dec 2013 04:11:46 +0000 (12:11 +0800)]
add hard float support for Linux/Andriod
Yang Zhang [Wed, 20 Nov 2013 08:15:11 +0000 (16:15 +0800)]
add the new FFT features
- c2c FFT/IFFT(float/int32/int16) with 2^N size
- r2c FFT(float/int32/int16) with 2^N size
- c2rIFFT(float/int32/int16) with 2^N size
- test cases and doc
Zhongwei Yao [Thu, 24 Oct 2013 10:55:12 +0000 (18:55 +0800)]
Make following changes:
- update cmake config script and doc due to Xcode upgrade
- add compiler switch(-mthumb) for android and ubuntu to make sure generated code is thumb code.
- change the log output buffer size to get around the bug in sfft test.
Yang Zhang [Mon, 2 Sep 2013 10:06:45 +0000 (18:06 +0800)]
Make the following changes
- Add C implementations, doc and test cases for image resize/rotate
- fix the bug in NEON version of image resize
- add a header file for external macro definitions
Zhongwei Yao [Thu, 22 Aug 2013 06:21:58 +0000 (14:21 +0800)]
update build script to enable building under Mac OS for Android development.
Zhongwei Yao [Thu, 22 Aug 2013 06:20:18 +0000 (14:20 +0800)]
add benchmark result to Android and iOS demo.
Fang Bao [Wed, 26 Jun 2013 07:39:30 +0000 (15:39 +0800)]
Add NEON intrinsic implementation of resize.
NOTE:
The gcc 4.7 is the minimum version advocated for compiling NEON intrinsics.
The intrinsic version will not be compilied because there is a NEON assembly version already.
To enable it, you should:
* Uncommenting the line including NE10_resize.neon.c in modules/CMakeLists.txt
* Commenting the line including NE10_resize.neon.s in modules/CMakeLists.txt
Zhongwei Yao [Tue, 25 Jun 2013 10:21:36 +0000 (18:21 +0800)]
- fix a bug when run command line tests
- add a reasonable check when add platform demo macro in Cmake script
Zhongwei Yao [Mon, 17 Jun 2013 04:19:49 +0000 (12:19 +0800)]
add android demo.
Zhongwei Yao [Sat, 8 Jun 2013 03:04:29 +0000 (11:04 +0800)]
add iOS demo.
Zhongwei Yao [Mon, 3 Jun 2013 04:16:25 +0000 (12:16 +0800)]
add iOS support.
Zhongwei Yao [Fri, 24 May 2013 02:32:41 +0000 (19:32 -0700)]
Merge pull request #53 from projectNe10/dev/zhongwei/android_support_review
update building system to add android support.
Zhongwei Yao [Sun, 7 Apr 2013 03:31:48 +0000 (11:31 +0800)]
update building system to add android support.
yangzhang [Fri, 26 Apr 2013 11:59:28 +0000 (04:59 -0700)]
Merge pull request #52 from projectNe10/dev/yangzhang/imageRotate
add the NEON functions for image rotate
yang01 [Mon, 1 Apr 2013 02:42:37 +0000 (10:42 +0800)]
use ne10 style data types to replace commom style
yang01 [Fri, 29 Mar 2013 08:51:05 +0000 (16:51 +0800)]
add image rotate function(NEON)
yangzhang [Mon, 18 Mar 2013 03:20:28 +0000 (20:20 -0700)]
Merge pull request #48 from projectNe10/dev/yangzhang/imageResizeZoomIn
fix the bug for image zoom in
yang01 [Mon, 18 Mar 2013 03:17:51 +0000 (11:17 +0800)]
fix the bug for image zoom in
yangzhang [Tue, 26 Feb 2013 03:26:28 +0000 (19:26 -0800)]
Merge pull request #47 from projectNe10/dev/yangzhang/imageResize
add image resize functions(NEON version)
yang [Tue, 26 Feb 2013 03:18:07 +0000 (11:18 +0800)]
add image resize functions(NEON version)
yangzhang [Wed, 9 Jan 2013 03:58:24 +0000 (19:58 -0800)]
Merge pull request #42 from projectNe10/dev/yangzhang/documents
build documentation with doxygen
yang [Tue, 8 Jan 2013 05:59:21 +0000 (13:59 +0800)]
change the URL for New BSD License
yang [Tue, 18 Dec 2012 10:47:14 +0000 (18:47 +0800)]
move information of USAGE.txt to documentations of doxygen
yang [Tue, 18 Dec 2012 08:33:59 +0000 (16:33 +0800)]
add notes and image for doxygen
yang [Wed, 12 Dec 2012 08:35:53 +0000 (16:35 +0800)]
build the frame work of documents with doxygen
yang [Wed, 12 Dec 2012 02:49:09 +0000 (10:49 +0800)]
Merge branch 'master' of git://github.com/projectNe10/Ne10 into documents
yang [Wed, 12 Dec 2012 02:46:39 +0000 (10:46 +0800)]
add doxygen files
yangzhang [Tue, 11 Dec 2012 10:26:33 +0000 (02:26 -0800)]
Merge pull request #41 from projectNe10/dev/yangzhang/seatest
build test environment with seatest
yang [Tue, 11 Dec 2012 10:23:35 +0000 (18:23 +0800)]
Merge remote-tracking branch 'origin/master' into seatest
yang [Tue, 11 Dec 2012 10:18:18 +0000 (18:18 +0800)]
remove extra spaces
yangzhang [Tue, 11 Dec 2012 03:52:43 +0000 (19:52 -0800)]
Merge pull request #40 from projectNe10/dev/yangzhang/documents
add functions list to doc