platform/upstream/ne10.git
9 years agoMerge pull request #108 from viswanath-puttagunta/rfcv1_rc3_armv8
Phil Wang [Tue, 7 Apr 2015 08:18:39 +0000 (16:18 +0800)]
Merge pull request #108 from viswanath-puttagunta/rfcv1_rc3_armv8

aarch64: Enable build for aarch64 GNU Linux

9 years agoDSP: Fix bug in scaled non-power-of-2 fixed-point FFT
Phil.Wang [Fri, 3 Apr 2015 02:28:58 +0000 (10:28 +0800)]
DSP: Fix bug in scaled non-power-of-2 fixed-point FFT

    Without this patch, the non-power-of-2 fixed-point FFT does scale
    the output even when the scaled_flag is set to 1.

Change-Id: I0e373f1d2ff110b64905902acf7529eafc2cba19

9 years agoaarch64: Enable build for aarch64 GNU Linux
Viswanath Puttagunta [Wed, 18 Mar 2015 15:29:17 +0000 (10:29 -0500)]
aarch64: Enable build for aarch64 GNU Linux

Enables build configuration to be able to build Ne10
for aarch64 GNU Linux.

Eg: build instructions:
mkdir build && cd build
export NE10_LINUX_TARGET_ARCH=aarch64
cmake -DCMAKE_TOOLCHAIN_FILE=../GNUlinux_config.cmake ..
make -j6

Note: By default, NE10_LINUX_TARGET_ARCH will be set to
"armv7"

Also adds -funsafe-math-optimizations for ARMv7 GNU targets

Signed-off-by: Viswanath Puttagunta <viswanath.puttagunta@linaro.org>
9 years agoProvide NE10_UNROLL_LEVEL to control FFT algorithm
Phil.Wang [Wed, 25 Mar 2015 09:35:05 +0000 (17:35 +0800)]
Provide NE10_UNROLL_LEVEL to control FFT algorithm

    * Macro NE10_UNROLL_LEVEL should be one of following:
        0: use less registers, default value on AArch32
        1: use more registers, default value on AArch64

    * Fix typos in doc/BuildingNe10.txt and CMakeLists.txt
    * Update doc/BuildingNe10.txt for environment varialbe
        NE10_(LINUX/ANDROID/IOS)_TARGET_ARCH

Change-Id: I991d11eedf3553d542406a25f89bd1157fe5c5ff

9 years agoDSP: Fine tune fixed-point non-power-of-2 CFFT for GCC 4.9.0
Phil.Wang [Thu, 19 Mar 2015 08:26:57 +0000 (16:26 +0800)]
DSP: Fine tune fixed-point non-power-of-2 CFFT for GCC 4.9.0

    Now Ne10 provides inline assembly for fixed-point non-power-of-2
    Complex FFT, on AArch64. For GCC 4.9.0, user can define
    NE10_INLINE_ASM_OPT to enable this optimization.
    Below is performance data with or without this
    optimization.

    Cortex-A53 AArch64 mode (1.69GHz)
    GCC 4.9.0, with -O2
    Android-21, AArch64

    | C2C FFT Time Cost|
    |             in ms|
    |size|         Ne10|
    |    |Without| With|
    |  60|   4.67| 2.91|
    | 120|   5.79| 3.37|
    | 240|   5.57| 3.30|
    | 480|   6.76| 3.87|
    | 970|   6.89| 4.00|

Change-Id: I4f74217b026d8ef6ab6af4e1fb178ce4f1398b50

9 years agoDSP: Fine tune floating-point non-power-of-2 CFFT for GCC 4.9.0
Phil.Wang [Thu, 19 Mar 2015 06:37:53 +0000 (14:37 +0800)]
DSP: Fine tune floating-point non-power-of-2 CFFT for GCC 4.9.0

    Now Ne10 provides inline assembly for floating-point non-power-of-2
    Complex FFT, on AArch64. For GCC 4.9.0, user can define
    NE10_INLINE_ASM_OPT to enable this optimization.
    Below is performance data with or without this optimization.

    Cortex-A53 AArch64 mode (1.69GHz)
    GCC 4.9.0, with -O2
    Android-21, AArch64

    |            C2C FFT Time Cost in ms|
    |size|         Ne10|pffft|pffft/Ne10|
    |    |Without| With|     |      With|
    |  60|   4.74| 3.56|   NA|        NA|
    | 120|   6.09| 3.57|   NA|        NA|
    | 240|   6.00| 3.52|   NA|        NA|
    | 480|   7.02| 3.96| 4.48|      113%|
    | 960|   7.05| 4.03| 4.50|      112%|

Change-Id: Ib241141c4ef962b665ef47f70abfcf4515fe80ba

9 years agoTuning floating point RFFT for GCC 4.9.0
Phil.Wang [Fri, 13 Mar 2015 10:10:26 +0000 (18:10 +0800)]
Tuning floating point RFFT for GCC 4.9.0

    Cortex-A53 (1.69GHz)
    GCC 4.9.0, with -O2
    Android-L, AArch64

    |   R2C FFT Time Cost in ms|
    |size|Ne10|pffft|pffft/Ne10|
    |  32| 118|  254|      215%|
    |  64| 126|  198|      157%|
    | 128| 109|  177|      162%|
    | 256| 126|  154|      122%|
    | 512| 122|  165|      135%|
    |1024| 143|  162|      113%|
    |2048| 153|  188|      123%|

    The larger the last column is, the faster Ne10 is.

Change-Id: I8921fc83afb8c7307ffd0fcb2a4bb1a88b349339

9 years agoUpdate doc/BuildingNe10.txt for Linux
Phil.Wang [Thu, 26 Feb 2015 06:30:12 +0000 (14:30 +0800)]
Update doc/BuildingNe10.txt for Linux

Now, users need to set NE10_LINUX_TARGET_ARCH for native compiling Ne10
on Linux.

Change-Id: I56ce3187f39b07ea5408051d606be7d8dd62f9af

9 years agoRemove dependency on libstdc++
Phil.Wang [Wed, 25 Feb 2015 09:38:31 +0000 (17:38 +0800)]
Remove dependency on libstdc++

Set linker language to C.
libstdc++ is no longer needed when a 3rd party library is linked to Ne10.

Change-Id: Iaa7cfcd069044c896360a35f04e9d2abb5b6b6eb

9 years agoFix function declaration in NE10_init.h
Phil.Wang [Wed, 25 Feb 2015 08:01:34 +0000 (16:01 +0800)]
Fix function declaration in NE10_init.h

Specify type(s) of function ne10_init and function ne10_HasNEON in
declarations so that they are prototypes as well.

Change-Id: Ie445d5bd7b04f8bf56718dd87435b16b7f554aac

9 years agoFix error in building for iOS #99
Phil.Wang [Thu, 18 Dec 2014 11:16:22 +0000 (19:16 +0800)]
Fix error in building for iOS #99

Change-Id: Iec2b8dda82c2a8c3b189988ddef15bcc3df36070

9 years agoEnable building for ARMv7 under GNU Linux. Fix issue #97.
Zhongwei Yao [Tue, 17 Feb 2015 06:57:23 +0000 (14:57 +0800)]
Enable building for ARMv7 under GNU Linux. Fix issue #97.

Change-Id: Ib8f2bc4b1d5d3b7c6b06648a4699f76e9d243332

9 years agoAdd flags is_forward/backward_scaled
Phil.Wang [Mon, 2 Feb 2015 04:25:02 +0000 (12:25 +0800)]
Add flags is_forward/backward_scaled

Only complex non-power-of-2 floating point FFT is affected by these
flag. Ne10 used to scale output of backward FFT, but not scale output of
forward FFT. Now Ne10 will scale output of forward FFT if is_forward_scaled
is set to anything but zero. It it possible to disable scaling output of
backward FFT by setting is_backward_scaled to zero.

Change-Id: I947b1896af46dff16ff725136868028f54d5dad8

9 years agoAdd destroy functions for FFT.
Phil.Wang [Mon, 2 Feb 2015 04:46:32 +0000 (12:46 +0800)]
Add destroy functions for FFT.

Using following functions instead of NE10_FREE to make sure memory
allocated by Ne10 is also freed by Ne10:
- Float/Fixed point Complex FFT Destroy functions:
    - ne10_fft_destroy_c2c_float32
    - ne10_fft_destroy_c2c_int32
    - ne10_fft_destroy_c2c_int16
- Float/Fixed point Real2Complex FFT Destroy functions:
    - ne10_fft_destroy_r2c_float32
    - ne10_fft_destroy_r2c_int32
    - ne10_fft_destroy_r2c_int16

Change-Id: Ia2eacb5faa8501cf3a8d7705ef732db400fc7013

9 years agoUpdate API for fixed-point non-power-of-2 FFT
Phil.Wang [Sun, 1 Feb 2015 09:45:58 +0000 (17:45 +0800)]
Update API for fixed-point non-power-of-2 FFT

original:
    ne10_fft_cfg_int32_t ne10_fft_alloc_c2c_int32 (ne10_int32_t nfft);
now:
    ne10_fft_cfg_int32_t (*ne10_fft_alloc_c2c_int32) (ne10_int32_t nfft);
    ne10_fft_cfg_int32_t ne10_fft_alloc_c2c_int32_c (ne10_int32_t nfft);
    ne10_fft_cfg_int32_t ne10_fft_alloc_c2c_int32_neon (ne10_int32_t nfft);

Use _c version for ne10_fft_c2c_1d_int32_c, and use _neon version for
ne10_fft_c2c_1d_int32_neon.
ne10_fft_alloc_c2c_int32 becomes a functon pointer now. Function
ne10_init_dsp will set it pointing to the right function according to
runtime condition.
Test suite is updated accordingly.

Change-Id: I15cbfe75a29995696335c9f6939e03cd2d5fe57a

9 years agoEnable Fixed-Point Non-power-of-2 FFT.
Phil.Wang [Fri, 23 Jan 2015 07:19:44 +0000 (15:19 +0800)]
Enable Fixed-Point Non-power-of-2 FFT.

For Cortex-A53 (AArch64)
LLVM 3.5, -O2
Time: in ms
SNR : in dB

|         |Forward  |Backward |
|    |Size| Time|SNR| Time|SNR|
|   C|   8|   26| 92|   26| 93|
|   C|  16|   51| 90|   51| 90|
|   C|  32|  130| 89|  132| 91|
|   C|  60|  452| 81|  469| 83|
|   C|  64|  304| 88|  305| 88|
|   C| 120| 1070| 82| 1149| 82|
|   C| 128|  727| 88|  735| 89|
|   C| 240| 2197| 81| 2312| 82|
|   C| 256| 1659| 88| 1659| 88|
|   C| 480| 5127| 82| 5520| 82|
|   C| 512| 3819| 88| 3855| 88|
|   C| 900|11621| 80|12190| 81|
|   C| 960|10640| 82|11246| 82|
|NEON|   8|   10| 93|   10| 95|
|NEON|  16|   18| 97|   18| 97|
|NEON|  32|   54| 88|   55| 89|
|NEON|  60|  163| 88|  169| 88|
|NEON|  64|  133| 85|  133| 86|
|NEON| 120|  346| 88|  358| 90|
|NEON| 128|  263| 87|  264| 87|
|NEON| 240|  668| 89|  704| 88|
|NEON| 256|  635| 85|  635| 85|
|NEON| 480| 1526| 89| 1595| 89|
|NEON| 512| 1300| 86| 1299| 87|
|NEON| 900| 3207| 88| 3372| 89|
|NEON| 960| 3107| 89| 3394| 89|

Change-Id: I256d1e4eff40ff20e19fe941f3222a3f7d2944f6

9 years agoDSP: Generic Fixed-Point complex FFT is enabled.
Phil.Wang [Wed, 21 Jan 2015 06:20:48 +0000 (14:20 +0800)]
DSP: Generic Fixed-Point complex FFT is enabled.

Currently, only C verion is available.
Pass conformance test. Since there is no NEON optimized Fixed-Point FFT,
performance test is runned but not compared.

For A53, with GCC
Complex Fixed-Point FFT
Time in ms, SNR in dB

|Direction|    |nfft|time|SNR|
|  forward|   C|  16|  66| 82|
|  forward|   C|  32| 108| 82|
|  forward|   C|  60| 366| 80|
|  forward|   C|  64| 360| 82|
|  forward|   C| 120| 876| 80|
|  forward|   C| 128| 641| 82|
|  forward|   C| 240|1715| 80|
|  forward|   C| 256|1876| 82|
|  forward|   C| 480|4075| 79|
|  forward|   C| 512|3484| 82|
|  forward|   C| 900|9266| 79|
|  forward|   C| 960|8224| 79|
| backward|   C|  16|  65| 82|
| backward|   C|  32| 106| 82|
| backward|   C|  60| 361| 80|
| backward|   C|  64| 360| 82|
| backward|   C| 120| 874| 80|
| backward|   C| 128| 639| 83|
| backward|   C| 240|1731| 79|
| backward|   C| 256|1892| 82|
| backward|   C| 480|4113| 79|
| backward|   C| 512|3509| 82|
| backward|   C| 900|9285| 79|
| backward|   C| 960|8384| 79|

Change-Id: Icb4575a22c51e2f684a1ebbe0464a782e912f769

9 years agoDisable full path in doxygen.cfg
Phil.Wang [Wed, 21 Jan 2015 10:35:04 +0000 (18:35 +0800)]
Disable full path in doxygen.cfg

Change-Id: I7c4f19c8055f6b16d2680cc8d8bbdc149b330860

9 years agoExtend copyright year to 2015.
Zhongwei Yao [Tue, 20 Jan 2015 10:41:25 +0000 (18:41 +0800)]
Extend copyright year to 2015.

Change-Id: I77b17e0c07713d03e2c74300b9427e0c15d0f963

9 years agoDisable following functions which have not been optimized for aarch64.
Zhongwei Yao [Tue, 13 Jan 2015 11:50:58 +0000 (19:50 +0800)]
Disable following functions which have not been optimized for aarch64.

Following functions have been disabled for aarch64 target:
  -. 6 dsp functions:
     ne10_fir_float_neon
     ne10_fir_decimate_float_neon
     ne10_fir_interpolate_float_neon
     ne10_fir_lattice_float_neon
     ne10_fir_sparse_float_neon
     ne10_iir_lattice_float_neon
  -. 1 imgproc function:
     ne10_img_rotate_rgba_neon
  -. 3 physics functions:
     ne10_physics_compute_aabb_vec2f_neon
     ne10_physics_relative_v_vec2f_neon
     ne10_physics_apply_impulse_vec2f_neon
  -. all functions in math module

Smoke test has been passed under both aarch64 and armv7.

Change-Id: Ied135a4ac83260564081180f4bfef6eda5042aa0

9 years agoEnable aarch64 in cmake build script for android platform.
Zhongwei Yao [Tue, 13 Jan 2015 09:16:35 +0000 (17:16 +0800)]
Enable aarch64 in cmake build script for android platform.

Change-Id: Iae0fa7b89f8ed8ee6cc932055c0602e3191e85b6

9 years agoFix wrong definition of DIV_TW81 and DIV_TW81N values
Haruki Hasegawa [Sun, 9 Nov 2014 07:57:45 +0000 (16:57 +0900)]
Fix wrong definition of DIV_TW81 and DIV_TW81N values

9 years agoDSP: remove unit tests in RFFT
Phil.Wang [Wed, 24 Dec 2014 08:55:57 +0000 (16:55 +0800)]
DSP: remove unit tests in RFFT

Change-Id: If535d81f1dbef710d6f7ac807e0313408345ac4a

9 years agoChange include Syntax
Phil.Wang [Wed, 24 Dec 2014 08:17:30 +0000 (16:17 +0800)]
Change include Syntax

Change-Id: I2872d8bba91beaad26d2b2b3aa56e63b10186083

9 years agoFFT: Fix false assert at the last loop
Phil.Wang [Wed, 24 Dec 2014 09:21:47 +0000 (17:21 +0800)]
FFT: Fix false assert at the last loop

Change-Id: I9954b32f4dafd4eed63db98a24cd47cabe9c1b3d

9 years agoDSP/FFT: declare ne10_radix8_r2c(c2r)_c
Phil.Wang [Tue, 23 Dec 2014 09:20:57 +0000 (17:20 +0800)]
DSP/FFT: declare ne10_radix8_r2c(c2r)_c

Change-Id: Ia01574388bed413954d66a6927c8d3d969f3d329

9 years agoUpdate the copyright and license.
Zhongwei Yao [Thu, 18 Dec 2014 08:19:55 +0000 (16:19 +0800)]
Update the copyright and license.

Change-Id: I224478247aa7201fb8bf150dfd14e7b92b2e5141

9 years agoNE10/FFT/complex-non-power-of-2 NEON v1.2.0
Phil.Wang [Wed, 17 Dec 2014 03:52:11 +0000 (11:52 +0800)]
NE10/FFT/complex-non-power-of-2 NEON

    ARM 64-bit (Cortex-A57)
    complex forward float LLVM 3.5
             Time in ms      |
        |kiss|opus|pffft|NE10|
        |   C|   C| NEON|NEON|
      60| 129| 113|   NA|  44|
     120| 148| 127|   NA|  49|
     240| 151| 128|   55|  47|
     480| 169| 142|   60|  55|
     960| 183| 149|   65|  58|
    1920| 193| 167|   71|  66|
    3840| 217| 175|   76|  71|
    SNR > 100dB

    ARM 64-bit (Cortex-A53)
    complex forward float LLVM 3.5
             Time in ms      |
        |kiss|opus|pffft|NE10|
        |   C|   C| NEON|NEON|
      60| 295| 311|   NA|  72|
     120| 368| 375|   NA|  79|
     240| 345| 342|  104|  77|
     480| 415| 407|  115|  87|
     960| 406| 378|  121|  95|
    1920| 476| 441|  138| 113|
    3840| 497| 424|  161| 126|
    SNR > 100dB

    ARM 32-bit (Cortex-A9)
    complex forward float LLVM 3.5
             Time in ms      |
        |kiss|opus|pffft|NE10|
        |   C|   C| NEON|NEON|
      60| 224| 211|   NA|  98|
     120| 265| 245|   NA| 104|
     240| 262| 240|  130| 106|
     480| 302| 274|  150| 122|
     960| 305| 271|  162| 153|
    1920| 369| 356|  230| 206|
    3840| 415| 440|  282| 239|
    SNR > 100dB

Change-Id: If9418041b01eed49dbdc8d6a18dd03f2c5684da8

9 years agoNE10/FFT/backward-complex-non-power-of-2 C
Phil.Wang [Wed, 17 Dec 2014 03:42:40 +0000 (11:42 +0800)]
NE10/FFT/backward-complex-non-power-of-2 C

ARM 32-bit (Cortex-A9)
complex backward unscaled float GCC 4.9
        Time in ms      |
   |kiss|opus|pffft|NE10|
   |   C|   C| NEON|   C|
 60| 195| 172|   NA| 144|
120| 234| 200|   NA| 173|
240| 231| 203|  148| 175|
480| 267| 231|  176| 215|

ARM 64-bit (Cortex-A57)
complex backward unscaled float GCC 4.9
        Time in ms      |
   |kiss|opus|pffft|NE10|
   |   C|   C| NEON|   C|
 60| 125|  89|   NA|  87|
120| 141| 104|   NA| 103|
240| 146| 106|   52| 109|
480| 163| 120|   58| 127|

SNR > 100dB for all 2^M*3^N*5^K

Change-Id: Ie4bb27d053213bfbf2dbdd0020f9fda5db4312f9

9 years agoNE10/FFT/forward-complex-non-power-of-2 C
Phil.Wang [Mon, 17 Nov 2014 08:57:44 +0000 (16:57 +0800)]
NE10/FFT/forward-complex-non-power-of-2 C

ARM 32-bit (Cortex-A9)
complex forward float GCC 4.9
       Time in ms       |
   |kiss|opus|pffft|NE10|
   |   C|   C| NEON|   C|
 60| 213| 200|   NA| 131|
120| 252| 225|   NA| 164|
240| 245| 245|  163| 171|
480| 268| 262|  187| 195|

SNR > 100dB, for all following cases
    |size|time|
NE10|  16|  69|
NE10|  32|  80|
NE10|  60| 131|
NE10|  64| 106|
NE10| 120| 164|
NE10| 128| 110|
NE10| 240| 171|
NE10| 256| 148|
NE10| 480| 195|
NE10| 512| 144|
NE10|1024| 214|

ARM 64-bit (Cortex-A57)
complex forward float LLVM 3.5
       Time in ms       |
   |kiss|opus|pffft|NE10|
   |   C|   C| NEON|   C|
 60| 131| 111|   NA|  95|
120| 154| 121|   NA| 121|
240| 153| 129|   55| 121|
480| 168| 144|   61| 146|

SNR > 100dB, for all following cases
    |size|time|
NE10|  16|  28|
NE10|  32|  31|
NE10|  60|  95|
NE10|  64|  43|
NE10| 120| 121|
NE10| 128|  46|
NE10| 240| 121|
NE10| 256|  56|
NE10| 480| 146|
NE10| 512|  61|
NE10|1024|  73|

Change-Id: I9fab61e9e47279d848815699e928ab4abead5635

9 years agoNE10/DSP/FFT formating
Phil.Wang [Thu, 13 Nov 2014 06:33:09 +0000 (14:33 +0800)]
NE10/DSP/FFT formating

Change-Id: Ie96cc6945844d0eb23b04027cce7fb03c7163f1d

9 years agoContributing
Matthew DuPuy [Sun, 9 Nov 2014 17:03:15 +0000 (09:03 -0800)]
Contributing

Also see: https://github.com/projectNe10/Ne10/wiki/Contribution-of-code-F.A.Q.

9 years ago3 clause BSD only
Matthew DuPuy [Sun, 9 Nov 2014 16:49:15 +0000 (08:49 -0800)]
3 clause BSD only

9 years agoCreated project contribution instructions
Matthew DuPuy [Sun, 9 Nov 2014 16:43:01 +0000 (08:43 -0800)]
Created project contribution instructions

Anyone wishing to submit a pull request must sign a contribution license agreement (CLA) with ARM first.

9 years agoUpdate email contacts to match landing page
Matthew DuPuy [Sun, 9 Nov 2014 16:40:01 +0000 (08:40 -0800)]
Update email contacts to match landing page

9 years agoAdd line wrapping to the documentation files
Kévin PETIT [Mon, 29 Sep 2014 12:34:36 +0000 (13:34 +0100)]
Add line wrapping to the documentation files

Change-Id: I81ca1c1980d250b2df020c7b584f91cc0595fdc2
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
9 years agoNE10/DSP/RFFT: optimise RFFT for armv8
Phil.Wang [Mon, 22 Sep 2014 04:26:46 +0000 (12:26 +0800)]
NE10/DSP/RFFT: optimise RFFT for armv8

intrinsic, LLVM 3.5, -O2
on a57, juno, android
   size|   time in ms   |   boost   |
       |  NE10 |  pffft | pffft/NE10|
       |R2C|C2R|R2C|C2R*|  R2C|  C2R|
     32|145|185|279| 319|1.92x|1.72x|
     64|175|200|239| 279|1.36x|1.39x|
    128|166|185|237| 262|1.42x|1.41x|
    256|197|208|232| 256|1.17x|1.23x|
    512|208|216|254| 270|1.22x|1.25x|
   1024|241|244|260| 278|1.07x|1.14x|
   2048|258|263|332| 322|1.28x|1.22x|
   4096|303|304|388| 353|1.28x|1.16x|
   8192|339|334|424| 426|1.25x|1.27x|

intrinsic, GCC 4.9, -O2
on a57, juno, android
   size|    time in ms  |   boost   |
       |  NE10 |  pffft | pffft/NE10|
       |R2C|C2R|R2C|C2R*|  R2C|  C2R|
     32|174|181|328| 410|1.88x|2.26x|
     64|214|216|270| 338|1.26x|1.56x|
    128|210|197|259| 310|1.23x|1.57x|
    256|232|223|243| 283|1.04x|1.26x|
    512|250|222|263| 307|1.04x|1.38x|
   1024|274|251|272| 304|1.00x|1.20x|
   2048|288|277|314| 353|1.08x|1.27x|
   4096|333|303|349| 379|1.04x|1.25x|
   8192|370|342|424| 452|1.14x|1.31x|

* Ne10 supports scale of output for backward RFFT,
  while pffft doesn't. To normalize the benchmark,
  a scale operation was added to the end of each
  call to pffft.
* pffft C2R FFT costs 410ms when size==32, 338ms when
  size==64, this is because the former loops more times
  than the latter does, so it does not mean pffft cost
  more time for short input.

intrinsic, GCC 4.9, -O2
on a53, juno, android
   size|    time in ms  |   boost   |
       |  NE10 |  pffft | pffft/NE10|
       |    R2C|     R2C|        R2C|
     32|    347|     607|      1.74x|
     64|    389|     489|      1.25x|
    128|    334|     484|      1.44x|
    256|    401|     456|      1.13x|
    512|    380|     502|      1.32x|
   1024|    460|     512|      1.11x|
   2048|    481|     593|      1.23x|
   4096|    605|     709|      1.17x|
   8192|    704|     891|      1.26x|

Change-Id: Ide0b974620ae8d06cfa862769004b2110abaaeff

10 years agoadd v8 assembly for float fft.
Yang Zhang [Tue, 22 Jul 2014 09:34:15 +0000 (17:34 +0800)]
add v8 assembly for float fft.

This assembly file is suitable for gcc only. If you want to use llvm, please use the files with suffix .neonintrinsic.c

Change-Id: I27528b2ebbf079db0e4f1c6ecf7828baa0fbaf0a

10 years agofix the issues
Yang Zhang [Thu, 3 Jul 2014 03:29:03 +0000 (11:29 +0800)]
fix the issues

 - update NE10_FREE macro define
   when a pointer is freed, the NULL should be assigned.
 - remove needless backslash
 - modify the data type of address for ARM v8
 - modify IFFT scaling for int32/int16

Change-Id: I62dc5803ba106e13fb9c91dba6ac3099f3fb5737

10 years agoMerge "make sure the address of buffer 64-bit alignment"
Zhou (Joe) Yu [Wed, 2 Jul 2014 10:22:40 +0000 (11:22 +0100)]
Merge "make sure the address of buffer 64-bit alignment"

10 years agomake sure the address of buffer 64-bit alignment v1.1.2
Yang Zhang [Wed, 2 Jul 2014 06:41:57 +0000 (14:41 +0800)]
make sure the address of buffer 64-bit alignment

For FFT NEON implementation, the 64-bit alignment address of input/output/twiddle can improve the
speed of data load/store. If the address isn't 64-bit alignment, there will be BUS error.

Change-Id: I201307de980eef544025bcb498b0093a272e2936

10 years agofix bug in fir v1.1.1
Yang Zhang [Tue, 1 Jul 2014 04:39:23 +0000 (12:39 +0800)]
fix bug in fir

When input length isn't multiple of 4, the filter output result is wrong. This patch is to fix this issue.

Change-Id: I212d86fef3beb9aaeb3292d98719665ba521daee

10 years agoMerge "fix bug in fir"
Zhou (Joe) Yu [Tue, 1 Jul 2014 06:04:39 +0000 (07:04 +0100)]
Merge "fix bug in fir"

10 years agofix bug in fir
Yang Zhang [Tue, 1 Jul 2014 04:39:23 +0000 (12:39 +0800)]
fix bug in fir

When input length isn't multiple of 4, the filter output result is wrong. This patch is to fix this issue.

Change-Id: I212d86fef3beb9aaeb3292d98719665ba521daee

10 years agoadd temp buffer allocation and scaling by 2 for rfft
Yang Zhang [Mon, 30 Jun 2014 07:45:47 +0000 (15:45 +0800)]
add temp buffer allocation and scaling by 2 for rfft
 - add temp buffer allocation in init function
 - add scaling by 2 for C, NEON assembly and intrinsic version

Change-Id: I7e46f327f43664e06700089f4d38f0d868d44f3e

10 years ago update the FFT implementation v1.1.0
Yang Zhang [Thu, 19 Jun 2014 09:21:01 +0000 (17:21 +0800)]
 update the FFT implementation

 - add scaling by nfft in IFFT
 - add temp buffer to protect the source data
 - change the interface for passing temp buffer
 - add intrinsic version of FFT
 - indent the code

Change-Id: I35f46e60bb88070127eb59281ddbd3a72f6b8e7d

10 years agoignore *.so and *.prefs
Matthew DuPuy [Wed, 18 Jun 2014 05:03:46 +0000 (22:03 -0700)]
ignore *.so and *.prefs

10 years agoMinor semantic update to demo
Matthew DuPuy [Wed, 18 Jun 2014 04:58:26 +0000 (21:58 -0700)]
Minor semantic update to demo

10 years agocfft and rfft test modules removed
Matthew DuPuy [Wed, 18 Jun 2014 03:31:30 +0000 (20:31 -0700)]
cfft and rfft test modules removed

NE10_TEST_DSP could no longer build with cfft and rfft test modules
removed.

10 years ago optimize int32/int16 complex FFT v1.0.2
Yang Zhang [Fri, 13 Jun 2014 06:59:51 +0000 (14:59 +0800)]
optimize int32/int16 complex FFT

    The performance result is as follows:

    toolchain: gcc 4.8 at -O2
    omx fft's execute time is the base. The ratio is less, the performance is better.

    int32 FFT
    A9:
    |     |16    |32    |64    |128   |256   |512   |1024  |2048  |4096  |
    |Ne10 |73.24%|99.95%|95.78%|96.04%|97.97%|97.57%|99.51%|97.87%|98.12%|
    |omx  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |

    A15:
    |     |16    |32    |64    |128   |256   |512   |1024  |2048  |4096  |
    |Ne10 |84.89%|98.62%|89.33%|100.7%|99.28%|103.9%|101.7%|105.1%|96.67%|
    |omx  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |

    int16 FFT
    A9:
    |     |16    |32    |64    |128   |256   |512   |1024  |2048  |4096  |
    |Ne10 |109.2%|97.81%|100.3%|97.20%|101.3%|99.01%|103.4%|103.5%|94.67%|
    |omx  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |

    A15:
    |     |16    |32    |64    |128   |256   |512   |1024  |2048  |4096  |
    |Ne10 |112.6%|95.78%|104.3%|101.7%|112.3%|111.5%|102.3%|105.1%|99.78%|
    |omx  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |

Change-Id: I7290ae5f9abfd3d04f8ca501f5ecbff452973d4b

10 years agooptimize float complex FFT v1.0.1
Yang Zhang [Fri, 30 May 2014 11:36:23 +0000 (19:36 +0800)]
optimize float complex FFT

1. To optimize FFT, the algorithm is changed. Bit reversal is removed and radix 8 is added.
2. After test, the optimized FFT show the best performance, so that the old implementations are removed.

The performance result is as follows:

toolchain: gcc 4.8 at -O2
omx fft's execute time is the base. The ratio is less, the performance is better.

panda board A9:
|     |16    |32    |64    |128   |256   |512   |1024  |2048  |4096  |
|Ne10 |84.27%|89.57%|85.63%|85.79%|87.89%|87.91%|83.51%|97.08%|92.68%|
|omx  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |

nexus10 A15:
|     |16    |32    |64    |128   |256   |512   |1024  |2048  |4096  |
|Ne10 |84.88%|98.43%|89.46%|101.0%|99.24%|103.2%|93.80%|105.1%|97.44%|
|omx  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |

Change-Id: I363ee1602f08532e566d3a5a4f3d7a99972a1283

10 years agoextend copyright year and add the extend script.
Zhongwei Yao [Thu, 15 May 2014 06:20:15 +0000 (14:20 +0800)]
extend copyright year and add the extend script.

Change-Id: Ice948d88f2dc6122b562bf479aea53c060181345

10 years agoadd box filter to image processing module.
Zhongwei Yao [Mon, 2 Dec 2013 05:27:20 +0000 (13:27 +0800)]
add box filter to image processing module.

10 years agoCreate Acknowledgements.md
Matthew DuPuy [Tue, 22 Apr 2014 21:11:17 +0000 (14:11 -0700)]
Create Acknowledgements.md

10 years agoCreate LICENSE
Matthew DuPuy [Wed, 12 Mar 2014 21:31:46 +0000 (14:31 -0700)]
Create LICENSE

Requested for clarification of license in code file headers.

10 years agomake changes as follows:
Yang Zhang [Wed, 19 Feb 2014 09:59:14 +0000 (17:59 +0800)]
make changes as follows:
-optimize float/int32 fft for 4-4096
-add unscaled/scaled implementation for int32 fft
-add neon intrinsic version for float/int32 fft

10 years agoCall for use cases
Matthew DuPuy [Fri, 14 Feb 2014 20:49:48 +0000 (12:49 -0800)]
Call for use cases

Help us track Ne10 usage since downloads are not a great metric and didn't even exist in GitHub till 2014.

10 years agomake the following changes
Yang Zhang [Fri, 24 Jan 2014 09:48:51 +0000 (17:48 +0800)]
make the following changes
  -add 3 functions for collision detection
  -add test cases and doc
  -update the ReleaseNote

10 years agoadd following changes: v1.0.0
Zhongwei Yao [Mon, 16 Dec 2013 06:04:06 +0000 (14:04 +0800)]
add following changes:
    - add MIN_IOS_VER configuration for iOS platform building
    - add new added FFT functions' iOS support
    - remove resize function's assembly version, only keep the intrinsics version
    - refine the smoke test case for resize function

10 years agoadd hard float support for Linux/Andriod
Yang Zhang [Mon, 9 Dec 2013 04:11:46 +0000 (12:11 +0800)]
add hard float support for Linux/Andriod

10 years agoadd the new FFT features
Yang Zhang [Wed, 20 Nov 2013 08:15:11 +0000 (16:15 +0800)]
add the new FFT features
 - c2c FFT/IFFT(float/int32/int16) with 2^N size
 - r2c FFT(float/int32/int16) with 2^N size
 - c2rIFFT(float/int32/int16) with 2^N size
 - test cases and doc

10 years agoMake following changes:
Zhongwei Yao [Thu, 24 Oct 2013 10:55:12 +0000 (18:55 +0800)]
Make following changes:
     - update cmake config script and doc due to Xcode upgrade
     - add compiler switch(-mthumb) for android and ubuntu to make sure generated code is thumb code.
     - change the log output buffer size to get around the bug in sfft test.

10 years agoMake the following changes
Yang Zhang [Mon, 2 Sep 2013 10:06:45 +0000 (18:06 +0800)]
Make the following changes
 - Add C implementations, doc and test cases for image resize/rotate
 - fix the bug in NEON version of image resize
 - add a header file for external macro definitions

10 years agoupdate build script to enable building under Mac OS for Android development.
Zhongwei Yao [Thu, 22 Aug 2013 06:21:58 +0000 (14:21 +0800)]
update build script to enable building under Mac OS for Android development.

10 years agoadd benchmark result to Android and iOS demo.
Zhongwei Yao [Thu, 22 Aug 2013 06:20:18 +0000 (14:20 +0800)]
add benchmark result to Android and iOS demo.

11 years agoAdd NEON intrinsic implementation of resize.
Fang Bao [Wed, 26 Jun 2013 07:39:30 +0000 (15:39 +0800)]
Add NEON intrinsic implementation of resize.

NOTE:
The gcc 4.7 is the minimum version advocated for compiling NEON intrinsics.
The intrinsic version will not be compilied because there is a NEON assembly version already.
To enable it, you should:
  * Uncommenting the line including NE10_resize.neon.c in modules/CMakeLists.txt
  * Commenting the line including NE10_resize.neon.s in modules/CMakeLists.txt

11 years ago- fix a bug when run command line tests
Zhongwei Yao [Tue, 25 Jun 2013 10:21:36 +0000 (18:21 +0800)]
- fix a bug when run command line tests
- add a reasonable check when add platform demo macro in Cmake script

11 years agoadd android demo.
Zhongwei Yao [Mon, 17 Jun 2013 04:19:49 +0000 (12:19 +0800)]
add android demo.

11 years agoadd iOS demo.
Zhongwei Yao [Sat, 8 Jun 2013 03:04:29 +0000 (11:04 +0800)]
add iOS demo.

11 years agoadd iOS support.
Zhongwei Yao [Mon, 3 Jun 2013 04:16:25 +0000 (12:16 +0800)]
add iOS support.

11 years agoMerge pull request #53 from projectNe10/dev/zhongwei/android_support_review
Zhongwei Yao [Fri, 24 May 2013 02:32:41 +0000 (19:32 -0700)]
Merge pull request #53 from projectNe10/dev/zhongwei/android_support_review

update building system to add android support.

11 years agoupdate building system to add android support.
Zhongwei Yao [Sun, 7 Apr 2013 03:31:48 +0000 (11:31 +0800)]
update building system to add android support.

11 years agoMerge pull request #52 from projectNe10/dev/yangzhang/imageRotate
yangzhang [Fri, 26 Apr 2013 11:59:28 +0000 (04:59 -0700)]
Merge pull request #52 from projectNe10/dev/yangzhang/imageRotate

add the NEON functions for image rotate

11 years agouse ne10 style data types to replace commom style
yang01 [Mon, 1 Apr 2013 02:42:37 +0000 (10:42 +0800)]
use ne10 style data types to replace commom style

11 years agoadd image rotate function(NEON)
yang01 [Fri, 29 Mar 2013 08:51:05 +0000 (16:51 +0800)]
add image rotate function(NEON)

11 years agoMerge pull request #48 from projectNe10/dev/yangzhang/imageResizeZoomIn
yangzhang [Mon, 18 Mar 2013 03:20:28 +0000 (20:20 -0700)]
Merge pull request #48 from projectNe10/dev/yangzhang/imageResizeZoomIn

fix the bug for image zoom in

11 years agofix the bug for image zoom in
yang01 [Mon, 18 Mar 2013 03:17:51 +0000 (11:17 +0800)]
fix the bug for image zoom in

11 years agoMerge pull request #47 from projectNe10/dev/yangzhang/imageResize
yangzhang [Tue, 26 Feb 2013 03:26:28 +0000 (19:26 -0800)]
Merge pull request #47 from projectNe10/dev/yangzhang/imageResize

add image resize functions(NEON version)

11 years agoadd image resize functions(NEON version)
yang [Tue, 26 Feb 2013 03:18:07 +0000 (11:18 +0800)]
add image resize functions(NEON version)

11 years agoMerge pull request #42 from projectNe10/dev/yangzhang/documents
yangzhang [Wed, 9 Jan 2013 03:58:24 +0000 (19:58 -0800)]
Merge pull request #42 from projectNe10/dev/yangzhang/documents

build documentation with doxygen

11 years agochange the URL for New BSD License
yang [Tue, 8 Jan 2013 05:59:21 +0000 (13:59 +0800)]
change the URL for New BSD License

11 years agomove information of USAGE.txt to documentations of doxygen
yang [Tue, 18 Dec 2012 10:47:14 +0000 (18:47 +0800)]
move information of USAGE.txt to documentations of doxygen

11 years agoadd notes and image for doxygen
yang [Tue, 18 Dec 2012 08:33:59 +0000 (16:33 +0800)]
add notes and image for doxygen

11 years agobuild the frame work of documents with doxygen
yang [Wed, 12 Dec 2012 08:35:53 +0000 (16:35 +0800)]
build the frame work of documents with doxygen

11 years agoMerge branch 'master' of git://github.com/projectNe10/Ne10 into documents
yang [Wed, 12 Dec 2012 02:49:09 +0000 (10:49 +0800)]
Merge branch 'master' of git://github.com/projectNe10/Ne10 into documents

11 years agoadd doxygen files
yang [Wed, 12 Dec 2012 02:46:39 +0000 (10:46 +0800)]
add doxygen files

11 years agoMerge pull request #41 from projectNe10/dev/yangzhang/seatest
yangzhang [Tue, 11 Dec 2012 10:26:33 +0000 (02:26 -0800)]
Merge pull request #41 from projectNe10/dev/yangzhang/seatest

build test environment with seatest

11 years agoMerge remote-tracking branch 'origin/master' into seatest
yang [Tue, 11 Dec 2012 10:23:35 +0000 (18:23 +0800)]
Merge remote-tracking branch 'origin/master' into seatest

11 years agoremove extra spaces
yang [Tue, 11 Dec 2012 10:18:18 +0000 (18:18 +0800)]
remove extra spaces

11 years agoMerge pull request #40 from projectNe10/dev/yangzhang/documents
yangzhang [Tue, 11 Dec 2012 03:52:43 +0000 (19:52 -0800)]
Merge pull request #40 from projectNe10/dev/yangzhang/documents

add functions list to doc

11 years agoadd license for seatest files
yang [Tue, 11 Dec 2012 03:20:36 +0000 (11:20 +0800)]
add license for seatest files

11 years agoadd functions list to doc
yang [Mon, 10 Dec 2012 03:44:31 +0000 (11:44 +0800)]
add functions list to doc

11 years agoindent the source code
yang [Fri, 7 Dec 2012 05:53:33 +0000 (13:53 +0800)]
indent the source code

11 years agobuild test environment with seatest
yang [Fri, 30 Nov 2012 09:05:45 +0000 (17:05 +0800)]
build test environment with seatest

11 years agoMerge pull request #39 from projectNe10/dev/yangzhang/finetune_dsp
yangzhang [Fri, 23 Nov 2012 05:51:59 +0000 (21:51 -0800)]
Merge pull request #39 from projectNe10/dev/yangzhang/finetune_dsp

Dev/yangzhang/finetune dsp

11 years agomodified the interface of CIFFT for precision
yang [Fri, 23 Nov 2012 04:02:46 +0000 (12:02 +0800)]
modified the interface of CIFFT for precision

11 years agomodify push operations for stack aligned
yang [Thu, 22 Nov 2012 02:41:45 +0000 (10:41 +0800)]
modify push operations for stack aligned

11 years agofine tune dsp functions
yang [Tue, 20 Nov 2012 10:11:07 +0000 (18:11 +0800)]
fine tune dsp functions
1. fine tune FIR function
2. keep stack 8 bytes aligned
3. save D8-15 register

11 years agoMerge pull request #38 from projectNe10/dev/yangzhang/filter
yangzhang [Mon, 29 Oct 2012 08:55:25 +0000 (01:55 -0700)]
Merge pull request #38 from projectNe10/dev/yangzhang/filter

add FIR/IIR functions
1. FIR
2. FIR decimate
3. FIR interpolate
4. FIR lattice
5 FIR sparse
6. IIR lattice

11 years agoadd notes " these functions aren't for hard vfpv3 ABI yet"
yang [Mon, 29 Oct 2012 08:44:23 +0000 (16:44 +0800)]
add notes " these functions aren't for hard vfpv3 ABI yet"