review.tizen.org Git - platform/upstream/ne10.git/log

projects / platform / upstream / ne10.git / log

Phil.Wang [Tue, 21 Jul 2015 01:14:28 +0000 (09:14 +0800)]

Enable NEON optimized image rotate for ARM v7-A

NEON opimized image rotate was disabled by accident. This patch
enables it again. Fix issue #115.

Change-Id: I4aa977de8534557d98a707cc9504aac94805d571

commit | commitdiff | tree

Phil Wang [Wed, 1 Jul 2015 12:27:15 +0000 (13:27 +0100)]

Improve data layout in RFFT float32

Transpose part of twiddles in RFFT float32 to avoid memory access by
a large stride.

Change-Id: I5e05c5baed523183ed3948371e6b1fbffc916e9b

commit | commitdiff | tree

Phil Wang [Fri, 10 Jul 2015 11:43:27 +0000 (12:43 +0100)]

Enable neon code on ARM32 for physics and dsp

Change-Id: Id7159b952cdd317c25ad97cc7388244407b75da9

commit | commitdiff | tree

Phil Wang [Thu, 9 Jul 2015 15:20:37 +0000 (16:20 +0100)]

main fuction in tests return 0 if all tests pass

Change-Id: I2bb30f4107174f043809828981484d61e903c30d

commit | commitdiff | tree

Phil Wang [Tue, 23 Jun 2015 16:56:23 +0000 (17:56 +0100)]

Fix test_suite_fft_int32.c fails under armv7

For 4-point and 8-point FFTs, we fall back to VFP because
they are too short for NEON optimizations.

Change-Id: I72cc37e73d9a62459a0ee9d85b36144aad43606f

commit | commitdiff | tree

Yang Zhang [Tue, 23 Jun 2015 07:47:39 +0000 (15:47 +0800)]

Limit the range of input to avoid the failed tests in math module

Change-Id: If2001c421a8d879a375e41304b607be75790278b

commit | commitdiff | tree

Phil Wang [Fri, 12 Jun 2015 09:26:56 +0000 (10:26 +0100)]

fix NE10_INLINE_ASM_OPT for RFFT

Change-Id: I43a2480a42a6166bff6a72afab022b232d085fea

commit | commitdiff | tree

Phil Wang [Tue, 26 May 2015 17:45:17 +0000 (18:45 +0100)]

Bugfix for #111 and #112

- Link stdc++ for static build
- Switch to intrinsic on iOS

Change-Id: I42e741058241f2ea133d98ba67c7a747d3712bec

commit | commitdiff | tree

Phil.Wang [Tue, 12 May 2015 08:22:32 +0000 (16:22 +0800)]

Bug fix: prevent overflow by moving scaling in front of butterfly

Change-Id: I02353d982f0af0920751ced276232f9552b5a675

commit | commitdiff | tree

Phil.Wang [Tue, 12 May 2015 08:20:04 +0000 (16:20 +0800)]

Bug fix: Call appropriate NEON butterfly functions in
NE10_fft_int32.neon.c

Change-Id: I25040b18cc73bb3171e1be5ee6429f7dd6fc073d

commit | commitdiff | tree

Phil Wang [Tue, 7 Apr 2015 08:18:39 +0000 (16:18 +0800)]

Merge pull request #108 from viswanath-puttagunta/rfcv1_rc3_armv8

aarch64: Enable build for aarch64 GNU Linux

commit | commitdiff | tree

Phil.Wang [Fri, 3 Apr 2015 02:28:58 +0000 (10:28 +0800)]

DSP: Fix bug in scaled non-power-of-2 fixed-point FFT

Without this patch, the non-power-of-2 fixed-point FFT does scale
the output even when the scaled_flag is set to 1.

Change-Id: I0e373f1d2ff110b64905902acf7529eafc2cba19

commit | commitdiff | tree

Viswanath Puttagunta [Wed, 18 Mar 2015 15:29:17 +0000 (10:29 -0500)]

aarch64: Enable build for aarch64 GNU Linux

Enables build configuration to be able to build Ne10
for aarch64 GNU Linux.

Eg: build instructions:
mkdir build && cd build
export NE10_LINUX_TARGET_ARCH=aarch64
cmake -DCMAKE_TOOLCHAIN_FILE=../GNUlinux_config.cmake ..
make -j6

Note: By default, NE10_LINUX_TARGET_ARCH will be set to
"armv7"

Also adds -funsafe-math-optimizations for ARMv7 GNU targets

Signed-off-by: Viswanath Puttagunta <viswanath.puttagunta@linaro.org>

commit | commitdiff | tree

Phil.Wang [Wed, 25 Mar 2015 09:35:05 +0000 (17:35 +0800)]

Provide NE10_UNROLL_LEVEL to control FFT algorithm

    * Macro NE10_UNROLL_LEVEL should be one of following:
        0: use less registers, default value on AArch32
        1: use more registers, default value on AArch64

    * Fix typos in doc/BuildingNe10.txt and CMakeLists.txt
    * Update doc/BuildingNe10.txt for environment varialbe
        NE10_(LINUX/ANDROID/IOS)_TARGET_ARCH

Change-Id: I991d11eedf3553d542406a25f89bd1157fe5c5ff

commit | commitdiff | tree

Phil.Wang [Thu, 19 Mar 2015 08:26:57 +0000 (16:26 +0800)]

DSP: Fine tune fixed-point non-power-of-2 CFFT for GCC 4.9.0

    Now Ne10 provides inline assembly for fixed-point non-power-of-2
    Complex FFT, on AArch64. For GCC 4.9.0, user can define
    NE10_INLINE_ASM_OPT to enable this optimization.
    Below is performance data with or without this
    optimization.

    Cortex-A53 AArch64 mode (1.69GHz)
    GCC 4.9.0, with -O2
    Android-21, AArch64

    | C2C FFT Time Cost|
    |             in ms|
    |size|         Ne10|
    |    |Without| With|
    |  60|   4.67| 2.91|
    | 120|   5.79| 3.37|
    | 240|   5.57| 3.30|
    | 480|   6.76| 3.87|
    | 970|   6.89| 4.00|

Change-Id: I4f74217b026d8ef6ab6af4e1fb178ce4f1398b50

commit | commitdiff | tree

Phil.Wang [Thu, 19 Mar 2015 06:37:53 +0000 (14:37 +0800)]

DSP: Fine tune floating-point non-power-of-2 CFFT for GCC 4.9.0

    Now Ne10 provides inline assembly for floating-point non-power-of-2
    Complex FFT, on AArch64. For GCC 4.9.0, user can define
    NE10_INLINE_ASM_OPT to enable this optimization.
    Below is performance data with or without this optimization.

    Cortex-A53 AArch64 mode (1.69GHz)
    GCC 4.9.0, with -O2
    Android-21, AArch64

    |            C2C FFT Time Cost in ms|
    |size|         Ne10|pffft|pffft/Ne10|
    |    |Without| With|     |      With|
    |  60|   4.74| 3.56|   NA|        NA|
    | 120|   6.09| 3.57|   NA|        NA|
    | 240|   6.00| 3.52|   NA|        NA|
    | 480|   7.02| 3.96| 4.48|      113%|
    | 960|   7.05| 4.03| 4.50|      112%|

Change-Id: Ib241141c4ef962b665ef47f70abfcf4515fe80ba

commit | commitdiff | tree

Phil.Wang [Fri, 13 Mar 2015 10:10:26 +0000 (18:10 +0800)]

Tuning floating point RFFT for GCC 4.9.0

    Cortex-A53 (1.69GHz)
    GCC 4.9.0, with -O2
    Android-L, AArch64

    |   R2C FFT Time Cost in ms|
    |size|Ne10|pffft|pffft/Ne10|
    |  32| 118|  254|      215%|
    |  64| 126|  198|      157%|
    | 128| 109|  177|      162%|
    | 256| 126|  154|      122%|
    | 512| 122|  165|      135%|
    |1024| 143|  162|      113%|
    |2048| 153|  188|      123%|

    The larger the last column is, the faster Ne10 is.

Change-Id: I8921fc83afb8c7307ffd0fcb2a4bb1a88b349339

commit | commitdiff | tree

Phil.Wang [Thu, 26 Feb 2015 06:30:12 +0000 (14:30 +0800)]

Update doc/BuildingNe10.txt for Linux

Now, users need to set NE10_LINUX_TARGET_ARCH for native compiling Ne10
on Linux.

Change-Id: I56ce3187f39b07ea5408051d606be7d8dd62f9af

commit | commitdiff | tree

Phil.Wang [Wed, 25 Feb 2015 09:38:31 +0000 (17:38 +0800)]

Remove dependency on libstdc++

Set linker language to C.
libstdc++ is no longer needed when a 3rd party library is linked to Ne10.

Change-Id: Iaa7cfcd069044c896360a35f04e9d2abb5b6b6eb

commit | commitdiff | tree

Phil.Wang [Wed, 25 Feb 2015 08:01:34 +0000 (16:01 +0800)]

Fix function declaration in NE10_init.h

Specify type(s) of function ne10_init and function ne10_HasNEON in
declarations so that they are prototypes as well.

Change-Id: Ie445d5bd7b04f8bf56718dd87435b16b7f554aac

commit | commitdiff | tree

Phil.Wang [Thu, 18 Dec 2014 11:16:22 +0000 (19:16 +0800)]

Fix error in building for iOS #99

Change-Id: Iec2b8dda82c2a8c3b189988ddef15bcc3df36070

commit | commitdiff | tree

Zhongwei Yao [Tue, 17 Feb 2015 06:57:23 +0000 (14:57 +0800)]

Enable building for ARMv7 under GNU Linux. Fix issue #97.

Change-Id: Ib8f2bc4b1d5d3b7c6b06648a4699f76e9d243332

commit | commitdiff | tree

Phil.Wang [Mon, 2 Feb 2015 04:25:02 +0000 (12:25 +0800)]

Add flags is_forward/backward_scaled

Only complex non-power-of-2 floating point FFT is affected by these
flag. Ne10 used to scale output of backward FFT, but not scale output of
forward FFT. Now Ne10 will scale output of forward FFT if is_forward_scaled
is set to anything but zero. It it possible to disable scaling output of
backward FFT by setting is_backward_scaled to zero.

Change-Id: I947b1896af46dff16ff725136868028f54d5dad8

commit | commitdiff | tree

Phil.Wang [Mon, 2 Feb 2015 04:46:32 +0000 (12:46 +0800)]

Add destroy functions for FFT.

Using following functions instead of NE10_FREE to make sure memory
allocated by Ne10 is also freed by Ne10:
- Float/Fixed point Complex FFT Destroy functions:
    - ne10_fft_destroy_c2c_float32
    - ne10_fft_destroy_c2c_int32
    - ne10_fft_destroy_c2c_int16
- Float/Fixed point Real2Complex FFT Destroy functions:
    - ne10_fft_destroy_r2c_float32
    - ne10_fft_destroy_r2c_int32
    - ne10_fft_destroy_r2c_int16

Change-Id: Ia2eacb5faa8501cf3a8d7705ef732db400fc7013

commit | commitdiff | tree

Phil.Wang [Sun, 1 Feb 2015 09:45:58 +0000 (17:45 +0800)]

Update API for fixed-point non-power-of-2 FFT

original:
    ne10_fft_cfg_int32_t ne10_fft_alloc_c2c_int32 (ne10_int32_t nfft);
now:
    ne10_fft_cfg_int32_t (*ne10_fft_alloc_c2c_int32) (ne10_int32_t nfft);
    ne10_fft_cfg_int32_t ne10_fft_alloc_c2c_int32_c (ne10_int32_t nfft);
    ne10_fft_cfg_int32_t ne10_fft_alloc_c2c_int32_neon (ne10_int32_t nfft);

Use _c version for ne10_fft_c2c_1d_int32_c, and use _neon version for
ne10_fft_c2c_1d_int32_neon.
ne10_fft_alloc_c2c_int32 becomes a functon pointer now. Function
ne10_init_dsp will set it pointing to the right function according to
runtime condition.
Test suite is updated accordingly.

Change-Id: I15cbfe75a29995696335c9f6939e03cd2d5fe57a

commit | commitdiff | tree

Phil.Wang [Fri, 23 Jan 2015 07:19:44 +0000 (15:19 +0800)]

Enable Fixed-Point Non-power-of-2 FFT.

For Cortex-A53 (AArch64)
LLVM 3.5, -O2
Time: in ms
SNR : in dB

|         |Forward  |Backward |
|    |Size| Time|SNR| Time|SNR|
|   C|   8|   26| 92|   26| 93|
|   C|  16|   51| 90|   51| 90|
|   C|  32|  130| 89|  132| 91|
|   C|  60|  452| 81|  469| 83|
|   C|  64|  304| 88|  305| 88|
|   C| 120| 1070| 82| 1149| 82|
|   C| 128|  727| 88|  735| 89|
|   C| 240| 2197| 81| 2312| 82|
|   C| 256| 1659| 88| 1659| 88|
|   C| 480| 5127| 82| 5520| 82|
|   C| 512| 3819| 88| 3855| 88|
|   C| 900|11621| 80|12190| 81|
|   C| 960|10640| 82|11246| 82|
|NEON|   8|   10| 93|   10| 95|
|NEON|  16|   18| 97|   18| 97|
|NEON|  32|   54| 88|   55| 89|
|NEON|  60|  163| 88|  169| 88|
|NEON|  64|  133| 85|  133| 86|
|NEON| 120|  346| 88|  358| 90|
|NEON| 128|  263| 87|  264| 87|
|NEON| 240|  668| 89|  704| 88|
|NEON| 256|  635| 85|  635| 85|
|NEON| 480| 1526| 89| 1595| 89|
|NEON| 512| 1300| 86| 1299| 87|
|NEON| 900| 3207| 88| 3372| 89|
|NEON| 960| 3107| 89| 3394| 89|

Change-Id: I256d1e4eff40ff20e19fe941f3222a3f7d2944f6

commit | commitdiff | tree

Phil.Wang [Wed, 21 Jan 2015 06:20:48 +0000 (14:20 +0800)]

DSP: Generic Fixed-Point complex FFT is enabled.

Currently, only C verion is available.
Pass conformance test. Since there is no NEON optimized Fixed-Point FFT,
performance test is runned but not compared.

For A53, with GCC
Complex Fixed-Point FFT
Time in ms, SNR in dB

|Direction|    |nfft|time|SNR|
|  forward|   C|  16|  66| 82|
|  forward|   C|  32| 108| 82|
|  forward|   C|  60| 366| 80|
|  forward|   C|  64| 360| 82|
|  forward|   C| 120| 876| 80|
|  forward|   C| 128| 641| 82|
|  forward|   C| 240|1715| 80|
|  forward|   C| 256|1876| 82|
|  forward|   C| 480|4075| 79|
|  forward|   C| 512|3484| 82|
|  forward|   C| 900|9266| 79|
|  forward|   C| 960|8224| 79|
| backward|   C|  16|  65| 82|
| backward|   C|  32| 106| 82|
| backward|   C|  60| 361| 80|
| backward|   C|  64| 360| 82|
| backward|   C| 120| 874| 80|
| backward|   C| 128| 639| 83|
| backward|   C| 240|1731| 79|
| backward|   C| 256|1892| 82|
| backward|   C| 480|4113| 79|
| backward|   C| 512|3509| 82|
| backward|   C| 900|9285| 79|
| backward|   C| 960|8384| 79|

Change-Id: Icb4575a22c51e2f684a1ebbe0464a782e912f769

commit | commitdiff | tree

Phil.Wang [Wed, 21 Jan 2015 10:35:04 +0000 (18:35 +0800)]

Disable full path in doxygen.cfg

Change-Id: I7c4f19c8055f6b16d2680cc8d8bbdc149b330860

commit | commitdiff | tree

Zhongwei Yao [Tue, 20 Jan 2015 10:41:25 +0000 (18:41 +0800)]

Extend copyright year to 2015.

Change-Id: I77b17e0c07713d03e2c74300b9427e0c15d0f963

commit | commitdiff | tree

Zhongwei Yao [Tue, 13 Jan 2015 11:50:58 +0000 (19:50 +0800)]

Disable following functions which have not been optimized for aarch64.

Following functions have been disabled for aarch64 target:
  -. 6 dsp functions:
     ne10_fir_float_neon
     ne10_fir_decimate_float_neon
     ne10_fir_interpolate_float_neon
     ne10_fir_lattice_float_neon
     ne10_fir_sparse_float_neon
     ne10_iir_lattice_float_neon
  -. 1 imgproc function:
     ne10_img_rotate_rgba_neon
  -. 3 physics functions:
     ne10_physics_compute_aabb_vec2f_neon
     ne10_physics_relative_v_vec2f_neon
     ne10_physics_apply_impulse_vec2f_neon
  -. all functions in math module

Smoke test has been passed under both aarch64 and armv7.

Change-Id: Ied135a4ac83260564081180f4bfef6eda5042aa0

commit | commitdiff | tree

Zhongwei Yao [Tue, 13 Jan 2015 09:16:35 +0000 (17:16 +0800)]

Enable aarch64 in cmake build script for android platform.

Change-Id: Iae0fa7b89f8ed8ee6cc932055c0602e3191e85b6

commit | commitdiff | tree

Haruki Hasegawa [Sun, 9 Nov 2014 07:57:45 +0000 (16:57 +0900)]

Fix wrong definition of DIV_TW81 and DIV_TW81N values

commit | commitdiff | tree

Phil.Wang [Wed, 24 Dec 2014 08:55:57 +0000 (16:55 +0800)]

DSP: remove unit tests in RFFT

Change-Id: If535d81f1dbef710d6f7ac807e0313408345ac4a

commit | commitdiff | tree

Phil.Wang [Wed, 24 Dec 2014 08:17:30 +0000 (16:17 +0800)]

Change include Syntax

Change-Id: I2872d8bba91beaad26d2b2b3aa56e63b10186083

commit | commitdiff | tree

Phil.Wang [Wed, 24 Dec 2014 09:21:47 +0000 (17:21 +0800)]

FFT: Fix false assert at the last loop

Change-Id: I9954b32f4dafd4eed63db98a24cd47cabe9c1b3d

commit | commitdiff | tree

Phil.Wang [Tue, 23 Dec 2014 09:20:57 +0000 (17:20 +0800)]

DSP/FFT: declare ne10_radix8_r2c(c2r)_c

Change-Id: Ia01574388bed413954d66a6927c8d3d969f3d329

commit | commitdiff | tree

Zhongwei Yao [Thu, 18 Dec 2014 08:19:55 +0000 (16:19 +0800)]

Update the copyright and license.

Change-Id: I224478247aa7201fb8bf150dfd14e7b92b2e5141

commit | commitdiff | tree

Phil.Wang [Wed, 17 Dec 2014 03:52:11 +0000 (11:52 +0800)]

NE10/FFT/complex-non-power-of-2 NEON

    ARM 64-bit (Cortex-A57)
    complex forward float LLVM 3.5
             Time in ms      |
        |kiss|opus|pffft|NE10|
        |   C|   C| NEON|NEON|
      60| 129| 113|   NA|  44|
     120| 148| 127|   NA|  49|
     240| 151| 128|   55|  47|
     480| 169| 142|   60|  55|
     960| 183| 149|   65|  58|
    1920| 193| 167|   71|  66|
    3840| 217| 175|   76|  71|
    SNR > 100dB

    ARM 64-bit (Cortex-A53)
    complex forward float LLVM 3.5
             Time in ms      |
        |kiss|opus|pffft|NE10|
        |   C|   C| NEON|NEON|
      60| 295| 311|   NA|  72|
     120| 368| 375|   NA|  79|
     240| 345| 342|  104|  77|
     480| 415| 407|  115|  87|
     960| 406| 378|  121|  95|
    1920| 476| 441|  138| 113|
    3840| 497| 424|  161| 126|
    SNR > 100dB

    ARM 32-bit (Cortex-A9)
    complex forward float LLVM 3.5
             Time in ms      |
        |kiss|opus|pffft|NE10|
        |   C|   C| NEON|NEON|
      60| 224| 211|   NA|  98|
     120| 265| 245|   NA| 104|
     240| 262| 240|  130| 106|
     480| 302| 274|  150| 122|
     960| 305| 271|  162| 153|
    1920| 369| 356|  230| 206|
    3840| 415| 440|  282| 239|
    SNR > 100dB

Change-Id: If9418041b01eed49dbdc8d6a18dd03f2c5684da8

commit | commitdiff | tree

Phil.Wang [Wed, 17 Dec 2014 03:42:40 +0000 (11:42 +0800)]

NE10/FFT/backward-complex-non-power-of-2 C

ARM 32-bit (Cortex-A9)
complex backward unscaled float GCC 4.9
        Time in ms      |
   |kiss|opus|pffft|NE10|
   |   C|   C| NEON|   C|
60| 195| 172|   NA| 144|
120| 234| 200|   NA| 173|
240| 231| 203|  148| 175|
480| 267| 231|  176| 215|

ARM 64-bit (Cortex-A57)
complex backward unscaled float GCC 4.9
        Time in ms      |
   |kiss|opus|pffft|NE10|
   |   C|   C| NEON|   C|
60| 125|  89|   NA|  87|
120| 141| 104|   NA| 103|
240| 146| 106|   52| 109|
480| 163| 120|   58| 127|

SNR > 100dB for all 2^M*3^N*5^K

Change-Id: Ie4bb27d053213bfbf2dbdd0020f9fda5db4312f9

commit | commitdiff | tree

Phil.Wang [Mon, 17 Nov 2014 08:57:44 +0000 (16:57 +0800)]

NE10/FFT/forward-complex-non-power-of-2 C

ARM 32-bit (Cortex-A9)
complex forward float GCC 4.9
       Time in ms       |
   |kiss|opus|pffft|NE10|
   |   C|   C| NEON|   C|
60| 213| 200|   NA| 131|
120| 252| 225|   NA| 164|
240| 245| 245|  163| 171|
480| 268| 262|  187| 195|

SNR > 100dB, for all following cases
    |size|time|
NE10|  16|  69|
NE10|  32|  80|
NE10|  60| 131|
NE10|  64| 106|
NE10| 120| 164|
NE10| 128| 110|
NE10| 240| 171|
NE10| 256| 148|
NE10| 480| 195|
NE10| 512| 144|
NE10|1024| 214|

ARM 64-bit (Cortex-A57)
complex forward float LLVM 3.5
       Time in ms       |
   |kiss|opus|pffft|NE10|
   |   C|   C| NEON|   C|
60| 131| 111|   NA|  95|
120| 154| 121|   NA| 121|
240| 153| 129|   55| 121|
480| 168| 144|   61| 146|

SNR > 100dB, for all following cases
    |size|time|
NE10|  16|  28|
NE10|  32|  31|
NE10|  60|  95|
NE10|  64|  43|
NE10| 120| 121|
NE10| 128|  46|
NE10| 240| 121|
NE10| 256|  56|
NE10| 480| 146|
NE10| 512|  61|
NE10|1024|  73|

Change-Id: I9fab61e9e47279d848815699e928ab4abead5635

commit | commitdiff | tree

Phil.Wang [Thu, 13 Nov 2014 06:33:09 +0000 (14:33 +0800)]

NE10/DSP/FFT formating

Change-Id: Ie96cc6945844d0eb23b04027cce7fb03c7163f1d

commit | commitdiff | tree

Matthew DuPuy [Sun, 9 Nov 2014 17:03:15 +0000 (09:03 -0800)]

Contributing

Also see: https://github.com/projectNe10/Ne10/wiki/Contribution-of-code-F.A.Q.

commit | commitdiff | tree

Matthew DuPuy [Sun, 9 Nov 2014 16:49:15 +0000 (08:49 -0800)]

3 clause BSD only

commit | commitdiff | tree

Matthew DuPuy [Sun, 9 Nov 2014 16:43:01 +0000 (08:43 -0800)]

Created project contribution instructions

Anyone wishing to submit a pull request must sign a contribution license agreement (CLA) with ARM first.

commit | commitdiff | tree

Matthew DuPuy [Sun, 9 Nov 2014 16:40:01 +0000 (08:40 -0800)]

Update email contacts to match landing page

commit | commitdiff | tree

Kévin PETIT [Mon, 29 Sep 2014 12:34:36 +0000 (13:34 +0100)]

Add line wrapping to the documentation files

Change-Id: I81ca1c1980d250b2df020c7b584f91cc0595fdc2
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>

commit | commitdiff | tree

Phil.Wang [Mon, 22 Sep 2014 04:26:46 +0000 (12:26 +0800)]

NE10/DSP/RFFT: optimise RFFT for armv8

intrinsic, LLVM 3.5, -O2
on a57, juno, android
   size|   time in ms   |   boost   |
       |  NE10 |  pffft | pffft/NE10|
       |R2C|C2R|R2C|C2R*|  R2C|  C2R|
     32|145|185|279| 319|1.92x|1.72x|
     64|175|200|239| 279|1.36x|1.39x|
    128|166|185|237| 262|1.42x|1.41x|
    256|197|208|232| 256|1.17x|1.23x|
    512|208|216|254| 270|1.22x|1.25x|
   1024|241|244|260| 278|1.07x|1.14x|
   2048|258|263|332| 322|1.28x|1.22x|
   4096|303|304|388| 353|1.28x|1.16x|
   8192|339|334|424| 426|1.25x|1.27x|

intrinsic, GCC 4.9, -O2
on a57, juno, android
   size|    time in ms  |   boost   |
       |  NE10 |  pffft | pffft/NE10|
       |R2C|C2R|R2C|C2R*|  R2C|  C2R|
     32|174|181|328| 410|1.88x|2.26x|
     64|214|216|270| 338|1.26x|1.56x|
    128|210|197|259| 310|1.23x|1.57x|
    256|232|223|243| 283|1.04x|1.26x|
    512|250|222|263| 307|1.04x|1.38x|
   1024|274|251|272| 304|1.00x|1.20x|
   2048|288|277|314| 353|1.08x|1.27x|
   4096|333|303|349| 379|1.04x|1.25x|
   8192|370|342|424| 452|1.14x|1.31x|

* Ne10 supports scale of output for backward RFFT,
  while pffft doesn't. To normalize the benchmark,
  a scale operation was added to the end of each
  call to pffft.
* pffft C2R FFT costs 410ms when size==32, 338ms when
  size==64, this is because the former loops more times
  than the latter does, so it does not mean pffft cost
  more time for short input.

intrinsic, GCC 4.9, -O2
on a53, juno, android
   size|    time in ms  |   boost   |
       |  NE10 |  pffft | pffft/NE10|
       |    R2C|     R2C|        R2C|
     32|    347|     607|      1.74x|
     64|    389|     489|      1.25x|
    128|    334|     484|      1.44x|
    256|    401|     456|      1.13x|
    512|    380|     502|      1.32x|
   1024|    460|     512|      1.11x|
   2048|    481|     593|      1.23x|
   4096|    605|     709|      1.17x|
   8192|    704|     891|      1.26x|

Change-Id: Ide0b974620ae8d06cfa862769004b2110abaaeff

commit | commitdiff | tree

Yang Zhang [Tue, 22 Jul 2014 09:34:15 +0000 (17:34 +0800)]

add v8 assembly for float fft.

This assembly file is suitable for gcc only. If you want to use llvm, please use the files with suffix .neonintrinsic.c

Change-Id: I27528b2ebbf079db0e4f1c6ecf7828baa0fbaf0a

commit | commitdiff | tree

Yang Zhang [Thu, 3 Jul 2014 03:29:03 +0000 (11:29 +0800)]

fix the issues

- update NE10_FREE macro define
when a pointer is freed, the NULL should be assigned.
- remove needless backslash
- modify the data type of address for ARM v8
- modify IFFT scaling for int32/int16

Change-Id: I62dc5803ba106e13fb9c91dba6ac3099f3fb5737

commit | commitdiff | tree

Zhou (Joe) Yu [Wed, 2 Jul 2014 10:22:40 +0000 (11:22 +0100)]

Merge "make sure the address of buffer 64-bit alignment"

commit | commitdiff | tree

Yang Zhang [Wed, 2 Jul 2014 06:41:57 +0000 (14:41 +0800)]

make sure the address of buffer 64-bit alignment

For FFT NEON implementation, the 64-bit alignment address of input/output/twiddle can improve the
speed of data load/store. If the address isn't 64-bit alignment, there will be BUS error.

Change-Id: I201307de980eef544025bcb498b0093a272e2936

commit | commitdiff | tree

Yang Zhang [Tue, 1 Jul 2014 04:39:23 +0000 (12:39 +0800)]

fix bug in fir

When input length isn't multiple of 4, the filter output result is wrong. This patch is to fix this issue.

Change-Id: I212d86fef3beb9aaeb3292d98719665ba521daee

commit | commitdiff | tree

Zhou (Joe) Yu [Tue, 1 Jul 2014 06:04:39 +0000 (07:04 +0100)]

Merge "fix bug in fir"

commit | commitdiff | tree

Yang Zhang [Tue, 1 Jul 2014 04:39:23 +0000 (12:39 +0800)]

fix bug in fir

When input length isn't multiple of 4, the filter output result is wrong. This patch is to fix this issue.

Change-Id: I212d86fef3beb9aaeb3292d98719665ba521daee

commit | commitdiff | tree

Yang Zhang [Mon, 30 Jun 2014 07:45:47 +0000 (15:45 +0800)]

add temp buffer allocation and scaling by 2 for rfft
- add temp buffer allocation in init function
- add scaling by 2 for C, NEON assembly and intrinsic version

Change-Id: I7e46f327f43664e06700089f4d38f0d868d44f3e

commit | commitdiff | tree

Yang Zhang [Thu, 19 Jun 2014 09:21:01 +0000 (17:21 +0800)]

update the FFT implementation

- add scaling by nfft in IFFT
- add temp buffer to protect the source data
- change the interface for passing temp buffer
- add intrinsic version of FFT
- indent the code

Change-Id: I35f46e60bb88070127eb59281ddbd3a72f6b8e7d

commit | commitdiff | tree

Matthew DuPuy [Wed, 18 Jun 2014 05:03:46 +0000 (22:03 -0700)]

ignore *.so and *.prefs

commit | commitdiff | tree

Matthew DuPuy [Wed, 18 Jun 2014 04:58:26 +0000 (21:58 -0700)]

Minor semantic update to demo

commit | commitdiff | tree

Matthew DuPuy [Wed, 18 Jun 2014 03:31:30 +0000 (20:31 -0700)]

cfft and rfft test modules removed

NE10_TEST_DSP could no longer build with cfft and rfft test modules
removed.

commit | commitdiff | tree

Yang Zhang [Fri, 13 Jun 2014 06:59:51 +0000 (14:59 +0800)]

optimize int32/int16 complex FFT

    The performance result is as follows:

    toolchain: gcc 4.8 at -O2
    omx fft's execute time is the base. The ratio is less, the performance is better.

    int32 FFT
    A9:
    |     |16    |32    |64    |128   |256   |512   |1024  |2048  |4096  |
    |Ne10 |73.24%|99.95%|95.78%|96.04%|97.97%|97.57%|99.51%|97.87%|98.12%|
    |omx  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |

    A15:
    |     |16    |32    |64    |128   |256   |512   |1024  |2048  |4096  |
    |Ne10 |84.89%|98.62%|89.33%|100.7%|99.28%|103.9%|101.7%|105.1%|96.67%|
    |omx  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |

    int16 FFT
    A9:
    |     |16    |32    |64    |128   |256   |512   |1024  |2048  |4096  |
    |Ne10 |109.2%|97.81%|100.3%|97.20%|101.3%|99.01%|103.4%|103.5%|94.67%|
    |omx  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |

    A15:
    |     |16    |32    |64    |128   |256   |512   |1024  |2048  |4096  |
    |Ne10 |112.6%|95.78%|104.3%|101.7%|112.3%|111.5%|102.3%|105.1%|99.78%|
    |omx  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |

Change-Id: I7290ae5f9abfd3d04f8ca501f5ecbff452973d4b

commit | commitdiff | tree

Yang Zhang [Fri, 30 May 2014 11:36:23 +0000 (19:36 +0800)]

optimize float complex FFT

1. To optimize FFT, the algorithm is changed. Bit reversal is removed and radix 8 is added.
2. After test, the optimized FFT show the best performance, so that the old implementations are removed.

The performance result is as follows:

toolchain: gcc 4.8 at -O2
omx fft's execute time is the base. The ratio is less, the performance is better.

panda board A9:
|     |16    |32    |64    |128   |256   |512   |1024  |2048  |4096  |
|Ne10 |84.27%|89.57%|85.63%|85.79%|87.89%|87.91%|83.51%|97.08%|92.68%|
|omx  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |

nexus10 A15:
|     |16    |32    |64    |128   |256   |512   |1024  |2048  |4096  |
|Ne10 |84.88%|98.43%|89.46%|101.0%|99.24%|103.2%|93.80%|105.1%|97.44%|
|omx  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |100%  |

Change-Id: I363ee1602f08532e566d3a5a4f3d7a99972a1283

commit | commitdiff | tree

Zhongwei Yao [Thu, 15 May 2014 06:20:15 +0000 (14:20 +0800)]

extend copyright year and add the extend script.

Change-Id: Ice948d88f2dc6122b562bf479aea53c060181345

commit | commitdiff | tree

Zhongwei Yao [Mon, 2 Dec 2013 05:27:20 +0000 (13:27 +0800)]

add box filter to image processing module.

commit | commitdiff | tree

Matthew DuPuy [Tue, 22 Apr 2014 21:11:17 +0000 (14:11 -0700)]

Create Acknowledgements.md

commit | commitdiff | tree

Matthew DuPuy [Wed, 12 Mar 2014 21:31:46 +0000 (14:31 -0700)]

Create LICENSE

Requested for clarification of license in code file headers.

commit | commitdiff | tree

Yang Zhang [Wed, 19 Feb 2014 09:59:14 +0000 (17:59 +0800)]

make changes as follows:
-optimize float/int32 fft for 4-4096
-add unscaled/scaled implementation for int32 fft
-add neon intrinsic version for float/int32 fft

commit | commitdiff | tree

Matthew DuPuy [Fri, 14 Feb 2014 20:49:48 +0000 (12:49 -0800)]

Call for use cases

Help us track Ne10 usage since downloads are not a great metric and didn't even exist in GitHub till 2014.

commit | commitdiff | tree

Yang Zhang [Fri, 24 Jan 2014 09:48:51 +0000 (17:48 +0800)]

make the following changes
  -add 3 functions for collision detection
  -add test cases and doc
  -update the ReleaseNote

commit | commitdiff | tree

Zhongwei Yao [Mon, 16 Dec 2013 06:04:06 +0000 (14:04 +0800)]

add following changes:
    - add MIN_IOS_VER configuration for iOS platform building
    - add new added FFT functions' iOS support
    - remove resize function's assembly version, only keep the intrinsics version
    - refine the smoke test case for resize function

commit | commitdiff | tree

Yang Zhang [Mon, 9 Dec 2013 04:11:46 +0000 (12:11 +0800)]

add hard float support for Linux/Andriod

commit | commitdiff | tree

Yang Zhang [Wed, 20 Nov 2013 08:15:11 +0000 (16:15 +0800)]

add the new FFT features
- c2c FFT/IFFT(float/int32/int16) with 2^N size
- r2c FFT(float/int32/int16) with 2^N size
- c2rIFFT(float/int32/int16) with 2^N size
- test cases and doc

commit | commitdiff | tree

Zhongwei Yao [Thu, 24 Oct 2013 10:55:12 +0000 (18:55 +0800)]

Make following changes:
     - update cmake config script and doc due to Xcode upgrade
     - add compiler switch(-mthumb) for android and ubuntu to make sure generated code is thumb code.
     - change the log output buffer size to get around the bug in sfft test.

commit | commitdiff | tree

Yang Zhang [Mon, 2 Sep 2013 10:06:45 +0000 (18:06 +0800)]

Make the following changes
- Add C implementations, doc and test cases for image resize/rotate
- fix the bug in NEON version of image resize
- add a header file for external macro definitions

commit | commitdiff | tree

Zhongwei Yao [Thu, 22 Aug 2013 06:21:58 +0000 (14:21 +0800)]

update build script to enable building under Mac OS for Android development.

commit | commitdiff | tree

Zhongwei Yao [Thu, 22 Aug 2013 06:20:18 +0000 (14:20 +0800)]

add benchmark result to Android and iOS demo.

commit | commitdiff | tree

Fang Bao [Wed, 26 Jun 2013 07:39:30 +0000 (15:39 +0800)]

Add NEON intrinsic implementation of resize.

NOTE:
The gcc 4.7 is the minimum version advocated for compiling NEON intrinsics.
The intrinsic version will not be compilied because there is a NEON assembly version already.
To enable it, you should:
* Uncommenting the line including NE10_resize.neon.c in modules/CMakeLists.txt
* Commenting the line including NE10_resize.neon.s in modules/CMakeLists.txt

commit | commitdiff | tree

Zhongwei Yao [Tue, 25 Jun 2013 10:21:36 +0000 (18:21 +0800)]

- fix a bug when run command line tests
- add a reasonable check when add platform demo macro in Cmake script

commit | commitdiff | tree

Zhongwei Yao [Mon, 17 Jun 2013 04:19:49 +0000 (12:19 +0800)]

add android demo.

commit | commitdiff | tree

Zhongwei Yao [Sat, 8 Jun 2013 03:04:29 +0000 (11:04 +0800)]

add iOS demo.

commit | commitdiff | tree

Zhongwei Yao [Mon, 3 Jun 2013 04:16:25 +0000 (12:16 +0800)]

add iOS support.