platform/core/ml/nntrainer.git
8 months ago[ hgemm ] Generalize redundant micro hgemm kernel implementation
skykongkong8 [Wed, 7 Aug 2024 11:41:39 +0000 (20:41 +0900)]
[ hgemm ] Generalize redundant micro hgemm kernel implementation

- Previous implementation naively used fixed-sized ukernels for the K-direction accumulation.
- Such kernels were excessively long, but had better performance than looping through single K-iteration.
- However, recent test results have shown that justing stacking 4 K iters, and looping through such ukernel preserved the performance with better code readability.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[Layer] add Weight Layer
Seungbaek Hong [Tue, 30 Jul 2024 06:17:17 +0000 (15:17 +0900)]
[Layer] add Weight Layer

- This layer contains only weights for building tensor-level graph

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
8 months ago[ hgemm ] Apply hgemm util funcs at frequently used functions
skykongkong8 [Wed, 7 Aug 2024 01:26:45 +0000 (10:26 +0900)]
[ hgemm ] Apply hgemm util funcs at frequently used functions

- get_prev_mltpl_of_2p_n is frequently used in many hgemm kernels.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ trivial ] Add missing docs and error message
skykongkong8 [Wed, 7 Aug 2024 01:21:08 +0000 (10:21 +0900)]
[ trivial ] Add missing docs and error message

- Add missing doxtgen tags : transpose boolean params
- error message : emit error when try to use full-fp16 kernel with experimental kernel build

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ hgemm ] Add hgemm experimental kernel
skykongkong8 [Thu, 1 Aug 2024 11:58:31 +0000 (20:58 +0900)]
[ hgemm ] Add hgemm experimental kernel

- According to current paper, accumulating up to 64 ~ 128 w.r.t. K-direction is fine.
- Since conventional error metric, and newly introduced metric (max component relative error) is fine as well, introduce experiemntal kernel.
- using build option -Dhgemm-experimental-kernel=true can enable such kernel when android build

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ hgemm ] Implement hgemm_small
skykongkong8 [Thu, 1 Aug 2024 11:29:39 +0000 (20:29 +0900)]
[ hgemm ] Implement hgemm_small

- Forcibly adding zero-padding made small dim index quite clumsy and redundant.
- Implement explicit hgemm small function to cover M<8, N<16, K<16 case

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[refactor] Restructure getStringDataType function
Donghyeon Jeong [Fri, 2 Aug 2024 04:02:23 +0000 (13:02 +0900)]
[refactor] Restructure getStringDataType function

This patch updates the getStringDataType function structure to utilize method overriding.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
8 months ago[Tensor] Update tensorbase for efficient creation of new tensor class.
Donghyeon Jeong [Tue, 30 Jul 2024 02:09:39 +0000 (11:09 +0900)]
[Tensor] Update tensorbase for efficient creation of new tensor class.

This PR updates the TensorBase class to make mathematical operations that are not required to create a new tensor class.
This change allows developers to easily create new classes without implementing math operations.
Note that these functions should be implemented to utilize tensor operations fully.

**Changes proposed in this PR:**
- Change math operation function from pure virtual function to virtual function
- Add a private function to get the data type as a string

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
8 months agoBUG FIX : Concat GPU Layer and CPU layer unittest cases name overlapping.
Niket Agarwal [Thu, 1 Aug 2024 05:56:52 +0000 (11:26 +0530)]
BUG FIX : Concat GPU Layer and CPU layer unittest cases name overlapping.

Modified the concat gpu testcases name in unittest_layers_concat_cl for differentiation with concat cpu testcases name.

**Self evaluation:**
1. Build test:   [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Niket Agarwal <niket.a@samsung.com>
8 months ago[Doc] NNTrainer Tool Utilization Guide
Donghyeon Jeong [Fri, 26 Jul 2024 07:53:32 +0000 (16:53 +0900)]
[Doc] NNTrainer Tool Utilization Guide

This PR adds a guide for executing unit tests on the Android device.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
8 months ago[Android] Verify Android NDK Installation and Configuration
Donghyeon Jeong [Fri, 26 Jul 2024 08:17:44 +0000 (17:17 +0900)]
[Android] Verify Android NDK Installation and Configuration

This patch checks if Android NDK is installed and configured before building using NDK in the Android test script.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
8 months ago[doc] Extend code documentation
Donghyeon Jeong [Wed, 31 Jul 2024 03:15:33 +0000 (12:15 +0900)]
[doc] Extend code documentation

This PR adds summary content to help users quickly understand the role and scope of the Tensor API.

**Self-evaluation:**
1. Build test: [ ]Passed [ ]Failed [X]Skipped
2. Run test:   [ ]Passed [ ]Failed [X]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
8 months ago[GPU/OpenCL] Initial version of Concat Layer with OpenCL ops
Niket Agarwal [Wed, 3 Jul 2024 10:42:38 +0000 (16:12 +0530)]
[GPU/OpenCL] Initial version of Concat Layer with OpenCL ops

Added naive version of OpenCL implementation for Concat Layer.
Incorporated kernel for ops used.
Added unit test for Concat_cl.

Signed-off-by: Niket Agarwal <niket.a@samsung.com>
8 months ago[ unittest ] Implement max_componentwise_relative_error
skykongkong8 [Mon, 15 Jul 2024 10:45:24 +0000 (19:45 +0900)]
[ unittest ] Implement max_componentwise_relative_error

- When comparing outputs computed with different precision, max componentwise relative error is needed.
- (trivial) Use more precision comparison for zeroDivisionError classifying code in cosine similarity function

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ unittest ] Use bounded value generator in hgemm unittests
skykongkong8 [Mon, 15 Jul 2024 10:20:01 +0000 (19:20 +0900)]
[ unittest ] Use bounded value generator in hgemm unittests

- According to recent papers, using values with distribution of [0,1), or [-1, 1) is widely used when comparing fp16-fp32 precision comparison.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ unittest ] Add TCs for checking padding-using GEMM
skykongkong8 [Mon, 15 Jul 2024 09:44:13 +0000 (18:44 +0900)]
[ unittest ] Add TCs for checking padding-using GEMM

- Add TCs checking for padding w.r.t. M, K, N, MK, KN, MKN

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ hgemm ] Implement NYI functions from matrix A/B hgemm_padding
skykongkong8 [Mon, 15 Jul 2024 09:41:43 +0000 (18:41 +0900)]
[ hgemm ] Implement NYI functions from matrix A/B hgemm_padding

- Missing implementations might trigger unittest fails on Android.
- This patch will now support padding function for all combinations of following conditions : matrix A / B, trans/noTrans, M/K/N direction

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ hgemm ] Implement matrix noTrans A w.r.t. MK padding
skykongkong8 [Fri, 12 Jul 2024 05:15:03 +0000 (14:15 +0900)]
[ hgemm ] Implement matrix noTrans A w.r.t. MK padding

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ trivial ] Fix typo and add missing doxygen tags
skykongkong8 [Wed, 10 Jul 2024 10:07:44 +0000 (19:07 +0900)]
[ trivial ] Fix typo and add missing doxygen tags

- Fix typo and add missing doxygen tags
- Add more exact explanation for doxygen tag briefs

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ hgemm ] Move hgemm_padding related files to explicit directory
skykongkong8 [Wed, 10 Jul 2024 09:10:38 +0000 (18:10 +0900)]
[ hgemm ] Move hgemm_padding related files to explicit directory

- Adding padding to matrices is not an optimal solution to approach, but yet can be one sub-optimal option.
- Final goal of this directory would be deleting this directory itself.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ hgemm ] Remove unnecessary K1 GEMM functions
skykongkong8 [Wed, 10 Jul 2024 08:43:39 +0000 (17:43 +0900)]
[ hgemm ] Remove unnecessary K1 GEMM functions

- With perspective of memory, when K = 1, matrix transpose condition has nothing to do with GEMM algorithm.
- Remove all K1 noTrans / transA / transB / transAB and unify them into single function.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ hgemm/refactor ] Refactor hgemm file structure
skykongkong8 [Wed, 10 Jul 2024 08:34:43 +0000 (17:34 +0900)]
[ hgemm/refactor ] Refactor hgemm file structure

- Kernel functions are used regardless of matrix transpose, does need to be included from separate file.
- For further optimal implemenation of matrix A / B / AB transpose blocking-kernel sequences, divide their file for convenience
- Function 'hgemm' itself is better to be reside in hgemm directory.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ unittest ] Add TC for K=1 hgemm case
skykongkong8 [Wed, 10 Jul 2024 04:38:53 +0000 (13:38 +0900)]
[ unittest ] Add TC for K=1 hgemm case

- Missing optimizations for K=1 GEMM case was recently detected.
- Add such TC accordingly.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ trivial/hgemm ] Move hgemm_K1 to hgemm directory
skykongkong8 [Wed, 10 Jul 2024 04:36:15 +0000 (13:36 +0900)]
[ trivial/hgemm ] Move hgemm_K1 to hgemm directory

- For consistency, hgemm_K1 function should reside under hgemm directory

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ trivial ] Add doxygen tags for hgemm padding functions
skykongkong8 [Wed, 10 Jul 2024 04:27:26 +0000 (13:27 +0900)]
[ trivial ] Add doxygen tags for hgemm padding functions

- Add doxygen tags for hgemm padding functions

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ hgemm ] Implement packing-blocking-kernel sequence for hgemm transB
skykongkong8 [Wed, 10 Jul 2024 03:27:39 +0000 (12:27 +0900)]
[ hgemm ] Implement packing-blocking-kernel sequence for hgemm transB

- Previously, hgemm transB computation was relying on transposing the entire matrix and using non-transpose sequence.
- For optimal performance, matrix packing-blocking-kernel sequence for transB case is explicitly implemented.
- Note that current implementation only supports for 8x16 gemm kernel.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ hgemm ] Separate source / header files for hgemm packing function
skykongkong8 [Wed, 10 Jul 2024 01:48:53 +0000 (10:48 +0900)]
[ hgemm ] Separate source / header files for hgemm packing function

- For easier implementation and maintenance of hgemm packing functions, separate them.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ Trivial/bugfix ] Add missing library to include
skykongkong8 [Wed, 10 Jul 2024 01:46:46 +0000 (10:46 +0900)]
[ Trivial/bugfix ] Add missing library to include

- add stdlib.h to hgemm_util.h

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
8 months ago[ hgemm ] Implement matrix padding function
skykongkong8 [Wed, 10 Jul 2024 01:42:10 +0000 (10:42 +0900)]
[ hgemm ] Implement matrix padding function

- Since current kernel / blocking function supports for fixed shape only, implement padding function for temporary solution.
- Note that flexible kernel / blocking implementation should be added for optimal performances
- Current implementation separates padding function for matrix A and B but it will eventually be governed with single function

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
9 months agofix: incorrect C/C++ preprocessor macro
MyungJoo Ham [Fri, 26 Jul 2024 05:52:27 +0000 (14:52 +0900)]
fix: incorrect C/C++ preprocessor macro

When -DENABLE_ENCODER is given, you do
 #ifdef ENABLE_ENCODER
not
 #ifdef DENABLE_ENCODER

CC: @baek2sm
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
9 months ago[bugfix] Resolves Android build warnings
Donghyeon Jeong [Mon, 22 Jul 2024 07:36:37 +0000 (16:36 +0900)]
[bugfix] Resolves Android build warnings

This PR resolves warnings that occur during the Android build. The list is as follows.

**Changes proposed in this PR:**
- Fix function that overrides virtual functions but is not marked override.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
9 months ago[bugfix] Android build error when fp16 is enabled
Donghyeon Jeong [Mon, 22 Jul 2024 07:34:52 +0000 (16:34 +0900)]
[bugfix] Android build error when fp16 is enabled

This PR fixes issues of undefined symbols of one of the tensor constructors.
The function implementation is moved to the header file to resolve this issue.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
9 months ago[Tensor] Operational Improvements and Functionality Simplification
Donghyeon Jeong [Fri, 12 Jul 2024 07:33:11 +0000 (16:33 +0900)]
[Tensor] Operational Improvements and Functionality Simplification

This commit moves several operations implementations to each Tensor class for easier management.
This allows users to create a new data type Tensor without unnecessary modification to the Tensor class.

**Changes proposed in this PR:**
- static function Tensor::cat() uses each tensor's member function concat().
- Tensor::copy() logic is simplified by not differentiating by its data type.
- Tensor::copy_with_stride() uses an internal function to operate.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
9 months ago[Tensor] Update newly added features
Donghyeon Jeong [Tue, 9 Jul 2024 11:57:46 +0000 (20:57 +0900)]
[Tensor] Update newly added features

This commit updates recently added features in tensor, including add_i_partial() and ele_mul().
The newly added functions have been implemented according to the revised tensor structure.

**Changes proposed in this PR:**
- Update Float/HalfTensor class with newly added function, add_i_partial().
- Apply BLAS operations in basic arithmetic operations in Tensor.
- height-width transpose in half-precision can be SIMD accelerated.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
9 months ago[bugfix] Fix issues occured in Tensor class refactoring
Donghyeon Jeong [Fri, 8 Mar 2024 02:37:22 +0000 (11:37 +0900)]
[bugfix] Fix issues occured in Tensor class refactoring

This commit aims to fix several issues that arose due to the refactoring of the Tensor class.

**Changes proposed in this PR:**
- The copy constructor has been implemented to prevent incorrect behavior of the default copy constructor in this commit
- Tensor add_i() has been newly implemented to fix previous incorrect implementations.
- Add chain() function that returns LazyTensor

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
9 months ago[Refactor] Deprecate TensorV2 and replace Tensor class with TensorV2
Donghyeon Jeong [Wed, 6 Mar 2024 05:31:48 +0000 (14:31 +0900)]
[Refactor] Deprecate TensorV2 and replace Tensor class with TensorV2

This commit deprecates the existing TensorV2 class and replaces Tensor class with the new TensorV2 class.
The previous Tensor class has been removed and all its usages have been updated to use the TensorV2 class.
Additionally, all instances of TensorV2 usage within the NNTrainer have been removed.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
9 months ago[Application] Bug fix in RL example
Eunju Yang [Thu, 25 Jul 2024 01:52:14 +0000 (10:52 +0900)]
[Application] Bug fix in RL example

**Changes proposed in this PR:**
- This commit updates the DQN example.
- In the previous code, there was a bug : copy main Net to Target Net was not written as
intended.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Eunju Yang <ej.yang@samsung.com>
9 months ago[Android] Add android test script
Donghyeon Jeong [Tue, 23 Jul 2024 10:16:35 +0000 (19:16 +0900)]
[Android] Add android test script

This patch adds a script to run unit tests on Android devices.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
9 months ago[trivial] remove unnecessary code
Donghyeon Jeong [Wed, 17 Jul 2024 03:24:37 +0000 (12:24 +0900)]
[trivial] remove unnecessary code

This PR removes the print statement that was previously added for debugging purposes.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
9 months ago[Layer] Add missing activation types
SeoHyungjun [Wed, 3 Jul 2024 02:47:41 +0000 (11:47 +0900)]
[Layer] Add missing activation types

Some activation types were missing from EnumList.
Added missing types to EnumList.

Changed the order of ActivationType and EnumList to be the same.

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: SeoHyungjun <hyungjun.seo@samsung.com>
9 months ago[ util ] Change name swish -> swiglu
skykongkong8 [Tue, 16 Jul 2024 03:06:29 +0000 (12:06 +0900)]
[ util ] Change name swish -> swiglu

- There was a typo in swiglu function. With Z element multiplication, this function is swiglu, not a swish

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
9 months ago[bugfix] Resolves Android build warnings
Donghyeon Jeong [Tue, 16 Jul 2024 00:34:06 +0000 (09:34 +0900)]
[bugfix] Resolves Android build warnings

This PR resolves warnings that occur during the Android build. The list is as follows.

**Changes proposed in this PR:**
- Resolves explicitly defaulted function is implicitly deleted.
- Fix function that overrides virtual functions but is not marked override.
- Resolves clang warning on expression side effects.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
9 months ago[meson] fix typo error and add encoder option
Seungbaek Hong [Tue, 16 Jul 2024 12:14:33 +0000 (21:14 +0900)]
[meson] fix typo error and add encoder option

- fix 'ENABLE_ENCODER' option typo errors in llama application
- add 'enable_encoder' to meson option

After reflecting this modifications, i've checked that the llama is running well.
(If you build the enable_encoder option as true, it works.)

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
9 months agoRemove dangerous dummy meson dep
MyungJoo Ham [Thu, 11 Jul 2024 11:02:17 +0000 (20:02 +0900)]
Remove dangerous dummy meson dep

When a dependency library is installed with hardcoded scripts,
declare dependency with as much information as possible from
the installed package to detect dependency errors at build-time.

Don't add a dummy dependency for actual library dependencies.

Fixes #2673

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
9 months ago[Layer] add tanh-based approximate gelu activation function
Seungbaek Hong [Mon, 1 Jul 2024 11:41:52 +0000 (20:41 +0900)]
[Layer] add tanh-based approximate gelu activation function

- add tanh-based approximate gelu(tanh gelu) for vision transformer.
- rename quick gelu to sigmoid gelu(it's a sigmoid-based approximate gelu)

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
9 months ago[build] Added third party to include directories
Debadri Samaddar [Tue, 9 Jul 2024 09:59:16 +0000 (15:29 +0530)]
[build] Added third party to include directories

Added opencl/third_party folder to include directory

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
9 months ago[GPU/OpenCL] Initial version of RMSNorm Layer
ThummalaPallavi [Tue, 11 Jun 2024 09:24:49 +0000 (14:54 +0530)]
[GPU/OpenCL] Initial version of RMSNorm Layer

Added naive version of OpenCL implementation for RMSNorm Layer.
Incorporated kernel for ops used.
Added unit test for rmsnorm_layer_cl.

Signed-off-by: ThummalaPallavi <t.pallavi@samsung.com>
9 months agoREADME: add openssf best practice badge.
MyungJoo Ham [Fri, 5 Jul 2024 08:12:16 +0000 (17:12 +0900)]
README: add openssf best practice badge.

To prepare LF AI & Data project proposal, openssf best practice
should be registered.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
9 months ago[ CI ] modify android build test in action
jijoong.moon [Thu, 4 Jul 2024 05:23:30 +0000 (14:23 +0900)]
[ CI ] modify android build test in action

This pr fixs the duplicated build in android build action.

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
9 months ago[blas/opencl] SGEMM OpenCL kernels added
Debadri Samaddar [Thu, 20 Jun 2024 10:28:02 +0000 (15:58 +0530)]
[blas/opencl] SGEMM OpenCL kernels added

Added all possible OpenCL kernels for SGEMM
Added unit tests

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
9 months ago[GPU/OpenCL] Moving Addition kernel to Tensor Directory
Yash Singh [Wed, 3 Jul 2024 08:42:46 +0000 (14:12 +0530)]
[GPU/OpenCL] Moving Addition kernel to Tensor Directory

Moved addition_cl kernel to Tensor directory.
Refactored addition_cl for generalization.

Signed-off-by: Yash Singh <yash.singh@samsung.com>
9 months ago[BUG FIX] Swiglu fp16 GPU Layer test filename mismatch
Niket Agarwal [Tue, 2 Jul 2024 11:50:07 +0000 (17:20 +0530)]
[BUG FIX] Swiglu fp16 GPU Layer test filename mismatch
Modified the swiglufp16 filename in gen_layer_tests for unity with the name in unittest_layers_swiglu_cl

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Niket Agarwal <niket.a@samsung.com>
9 months ago[GPU/OpenCL] Initial version of Reshape Layer with OpenCL ops
Niket Agarwal [Wed, 26 Jun 2024 06:21:20 +0000 (11:51 +0530)]
[GPU/OpenCL] Initial version of Reshape Layer with OpenCL ops

Added naive version of OpenCL implementation for Reshape Layer.
Incorporated kernel for ops used.
Added unit test for Reshape_layer_cl.

Signed-off-by: Niket Agarwal <niket.a@samsung.com>
9 months ago[layer] added start/end dimension in flatten layer
hyeonseok [Thu, 30 May 2024 11:32:28 +0000 (20:32 +0900)]
[layer] added start/end dimension in flatten layer

 - For now flatten layer flatten all dimension except batch.
   This commit will be able to flatten only the sub dimensions

Signed-off-by: hyeonseok <hs89.lee@samsung.com>
9 months ago[FP16][Tensor] Remove unnecessary copy on save
Donghak PARK [Mon, 1 Jul 2024 05:51:37 +0000 (14:51 +0900)]
[FP16][Tensor] Remove unnecessary copy on save

There are unnecessary copy of tensor, in case of fp16

It seems that when developing previously, the tensor structure was not accurately established, so it attempted to save by forcibly converting to FP16.

Now, when performing getData<_FP16>(), it is automatically converted, so the process of putting every tensor one by one in the temp array is unnecessary and only slows down the speed.

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>
9 months ago[Tensor] Remove NaN check for integer
Donghyeon Jeong [Wed, 3 Jul 2024 02:28:28 +0000 (11:28 +0900)]
[Tensor] Remove NaN check for integer

Fixed-sized integer formats do not have a way of explicitly indicating invalid data.
Every possible value of an int is a number. Therefore, removing NaN check for int values.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
9 months ago[CI] Add PR review from clang-format
Donghyeon Jeong [Tue, 2 Jul 2024 01:32:24 +0000 (10:32 +0900)]
[CI] Add PR review from clang-format

This PR enables the GitHub actions bot to suggest changes based on the clang-format directly.

**Changes proposed in this PR:**
- Add format-review options to enable Pull Request reviews from clang-format.
- Upgrade the cpp-linter version to 2.9.0 to meet minimum version requirements.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
9 months ago[Layer] Introduce `upsample2d` layer
heka1024 [Sun, 9 Jun 2024 10:55:33 +0000 (19:55 +0900)]
[Layer] Introduce `upsample2d` layer

Add `upsample2d` layer in nntrainer. This could be used in YOLO or other layers.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Co-authored-by: Boseong Seo <suzy13549@snu.ac.kr>
Co-authored-by: kimhan0515 <kimhan0515@gmail.com>
Signed-off-by: heka1024 <heka1024@gmail.com>
9 months ago[blas/OpenCL] Updated doxygen docs
Debadri Samaddar [Fri, 21 Jun 2024 08:51:28 +0000 (14:21 +0530)]
[blas/OpenCL] Updated doxygen docs

Modified doxygen docs as required

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
9 months ago[blas/OpenCL] Added multiply OpenCL kernel and unit test
Debadri Samaddar [Tue, 11 Jun 2024 07:25:46 +0000 (12:55 +0530)]
[blas/OpenCL] Added multiply OpenCL kernel and unit test

Added sscal equivalent kernel and multiply function.
Added unit test setup to test standalone kernels.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
9 months ago[Trivial] Fix Typo
Donghak PARK [Mon, 1 Jul 2024 05:11:49 +0000 (14:11 +0900)]
[Trivial] Fix Typo

Fix Typo at
    modified:   nntrainer/dataset/data_iteration.h
    modified:   nntrainer/dataset/data_producer.h
    modified:   nntrainer/dataset/databuffer.h
    modified:   nntrainer/dataset/dir_data_producers.cpp
    modified:   nntrainer/dataset/random_data_producers.cpp
    modified:   nntrainer/layers/preprocess_l2norm_layer.h
    modified:   nntrainer/layers/split_layer.cpp

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>
9 months ago[ hgemm/trivial ] Use aligned memory allocation in K1 transpose non_M8_case
skykongkong8 [Fri, 28 Jun 2024 02:12:09 +0000 (11:12 +0900)]
[ hgemm/trivial ] Use aligned memory allocation in K1 transpose non_M8_case

- Since K1 GEMM does not use data packing, I did not use aligned memory allocation.
- However, for SIMD situation, using such is more preferred.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
9 months ago[ BLAS ] Implement transpose case functions for K=1 GEMM
skykongkong8 [Fri, 28 Jun 2024 02:03:59 +0000 (11:03 +0900)]
[ BLAS ] Implement transpose case functions for K=1 GEMM

- To cover transpose cases like, (1,M).T * (1,N) and all other transpose combinations, transpose with SIMD, and apply the original kernel

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
9 months ago[ hgemm ] Consider K=1 changes
skykongkong8 [Thu, 27 Jun 2024 07:25:54 +0000 (16:25 +0900)]
[ hgemm ] Consider K=1 changes

- Current implementation is rooted on general cases, thus optimize only w.r.t. K accumulation.
- However, when it comes to M,1 x 1,N computation, all optimizations like packing, transposing is no use.
- Implementing a explicit kernel function for such case resolved the latency issue.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
9 months ago[Layer] Fix logic: SwiGLU Layer Training Incompatibility
Donghyeon Jeong [Thu, 27 Jun 2024 04:24:48 +0000 (13:24 +0900)]
[Layer] Fix logic: SwiGLU Layer Training Incompatibility

Currently, the SwiGLU layer using OpenCL operations does not support training/backpropagation.
Consequently, we are updating the logic to reflect that it is false.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
9 months ago[ hgemm ] Use aligned memory allocation in transpose / padding gemm
skykongkong8 [Thu, 20 Jun 2024 11:17:47 +0000 (20:17 +0900)]
[ hgemm ] Use aligned memory allocation in transpose / padding gemm

- Using unaligned memory may invoke SIGSEGV

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
9 months ago[ hgemm ] Use zero padding in Non-8-divisible GEMM case
skykongkong8 [Thu, 20 Jun 2024 10:56:08 +0000 (19:56 +0900)]
[ hgemm ] Use zero padding in Non-8-divisible GEMM case

- For temporary solution apply zero padding in non-8-K divisible case.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
9 months ago[ hgemm/trivial ] Wrap multi-line expressions
skykongkong8 [Thu, 20 Jun 2024 06:08:58 +0000 (15:08 +0900)]
[ hgemm/trivial ] Wrap multi-line expressions

- Wrapping multi-line expressions can prevent unwanted function call

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
9 months ago[ trivial ] Fix typo
skykongkong8 [Thu, 27 Jun 2024 01:17:52 +0000 (10:17 +0900)]
[ trivial ] Fix typo

- Found duplicated TC name while creating GPU unittest cases

Resolves:
[arm64-v8a] Executable     : unittest_layers
ld: error: duplicate symbol: addition_w16a16
>>> defined at unittest_layers_addition.cpp:33 (../unittest/layers/unittest_layers_addition.cpp:33)
>>>            /home/sungsik/nntrainer/test/obj/local/arm64-v8a/objs-debug/unittest_layers/__/unittest/layers/unittest_layers_addition.o:(addition_w16a16)
>>> defined at unittest_layers_addition_cl.cpp:47 (../unittest/layers/unittest_layers_addition_cl.cpp:47)
>>>            /home/sungsik/nntrainer/test/obj/local/arm64-v8a/objs-debug/unittest_layers/__/unittest/layers/unittest_layers_addition_cl.o:(.bss.addition_w16a16+0x0)
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[GPU/OpenCL] Initial version of SwiGLU Layer with OpenCL ops
Niket Agarwal [Thu, 6 Jun 2024 11:06:23 +0000 (16:36 +0530)]
[GPU/OpenCL] Initial version of SwiGLU Layer with OpenCL ops

Added naive version of OpenCL implementation for SwiGLU Layer.
Incorporated kernel for ops used.
Added unit test for SwiGLU_layer_cl.

Signed-off-by: Niket Agarwal <niket.a@samsung.com>
10 months ago[GPU/OpenCL] Added fp16 support for Addition Layer on GPU
yash.singh [Thu, 6 Jun 2024 13:39:26 +0000 (19:09 +0530)]
[GPU/OpenCL] Added fp16 support for Addition Layer on GPU

Added fp16 support for Addition layer
Added unit tests for fp16 support
Updated the Layer Semantics for GPU

Signed-off-by: yash.singh <yash.singh@samsung.com>
10 months ago[GPU/OpenCL] Addition Kernel added in reusable blas OpenCL kernels
yash.singh [Tue, 28 May 2024 07:01:53 +0000 (12:31 +0530)]
[GPU/OpenCL] Addition Kernel added in reusable blas OpenCL kernels

Added addition kernel to enhance reusability of the common blas kernels.
Used AdditionLayer interface for both CPU and GPU calls.

Signed-off-by: yash.singh <yash.singh@samsung.com>
[GPU/OpenCL] Initial version of Addition Layer with OpenCL ops

Added naive version of OpenCL implementation for Addition Layer.
Incorporated kernel for ops used.
Added unit test for addition_layer_cl.

Signed-off-by: yash.singh <yash.singh@samsung.com>
[GPU/OpenCL] Addition Kernel added in reusable blas OpenCL kernels

Added addition kernel to enhance reusability of the common blas kernels.
Used AdditionLayer interface for both CPU and GPU calls.

Signed-off-by: yash.singh <yash.singh@samsung.com>
10 months ago[GPU/OpenCL] Initial version of Addition Layer with OpenCL ops
yash.singh [Thu, 23 May 2024 10:42:12 +0000 (16:12 +0530)]
[GPU/OpenCL] Initial version of Addition Layer with OpenCL ops

Added naive version of OpenCL implementation for Addition Layer.
Incorporated kernel for ops used.
Added unit test for addition_layer_cl.

Signed-off-by: yash.singh <yash.singh@samsung.com>
10 months ago[DOCS] Update README.md
Jubilee.Yang [Thu, 20 Jun 2024 00:31:14 +0000 (09:31 +0900)]
[DOCS] Update README.md

- Update out-dated link with the recent one.

Co-authored-by: Donghyeon Jeong <54725479+djeong20@users.noreply.github.com>
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
10 months ago[Docs/trivial] fix typo in main `README.md`
Eunju Yang [Tue, 18 Jun 2024 04:06:05 +0000 (13:06 +0900)]
[Docs/trivial] fix typo in main `README.md`

- This commit fixes typo in README.md

Signed-off-by: Eunju Yang <ej.yang@samsung.com>
10 months ago[Docs] add recent proceeding to main README.md
Eunju Yang [Tue, 18 Jun 2024 04:03:43 +0000 (13:03 +0900)]
[Docs] add recent proceeding to main README.md

- This commit updates `README.md` to include recent publication and its
citation.

Signed-off-by: Eunju Yang <ej.yang@samsung.com>
10 months agoaction/ubuntu: fp16 on/off handled by matrix
MyungJoo Ham [Thu, 20 Jun 2024 05:45:15 +0000 (14:45 +0900)]
action/ubuntu: fp16 on/off handled by matrix

To de-dup scripts handling fp16-enabled and fp16-disabled
cases for meson-clean build, add another matrix entry.

Reference: https://github.com/nnstreamer/nntrainer/pull/2641/files#r1646986341

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months ago[ layer ] Bugfix for enabling unittest_models on Android
skykongkong8 [Thu, 30 May 2024 04:09:06 +0000 (13:09 +0900)]
[ layer ] Bugfix for enabling unittest_models on Android

- This commit fixes unusual memory access on cross-compiled unittest executable on Android

Resolves:
> SIGSEGV : signal segmentation violation
- lldb | signal SIGSEGV: invalid address
- SIGILL : illegal instruction

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[Application] Bug fix about meson setting
Seungbaek Hong [Fri, 7 Jun 2024 08:44:10 +0000 (17:44 +0900)]
[Application] Bug fix about meson setting

Now, PICO GPT and LLAMA are adding extra_defines meson option in the application side.

However, even if this code is executed during build, this definition is not reflected when actually running the app.

Because the application area is built after the process of reflecting extra_defines to add_project_arguments has already been completed, so adding extra_defines during application build is meaningless.

In addition, it is impossible to call add_project_arguments after build, so the structure to add extra_defines during build process is wrong.

The reason why PICO GPT and LLAMA add extra_defines is that the encoder-related script created now does not run on tizen, so encoder-related option was added to the root meson and the options on the application side were removed.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
10 months ago[refactor] Moved blas_kernels to tensor directory
Debadri Samaddar [Wed, 5 Jun 2024 10:20:35 +0000 (15:50 +0530)]
[refactor] Moved blas_kernels to tensor directory

Moved common OpenCL blas kernels to tensor directory.
Added pre processing functions as common that can be re-used.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
10 months ago[refactor] Removed experimental OpenCL kernel files
Debadri Samaddar [Wed, 5 Jun 2024 10:18:39 +0000 (15:48 +0530)]
[refactor] Removed experimental OpenCL kernel files

Removed OpenCL kernel files used for experiments.
There are no dependencies on these files.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
10 months agoactions: gbs build test
MyungJoo Ham [Tue, 4 Jun 2024 09:56:09 +0000 (18:56 +0900)]
actions: gbs build test

Run gbs build for x64/x86/aarch64/armv7l Tizen.
This is imported from nnstreamer.git.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months agoaction: Yocto devtool test
MyungJoo Ham [Thu, 13 Jun 2024 10:31:06 +0000 (19:31 +0900)]
action: Yocto devtool test

Test if a pull request breaks Yocto build or not.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months ago[CI] Add fp-16 build in github action
heka1024 [Mon, 17 Jun 2024 19:22:10 +0000 (04:22 +0900)]
[CI] Add fp-16 build in github action

Closes #2560. You can now see the build results from Fp16 in a Github action.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: heka1024 <heka1024@gmail.com>
10 months agoaction: add Ubuntu pdebuild
MyungJoo Ham [Thu, 13 Jun 2024 10:28:48 +0000 (19:28 +0900)]
action: add Ubuntu pdebuild

Run pdebuild to test if it is not breaking PPA builds.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months ago[ layer ] Optimize LSTM fp16 computation
skykongkong8 [Tue, 18 Jun 2024 00:54:32 +0000 (09:54 +0900)]
[ layer ] Optimize LSTM fp16 computation

Using add_i_partial function in LSTM layer will reduce if/def codeblock, and even accelerate the function latency.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[ Tensor ] Implement add_i_partial
skykongkong8 [Tue, 18 Jun 2024 00:50:34 +0000 (09:50 +0900)]
[ Tensor ] Implement add_i_partial

- Occasionally, add_i computation for only interested section is desired.
- Moreover, this function could lower down if/def code blocks from the layer level.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[Doc] Update activation function in `README.md`
heka1024 [Mon, 17 Jun 2024 19:05:51 +0000 (04:05 +0900)]
[Doc] Update activation function in `README.md`

Sync supported function with `README.md`

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: heka1024 <heka1024@gmail.com>
10 months agoandroid: consistant ML_API_COMMON macro
MyungJoo Ham [Thu, 13 Jun 2024 07:56:10 +0000 (16:56 +0900)]
android: consistant ML_API_COMMON macro

ML_API_COMMON macro has been inconsistent for Android build,
where it is force-defined 1 in Android.mk while it may
become 0 or 1 depending on build system in meson.

Because android build uses meson and Android.mk simultaneously,
this must become consistent.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months agoaction: add Android build test
MyungJoo Ham [Wed, 12 Jun 2024 06:36:56 +0000 (15:36 +0900)]
action: add Android build test

Android is the major release target of nntrainer.
Build and run test cases for Android.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months agoaction: add check if rebuild required module
MyungJoo Ham [Thu, 13 Jun 2024 04:23:33 +0000 (13:23 +0900)]
action: add check if rebuild required module

Import check-if-rebuild-requires module from nnstreamer.git

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months ago[ hgemm ] Use hgemm kernel in transpose cases
skykongkong8 [Mon, 17 Jun 2024 23:49:23 +0000 (08:49 +0900)]
[ hgemm ] Use hgemm kernel in transpose cases

- With SIMD version of fp16 transpose code, using hgemm kernel in transpose case would be more useful.
- Note that we should develop a data packing code for this case for further optimization.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[trivial] fix typo error
Donghyeon Jeong [Tue, 18 Jun 2024 02:31:10 +0000 (11:31 +0900)]
[trivial] fix typo error

Fix typo error
- README.md

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
10 months agoFixed the build error for gcc-14
wchang kim [Mon, 10 Jun 2024 04:49:16 +0000 (13:49 +0900)]
Fixed the build error for gcc-14

This is imported from review.tizen.org

Change-Id: I80e2332711ae405488b39eaf060384e7490a7c45
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months ago[ layer ] Enable mha gtest and match version
skykongkong8 [Mon, 3 Jun 2024 09:31:10 +0000 (18:31 +0900)]
[ layer ] Enable mha gtest and match version

- Current mha layer at nntrainer/layer is not for general use, but implemented solely for LLaMA support.
- In order to run unittest for mha layer, return to previous version of mha layer, and move current implementation under Application/LLaMA

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[bugfix/unittest] Using LayerSemanticsGpu for FC Layer test
Debadri Samaddar [Wed, 5 Jun 2024 05:34:12 +0000 (11:04 +0530)]
[bugfix/unittest] Using LayerSemanticsGpu for FC Layer test

Using newly added LayerSemanticsGpu for FC Layer GPU unittests.
Renaming fp16 unit test variable to avoid duplicate declaration when all tests are run.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
10 months ago[ docs ] Add lldb-server debugger guide file
skykongkong8 [Wed, 5 Jun 2024 01:23:06 +0000 (10:23 +0900)]
[ docs ] Add lldb-server debugger guide file

- In order to attach debugger to android unittest, using lldb is quite useful.
- Add some guidelines to attach

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[ neon/trivial ] Compare float scaling factors more precisely
skykongkong8 [Mon, 3 Jun 2024 10:55:31 +0000 (19:55 +0900)]
[ neon/trivial ] Compare float scaling factors more precisely

- for zero-comparison, use std::fpclassify
- for 1.0-comparison, use std::numeric_limits and epsilon

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[ Trivial ] Fix typo and use better iterating index
skykongkong8 [Thu, 23 May 2024 00:28:48 +0000 (09:28 +0900)]
[ Trivial ] Fix typo and use better iterating index

- Fix typo for hgemm kernels docs
- Use fixed size4, size8 instead of getting value every time

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[ hgemm ] Support scaling factor beta in kernel-based hgemm
skykongkong8 [Tue, 14 May 2024 07:29:53 +0000 (16:29 +0900)]
[ hgemm ] Support scaling factor beta in kernel-based hgemm

- This commit allows hgemm to get beta condition as well.
- Note that beta for here is as follow:
C = alpha * A * B + beta * C
- In addition add zero-init code for beta = 0.F case. According to recent model profiling result, even for initialization, minimizing instruction is quite helpful more overall model latency reduction.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>