review.tizen.org Git - platform/core/ml/nntrainer.git/log

[svace] fix some svace issues

fixed some svace issues.

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>

[tizen] fix coverity issues in appication

fix some coverity issues (exception handling issues)

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>

[tizen] fix coverity issues

fix some coverity issues.

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>

[tizen] fix coverity issues in application

update exception handling for applications..

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>

[tizen] set nnstreamer_trainer flag

set nnstreamer_trainer value to 1 for tizen build

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>

[trivial] add header files

added missing header files for tizen branch

**Self evaluation:**
Build test: [x]Passed [ ]Failed [ ]Skipped
Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>

[ coverity ] Fix coverity issue

Resolves:
- double_unlock: ~unique_lock unlocks lk while it is unlocked.
By applying:
- letting unique_lock go out of scope naturally by limiting the scope of the lock.
- In this case, #2913 (using std::lock_guard) since wait() function from std::condition_variable is being called.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Change-Id: I6e6ac92883931448407ceeb167c85b9ed7007859
Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[coverity] resolve exception handle issue

- Resolve exception handle issue by adding try catch statement
- Remove noexcept

Signed-off-by: hyeonseok <hs89.lee@samsung.com>

[Bug] fix memory leakage problem in inference mode

this patch fix a memory leakage issue in inference mode.

The problem occurred because even though deallocation was performed,
vectors inside the memory pool were still remained.

**Self evaluation:**

Build test: [x]Passed [ ]Failed [ ]Skipped
Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>

[Coverity] #1840581 catch std::bad_function_call

Catch a possible `std::bad_function_call` exception that can be thrown
by `Tensor::getData()`.

Signed-off-by: Daekyoung Jung <dk11.jung@samsung.com>

[Coverity] #1840519 Uncaught exception

Add catch block to cope with an std::system_error that could be thrown
by `nntrainer::AppContext::Global()`.

Signed-off-by: Daekyoung Jung <dk11.jung@samsung.com>

[Coverity] #1839950 Catch uncaught exception

Catch a `std::bad_array_new_length` that can be risen when array
indexing

Signed-off-by: Daekyoung Jung <dk11.jung@samsung.com>

[Coverity] Fix Coverity issue at KNN Application

Uncaught exception (UNCAUGHT_EXCEPT)
- root_function: In function main(int, char **) an exception of type std::bad_cast is thrown and never caught
- to fix issue, add try catch statement at writefile

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[Coverity] resolve coverity issues

This pull request aims to resolve various Coverity issues.
The following warning group IDs are addressed:

Warning Group ID
- #1832819
- #1833528
- #1834338
- #1837520
- #1839030
- #1841393
- #1837925
- #1838882
- #1839129
- #1839950
- #1840581
- #1840832

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Change-Id: I74a843793789b60c476d831bd53ce085e69cd163
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Coverity] Fix Coverity issue on ini_interpreter (path nullptr exception)

For Resolve below issue
- Return value of a function 'iniparser_getstring' is dereferenced at ini_interpreter.cpp:350 without checking for null, but it is usually checked for this function

add NNTR_THROW_IF() to check backbone_path is null
when backbone_path is nullptr then it will print error msg

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[Coverity] Fix DIVIDE_BY_ZERO issue

- This commit resolves coverity issue of DIVIDE_BY_ZERO.
- This commit updatees preprocess_l2norm_layer.cpp/.h

Signed-off-by: Eunju Yang <ej.yang@samsung.com>

[ coverity ] Fix coverity issue

- Fixes:
leaked_storage: Variable handle going out of scope leaks the storage it points to.
- Apply:
Use std::move(factory_func) instead of factory_func.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[Coverity] Fix Coverity issue at neuralnet destructor

Uncaught exception (UNCAUGHT_EXCEPT)

exn_spec_violation: An exception of type std::length_error is thrown but the exception specification /*implicit*/noexcept doesn't allow it to be thrown. This will result in a call to terminate().
- in general, throw exception at destrctor is not allowed
- so, add exception at deallocate Tensor

Resolves:

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[Coverity] Fix double_unlock issue in markFilled()

- The `unique_lock` was manually unlocked before going out of scope,
causing a double unlock issue.
- This has been fixed by removing the explicit `unlock()` call.
- Additionally, `notify_emptied_cv.notify_all()` is now called without
holding the lock to avoid unnecessary cntention.

Signed-off-by: Eunju Yang <ej.yang@samsung.com>

Memory Leak Fixes in NNTrainer

This PR addresses memory leak issues in the NNTrainer.

**Changes proposed in this PR:**
- Release MemoryData pointer in the Tensor constructor using the vector.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Change-Id: Ia59d891439f1507a16b3fc252731c25993cf7b26
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[UTIL] fix the bug in nntr parallel run

This PR fix the nntr prarallel run. Previously it is not set properly
according to NNTR-NUM-THREADS options in meson_options.txt

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[ Tizen7.0 ] Include some headers in -dev header for neuralnet.h

- In the previous PR (77e56f1), neuralnet.h was included in dev package.
- However, some headers were missing used in nueralnet.h
- This PR adds headers which have dependency with neuralnet.h
- This PR is tested whether it supports ReinforcementLearning app on
Tizen7.0

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

Change-Id: Ie7e1ee5361ceabed14c2714c5091964f28e0f5cb
Signed-off-by: Eunju Yang <ej.yang@samsung.com>

[ Tizen7.0 ] Include neuralnet.h in -dev header

- Update the code to include `neuralnet.h` in -dev header.
- Some applications, e.g., ReinforcementLearning uses `forwarding` and
`backwarding` directly. To support it, this commit adds the header into
dev package.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Eunju Yang <ej.yang@samsung.com>

[bugfix] fix coverity issues

This PR resolves coverity issues in the ShortTensor class.
Replace max_abs() implementation with maxValue() since the maximum absolute value of unsigned int equals to the maximum value.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tizen7.0] Tizen7.0 Backporting

- This commit adds some updates for Tizen7.0 backporting
- Type mismatch bug is fixed.
- Unused variable is removed.
- Missing header files are added in spec file.
- spec file is updated

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Eunju Yang <ej.yang@samsung.com>

[ NNStreamer ] disable nnstreamer trainer

Describe a commit content (Until 80 colums per line) in detail ASAP.

**Changes proposed in this PR:**
- Added TOC generator for README.md

Resolves:

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[ SPEC ] chagne fp16

Describe a commit content (Until 80 colums per line) in detail ASAP.

**Changes proposed in this PR:**
- Added TOC generator for README.md

Resolves:

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

temporary code for layer initialization

- Temporary code for layer initialization

Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>

[BUILD] Remove Flag and Add FLECIBLE PAGE Option

Remove Flag on Android.mk
Add APP_SUPPORT_FLEXIBLE_PAGE_SIZE to Application.mk

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[BUILD] add APP_SUPPORT_FLEXIBLE_PAGE_SIZES

For support 16k page size, add APP_SUPPORT_FLEXIBLE_PAGE_SIZES
as True, According to Android Guide

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[BUILD] Add more 16k shared lib package option on Android.mk

After #2699 : add option's for all android.mk file

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[ BUILD ] Add 16K shared lib package option for Android

Android encourage to use 16KB package for the shared library. This PR
add the 16KB package option and also recommand to use ndk which is
higher or equal version of r27.

Resolves:

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[enhance] Using 64 bit for LayerKernel enum

Enhanced LayerKernel enum and mask for 64-bit values

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>

[Tensor] ShortTensor class with unsigned 16-bit integer

In this PR, a new type of tensor, the ShortTensor class, is designed explicitly for handling unsigned 16-bit integer data types.
This new tensor class aims to provide users with more options when working with various data types.
Note that the ShortTensor class does not support mathematical operations like multiplication or addition.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[ blas/bugfix ] Fix irrelevant function call

- Since current function implementations are not using CBLAS params, should directly call function from cblas.h

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[bugfix] Resolve fp16 enabled build error

This PR resolves the build error after #2704 when enable_fp16 is true.

This fixes:
blas_interface.cpp:141:9: error: ‘order’ was not declared in this scope
  141 |   sgemv(order, TransA, M, N, alpha, A_, lda, X_, incX, beta, Y_, incY);
      |         ^~~~~

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [ ]Passed [X]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[ Tensor ] Remove CBLAS params from Tensor related files.

- Remove cblas params from tensor related files since nntrainer is not fully-dependent on cblas anymore.
- Letting tensors to be aware of Cblas related parameters is a nonsense at the first place.
- CBLAS params will be declared only when functions from cblas is called.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ CAPI ] fix the Native API Ref Doc

Add MODULE in submodule name in doc file.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[coverity] fix coverity issue

This PR resolves the coverity issues of resource leak, unreachable code, and missing break.

**Changes proposed in this PR:**
- use static arrays instead of dynamic allocation to avoid resource leaks.
- remove unreachable code and add missing break statement.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[GPU/OPENCL] RMSNorm Accuracy Fix

The alpha values were not picked correctly.

Signed-off-by: Thummala Pallavi <t.pallavi@samsung.com>

[Layer] enhance ConcatLayer algorithms for efficient concatenation and split

This PR renovates reshape/concatenation algorithms to facilitate efficient concatenation and split in ConcatLayer.

Previously, dimension 2 (height) was set as a standard axis to operate concatenation.
However, this causes an overhead by copying a tensor size of 1 when the concat dimension is 3 (width).

The new algorithm consolidates all dimensions to the first and last axes based on the concat dimension, sets the standard axis to be 3, and performs concat and split.

**Changes proposed in this PR:**
- Revise creating helper dimension logic in finalize().
- Update forwarding() and calcDeriv() workflow to be efficient.
- Add descriptions for the new concat algorithm.

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Add optional output tensor for tensor concatenation

This PR adds an optional feature in Tensor::cat to pass the output tensor to the function.
This change allows the user-given tensor to store the result of the concatenation without creating a new tensor.

**Changes proposed in this PR:**
- Add optional argument output (the output tensor) to the cat function.
- Add negative test cases for tensor concatenation.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Android] Support Android NDK r27 and higher

This PR enables NNTrainer to use Android NDK r27 to support compiling 16 KB-aligned shared libraries.

While -fpu is ignored and -mfloat-abi option is not valid with AArch64 targets, removing these options has no effect on using current NEON instructions for armv8.2.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Layer] Improve forwarding logic of ConcatLayer

This PR updates current ConcatLayer forwarding for faster computation.

**Changes proposed in this PR:**
- Utilize the Tensor::concat() operation to perform forwarding and replace manual mapping and copying.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] CharTensor class with signed 8-bit integer

In this PR, a new type of tensor, the CharTensor class, is designed explicitly for handling signed 8-bit integer data types that have already undergone quantization.
This new tensor class aims to provide users with more options when working with tensors and their respective data types.
Currently, the CharTensor class does not support mathematical operations like multiplication or addition. However, these features will be added in future updates.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[ matrix_transpose/bugfix ] Prevent reading/saving data from/to unallocated memory

- Previous transpose kernel occasionally load/save unallocated memory, and then masked it.
- Now, it does not read them at the first place, but load with for-loop
- This would deteriorate speed of fp16 matrix transpose, but won't be dominant in total model latency

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ hgemm ] Generalize redundant micro hgemm kernel implementation

- Previous implementation naively used fixed-sized ukernels for the K-direction accumulation.
- Such kernels were excessively long, but had better performance than looping through single K-iteration.
- However, recent test results have shown that justing stacking 4 K iters, and looping through such ukernel preserved the performance with better code readability.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[Layer] add Weight Layer

- This layer contains only weights for building tensor-level graph

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>

[ hgemm ] Apply hgemm util funcs at frequently used functions

- get_prev_mltpl_of_2p_n is frequently used in many hgemm kernels.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ trivial ] Add missing docs and error message

- Add missing doxtgen tags : transpose boolean params
- error message : emit error when try to use full-fp16 kernel with experimental kernel build

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ hgemm ] Add hgemm experimental kernel

- According to current paper, accumulating up to 64 ~ 128 w.r.t. K-direction is fine.
- Since conventional error metric, and newly introduced metric (max component relative error) is fine as well, introduce experiemntal kernel.
- using build option -Dhgemm-experimental-kernel=true can enable such kernel when android build

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ hgemm ] Implement hgemm_small

- Forcibly adding zero-padding made small dim index quite clumsy and redundant.
- Implement explicit hgemm small function to cover M<8, N<16, K<16 case

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[refactor] Restructure getStringDataType function

This patch updates the getStringDataType function structure to utilize method overriding.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Update tensorbase for efficient creation of new tensor class.

This PR updates the TensorBase class to make mathematical operations that are not required to create a new tensor class.
This change allows developers to easily create new classes without implementing math operations.
Note that these functions should be implemented to utilize tensor operations fully.

**Changes proposed in this PR:**
- Change math operation function from pure virtual function to virtual function
- Add a private function to get the data type as a string

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

BUG FIX : Concat GPU Layer and CPU layer unittest cases name overlapping.

Modified the concat gpu testcases name in unittest_layers_concat_cl for differentiation with concat cpu testcases name.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Niket Agarwal <niket.a@samsung.com>

[Doc] NNTrainer Tool Utilization Guide

This PR adds a guide for executing unit tests on the Android device.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Android] Verify Android NDK Installation and Configuration

This patch checks if Android NDK is installed and configured before building using NDK in the Android test script.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[doc] Extend code documentation

This PR adds summary content to help users quickly understand the role and scope of the Tensor API.

**Self-evaluation:**
1. Build test: [ ]Passed [ ]Failed [X]Skipped
2. Run test: [ ]Passed [ ]Failed [X]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[GPU/OpenCL] Initial version of Concat Layer with OpenCL ops

Added naive version of OpenCL implementation for Concat Layer.
Incorporated kernel for ops used.
Added unit test for Concat_cl.

Signed-off-by: Niket Agarwal <niket.a@samsung.com>

[ unittest ] Implement max_componentwise_relative_error

- When comparing outputs computed with different precision, max componentwise relative error is needed.
- (trivial) Use more precision comparison for zeroDivisionError classifying code in cosine similarity function

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ unittest ] Use bounded value generator in hgemm unittests

- According to recent papers, using values with distribution of [0,1), or [-1, 1) is widely used when comparing fp16-fp32 precision comparison.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ unittest ] Add TCs for checking padding-using GEMM

- Add TCs checking for padding w.r.t. M, K, N, MK, KN, MKN

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ hgemm ] Implement NYI functions from matrix A/B hgemm_padding

- Missing implementations might trigger unittest fails on Android.
- This patch will now support padding function for all combinations of following conditions : matrix A / B, trans/noTrans, M/K/N direction

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ hgemm ] Implement matrix noTrans A w.r.t. MK padding

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ trivial ] Fix typo and add missing doxygen tags

- Fix typo and add missing doxygen tags
- Add more exact explanation for doxygen tag briefs

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ hgemm ] Move hgemm_padding related files to explicit directory

- Adding padding to matrices is not an optimal solution to approach, but yet can be one sub-optimal option.
- Final goal of this directory would be deleting this directory itself.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ hgemm ] Remove unnecessary K1 GEMM functions

- With perspective of memory, when K = 1, matrix transpose condition has nothing to do with GEMM algorithm.
- Remove all K1 noTrans / transA / transB / transAB and unify them into single function.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ hgemm/refactor ] Refactor hgemm file structure

- Kernel functions are used regardless of matrix transpose, does need to be included from separate file.
- For further optimal implemenation of matrix A / B / AB transpose blocking-kernel sequences, divide their file for convenience
- Function 'hgemm' itself is better to be reside in hgemm directory.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ unittest ] Add TC for K=1 hgemm case

- Missing optimizations for K=1 GEMM case was recently detected.
- Add such TC accordingly.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ trivial/hgemm ] Move hgemm_K1 to hgemm directory

- For consistency, hgemm_K1 function should reside under hgemm directory

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ trivial ] Add doxygen tags for hgemm padding functions

- Add doxygen tags for hgemm padding functions

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ hgemm ] Implement packing-blocking-kernel sequence for hgemm transB

- Previously, hgemm transB computation was relying on transposing the entire matrix and using non-transpose sequence.
- For optimal performance, matrix packing-blocking-kernel sequence for transB case is explicitly implemented.
- Note that current implementation only supports for 8x16 gemm kernel.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ hgemm ] Separate source / header files for hgemm packing function

- For easier implementation and maintenance of hgemm packing functions, separate them.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ Trivial/bugfix ] Add missing library to include

- add stdlib.h to hgemm_util.h

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ hgemm ] Implement matrix padding function

- Since current kernel / blocking function supports for fixed shape only, implement padding function for temporary solution.
- Note that flexible kernel / blocking implementation should be added for optimal performances
- Current implementation separates padding function for matrix A and B but it will eventually be governed with single function

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

fix: incorrect C/C++ preprocessor macro

When -DENABLE_ENCODER is given, you do
#ifdef ENABLE_ENCODER
not
#ifdef DENABLE_ENCODER

CC: @baek2sm
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>

[bugfix] Resolves Android build warnings

This PR resolves warnings that occur during the Android build. The list is as follows.

**Changes proposed in this PR:**
- Fix function that overrides virtual functions but is not marked override.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[bugfix] Android build error when fp16 is enabled

This PR fixes issues of undefined symbols of one of the tensor constructors.
The function implementation is moved to the header file to resolve this issue.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Operational Improvements and Functionality Simplification

This commit moves several operations implementations to each Tensor class for easier management.
This allows users to create a new data type Tensor without unnecessary modification to the Tensor class.

**Changes proposed in this PR:**
- static function Tensor::cat() uses each tensor's member function concat().
- Tensor::copy() logic is simplified by not differentiating by its data type.
- Tensor::copy_with_stride() uses an internal function to operate.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Update newly added features

This commit updates recently added features in tensor, including add_i_partial() and ele_mul().
The newly added functions have been implemented according to the revised tensor structure.

**Changes proposed in this PR:**
- Update Float/HalfTensor class with newly added function, add_i_partial().
- Apply BLAS operations in basic arithmetic operations in Tensor.
- height-width transpose in half-precision can be SIMD accelerated.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[bugfix] Fix issues occured in Tensor class refactoring

This commit aims to fix several issues that arose due to the refactoring of the Tensor class.

**Changes proposed in this PR:**
- The copy constructor has been implemented to prevent incorrect behavior of the default copy constructor in this commit
- Tensor add_i() has been newly implemented to fix previous incorrect implementations.
- Add chain() function that returns LazyTensor

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Refactor] Deprecate TensorV2 and replace Tensor class with TensorV2

This commit deprecates the existing TensorV2 class and replaces Tensor class with the new TensorV2 class.
The previous Tensor class has been removed and all its usages have been updated to use the TensorV2 class.
Additionally, all instances of TensorV2 usage within the NNTrainer have been removed.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Application] Bug fix in RL example

**Changes proposed in this PR:**
- This commit updates the DQN example.
- In the previous code, there was a bug : copy main Net to Target Net was not written as
intended.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Eunju Yang <ej.yang@samsung.com>

[Android] Add android test script

This patch adds a script to run unit tests on Android devices.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[trivial] remove unnecessary code

This PR removes the print statement that was previously added for debugging purposes.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Layer] Add missing activation types

Some activation types were missing from EnumList.
Added missing types to EnumList.

Changed the order of ActivationType and EnumList to be the same.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: SeoHyungjun <hyungjun.seo@samsung.com>

[ util ] Change name swish -> swiglu

- There was a typo in swiglu function. With Z element multiplication, this function is swiglu, not a swish

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[bugfix] Resolves Android build warnings

This PR resolves warnings that occur during the Android build. The list is as follows.

**Changes proposed in this PR:**
- Resolves explicitly defaulted function is implicitly deleted.
- Fix function that overrides virtual functions but is not marked override.
- Resolves clang warning on expression side effects.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[meson] fix typo error and add encoder option

- fix 'ENABLE_ENCODER' option typo errors in llama application
- add 'enable_encoder' to meson option

After reflecting this modifications, i've checked that the llama is running well.
(If you build the enable_encoder option as true, it works.)

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>

Remove dangerous dummy meson dep

When a dependency library is installed with hardcoded scripts,
declare dependency with as much information as possible from
the installed package to detect dependency errors at build-time.

Don't add a dummy dependency for actual library dependencies.

Fixes #2673

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>

[Layer] add tanh-based approximate gelu activation function

- add tanh-based approximate gelu(tanh gelu) for vision transformer.
- rename quick gelu to sigmoid gelu(it's a sigmoid-based approximate gelu)

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>

[build] Added third party to include directories

Added opencl/third_party folder to include directory

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>

[GPU/OpenCL] Initial version of RMSNorm Layer

Added naive version of OpenCL implementation for RMSNorm Layer.
Incorporated kernel for ops used.
Added unit test for rmsnorm_layer_cl.

Signed-off-by: ThummalaPallavi <t.pallavi@samsung.com>

README: add openssf best practice badge.

To prepare LF AI & Data project proposal, openssf best practice
should be registered.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>

[ CI ] modify android build test in action

This pr fixs the duplicated build in android build action.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[blas/opencl] SGEMM OpenCL kernels added

Added all possible OpenCL kernels for SGEMM
Added unit tests

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>

[GPU/OpenCL] Moving Addition kernel to Tensor Directory

Moved addition_cl kernel to Tensor directory.
Refactored addition_cl for generalization.

Signed-off-by: Yash Singh <yash.singh@samsung.com>

[BUG FIX] Swiglu fp16 GPU Layer test filename mismatch
Modified the swiglufp16 filename in gen_layer_tests for unity with the name in unittest_layers_swiglu_cl

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Niket Agarwal <niket.a@samsung.com>

[GPU/OpenCL] Initial version of Reshape Layer with OpenCL ops

Added naive version of OpenCL implementation for Reshape Layer.
Incorporated kernel for ops used.
Added unit test for Reshape_layer_cl.

Signed-off-by: Niket Agarwal <niket.a@samsung.com>

[layer] added start/end dimension in flatten layer

- For now flatten layer flatten all dimension except batch.
This commit will be able to flatten only the sub dimensions

Signed-off-by: hyeonseok <hs89.lee@samsung.com>