review.tizen.org Git - platform/core/ml/nntrainer.git/log

meson script condition fix

Whether to include fp16 codes should depend on
fp16 enable/disable, not on the platform name
directly.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>

blas_neon.cpp: unsigned int type mismatch

Please do not ignore compiler warnings.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>

dist/Tizen: disable fp16 in Tizen

NNTrainer FP16 implementation relies on NEON, which requires
armv8.2-a ISA.

Tizen aarch64 is on armv8.0-a; thus it cannot support fp16-neon.
Thus, disable fp16 in armv7l and aarch64.

Tizen x86/x64 does not support fp16, too.

This re-enabled Tizen build of nntrainer.
Please do not break the build in main branch!

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>

[ Tensor ] Support non-contiguous case in sin, cos, inv_sqrt_i

- If it is not for BLAS, we can also support sin, cos, inv_sqrt_i functions for non-contiguous case as well.
- Fix related functions and add unittest accordingly.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ Trivial ] Add exception in inv_sqrt_i function

- In case of non-contiguous Tensor, it is impossible to apply SIMD instructions. Add expection accordingly.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ Trivial ] Refactor trigonometric functions

- In case of non-contiguous Tensor, it is impossible to apply SIMD instructions. Add expection accordingly.
- Rename the function name for intuitiveness : sin_transform -> sin, cos_transform -> cos

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ Bug ] Fix coverity issues

- Fix non-const variables to const variables since their value is never changed in actual practice
- Use const auto & to avoid object copy

Resolves:
```
non-const type variable, but its value is never changed.
auto_causes_copy
```

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[coverity] Fix coverity issues

This PR resolves the coverity issues that were identified.

**Changes proposed in this PR:**
- Specify the return type of the lambda function
- Use reference to not copy the object.

This fixes:
- Use of auto that causes a copy (AUTO_CAUSES_COPY)

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[FIX] Fix coverity issues

Issue:
1740106
1742375
1747001

Signed-off-by: Jiho Chu <jiho.chu@samsung.com>

[bug] fix coverity issues

- Specify the lambda return type to avoid object copy

Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>

[CI] Rename laber & upgrade node version & add workflow fail process

In order to improve the convenience and robustness of GitAction, make the following modifications:

1. In accordance with the guidelines to upgrade from Node 16 to Node 20
- change gitaction-script version to v7
- change gitaction-upload-artifact to v4
- ref : https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/.
2. If the check_count fails, alter it so that no additional actions are executed and the process terminates immediately.
3. adopt more descriptive and clear names for better understanding.

**Changes proposed in this PR:**
renamed: .github/workflows/Upload.yml -> .github/workflows/check_count.yml
modified: .github/workflows/labeler.yml

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[TensorV2] Multiplication support

This PR adds support for performing the multiplication operation on two tensors.

**Changes proposed in this PR:**
- TensorV2 includes member functions to perform tensor multiplication.
- FloatTensor and HalfTensor take TensorV2 as input/output to perform multiplication.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

blas_neon: fix compiler errors in aarch64/Linux

With stricter compilers, fp16 codes are not compilable.
To enable testing in non-android, fix type mismatches.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>

[CI] Add Pylint gitaction for gitaction ci

Add pylint yml file for python lint
- we move to gitaction from TAOS CI
- using pylint file from tensorflow gitaction
- ref : https://github.com/tensorflow/tensorflow/blob/master/.github/workflows/pylint-presubmit.yml
- and for test : fix python file's format

**Changes proposed in this PR:**
- pylint.yml

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[coverity] Remove no effect code

This PR fixes the Coverity issue, indicating no effect code

**Changes proposed in this PR:**
- Remove negative check (unsigned int is always non-negative).

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Trivial] Add new member & update CODEOWNERS

Add new member & update CODEOWNERS

**Changes proposed in this PR:**
    modified:   .github/CODEOWNERS
    modified:   CONTRIBUTING.md
    modified:   README.md

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[TensorV2] multiply_strided() skeleton

This pull request introduces a basic structure of tensor multiplication operations that support different strided inputs and outputs.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Test] Generate TensorV2 in unit test

This PR includes the implementation of test util functions to create a tensor filled with values to utilize in unit testing.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[CI] Add cpp file format checker

This patch adds a Github Action workflow to check cpp file format
- this file imported from deviceMLOps.MLAgent
- this file use cpp_linter marketplace actions

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[CI] Add Clean meson build for gitaction ci

Add Clean meson build .yml file for ci
- This file was taken from nnstreamer's git action and modified to fit nntrainer.
- ref : https://github.com/nnstreamer/nntrainer/blob/main/docs/getting-started.md

**Changes proposed in this PR:**
- .github/workflows/ubuntu_clean_meson_build.yml

Resolves:
- Add gitaction ci

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[bugfix] Resolve segfault in tensor apply

This PR fixes a bug where a segmentation fault occurs when the output tensor is empty.

The fix initializes the output tensor when empty to avoid this error.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[TensorV2] Enable copying data from Tensor

This PR enables deep copy functionalities of a contiguous tensor with the following functions. copy(), copyData(), and copy_with_strides().

The copy function completely copies the target tensor values regardless of the dimension of the input tensor. All elements and properties of the original tensor are copied to the new tensor. Therefore, if the copy function is used, a new tensor with the same size and shape as the original tensor is created.

On the other hand, the copyData function must match the size of the input and target tensors. This function only copies the data of the original tenor, so if the size or shape of the tensor is different, the copy may not be done properly.

Note that copy and copyData functions support copy data from multiple tensor data types while the copy_with_strides function only supports copying data from the same data type.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[TensorV2] Reshape functionality

This commit includes the implementation of reshaping the tensor to the given TensorDim in the following conditions.

1. Tensor to reshape is contiguous.
2. The length of data should match with the new TensorDim.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[TensorV2] Add support for applying operators with broadcasting

This PR enables functionality to apply the given operator, such as multiply and divide, with broadcasting to the tensor.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[DOCS] add instructions to create meson.build in how-to-create-model.md

* Ths commit adds a missing part in docs/how-to-create-model.md.
* It includes some explanations to write meson.build file of a new application (under Applications/MyApp/jni/ so as to build it.

Signed-off-by: Eunju Yang <ej.yang@samsung.com>

[Tensor] Add broadcast support for operations

This PR adds support for broadcasting to enable broadcasting in operation in the future.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[TensorV2] Multiplication operation skeleton

This pull request adds a basic implementation of tensor multiplication operations to our codebase.
The new functionality allows users to perform multiplication of tensors by simply calling a function.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

meson: do not force enable ml-api when it's not force enabled.

The previous meson logic force enabled ml-api if it is not disabled
and common headers are found.

The new logic disables ml-api even if common headers are found
if ml-inference is not found.

This allows to build nntrainer in a system only common ml headers
are available without any meson options.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>

[Test] Enabled unit testing for TensorV2 class

This PR enables unit testing for the TensorV2 class by adding a suite of tests that cover public methods.
More tests will be added in a future PR to further validate the TensorV2 class.

**Changes proposed in this PR:**
- Edit meson build file to include tensor v2 unit tests
- Fix public methods usage due to changed function use.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[ unittest ] Add unittest for inv_sqrt_i with fp16

- There was a request to add unittest for inv_sqrt_i in PR#2396
- Conducted between fp16 and fp32 Tensor with eps = 1e-3

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

util_simd: make typename consistent (__fp16 --> _FP16)

_FP16 is the macro to unify different fp16 typenames
across different architectures or libraries.

Note that util_simd.cpp has the correct name (_FP16) while
the header has the incorrect naming (__fp16).

Although this does not break the build or execution,
this is not good for readability and dependency clean-ups.

CC: @skykongkong8
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>

Use getDataType() instead of getTensorType().data_type

Suggested by @djeong20 at https://github.com/nnstreamer/nntrainer/pull/2409#pullrequestreview-1822817983

Co-authored-by: Donghyeon Jeong <54725479+djeong20@users.noreply.github.com>

fix: multi-head-attention incorrect macro usage.

1. Fix re-definitions of macros
2. Determine mask num in runtime, not in compile-time.
You do not know whether users want fp16 or not at compile-time.

Fixes #2407

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>

[ util ] Implement swish function in util

- To accelerate swish activation function implement swish calculation function for:
- neon fp32 / fp16
- raw fp32 / fp16
- SIMD calculation of exponential function is grounded on neon_mathfun

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ BLAS ] Refactor neon_mathfun

- For easier usage of neon_mathfun, refactor to avoid duplicated symbol error

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[TensorV2] apply() function to apply a given function

This pull request implements a new function called apply(), which applies a given function element-by-element to a tensor.
The resulting tensor has the same shape as the input tensor, but each element has been transformed by the given function.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Added getters and setters for private members of TensorBase class.

This PR adds accessors (getters) and mutators (setters) for the private data members of the TensorBase class.
This change allows TensorV2 to interact with these variables through the provided methods, improving encapsulation and making the code more maintainable.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[ layer ] Apply neon simd acceleration in rotary embedding computation

- Previous rotary embedding computation code was naively implemented with for-loop
- Using simd code, I expect this code to behave efficiently in time cost perspective, without precision loss
- Current implementation only supports Neon simd (ARMv8)
- Trivial typos fixes included

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ util ] Add util_simd file

- Here I would like to introduce util_simd. I had no choice but differentiate this file with blas for following reasons:
1. This is actually not 'Basic Linear Algebra Subprogram' indeed
2. This code is only used in the very specific occasion.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[Tensor] Fix comparison operator

A tensor containing a NaN value cannot be equal to any other tensors since NaNs are never equal. Therefore, comparison operator logic changed in handling NaN values.

**Changes proposed in this PR:**
- Comparison operator returns false if tensor has a NaN value.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[ BLAS ] Add inv sqrt inplace function

- Implement inv sqrt inplace function with neon / raw

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ Tensor ] Add trigonometric transformation functions in Tensor
- Add sin / cos transform function in Tensor : both for BLAS/raw
- Add unittest accordingly

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[ BLAS ] Add trigonometric transformation functions

- In order to accelerate trigonometric calculations, add such transformation functions in BLAS module
- Add zlib license file

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[Tensor] Support additional weight initialization

This PR enables additional weight initializers with a probability distribution.

**Changes proposed in this PR:**
- Functions to set tensor with random distribution are implemented.
- Tensor now supports various initializers besides zero and one.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Sum by axis in column-major order

This PR enables a summation of tensor elements by axis in column-major order.

**Changes proposed in this PR:**
- Use sgemv in sum() with CblasColMajor when the tensor is column-major.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Comparison operator overloading

This PR includes the implementation of comparison operators for TensorV2-related classes.

**Changes proposed in this PR:**
- TensorBase comparison operator compares Tensor information such as TensorDim.
- Float/HalfTensor comparison operator checks Tensor data.
- Destructor implementation is removed and set to default due to the rule of five.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[TensorDim] Add column-major storage order

This PR defines the Tensor storage order in the TensorDim class to support Row-major and Column-major order.

**Changes proposed in this PR:**
- Add enum class StorageOrder to define storage order

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[FP16] Include HalfTensor when enable_fp16

In this PR, HalfTensor is included only when FP16 is enabled.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Enable additional constructors

In this PR, multiple constructors are supported in the original Tensor class.

- TensorV2 constructors decide which Tensor to create.
- TensorBase constructors handle initialization that is shared.
- Float/HalfTensor constructors manage their own unique initialization.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Support source tensor allocation

In this PR, the float/half tensor can be allocated based on the source tensor.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[ Ahub ] Fix Ahub issues

- Fixes : TensorV2 may not initialize itensor
- Fixes : After dynamic memory allocation of itensor, there is no destructor

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[Tensor] Resolve SrcSharedTensorV2 cyclic dependency

This PR fixes the issue of SrcSharedTensorV2 containing TensorV2 which creates cyclic dependency.
(SrcSharedTensorV2 -> TensorV2 -> TensorBase -> SrcSharedTensorV2)

**Changes proposed in this PR:**
- SrcSharedTensorV2 owns TensorBase instead of TensorV2
- Rename SrcSharedTensorV2 as SrcSharedTensorBase accordingly
- Add functions to create and get shared data tensor

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Add Float/HalfTensor Implementation

In this PR, FloatTensor and HalfTensor's override methods are implemented.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[ API ] Add Tensor CPP API for Auto Grad

In this PR,
. Add skeleton Tensor Class in cpp api which inherite the
nntrainer::var_grad.
. include setter and getter of source layer which create this
tensor.
. Add unittest case

Resolves:

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[Util] Fix error in using fp16.h functions

This PR fixes multiple definition error when using FP32/16 conversion functions in fp16.h

**Changes proposed in this PR:**
- Function definition is moved to fp16.cpp.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Add Tensor member functions

This PR extends the current Tensor member functions.

**Changes proposed in this PR:**
- Add member functions to get tensor information.
- Implement skeleton code.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Unit Test] Fix unittest_interpreter

The unit test interpreter had been previously disabled due to build errors.
However, the TensorFlow Lite (TFLite) related unit tests were separated and dependencies were removed, reflecting changes made in the neural network trainer.

**Changes proposed in this PR:**
- modified: ../test/unittest/compiler/meson.build
- modified: ../test/unittest/compiler/unittest_interpreter.cpp

Resolves:

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[Doc] Add TensorV2 class diagram

- Add class diagram of TensorV2

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Refactored TensorV2 class

In this PR, the TensorV2 structure is refactored instead of following a Type erasure pattern.

**Changes proposed in this PR:**
- TensorV2 is a target, expected for a user to use, that contains a TensorBase pointer.
- TensorBase is an abstract class that provides default infrastructure code.
- FloatTensor class inherits the TensorBase class and overrides the pure virtual methods with 32-bit floating point calculation.
- HalfTensor class inherits the TensorBase class and overrides the pure virtual methods with 16-bit floating point calculation.

**Note**
This is a skeleton to build the structure with no implementation.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Unit Test] Add unittest_export for tflite export

I have created a file named "unittest_export" to test the tflite export functionality.

- So far, I have only tested this feature on a network graph basis.
- In order to conduct more accurate testing, I have generated a model for testing purposes.
- To facilitate future tests, I have developed a function called "run_tflite," which can retrieve outputs using the model name and input data.
- I have included the MNIST FULL model to assess the application's compatibility with existing models.

**Changes proposed in this PR:**
- Added unittest_export.cpp
- modify meson.build
- update unittest_interpreter.cpp

Related : #2371

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[Unit Test] Remove tflite export related part in unittest_interpreter

Thus far, I have removed the TFLite-related unit tests from the interpreter and transferred them to the unittest_export file.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[Unit Test] Update meson.build file to add export test

Update meson.build file to add export test
- for tflite export test add meson unittest_export.cpp
- for unittest add some dependency

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[Tensor] Remove current TensorV2

- Remove TensorV2 class and related classes for designing a new pattern

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Trivial] Edit author list

- Edit authors list for tensor files

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Apply function for TensorV2

- Add TensorV2 member function apply, which applies given function element by element.
- Add unit test for newly added TensorV2 function verification.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] TensorV2 class operator overloading

- Add implementation of operator overloading that is required .

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Addition setter/getter for TensorV2

This PR extends the current setter and getter for TensorV2 data.

**Self-evaluation:**
1. Build test: [ ]Passed [ ]Failed [X]Skipped
2. Run test: [ ]Passed [ ]Failed [X]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Remove data type as input for Float/Half Tensor

As discussed on #2367, the Float/HalfTensor data type does not change. There's no need to take datatype as input and therefore is removed.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Trivial] Update gitignore file

**Update gitignore file**

The current build process involves downloading the necessary encoder,
ctre-unicode, and json files for running LLM from an external repository
which is not tracked within the nntrainer repo.

In order to prevent developers from accidentally uploading
these files upstream and to make the process more convenient,
we will be updating the gitignore file.

**Changes proposed in this PR:**
- Update gitignore file

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[Exporter] Update node_exporter

Update node_exporter.cpp and node_exporter.h

The issue was that new properties were added to the layer node,
causing existing TFLite export code to fail recognizing the layer_node.

This problem was resolved by adding these properties to the node_exporter.

added props
-  props::Packed
-  props::Print

**Changes proposed in this PR:**
- modified:   nntrainer/utils/node_exporter.cpp
- modified:   nntrainer/utils/node_exporter.h

Resolves:
- tflite export error #2371

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>

[Tensor] Initial Draft of Tensor Version 2

In this PR, the initial working version of Tensor V2 is included.

**Changes proposed in this PR:**
- Create a TensorV2 class that follows a type erasure design pattern.
- Create a new SrcSharedTensorV2, which is a class template.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[neon/bugfix] Fix ewva function

- There was a wrong implementation of ewva function. It should be added, not multiplied.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[Ahub] Fix AnalysisHub defects

**Changes proposed in this PR:**
- Fix uninitialized class members in the constructor
- Fix potential uninitialized data
- Add try-block to catch exceptions
- Check if malloc returns null

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Include Half Tensor when FP16 is enabled

**Changes proposed in this PR:**
- Edit meson.build file to add half_tensor.cpp when enable_fp16 is true

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] HalfTensor class for 16-bit floating point

This PR includes creating the HalfTensor class which separates 16-bit floating point calculation from nntrainer::Tensor.

**Changes proposed in this PR:**
- Create a HalfTensor class that only handles 16-bit floating point calculation.
- Remove operations for Quantized Tensor.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Sub-plugin] Refactoring sub-plugin class

- Change NNTrainerTrain class name to NNTrainerImpl
- Change InputTensorsInfo class name to TensorsQueue
- Add push method to TensorsQueue
- Change member variables of NNTrainerImpl and TensorsQueue to private and rename some variables and methods

Signed-off-by: hyunil park <hyunil46.park@samsung.com>

[Application] Add multi_input dataloader example

Added multi-input dataloader example application.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>

[Tensor] FloatTensor class for 32-bit floating point

This PR includes creating the FloatTensor class which separates 32-bit floating point calculation from nntrainer::Tensor.

**Changes proposed in this PR:**
- Create a FloatTensor class that only handles 32-bit floating point calculation.
- Remove operations for Quantized Tensor.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[trivial/bugfix] Add `inference_only_option` in multi_head_attention unittest

- We were using `inference_only` option for multi_head_attention fp16 unittest since we do not have loss scaling implementation yet.
- However, I discovered that there was a missing declaration of such option which might make malfunction when build, and added accordingly.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[gtest] Add test suites for multiHeadAttention with w16a16

- We already had proper implementation of multihead attention layer with half-precision, but did not have any unittest cases.
- Add unittest accordingly
- Fix typo : last line indent

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[layer] Support fp16 in embedding layer

- Add _FP16 code block in calcGrad in embedding layer (it does not need it for forwarding)
- Add unittest accordingly
- Explicit code in gtest for embedding layer:
- In layer gtest, each test suite runs without knowing any context of its adjacent layers
- Thus, for atypical layers like Embedding layer (which layer that has different dataType in input and output) we should either refactorize the code or explicitly process it
- Room for memory optimization

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[gtest] Add gtest data for embedding layer

    - Generating gtest data for the embedding layer should be differentiated from the other data because of following reasons:
            1. Embedding layer takes 32bit for its input, even when working with fp16 models
            2. Embedding layer has this particular object called 'IndexSlices', and we need additional processing in order to let it behave like the way in
the NNTrainer

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[gtest/trivial] Change notation in gtest: fp16fp16 to w16a16

- Previous notation 'fp16fp16' does not really represent the true meaning: data file for layers with half-precision weight and activation
- Thus, I would like to propose a new notation style 'w16a16' not only for the better understanding, but also to avoid unnecessary misunderstandings for mixed precision support in the near future.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[layer] Support fp16 in dropout layer

- Check dropout layer does not need code fix to support multiple dataTypes
- Add unittest accordingly

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[layer] Support fp16 in lstm layer

- Add _FP16 code block to enable float16 functionality
- Add unittest accordingly

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[layer] Support fp16 in concat layer

- Add _FP16 code block to enable float16 functionality
- Add unittest accordingly

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[Utils] Conversion to/from half-precision floating point

This PR includes utility functions for conversion to/from 16-bit floating point number, in bit representation, from/to 32-bit floating point number in IEEE format.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[Tensor] Support multiple data types in copyData

Previously, copyData only supported copying data of the same data type. Copying data of different data types is needed with the increased demand for flexibility of mixed-precision.

**Changes proposed in this PR:**
- copyData supports copying data of different type with the use of NEON
- Remove the flate function
- utilize copyData in dequantize

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[docs] add how-to-create-model document

I have added a tutorial document on how users can build their own models using NNTrainer.

The current tutorial is just the most basic draft, and it needs updating.
It seems that the API needs to be updated to use easily,
as setting up the data by the user is very inconvenient for now.
(So in this example, it uses random data generator.
so.. user couldn't learn how to train using a real data.)

I also added this tutorial link to the README on the first page,
and since the list of maintainers and contributors currently occupies too much space,
I moved this part to the bottom of the README.

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>

[trivial] fix typo errors and delete duplicated script

fix typo error in llama implementation and delete dummy
script (multi_head_attention copy.h).

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>

[TensorDim] Fix TensorDim constructor

- Previously, there were lack of default constructors for 1-batch, 1-channel, 1-height, 1-width with currently added TensorType option (regardless of format).
- Since there are a lot of codes using such case in FP32 implementation, I had no choice but have to explicitly feed the tensor_type (which contains the fp16 info) to the previously defined tensor.
- For clean code, I would like to propose a new default constructor for better construction of TensorDim instance.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[bugfix] memory overwrite error fix in unittest_tizen_capi

This PR resolves the failing getWeight_01 test case in the unittest_tizen_capi.
In the ML API common data structure, the maximum rank in Tizen APIs has changed from 4 to 16 since Tizen 8.0.
However, the NNTrainer getWeight test uses a dimension with MAXDIM of 4.
This causes an issue of ml_tensors_info_get_tensor_dimension overwriting irrelevant memory since it expects to get an array length of 16 while it's passing array length of 4.

**Changes proposed in this PR:**
- Switch the order of defining variables to avoid memory overwrites.

This fixes:
[ RUN ] nntrainer_capi_nnmodel.getWeight_01
[ FAILED ] nntrainer_capi_nnmodel.getWeight_01 (10 ms)

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[blas/neon] Add copy function for fp32 and fp16

- There was a missing implementation of user interface of fp32<->fp16 copy neon function.
- Add blas interface to use such neon funcions

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[Application] LLaMA weights converter for mha model

Actually, I already added weights converter for mha model in pr #2287.

There were 2 types of converter for supporting legacy and mha model.

But now, converter for legacy model is useless. So I deleted it.

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>

[Application] fix deepq to make it run

- Make res directory and move DeepQ.ini file to res dir
- If the input batch size does not match with batch size property of model graph,
set model graph batch size property as input batch size in forwarding function
- Allocate weight/tensor memory as train mode of mainNet, targetNet network
- Comment "return 1" statement so as to make it as train from scratch

Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>

implementation of nndetector
implementation of application to run object detection and learning personal object on mobile

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: hs0207.kim <hs0207.kim@samsung.com>

[bugfix] Android ndk-build error fix

This PR resolves ndk-build issue in the tensor fp16 unit test.

**Changes proposed in this PR:**
- Move implementation to header file to avoid linker error
- Change ambiguous variable and function names

This fixes:
[arm64-v8a] Executable : unittest_nntrainer_tensor_fp16
ld: error: undefined symbol: nntrainer::Tensor::setScaleFactors16(std::__ndk1::vector<_Float16, std::__ndk1::allocator<_Float16> >)
>>> referenced by unittest_nntrainer_tensor_fp16.cpp:5805 (../unittest/unittest_nntrainer_tensor_fp16.cpp:5805)

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>

[gtest] Fix gtest error assessing logic

- Float16 models tend to show unavoidable higher accuracy loss in: 1. huge Tensors 2. Tensors with huge values
- Reassess value-by-value logic with relative error when large error

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[bugfix] Fix Tensor save function when float16

- Previously, wrong getData function was called when saving fp16 Tensor
- Apply proper template parameter: _FP16

Resolves:
```
...
23/41 unittest_nntrainer_tensor_fp16          FAIL     2.26 s (exit status 1)
...
[  FAILED  ] nntrainer_Tensor.save_read_01_p
...
```

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>

[neon] Support scopy for multiple dataTypes in neon

- scopy_int4_to_fp32
- scopy_int8_to_fp32
- scopy_int8_to_fp16
- scopy_int8_or_int4 : Since we use uint8_t for int4 Tensors, codes can be shared here
- vcvt_fp32_u32_bitwise : Faster optimization can be done with bitwise operation rather than elementwise operation

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>