Jaeyun Jung [Fri, 26 Apr 2024 05:49:05 +0000 (14:49 +0900)]
[Build] dependency to api
Code clean, fix cyclic dependency between nntrainer and ml-api.
Build dependency to ml-api on nntrainer is unnecessary.
Signed-off-by: Jaeyun Jung <jy1210.jung@samsung.com>
Eunju Yang [Tue, 23 Apr 2024 04:23:19 +0000 (13:23 +0900)]
[LLaMA] Bugfix in LLaMA application
- This commit fixes a bug in `applyTKP` function.
- It seems applying Top-K and Top-P to logits didn't work as intended
Signed-off-by: Eunju Yang <ej.yang@samsung.com>
Debadri Samaddar [Tue, 23 Apr 2024 06:30:16 +0000 (12:00 +0530)]
[hgemm] hgemm noTrans with 1x4 kernel
Added hgemm_kernel_1x4
Added hgemm_noTrans_1x4 calls
Added unittest dot_gemm_50_768_516
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Donghyeon Jeong [Thu, 25 Apr 2024 04:34:17 +0000 (13:34 +0900)]
[bugfix] Fix build issues when fp16 is enabled
This PR resolves build issues occur in acti_func.h when fp16 is enabled.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Eunju Yang [Fri, 5 Apr 2024 00:15:01 +0000 (09:15 +0900)]
[LoRA] add alpha parameter to LoRA
- This commit adds `alpha` parameter to LoRA (fc)
- In the original paper,they adopted `alpha (int)` as a parameter to
derive the scaling factor internally, i.e., scaling = alpha / rank
- This commit takes `alpha` as a hyper-parameter and apply the scaling
factor to the LoRA layer.
- This commit's updates are summarized as follows:
- `common_properties.h` : add LoraAlpha as a parameter.
- `fc_layer.cpp`: update forwarding / calcGradient /
calcDerivative func to apply scaling factor in LoRA computation
- `fc_layer.h`: update to take LoraAlpha as fc_props
- `node_exporter.cpp/h`: add LoraAlpha as a parameter in
tf.export format of fc layer (to pass the test code)
- fix the code lines which may cause coverity issue.
- LoRA initialization is updated:
- LoRA A : ZEROS
- LoRA B : Normal
- [TODO] update tf exporter of fc layer
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Eunju Yang <ej.yang@samsung.com>
Boseong Seo [Sat, 20 Apr 2024 03:55:09 +0000 (12:55 +0900)]
Add Mish activation function
- Now, user can use Mish activation function like torch or tensorflow.
**Self evaluation**:
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Boseong Seo <suzy13549@snu.ac.kr>
Debadri Samaddar [Mon, 22 Apr 2024 04:01:44 +0000 (09:31 +0530)]
[hgemm] Removed unused header
Deleted unused header inclusion
Removed #include <iostream>
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Thu, 18 Apr 2024 09:02:29 +0000 (14:32 +0530)]
[hgemm] hgemm noTrans with kernel 1x8
Added 1x8 hgemm kernel, packing_A1, packing_B1 functions.
Incorporated hgemm_noTrans_1x8.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
MyungJoo Ham [Sat, 20 Apr 2024 04:22:44 +0000 (13:22 +0900)]
ci / remove cpp-linter's false positive reports.
It gives an error of "iostream" not found.
Install libstdc++-dev for possible compilers.
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Seungbaek Hong [Fri, 15 Mar 2024 07:11:11 +0000 (16:11 +0900)]
[API] Add tensor&operations API structure for supporting autograd
I added tensor & function(operation) api structure for supporting autograd.
Users can make a model graph using this api.
The operators will be supported one-to-one with ONNX.
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
heka1024 [Wed, 10 Apr 2024 05:54:26 +0000 (14:54 +0900)]
Add Softplus activation function
- Now, user can use Softplus activation function like torch or tensor flow.
- Furthermore, we can use this function to build Mish or other activation functions
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: heka1024 <heka1024@gmail.com>
hyeonseok lee [Wed, 17 Apr 2024 01:48:49 +0000 (10:48 +0900)]
[neuralnet] bugfix multi batch incremental inference
- This commit will handle when the model activation datatype is fp32
Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
Seungbaek Hong [Mon, 15 Apr 2024 12:39:17 +0000 (21:39 +0900)]
[Docs] add yolov3 readme file
added yolov3 readme file
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
Debadri Samaddar [Mon, 15 Apr 2024 07:40:44 +0000 (13:10 +0530)]
[OpenCL/GPU] Modified ifstream condition
Updated ifstream object valid condition
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Thu, 4 Apr 2024 09:21:10 +0000 (14:51 +0530)]
[GPU/OpenCL] Create kernel utility with binaries
Added feature for reading kernel binaries.
Managing already created kernels.
Added static flag and bitmask to check existing kernels.
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
heka1024 [Tue, 9 Apr 2024 13:50:02 +0000 (22:50 +0900)]
Add ELU activation function
- Now, user can use ELU activation function like torch or tensor flow.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Co-authored-by: Hanbyeol Kim kimhan0515@snu.ac.kr
Co-authored-by: Boseong Seo suzy13549@snu.ac.kr
Signed-off-by: heka1024 <heka1024@gmail.com>
Seungbaek Hong [Thu, 4 Apr 2024 11:24:54 +0000 (20:24 +0900)]
[application] add repetition_penalty to generate func
add some options to 'generate' function of llm
- add naive repetition_penalty option
- add bad_words option
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
Boseong Seo [Tue, 9 Apr 2024 13:23:31 +0000 (22:23 +0900)]
Remove LSTM example in Applications/README.md
- existing link to LSTM example does not work
- user can find LSTM example in Layers dir
+ LSTM dir merged to Layers dir (in PR nnstreamer#2107)
- delete LSTM example in Applications/README.md file in order to reduce confusion
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Boseong Seo <suzy13549@snu.ac.kr>
skykongkong8 [Thu, 4 Apr 2024 06:44:15 +0000 (15:44 +0900)]
[ hgemm ] Use macro kernel in 8x8 kernel
- Using macro kernel, we can choose somewhere between accuracy-latency tradeoff. Furthermore, it is easier to maintain this way.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Thu, 4 Apr 2024 06:42:11 +0000 (15:42 +0900)]
[ hgemm ] Apply software prefetching in 4x8 kernel
- We can expect to minimize cache miss using software prefetching
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Wed, 3 Apr 2024 11:10:57 +0000 (20:10 +0900)]
[ hgemm ] Implement 8x16 hgemm kernel
- This commit introduces 2 types of 8x16 hgemm kernel
1. full-fp16
2. fp16-fp32 partial accumulation
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
hyeonseok lee [Fri, 5 Apr 2024 13:49:45 +0000 (22:49 +0900)]
[neuralnet] enable multi batch incremental inference
- The output was not considered multi batch input in incremental inference.
Now it will return multi batch output.
Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
Seungbaek Hong [Wed, 3 Apr 2024 11:10:13 +0000 (20:10 +0900)]
[application] update llm generate function
- fix "temperature" operation
- add "top-k, top-p" option
- support batch mode
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
Boseong Seo [Thu, 4 Apr 2024 07:34:29 +0000 (16:34 +0900)]
Reformat code with .clang_format
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Boseong Seo <suzy13549@snu.ac.kr>
Boseong Seo [Thu, 4 Apr 2024 04:19:55 +0000 (13:19 +0900)]
[ BugFix ] Modify the wrong input in `EXPECT_EQ`
- `registerFactory` function returns the unsigned value of the int_key when int_key is given as -1 (default), but it was not considered in the code.
- So, modified the second argument (expected value) of `EXPECT_EQ` as follows.
Signed-off-by: Boseong Seo <suzy13549@snu.ac.kr>
Boseong Seo [Tue, 2 Apr 2024 19:28:35 +0000 (04:28 +0900)]
Use parameterized test in unittest
Use parameterized test according to existing TODO comment.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Boseong Seo <suzy13549@snu.ac.kr>
kimhan0515 [Thu, 4 Apr 2024 16:09:57 +0000 (01:09 +0900)]
Fix typo in docs
Fix typos for some docs
- README.md and docs/configuration-ini.md: simple typo
- Applications/MNIST/README.md: typo and duplicate image
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Hanbyeol Kim kimhan0515@snu.ac.kr
Signed-off-by: kimhan0515 <kimhan0515@gmail.com>
hyeonseok lee [Thu, 4 Apr 2024 12:23:37 +0000 (21:23 +0900)]
[layer] multi batch incremental forwarding
- Enable multi batch incremental forwarding by looping batchwise
Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
Debadri Samaddar [Tue, 2 Apr 2024 12:51:18 +0000 (18:21 +0530)]
[OpenCL] Added stringification macro and kernel path
Add DEFAULT_KERNEL_PATH as static member of Program class
Modified macros for stringification
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Fri, 15 Mar 2024 12:16:12 +0000 (17:46 +0530)]
[OpenCL] Added opencl kernel path as option
Added opencl-kernel-path preprocessor directive and handled inside opencl_program.
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Thu, 14 Mar 2024 07:55:40 +0000 (13:25 +0530)]
[OpenCL] Proper cleanup and readability
Used better C++ paradigm to enhance readability.
Added proper cleanup stub.
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Mon, 11 Mar 2024 11:14:19 +0000 (16:44 +0530)]
[OpenCL/GPU] Kernel binary caching
Added utilities for saving kernel as binary files.
Added wrapper for clCreateProgramWithBinary.
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Eunju Yang [Fri, 15 Mar 2024 06:35:28 +0000 (15:35 +0900)]
[LoRA] Apply Inception-LoRA
- updates the LoRA computation (applying Inception-LoRA)
- compute with LoRA vectors without matrix construction
- revise `forwarding()`
- revise `calcGradient()`
- revise `calcDerivative()`
Self evaluation:
Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Eunju Yang <ej.yang@samsung.com>
Eunju Yang [Tue, 12 Mar 2024 00:14:06 +0000 (09:14 +0900)]
[ Trivial ] apply clang-format to fc_layer.cpp
- clang-format re-apply to pass static checker
- `fc_layer.cpp`
Signed-off-by: Eunju Yang <ej.yang@samsung.com>
Eunju Yang [Fri, 8 Mar 2024 07:08:12 +0000 (16:08 +0900)]
[ trivial ] fix doxgen tag check error
- remove a redundant and incorrect block comment in
`nntrainer/layers/fc_layer.cpp`
Signed-off-by: Eunju Yang <ej.yang@samsung.com>
Eunju Yang [Fri, 8 Mar 2024 06:36:39 +0000 (15:36 +0900)]
[ trivial ] apply clang-format
- apply clang format to
- nntrainer/tensor/tensor_v2.cpp
- nntrainer/utils/node_exporter.cpp
Signed-off-by: Eunju Yang <ej.yang@samsung.com>
Eunju Yang [Wed, 6 Mar 2024 02:45:08 +0000 (11:45 +0900)]
[LoRA/Trivial] fix typo and edit comments
- Fix typo in the code
- edit comments to to add some explanations
Signed-off-by: Eunju Yang <ej.yang@samsung.com>
Eunju Yang [Wed, 6 Mar 2024 02:13:28 +0000 (11:13 +0900)]
[LoRA] Revise LoRA implementation for fc_layer
- remove `forwarding_lora()` function
- update forwarding path with LoRA option
- First, compute the forwarding logits of base weight (W) and lora weight (A @ B) respectively.
- then merge the logits to return
- [update] (W + A @ B)x -> Wx + (A @ B)x
- update `calcDerivative` to reflect the changes in forwarding operation
- implicit update of calcDerivative is updated.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Eunju Yang <ej.yang@samsung.com>
Eunju Yang [Wed, 31 Jan 2024 07:07:31 +0000 (16:07 +0900)]
[LoRA] revise type of LoraRank property & fix error in fc_layer
- update type of LoraRank property : Property<int> -> PositiveIntegerProperty
- fix typo dot_batched_deriv_wrt_1 -> dot_deriv_wrt_1
- update code with add -> add_i
- apply clang-format
Signed-off-by: Eunju Yang <ej.yang@samsung.com>
Eunju Yang [Wed, 31 Jan 2024 07:05:50 +0000 (16:05 +0900)]
[LoRA] update node_exporter of fully connected layer
This commit updates TfLite node exporter of fully conntected layer. It adds new property (LoraRank) as additional input property of fullyconnected layer
Signed-off-by: Eunju Yang <ej.yang@samsung.com>
Eunju Yang [Mon, 29 Jan 2024 06:57:15 +0000 (15:57 +0900)]
[LoRA] add a new feat(lora) to fc layer
This commit includes implementation of lora only for the FC layer, which means it is not the generalized version. It is required to be written as a seperate class in order to remove code duplicates for other layers
Signed-off-by: Eunju Yang <ej.yang@samsung.com>
Donghak PARK [Wed, 27 Mar 2024 05:27:19 +0000 (14:27 +0900)]
[Layer] Create Depthwise 2D Convolution
This pull request defines a header file for depthwise convolution.
It is a draft for a new layer and welcome any feedback or assistance you may have.
This layer is necessary to support various applications such as SV.
- Depthwise convolution is a type of convolution in which each input channel is convolved with a different kernel (called a depthwise kernel).
- Unlike a regular 2D convolution, depthwise convolution does not mix information across different input channels.
**Changes proposed in this PR:**
- Add Depthwise Convolution 2D Layer
Resolves:
- #2520
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
skykongkong8 [Wed, 3 Apr 2024 07:02:16 +0000 (16:02 +0900)]
[ neon ] Apply kernel based hgemm
- Now hgemm subdirectory is included when neon fp16 is in use
- WIP : hgemm 8x16 kernel
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Wed, 3 Apr 2024 04:29:06 +0000 (13:29 +0900)]
[ Trivial ] Fix typo
- GEMM unittest for square 1024 was generating improper dimension. Fix accordingly.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Wed, 3 Apr 2024 04:23:42 +0000 (13:23 +0900)]
[ hgemm ] Use optimized hgemm if possible
- We can use optimized version of hgemm with following condition:
1. noTrans hgemm
2. M, N, K is divisible with 4 or 8
3. Row Major GEMM
4. alpha = 1.0, beta = 0.0 (will be patched soon)
- Otherwise, use previous version as a fallback.
- Note that there are a few optimization strategy is left for optimal hgemm.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Wed, 3 Apr 2024 04:17:30 +0000 (13:17 +0900)]
[ hgemm ] Implement 8x8 hgemm kernel
- This commit introduces 2 types of 8x8 hgemm kernel
1. full-fp16
2. fp16-fp32 partial accumulationCommit title (Until 50 colums per line)
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Wed, 3 Apr 2024 04:16:06 +0000 (13:16 +0900)]
[ hgemm ] Implement 4x8 hgemm kernel
- This commit introduces 2 types of 4x8 hgemm kernel
1. full-fp16
2. fp16-fp32 partial accumulation
- Additionally, 4x8 kernel has macro kernel that can regulate accuracy-latency tradeoff. By default it uses partial sum up to 256 digits. Other kernels will be refactored in this way ASAP.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
heka1024 [Tue, 2 Apr 2024 14:53:07 +0000 (23:53 +0900)]
Fix typo in test
Fix some typo in testcase. `duing` -> `during`, `TSETS` -> `TESTS`. And add doxgen for `nntrainer_LazyTensorOpsTest`
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: heka1024 <heka1024@gmail.com>
skykongkong8 [Fri, 29 Mar 2024 01:31:00 +0000 (10:31 +0900)]
[ HGEMM/draft ] Draft of kernel-based hgemm
- Previously, hgemm was implemented without taking packing / kernel into consideration.
- Here I would like to introduce kernel-based hgemm. It consists of:
1. packing A / B matrix for 4 / 8 divisible case
2. 4x4, 8x8 hgemm kernel for full-fp16 case
- More features like fine-grained packing strategies and kernels will be updated in the near future.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
Donghyeon Jeong [Fri, 29 Mar 2024 06:22:09 +0000 (15:22 +0900)]
[Coverity] Fix coverity issues
This PR resolves coverity issues of overflow, use of auto that causes a copy, missing lock and thread lock.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Seungbaek Hong [Wed, 27 Mar 2024 09:44:06 +0000 (18:44 +0900)]
[svace] fix svace issues
fixed all svace issues on main branch
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
Donghyeon Jeong [Thu, 28 Mar 2024 04:20:52 +0000 (13:20 +0900)]
[Coverity] Fix coverity issues
This PR resolves coverity issues of use of auto that causes a copy and missing lock.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Donghak PARK [Wed, 27 Mar 2024 07:23:39 +0000 (16:23 +0900)]
[Trivial] Disable cpp-linter action's clang-format
We currently perform a Clang format check during our static checks.
The CPP-Linter we are using is from the Action Market and occasionally produces different results even when the same version is specified.
This reduces efficiency for developers, so only the static check with more detailed logs will be left and the CPP-Linter function will be disabled.
However, the existing Linter function will remain.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
hyeonseok lee [Mon, 25 Mar 2024 10:40:50 +0000 (19:40 +0900)]
[coverity] fix coverity issues
- Added const auto & to avoid copy of an object
- Added missing lock
Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
Donghak PARK [Fri, 22 Mar 2024 06:04:46 +0000 (15:04 +0900)]
[ coverity ] Fix Coverity issue
Fix Coverity issue on
- /test/unittest/layers/layers_golden_tests.cpp
- /test/unittest/models/unittest_models_recurrent.cpp
- /test/unittest/unittest_nntrainer_models.cpp
Resolves:
```
Use of auto that causes a copy (AUTO_CAUSES_COPY)
auto_causes_copy: This lambda has an unspecified return type
copy: This return statement creates a copy.
```
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Eunju Yang [Thu, 21 Mar 2024 07:51:00 +0000 (16:51 +0900)]
[coverity] fix coverity issue
- This commit fixes the coverity issues
- AUTO_CAUSES_COPY
- MSSING_LOCK
Self-evaluation:
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Eunju Yang <ej.yang@samsung.com>
Seungbaek Hong [Mon, 25 Mar 2024 02:27:29 +0000 (11:27 +0900)]
[coverity] fix coverity issue
Fix coverity issue on
- /test/unittest/layers/layers_golden_recurrent.cpp
The other issues assigned have already been fixed.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
Donghyeon Jeong [Thu, 21 Mar 2024 07:42:43 +0000 (16:42 +0900)]
[Coverity] Fix the coverity issue
This PR resolves the coverity issues of use of auto that causes a copy.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
heka1024 [Sun, 17 Mar 2024 06:59:19 +0000 (15:59 +0900)]
Fix minor errors in github action
- `actions/setup-python@v1` is deprecated. So, bump version to v5.
- Step name says it uses python3.9, but it actually installs 3.10. Match name and what it actually doing.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: heka1024 <heka1024@gmail.com>
skykongkong8 [Thu, 21 Mar 2024 10:34:32 +0000 (19:34 +0900)]
[ coverity ] Fix coverity issue
- Fix coverity issue
1774230,
1774235,
1774238,
1774239,
1774243
Resolves:
```
Use of auto that causes a copy (AUTO_CAUSES_COPY)
auto_causes_copy: This lambda has an unspecified return type
copy: This return statement creates a copy.
```
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
heka1024 [Mon, 18 Mar 2024 16:35:10 +0000 (01:35 +0900)]
Use parameterized test in `NamePropertyTest`
To make code more readable, use parameterized test according to existing TODO comment.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: heka1024 <heka1024@gmail.com>
GyeongHoe Koo [Sun, 17 Mar 2024 06:44:18 +0000 (15:44 +0900)]
Bump actions/checkout in Ubuntu Meson build & test
Node 16 has reached end of life. So, github recommend transition to actions which use node 20+. [Ref](https://github.blog/changelog/2023-09-22-github-actions-transitioning-from-node-16-to-node-20/)
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: heka1024 <heka1024@gmail.com>
Eunju Yang [Mon, 18 Mar 2024 08:25:53 +0000 (17:25 +0900)]
[coverity] fix coverity issues
This commit fixes coverity issues of auto_causes_copy
-
1739360
-
1740106
Self-evaluation:
Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Eunju Yang <ej.yang@samsung.com>
Donghyeon Jeong [Mon, 18 Mar 2024 07:42:58 +0000 (16:42 +0900)]
[Coverity] Fix the coverity issue
This PR resolves the coverity issues of missing lock and use of auto that causes a copy.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Donghyeon Jeong [Thu, 7 Mar 2024 05:43:48 +0000 (14:43 +0900)]
[Layer] Remove Tensor setDataType() usuage
In several layers, there are attempts to change the data type of a Tensor object after initializing it.
This is currently possible but can cause issues down the line (e.g., treat FloatTensor object as HalfTensor).
As such, the setDataType() method will be removed and considered not to be used in future updates.
Instead, users will need to provide the desired data type when creating a new tensor.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
skykongkong8 [Mon, 11 Mar 2024 05:05:00 +0000 (14:05 +0900)]
[ neon/trivial ] Use N8 for hgemm, and for starting index for the remaining Tensor area
- Like hgemv_transpose, use N8 for hgemm_noTrans as well
- we can re-use this value for the starting index for the remaining area
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Mon, 11 Mar 2024 03:05:09 +0000 (12:05 +0900)]
[ neon ] Use bigger kernel in hgemv
- Using up to 16x8 sized kernel shows highest latency. Apply accordingly.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Mon, 11 Mar 2024 02:22:56 +0000 (11:22 +0900)]
[ neon ] Use N8 for shorter code
- Use 8-divisible N8 for desired vector length for multithreading
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
Donghyeon Jeong [Wed, 6 Mar 2024 05:22:53 +0000 (14:22 +0900)]
[TensorV2] Completed integration of remaining functions from Tensor
This commit integrated all remaining functions from Tensor class into TensorV2.
This includes fill(), setData(), setValueInt(), sin(), and cos().
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Donghyeon Jeong [Mon, 4 Mar 2024 08:30:24 +0000 (17:30 +0900)]
[TensorV2] Feature Scaling Functions
This pull request adds two new feature scaling functions - standardization and normalization - to the Tensor class. These functions help users preprocess input data before feeding it into models, improving model performance and accuracy.
**Changes proposed in this PR:**
* Added normalization() function to rescale values to a range between 0 and 1
* Added standardization() function to center data around the mean and scales to a standard deviation of 1
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
skykongkong8 [Mon, 4 Mar 2024 01:13:06 +0000 (10:13 +0900)]
[ Tensor ] Apply fallback blas operations in Tensor : div, sub
- Previous Tensor::subtract has been relying on Tensor::add with input scalar multiplier as -1. However, I found that this is quite slower than explicit implementation of subtract operation. Implement and add accordingly.
- Ditto with previous commit
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Mon, 4 Mar 2024 01:01:20 +0000 (10:01 +0900)]
[ Tensor ] Apply fallback blas operations in Tensor : mul, add
- for stride size of 1, when there is additional SIMD implementation, we can enjoy accelerated function
- with neither if non-1-stride size case nor the SIMD implementation, use fallback function
- note that fallback functions also get the same parameters for future refactorization
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Tue, 27 Feb 2024 08:24:10 +0000 (17:24 +0900)]
[ Tensor ] Use ele_mul instead of std::transform with std::multiplies
- As far as I am concerned, this code will not deteriorate function latency, but make it clearer
- Implement ele_mul_fallback for default option
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Tue, 27 Feb 2024 07:26:39 +0000 (16:26 +0900)]
[ BLAS ] Remove beta condition for fallback function in BLAS
- `apply_broadcast` function would throw error when the input Tensor is not allocated.
- Thus, checking for NaN in BLAS level would be redundant. Remove accordingly.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
Donghyeon Jeong [Tue, 27 Feb 2024 07:09:25 +0000 (16:09 +0900)]
[TensorV2] Add utility member functions to TensorV2 class
This pull request adds several new utility member functions to the TensorV2 class, enabling users to perform various tasks with their tensors more easily and efficiently.
These include saving and loading tensors, updating batches, getting argmax and max absolute values, and more.
The implementation is based on the current Tensor class and aims to improve the overall usability and flexibility of the TensorV2 class.
**Changes proposed in this PR:**
- Added save() and read() methods to allow saving and loading of saved tensor data.
- Added Map() method to create a new Tensor object from a buffer.
- Added argmax() and max_abs() methods to retrieve the indices of max value by batch and the value of the maximum absolute element in a tensor.
- Added updateBatch() to update tensor batch size.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Debadri Samaddar [Mon, 11 Mar 2024 12:00:29 +0000 (17:30 +0530)]
[build] Updated Android build script for OpenCl inclusion
Added enable-opencl flag for Android build.
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Donghyeon Jeong [Wed, 6 Mar 2024 02:37:15 +0000 (11:37 +0900)]
[TensorV2] Add functions to split and concatenate tensors
This pull request adds two new methods to the Tensor class: split() and cat().
The split() method allows users to divide a tensor into multiple smaller tensors along the specified axis.
The cat() method enables them to combine multiple tensors into a single larger tensor along a given axis.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
skykongkong8 [Mon, 11 Mar 2024 02:14:40 +0000 (11:14 +0900)]
[ bugfix ] Fix wrong input when hgemv_transpose
- Wrong input has been fed when hgemv
- Previously it was quite unclear to evaluate similar A-B case, need to implement a proper way/convention to generate random input data.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
Donghak PARK [Fri, 8 Mar 2024 04:57:55 +0000 (13:57 +0900)]
[Trivial/CI] sync cpp-linter version with clang-format
sync cpp-linter version with clang-format
- change cpp-linter version to 14
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Donghyeon Jeong [Mon, 26 Feb 2024 05:23:24 +0000 (14:23 +0900)]
[TensorV2] Average Tensor element by axes
This pull request adds new functions to the TensorV2 that allow users to average tensor elements along specified axes.
The functions take in an axis or list of axes as input and return a new tensor with elements replaced by their corresponding means. If no axis is provided, it returns a tensor value by averaging the elements by all axes.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Debadri Samaddar [Thu, 7 Mar 2024 07:16:46 +0000 (12:46 +0530)]
[OpenCL] Reduced ifdef checks
Handled conditions by reducting ifdef checks
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Thu, 7 Mar 2024 07:13:26 +0000 (12:43 +0530)]
[OpenCL] CI issues fixed for clang and doxygen
Fixed CI issues for the following:
Third party files: clang
Non third party files: clang, doxygen
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Tue, 5 Mar 2024 07:54:31 +0000 (13:24 +0530)]
[OpenCL] Addressed review comments
Modified doc for OpenCL buffer move constructor
Addressed review comments
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Fri, 1 Mar 2024 08:17:41 +0000 (13:47 +0530)]
[OpenCL] Added doxygen comments
Added comments.
Modified command queue manager to have dynamic local_work_size.
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Tue, 27 Feb 2024 07:03:38 +0000 (12:33 +0530)]
[GPU] Added unit test for cl_context
Modified layers_dependent_common_tests to add cl_context instance creation
Updated build setup for test
** Self-evaluation**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Fri, 23 Feb 2024 14:08:09 +0000 (19:38 +0530)]
[GPU] Added cl_context and compute engine information
Created enum LayerComputeEngine at layer.h API level
Modified layer.h, factory.cpp, layer_node.h/cpp to propagate compute engine information
Added cl_context to handle global configuration of OpenCL environment
Modified RunLayerContext to enable getter/setter for compute engine
Added kernel creation utility in RunLayerContext
Used FullyConnected layer as example for propagating compute info
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Fri, 23 Feb 2024 14:02:04 +0000 (19:32 +0530)]
[OpenCL]Refactored example tensor GPU ops
Refactored GPU SGEMV ops at tensor level.
Modified tensor.h and tensor.cpp to remove GPU info.
Renamed headers of OpenCL wrappers and added opencl namespace.
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Wed, 7 Feb 2024 10:39:37 +0000 (16:09 +0530)]
[OpenCL] enable-opencl flag added to build option
enable-opencl flag added which defines ENABLE_OPENCL macro internally.
Fixed clang issues.
** Self-evaluation**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Tue, 6 Feb 2024 08:39:27 +0000 (14:09 +0530)]
[GPU] Sum tensor signature updated for GPU operation
Modified sum and sum_by_bath function signatures to incorporate boolean flag.
This flag will help to decide which compute unit will be used.
** Self-evaluation**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Tue, 6 Feb 2024 08:32:27 +0000 (14:02 +0530)]
[GPU] Added OpenCL interface for OpenCL tensor operations
Create cl_interface.h to handle tensor operation calls on GPU
Added experimental SGEMV GPU kernel
** Self-evaluation**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Debadri Samaddar [Tue, 6 Feb 2024 08:23:06 +0000 (13:53 +0530)]
[OpenCL] OpenCL GPU wrappers added
Created OpenCL wrappers for the following:
- Context
- Command queue
- Kernel
- Program
- Buffer
Added operation interface to handle the flow of operation
** Self-evaluation**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
skykongkong8 [Sun, 25 Feb 2024 23:52:43 +0000 (08:52 +0900)]
[ util ] Use proper STL in max element comparison
- For cleaner code use std::max_element instead of for-loop
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Tue, 13 Feb 2024 06:47:09 +0000 (15:47 +0900)]
[ util ] Implement exp_i function
- Add exponential inplace function
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Tue, 13 Feb 2024 03:16:07 +0000 (12:16 +0900)]
[ util ] Use max value in softmax function
- For numerical stability, using negative values for the input of exponential function is recommended. (since negative output will range from 0 to 1)
- Subtract the maximum value before calculating exponential vectors
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Tue, 13 Feb 2024 02:26:46 +0000 (11:26 +0900)]
[ util ] Implement max function in util_simd
- Unlike isamax function of BLAS, this function returns the maximum 'value', not index
- Note that this function is applicable only when the input data is continuous
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Tue, 13 Feb 2024 01:17:14 +0000 (10:17 +0900)]
[ util ] Implement softmax function in util_simd
- Current softmax implementation does not consider fp32 use in half-precision softmax
- Implement raw, and neon-simd version of softmax function with fp32 and fp16 with fp32 accumulation
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
Donghyeon Jeong [Thu, 22 Feb 2024 10:55:33 +0000 (19:55 +0900)]
[TensorV2] Tensor element summation by axes feature
This pull request introduces a new feature that enables the summation of tensor elements by axes.
This feature allows users to sum up tensor values along specified axes, making it easier to perform complex mathematical operations involving tensors.
**Changes proposed in this PR:**
- Added new function sum() that takes a tensor and an axis as input and returns the sum of all elements along that axis.
- Added mergeAxis feature which merges the two axes for tensor into one.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Donghak PARK [Wed, 28 Feb 2024 03:19:19 +0000 (12:19 +0900)]
[util] Add numpy file reader
In deep learning data feeding, there are often tasks that involve reading and processing numpy (.npy) files.
Therefore, it would be much more convenient to develop by entering the file name and returning the data in a vector instead of manually writing code to read it every time.
Function Signature
```
void read_npy_file(const char *file_path, std::vector<int> &dims,
std::vector<float> &values);
```
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Donghyeon Jeong [Mon, 19 Feb 2024 11:52:58 +0000 (20:52 +0900)]
[TensorV2] Create a mask of dropout, filter, and zoneout
This PR adds new functionalities for getting masks of the following techniques: dropout, filter, and zoneout.
These functions enable working with masks, making it easier to perform such techniques in regularization.
**Changes proposed in this PR:**
- Added dropout_mask(), which calculates the dropout mask by multiplying tensor elements by 1.0 / (1.0 - dropout rate), in place.
- Added filter_mask(), which takes an input tensor and applies a filter mask based on the given mask length and invert option.
- Added zoneout_mask(), which generates two zoneout masks, one for in-place operation and another for opposite operation, based on the specified zoneout rate.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Donghyeon Jeong [Wed, 28 Feb 2024 05:47:49 +0000 (14:47 +0900)]
[Coverity] Fix the coverity issue
This PR resolves the coverity issue of resource leak.
**Changes proposed in this PR:**
- Switch the order of assert to check if the output is a nullptr to ensure releasing acquired resources (e.g., outdata_beta)
**This fixes:**
- leaked_storage: Variables going out of scope leak the storage it points to
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>