platform/core/ml/nntrainer.git
10 months ago[ layer ] Bugfix for enabling unittest_models on Android
skykongkong8 [Thu, 30 May 2024 04:09:06 +0000 (13:09 +0900)]
[ layer ] Bugfix for enabling unittest_models on Android

- This commit fixes unusual memory access on cross-compiled unittest executable on Android

Resolves:
> SIGSEGV : signal segmentation violation
- lldb | signal SIGSEGV: invalid address
- SIGILL : illegal instruction

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[Application] Bug fix about meson setting
Seungbaek Hong [Fri, 7 Jun 2024 08:44:10 +0000 (17:44 +0900)]
[Application] Bug fix about meson setting

Now, PICO GPT and LLAMA are adding extra_defines meson option in the application side.

However, even if this code is executed during build, this definition is not reflected when actually running the app.

Because the application area is built after the process of reflecting extra_defines to add_project_arguments has already been completed, so adding extra_defines during application build is meaningless.

In addition, it is impossible to call add_project_arguments after build, so the structure to add extra_defines during build process is wrong.

The reason why PICO GPT and LLAMA add extra_defines is that the encoder-related script created now does not run on tizen, so encoder-related option was added to the root meson and the options on the application side were removed.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
10 months ago[refactor] Moved blas_kernels to tensor directory
Debadri Samaddar [Wed, 5 Jun 2024 10:20:35 +0000 (15:50 +0530)]
[refactor] Moved blas_kernels to tensor directory

Moved common OpenCL blas kernels to tensor directory.
Added pre processing functions as common that can be re-used.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
10 months ago[refactor] Removed experimental OpenCL kernel files
Debadri Samaddar [Wed, 5 Jun 2024 10:18:39 +0000 (15:48 +0530)]
[refactor] Removed experimental OpenCL kernel files

Removed OpenCL kernel files used for experiments.
There are no dependencies on these files.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
10 months agoactions: gbs build test
MyungJoo Ham [Tue, 4 Jun 2024 09:56:09 +0000 (18:56 +0900)]
actions: gbs build test

Run gbs build for x64/x86/aarch64/armv7l Tizen.
This is imported from nnstreamer.git.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months agoaction: Yocto devtool test
MyungJoo Ham [Thu, 13 Jun 2024 10:31:06 +0000 (19:31 +0900)]
action: Yocto devtool test

Test if a pull request breaks Yocto build or not.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months ago[CI] Add fp-16 build in github action
heka1024 [Mon, 17 Jun 2024 19:22:10 +0000 (04:22 +0900)]
[CI] Add fp-16 build in github action

Closes #2560. You can now see the build results from Fp16 in a Github action.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: heka1024 <heka1024@gmail.com>
10 months agoaction: add Ubuntu pdebuild
MyungJoo Ham [Thu, 13 Jun 2024 10:28:48 +0000 (19:28 +0900)]
action: add Ubuntu pdebuild

Run pdebuild to test if it is not breaking PPA builds.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months ago[ layer ] Optimize LSTM fp16 computation
skykongkong8 [Tue, 18 Jun 2024 00:54:32 +0000 (09:54 +0900)]
[ layer ] Optimize LSTM fp16 computation

Using add_i_partial function in LSTM layer will reduce if/def codeblock, and even accelerate the function latency.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[ Tensor ] Implement add_i_partial
skykongkong8 [Tue, 18 Jun 2024 00:50:34 +0000 (09:50 +0900)]
[ Tensor ] Implement add_i_partial

- Occasionally, add_i computation for only interested section is desired.
- Moreover, this function could lower down if/def code blocks from the layer level.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[Doc] Update activation function in `README.md`
heka1024 [Mon, 17 Jun 2024 19:05:51 +0000 (04:05 +0900)]
[Doc] Update activation function in `README.md`

Sync supported function with `README.md`

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: heka1024 <heka1024@gmail.com>
10 months agoandroid: consistant ML_API_COMMON macro
MyungJoo Ham [Thu, 13 Jun 2024 07:56:10 +0000 (16:56 +0900)]
android: consistant ML_API_COMMON macro

ML_API_COMMON macro has been inconsistent for Android build,
where it is force-defined 1 in Android.mk while it may
become 0 or 1 depending on build system in meson.

Because android build uses meson and Android.mk simultaneously,
this must become consistent.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months agoaction: add Android build test
MyungJoo Ham [Wed, 12 Jun 2024 06:36:56 +0000 (15:36 +0900)]
action: add Android build test

Android is the major release target of nntrainer.
Build and run test cases for Android.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months agoaction: add check if rebuild required module
MyungJoo Ham [Thu, 13 Jun 2024 04:23:33 +0000 (13:23 +0900)]
action: add check if rebuild required module

Import check-if-rebuild-requires module from nnstreamer.git

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months ago[ hgemm ] Use hgemm kernel in transpose cases
skykongkong8 [Mon, 17 Jun 2024 23:49:23 +0000 (08:49 +0900)]
[ hgemm ] Use hgemm kernel in transpose cases

- With SIMD version of fp16 transpose code, using hgemm kernel in transpose case would be more useful.
- Note that we should develop a data packing code for this case for further optimization.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[trivial] fix typo error
Donghyeon Jeong [Tue, 18 Jun 2024 02:31:10 +0000 (11:31 +0900)]
[trivial] fix typo error

Fix typo error
- README.md

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
10 months agoFixed the build error for gcc-14
wchang kim [Mon, 10 Jun 2024 04:49:16 +0000 (13:49 +0900)]
Fixed the build error for gcc-14

This is imported from review.tizen.org

Change-Id: I80e2332711ae405488b39eaf060384e7490a7c45
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
10 months ago[ layer ] Enable mha gtest and match version
skykongkong8 [Mon, 3 Jun 2024 09:31:10 +0000 (18:31 +0900)]
[ layer ] Enable mha gtest and match version

- Current mha layer at nntrainer/layer is not for general use, but implemented solely for LLaMA support.
- In order to run unittest for mha layer, return to previous version of mha layer, and move current implementation under Application/LLaMA

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[bugfix/unittest] Using LayerSemanticsGpu for FC Layer test
Debadri Samaddar [Wed, 5 Jun 2024 05:34:12 +0000 (11:04 +0530)]
[bugfix/unittest] Using LayerSemanticsGpu for FC Layer test

Using newly added LayerSemanticsGpu for FC Layer GPU unittests.
Renaming fp16 unit test variable to avoid duplicate declaration when all tests are run.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
10 months ago[ docs ] Add lldb-server debugger guide file
skykongkong8 [Wed, 5 Jun 2024 01:23:06 +0000 (10:23 +0900)]
[ docs ] Add lldb-server debugger guide file

- In order to attach debugger to android unittest, using lldb is quite useful.
- Add some guidelines to attach

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[ neon/trivial ] Compare float scaling factors more precisely
skykongkong8 [Mon, 3 Jun 2024 10:55:31 +0000 (19:55 +0900)]
[ neon/trivial ] Compare float scaling factors more precisely

- for zero-comparison, use std::fpclassify
- for 1.0-comparison, use std::numeric_limits and epsilon

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[ Trivial ] Fix typo and use better iterating index
skykongkong8 [Thu, 23 May 2024 00:28:48 +0000 (09:28 +0900)]
[ Trivial ] Fix typo and use better iterating index

- Fix typo for hgemm kernels docs
- Use fixed size4, size8 instead of getting value every time

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[ hgemm ] Support scaling factor beta in kernel-based hgemm
skykongkong8 [Tue, 14 May 2024 07:29:53 +0000 (16:29 +0900)]
[ hgemm ] Support scaling factor beta in kernel-based hgemm

- This commit allows hgemm to get beta condition as well.
- Note that beta for here is as follow:
C = alpha * A * B + beta * C
- In addition add zero-init code for beta = 0.F case. According to recent model profiling result, even for initialization, minimizing instruction is quite helpful more overall model latency reduction.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[ hgemm ] Consider tiny gemm case
skykongkong8 [Thu, 9 May 2024 10:50:59 +0000 (19:50 +0900)]
[ hgemm ] Consider tiny gemm case

- Current hard-coded lenght kernel hgemm, matrix length that are smaller than unit vector length might fail.
- Toss such case to fallback. Such small sized hgemm will take a few nanosecond latency, therefore negligible.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[acti_func] implement quick gelu
hyeonseok [Fri, 7 Jun 2024 04:14:46 +0000 (13:14 +0900)]
[acti_func] implement quick gelu

 - Implemented quick gelu function.
   Please note that quickGeluPrime which is calculate derivate of quickGelu function is not yet implemented.

Signed-off-by: hyeonseok <hs89.lee@samsung.com>
10 months ago[blas/neon] isamax edge cases unit tests
Debadri Samaddar [Thu, 30 May 2024 07:30:54 +0000 (13:00 +0530)]
[blas/neon] isamax edge cases unit tests

Added tests for UINT16_MAX boundary cases of isamax

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
10 months ago[blas/neon] isamax improvement for larger input length
Debadri Samaddar [Wed, 22 May 2024 07:22:02 +0000 (12:52 +0530)]
[blas/neon] isamax improvement for larger input length

Used uint32_t operations to process indices larger than 65535.
Added unittest of shape(1,1,768,768) for max_abs which calls isamax

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
10 months ago[ trivial ] Add doxygen tags in matrix transpose functions
skykongkong8 [Thu, 23 May 2024 04:34:23 +0000 (13:34 +0900)]
[ trivial ] Add doxygen tags in matrix transpose functions

- add doxygen tags to avoid CI fail
- trivial formatting

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[ BLAS ] Support non-4-divisible case in matrix transpose
skykongkong8 [Mon, 13 May 2024 07:53:18 +0000 (16:53 +0900)]
[ BLAS ] Support non-4-divisible case in matrix transpose

- Previously, there was a code defect when transposing matrix with non-4-divisible col length.
- Bugfix and refactor its using interface: move transpose fallback when NEON is supported.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[ Tensor ] Use SIMD accelerated transpose if possible
skykongkong8 [Thu, 9 May 2024 23:52:35 +0000 (08:52 +0900)]
[ Tensor ] Use SIMD accelerated transpose if possible

- If it is for height-width transpose, we can enjoy SIMD accelerated code.
- Use SIMD version if possible, otherwise fallback.
- Through this commit, followings are expected to be accelerated, or can be accelerated with ease in the near future:
  - "0:2:1" transpose
  - BiQHGEMM
  - HGEMM

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[ blas ] Add transpose_matrix function
skykongkong8 [Thu, 9 May 2024 23:52:06 +0000 (08:52 +0900)]
[ blas ] Add transpose_matrix function

- Add new function "transpos_matrix" to use newly implemented matrix transpose code

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[ matrix_transpose_neon ] Implement NEON-accelereated half-precision matrix transpose
skykongkong8 [Thu, 9 May 2024 23:51:13 +0000 (08:51 +0900)]
[ matrix_transpose_neon ] Implement NEON-accelereated half-precision matrix transpose

- Previously, matrix transpose was relying on naive for-loop implementaion.
- Using SIMD instructions, there is a room to be latency-optimized.
- Note that current implementation only supports half-precision matrices.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
10 months ago[GPU/OpenCL] Added fp16 support for FC layer on GPU
Debadri Samaddar [Wed, 29 May 2024 09:00:10 +0000 (14:30 +0530)]
[GPU/OpenCL] Added fp16 support for FC layer on GPU

Added blas_kernels_fp16.cpp for fp16 kernels.
fp16 unit tests added.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
10 months ago[unittest/gpu] Added LayerSemanticsGpu test suite
Debadri Samaddar [Tue, 21 May 2024 11:29:36 +0000 (16:59 +0530)]
[unittest/gpu] Added LayerSemanticsGpu test suite

Added semantics test for layers on GPU.
Removed redundant run_context compute engine setter call.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
10 months ago[Application] yolo v2 bug fix
Seungbaek Hong [Fri, 31 May 2024 04:27:18 +0000 (13:27 +0900)]
[Application] yolo v2 bug fix

register a custom layer that had been omitted on yolo v2.

**Self evaluation:**
1. Build test:   [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
10 months ago[GPU/OpenCL] fp16(half) support
Debadri Samaddar [Tue, 28 May 2024 12:19:39 +0000 (17:49 +0530)]
[GPU/OpenCL] fp16(half) support

Added check for fp16 support on OpenCL.
Will show proper message if not found.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
11 months ago[goldendata] Added script to generate Swiglu data
Debadri Samaddar [Thu, 23 May 2024 08:10:35 +0000 (13:40 +0530)]
[goldendata] Added script to generate Swiglu data

Added code stub to generate Swiglu layer's golden test data.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
11 months ago[Application] update yolo v2 modeling
Seungbaek Hong [Wed, 22 May 2024 05:39:01 +0000 (14:39 +0900)]
[Application] update yolo v2 modeling

update yolo v2 modeling part of yolo v2.
(update some hyper param values)

- update yolo v2 pytorch(python) script
- update yolo v2 nntrainer(c++) script

* issue
- activation function(in this case, leaky relu) of nntrainer needs to support setting negative slope via
parameter...

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
11 months ago[bugfix/refactor] OpenCL buffer creation fix and optimization
Debadri Samaddar [Fri, 17 May 2024 07:34:51 +0000 (13:04 +0530)]
[bugfix/refactor] OpenCL buffer creation fix and optimization

Used proper size while creating OpenCL buffers.
Optimized SGEMM kernel with 2D global work size.
Modified function docs.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
11 months ago[bugfix] Used global memmory for result in dot_cl kernel
Debadri Samaddar [Thu, 16 May 2024 07:42:39 +0000 (13:12 +0530)]
[bugfix] Used global memmory for result in dot_cl kernel

Fixed kernel argument bug for dot_cl kernel

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
11 months ago[bugfix] Renamed variables in unittest of FC Layer
Debadri Samaddar [Wed, 15 May 2024 10:38:04 +0000 (16:08 +0530)]
[bugfix] Renamed variables in unittest of FC Layer

Renamed global variables in unittest_layers_fully_connected_cl.cpp to fix duplicate declaration error

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
11 months ago[GPU/OpenCL] Resuable blas OpenCL kernels
Debadri Samaddar [Tue, 14 May 2024 08:26:20 +0000 (13:56 +0530)]
[GPU/OpenCL] Resuable blas OpenCL kernels

Added blas_kernels to enhance resuability of the common blas kernels.
Used FullyConnected interface for both CPU and GPU calls.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
11 months ago[unittest] Added test for incremental forwarding for layers
Debadri Samaddar [Thu, 9 May 2024 11:16:47 +0000 (16:46 +0530)]
[unittest] Added test for incremental forwarding for layers

Added incremental forwarding as an option for unit testing layers

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
11 months ago[GPU/OpenCL] Initial version of FC Layer with OpenCL ops
Debadri Samaddar [Tue, 7 May 2024 09:08:36 +0000 (14:38 +0530)]
[GPU/OpenCL] Initial version of FC Layer with OpenCL ops

Added naive version of OpenCl implementation for FC Layer.
Incorporated separate kernels for ops used.
Added unit test for fc_layer_cl.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
11 months ago[ Trivial ] Remove redundant comments and format
skykongkong8 [Mon, 15 Apr 2024 04:11:24 +0000 (13:11 +0900)]
[ Trivial ] Remove redundant comments and format

- Due to adaptive macro kernel usage, previous comment is no longer needed.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
11 months ago[ hgemm ] Refactor kernel init process
skykongkong8 [Mon, 15 Apr 2024 04:01:04 +0000 (13:01 +0900)]
[ hgemm ] Refactor kernel init process

- I found there was a repeated usage of matrix initialization before mul-add fused operations.
- With separate initialization code, we can enjoy:
1. Cleaner code that is reusable for both f16 & f16-f32 kernel
2. Redundant init process is minimized for f16 kernel. Better latency with the SAME accuracy.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
11 months ago[ hgemm/bugfix ] Adaptive macro kernel usage in 4x4 4x8 kernels
skykongkong8 [Mon, 15 Apr 2024 02:10:01 +0000 (11:10 +0900)]
[ hgemm/bugfix ] Adaptive macro kernel usage in 4x4 4x8 kernels

- To avoid the constraint of 4-8 divisibilty w.r.t. K, loop for adaptive K direction.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
11 months ago[ hgemm ] Apply acc16 partial sum strategy and adaptive macro use in 8x8 kernel
skykongkong8 [Mon, 15 Apr 2024 01:34:29 +0000 (10:34 +0900)]
[ hgemm ] Apply acc16 partial sum strategy and adaptive macro use in 8x8 kernel

- Apply similar change made in commit#52a3c734 but in 8x8 kernel

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
11 months ago[ hgemm ] Apply ACC16 partial sum strategy & adaptive macro use in 8x16 kernel
skykongkong8 [Mon, 15 Apr 2024 01:19:24 +0000 (10:19 +0900)]
[ hgemm ] Apply ACC16 partial sum strategy & adaptive macro use in 8x16 kernel

- With more digits computed with fp16 (in this case 1024 -> 2048) I could observe latency improvement with the cost of accuracy loss. However, according to current accuracy measurement criteria, it is still acceptable. Note that it is highly desired to be proven with model output once more.
- With variety of partial sum kernels, we can adaptively apply internal macro kernels without being constrained to K-divisibilty w.r.t. 4, 8, 16.Commit title (Until 50 colums per line)

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
11 months ago[ hgemm ] Apply macro kernel in 4x4 noTrans
skykongkong8 [Fri, 12 Apr 2024 05:13:25 +0000 (14:13 +0900)]
[ hgemm ] Apply macro kernel in 4x4 noTrans

- With macro-defined code, the function latency is expected to be optimized by compiler more easily

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
11 months ago[ hgemm ] Add 4x4 kernel-using f16-f32 hgemm_noTrans
skykongkong8 [Fri, 12 Apr 2024 03:48:02 +0000 (12:48 +0900)]
[ hgemm ] Add 4x4 kernel-using f16-f32 hgemm_noTrans

- Now Hgemm supports 4x4 f16-f32 partial accumulation strategy

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
11 months ago[ hgemm ] Implement 4x4 f16-f32 kernel
skykongkong8 [Fri, 12 Apr 2024 03:46:57 +0000 (12:46 +0900)]
[ hgemm ] Implement 4x4 f16-f32 kernel

- Implement 4x4 GEMM kernel that works f16-f32 partial accumulation

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
11 months agoEdited build instructions for Resnet18 test
Udit Jain [Wed, 22 May 2024 07:39:01 +0000 (16:39 +0900)]
Edited build instructions for Resnet18 test

Edited build instructions for Resnet18 test
**Fixing the meson build option**

Resolves: Error on building the test example where it says
`-c is an un-recognized option` and in the meson documentation -C is used, so it seems to be a typo.

**Self evaluation:**
1. Build test:     []Passed [ ]Failed [ X]Skipped
2. Run test:     []Passed [ ]Failed [ X]Skipped

Signed-off-by: Udit Jain <udit.jain@samsung.com>
11 months ago[Trivial] Update gitignore file
Seungbaek Hong [Wed, 22 May 2024 04:29:18 +0000 (13:29 +0900)]
[Trivial] Update gitignore file

add ".idea/" in gitignore file
- For ignore jetbrain's IDE

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>
11 months ago[coverity] fix coverity issue
Donghyeon Jeong [Tue, 21 May 2024 00:38:00 +0000 (09:38 +0900)]
[coverity] fix coverity issue

This PR resolves the coverity issue of the constructor may not initialize class members.

**Changes proposed in this PR:**
- initialize lora_idx and lora_scaling in class constructor.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
11 months ago[bugfix] Fix LoRA indices array size in the FC layer
Donghyeon Jeong [Mon, 20 May 2024 02:12:43 +0000 (11:12 +0900)]
[bugfix] Fix LoRA indices array size in the FC layer

This PR resolves an issue related to the incorrect array size for lora_idx in the fully connected layer.
Specifically, the fix has made the array size four elements long, corresponding to loraA, loraB, loraTmp, and loraOut.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
11 months ago[Application] update yolo v2 python for building pre-training model
Seungbaek Hong [Fri, 17 May 2024 08:38:52 +0000 (17:38 +0900)]
[Application] update yolo v2 python for building pre-training model

In order to train a large dataset, instead of loading the dataset into memory in advance, it was changed to a real-time loading method during training, and visualization code was added to check whether the training proceeded well.

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
11 months ago[Nnstreamer-subplugin] Add save_path to setProperty
hyunil park [Fri, 17 May 2024 05:11:08 +0000 (14:11 +0900)]
[Nnstreamer-subplugin] Add save_path to setProperty

- Add save_path to setProperty to save the model for each epoch.
- Remove model->save() call to avoid saving the current epoch result
  to the model when current epoch is interrupted

**Self evaluation:**
1. Build test:   [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: hyunil park <hyunil46.park@samsung.com>
11 months ago[Application] cuda support for example of pytorch yolo v2
Seungbaek Hong [Wed, 8 May 2024 12:21:40 +0000 (21:21 +0900)]
[Application] cuda support for example of pytorch yolo v2

- add cuda option to train yolo v2 model backbone
- preprocessing for input dataset
  * unmatched paired dataset
  * no annotation value

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
11 months ago[Application] Rename yolo -> yolo v2
Seungbaek Hong [Wed, 8 May 2024 04:05:32 +0000 (13:05 +0900)]
[Application] Rename yolo -> yolo v2

To prevent confusion, the name of YOLOv2 implementation was changed from
YOLO to YOLOv2.

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
11 months ago[hgemm] Optimizing dimension checks using bitmask
Debadri Samaddar [Thu, 9 May 2024 08:15:22 +0000 (13:45 +0530)]
[hgemm] Optimizing dimension checks using bitmask

Used bitmasks for dimension checks.
e.g: N % 8 is same as N & 0x7

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
11 months ago[hgemm] Added K divisible condition for 1x8 and 1x4 kernels
Debadri Samaddar [Wed, 8 May 2024 09:09:55 +0000 (14:39 +0530)]
[hgemm] Added K divisible condition for 1x8 and 1x4 kernels

Added condition for better accuracy while calling 1x4 and 1x8 kernels

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
11 months ago[hgemm] Interchanged hgemm_noTrans_1x8 and hgemm_noTrans_4x4 calls
Debadri Samaddar [Wed, 8 May 2024 05:48:54 +0000 (11:18 +0530)]
[hgemm] Interchanged hgemm_noTrans_1x8 and hgemm_noTrans_4x4 calls

Moving 1x8 kernel call after 4x4 kernel call.
Added couple of testcases.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
11 months ago[ hdot ] Use precision-enhanced hdot
skykongkong8 [Wed, 24 Apr 2024 01:39:41 +0000 (10:39 +0900)]
[ hdot ] Use precision-enhanced hdot

- Previous hdot was using full-fp16.
- Since this is also one of dimension-shrinking computation, should use inter-fp32 values to enhance precision.
- This has not been detected due to small dimension Tensor usage in unittest. Add higher dimension test case accordingly.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
11 months ago[Trivial] Removing unnecessary files from the repo and adding an ignore file.
Donghak PARK [Wed, 8 May 2024 04:11:05 +0000 (13:11 +0900)]
[Trivial] Removing unnecessary files from the repo and adding an ignore file.

In an Android project, the files ".gradle" and ".idea" are created locally and have nothing to do with the repository.
Therefore, it is common to delete them, and add a "gitignore" file so that they will not be uploaded again as development progresses.

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>
11 months ago[CI] Remove Pylinter in CI
Donghak PARK [Thu, 9 May 2024 00:38:13 +0000 (09:38 +0900)]
[CI] Remove Pylinter in CI

Previously, since Pylinter did not exist as gitaction, it was run directly in gitaction.
but there is no need to do the same task twice because pylint is included in static_check.scripts when importing ci from nnstreamer.
So delete pylinter.yml file because it continues to create unnecessary CI errors.

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK <donghak.park@samsung.com>
11 months ago[Application] fix LLaMA application example error
Seungbaek Hong [Tue, 7 May 2024 06:38:22 +0000 (15:38 +0900)]
[Application] fix LLaMA application example error

in case of running without encoder, a problem has been fixed where invalid values are set during operation due to incorrect assignment of input data, and this causes word index related errors.

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
11 months ago[Application] Update weights_converter
Seungbaek Hong [Fri, 3 May 2024 04:18:26 +0000 (13:18 +0900)]
[Application] Update weights_converter

the num_layer parameter is set to be automatically through auto config
when converting weights from pytorch format to nntrainer format.

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
11 months ago[ NEURALNET ] change the loss scale property to Rigid Property
jijoong.moon [Fri, 26 Apr 2024 11:07:28 +0000 (20:07 +0900)]
[ NEURALNET ] change the loss scale property to Rigid Property

Loss Scale is more like Rigid Property of model, rather than flexible
property.

Resolves:

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
11 months ago[ Weight ] split variable dim and grad dim to set separately
jijoong.moon [Fri, 26 Apr 2024 10:13:05 +0000 (19:13 +0900)]
[ Weight ] split variable dim and grad dim to set separately

This PR split the Variable and Gradient Dim in Var_Grad and Weight.
By this way we can set different Variable Type and Gradient in Wegiht.
. add dim_g for gradient in WeightSpec.
. manager need to update to support WeightSpec.
. Create Tensors according to dim_v and dim_g
. Create Weight chaged in Weight.h

Resolves:

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
11 months ago[ Weight ] Add Loss Scale factor in Weight
jijoong.moon [Fri, 26 Apr 2024 05:48:26 +0000 (14:48 +0900)]
[ Weight ] Add Loss Scale factor in Weight

This PR enables the loss scale factor in Weight.
. Change the WeightSpec to incluide the loss factor
. Add LossScaleForMixed Property as an layer common property, so that
  it can set the scale factor in initContext.
. Add Loss Scale in initContext
. Set the LossScaleForMixed Property when there is LossScale Model
  Property

Resolves:

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
11 months ago[Property] Add loss scale property
Jiho Chu [Wed, 6 Mar 2024 00:58:18 +0000 (09:58 +0900)]
[Property] Add loss scale property

It add loss scale property as model common property.

Signed-off-by: Jiho Chu <jiho.chu@samsung.com>
11 months agomeson: fix fp16 support conditions for arm/aarch64
MyungJoo Ham [Fri, 26 Jan 2024 04:02:05 +0000 (13:02 +0900)]
meson: fix fp16 support conditions for arm/aarch64

According to GCC document,
https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Half-Precision.html
even if -mfp16-format=ieee is not given, aarch64 supports
ieee fp16. Thus, for aarch64, even if the option is not available,
try to built it with __fp16 type.

Then, add condition for arm: the final "else" is written for x64/x86
machines.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
11 months ago[Wait for #2536][application] add generate_multiple_tokens for llm
Seungbaek Hong [Fri, 5 Apr 2024 05:08:50 +0000 (14:08 +0900)]
[Wait for #2536][application] add generate_multiple_tokens for llm

Added generate_multiple_tokens function for first generation on llm.

This function takes one logits and generates multiple output tokens.
To meet the purpose of the target application,
even if input are multiple logits,
only the first logits is used to generate multiple output tokens.

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
11 months agoAdd SELU activation function
kimhan0515 [Sat, 27 Apr 2024 16:13:16 +0000 (01:13 +0900)]
Add SELU activation function

- Now, user can use SELU activation function like torch or tensor flow.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: kimhan0515 <kimhan0515@gmail.com>
11 months ago[ hnrm2 ] Use precision-enhanced hscal
skykongkong8 [Thu, 25 Apr 2024 03:32:24 +0000 (12:32 +0900)]
[ hnrm2 ] Use precision-enhanced hscal

- Previous hnrm2 was using full-fp16.
- Since this is also one of dimension-shrinking computation, should use inter-fp32 values to enhance precision.
- This has not been detected due to small dimension Tensor usage in unittest. Add higher dimension test case accordingly.
- Note that this function is responsible for Tensor::l2norm(), frequently used for mse loss computation.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
11 months ago[Build] dependency to api
Jaeyun Jung [Fri, 26 Apr 2024 05:49:05 +0000 (14:49 +0900)]
[Build] dependency to api

Code clean, fix cyclic dependency between nntrainer and ml-api.
Build dependency to ml-api on nntrainer is unnecessary.

Signed-off-by: Jaeyun Jung <jy1210.jung@samsung.com>
11 months ago[LLaMA] Bugfix in LLaMA application
Eunju Yang [Tue, 23 Apr 2024 04:23:19 +0000 (13:23 +0900)]
[LLaMA] Bugfix in LLaMA application

- This commit fixes a bug in `applyTKP` function.
- It seems applying Top-K and Top-P to logits didn't work as intended

Signed-off-by: Eunju Yang <ej.yang@samsung.com>
12 months ago[hgemm] hgemm noTrans with 1x4 kernel
Debadri Samaddar [Tue, 23 Apr 2024 06:30:16 +0000 (12:00 +0530)]
[hgemm] hgemm noTrans with 1x4 kernel

Added hgemm_kernel_1x4
Added hgemm_noTrans_1x4 calls
Added unittest dot_gemm_50_768_516

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
12 months ago[bugfix] Fix build issues when fp16 is enabled
Donghyeon Jeong [Thu, 25 Apr 2024 04:34:17 +0000 (13:34 +0900)]
[bugfix] Fix build issues when fp16 is enabled

This PR resolves build issues occur in acti_func.h when fp16 is enabled.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
12 months ago[LoRA] add alpha parameter to LoRA
Eunju Yang [Fri, 5 Apr 2024 00:15:01 +0000 (09:15 +0900)]
[LoRA] add alpha parameter to LoRA

- This commit adds `alpha` parameter to LoRA (fc)
- In the original paper,they adopted `alpha (int)` as a parameter to
derive the scaling factor internally, i.e., scaling = alpha / rank
- This commit takes `alpha` as a hyper-parameter and apply the scaling
factor to the LoRA layer.
- This commit's updates are summarized as follows:
- `common_properties.h` : add LoraAlpha as a parameter.
- `fc_layer.cpp`: update forwarding / calcGradient /
calcDerivative func to apply scaling factor in LoRA computation
- `fc_layer.h`: update to take LoraAlpha as fc_props
- `node_exporter.cpp/h`: add LoraAlpha as a parameter in
tf.export format of fc layer (to pass the test code)
- fix the code lines which may cause coverity issue.
- LoRA initialization is updated:
- LoRA A : ZEROS
- LoRA B : Normal
- [TODO] update tf exporter of fc layer

**Self evaluation:**
1. Build test:   [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Eunju Yang <ej.yang@samsung.com>
12 months agoAdd Mish activation function
Boseong Seo [Sat, 20 Apr 2024 03:55:09 +0000 (12:55 +0900)]
Add Mish activation function

- Now, user can use Mish activation function like torch or tensorflow.

**Self evaluation**:
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Boseong Seo <suzy13549@snu.ac.kr>
12 months ago[hgemm] Removed unused header
Debadri Samaddar [Mon, 22 Apr 2024 04:01:44 +0000 (09:31 +0530)]
[hgemm] Removed unused header

Deleted unused header inclusion
Removed #include <iostream>

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
12 months ago[hgemm] hgemm noTrans with kernel 1x8
Debadri Samaddar [Thu, 18 Apr 2024 09:02:29 +0000 (14:32 +0530)]
[hgemm] hgemm noTrans with kernel 1x8

Added 1x8 hgemm kernel, packing_A1, packing_B1 functions.
Incorporated hgemm_noTrans_1x8.

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
12 months agoci / remove cpp-linter's false positive reports.
MyungJoo Ham [Sat, 20 Apr 2024 04:22:44 +0000 (13:22 +0900)]
ci / remove cpp-linter's false positive reports.

It gives an error of "iostream" not found.

Install libstdc++-dev for possible compilers.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
12 months ago[API] Add tensor&operations API structure for supporting autograd
Seungbaek Hong [Fri, 15 Mar 2024 07:11:11 +0000 (16:11 +0900)]
[API] Add tensor&operations API structure for supporting autograd

I added tensor & function(operation) api structure for supporting autograd.

Users can make a model graph using this api.

The operators will be supported one-to-one with ONNX.

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
12 months agoAdd Softplus activation function
heka1024 [Wed, 10 Apr 2024 05:54:26 +0000 (14:54 +0900)]
Add Softplus activation function

- Now, user can use Softplus activation function like torch or tensor flow.
- Furthermore, we can use this function to build Mish or other activation functions

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: heka1024 <heka1024@gmail.com>
12 months ago[neuralnet] bugfix multi batch incremental inference
hyeonseok lee [Wed, 17 Apr 2024 01:48:49 +0000 (10:48 +0900)]
[neuralnet] bugfix multi batch incremental inference

 - This commit will handle when the model activation datatype is fp32

Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
12 months ago[Docs] add yolov3 readme file
Seungbaek Hong [Mon, 15 Apr 2024 12:39:17 +0000 (21:39 +0900)]
[Docs] add yolov3 readme file

added yolov3 readme file

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
12 months ago[OpenCL/GPU] Modified ifstream condition
Debadri Samaddar [Mon, 15 Apr 2024 07:40:44 +0000 (13:10 +0530)]
[OpenCL/GPU] Modified ifstream condition

Updated ifstream object valid condition

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
12 months ago[GPU/OpenCL] Create kernel utility with binaries
Debadri Samaddar [Thu, 4 Apr 2024 09:21:10 +0000 (14:51 +0530)]
[GPU/OpenCL] Create kernel utility with binaries

Added feature for reading kernel binaries.
Managing already created kernels.
Added static flag and bitmask to check existing kernels.

Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
12 months agoAdd ELU activation function
heka1024 [Tue, 9 Apr 2024 13:50:02 +0000 (22:50 +0900)]
Add ELU activation function

- Now, user can use ELU activation function like torch or tensor flow.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Co-authored-by: Hanbyeol Kim kimhan0515@snu.ac.kr
Co-authored-by: Boseong Seo suzy13549@snu.ac.kr
Signed-off-by: heka1024 <heka1024@gmail.com>
12 months ago[application] add repetition_penalty to generate func
Seungbaek Hong [Thu, 4 Apr 2024 11:24:54 +0000 (20:24 +0900)]
[application] add repetition_penalty to generate func

add some options to 'generate' function of llm

- add naive repetition_penalty option
- add bad_words option

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
12 months agoRemove LSTM example in Applications/README.md
Boseong Seo [Tue, 9 Apr 2024 13:23:31 +0000 (22:23 +0900)]
Remove LSTM example in Applications/README.md

- existing link to LSTM example does not work
- user can find LSTM example in Layers dir
  + LSTM dir merged to Layers dir (in PR nnstreamer#2107)
- delete LSTM example in Applications/README.md file in order to reduce confusion

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Boseong Seo <suzy13549@snu.ac.kr>
12 months ago[ hgemm ] Use macro kernel in 8x8 kernel
skykongkong8 [Thu, 4 Apr 2024 06:44:15 +0000 (15:44 +0900)]
[ hgemm ] Use macro kernel in 8x8 kernel

- Using macro kernel, we can choose somewhere between accuracy-latency tradeoff. Furthermore, it is easier to maintain this way.

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
12 months ago[ hgemm ] Apply software prefetching in 4x8 kernel
skykongkong8 [Thu, 4 Apr 2024 06:42:11 +0000 (15:42 +0900)]
[ hgemm ] Apply software prefetching in 4x8 kernel

- We can expect to minimize cache miss using software prefetching

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
12 months ago[ hgemm ] Implement 8x16 hgemm kernel
skykongkong8 [Wed, 3 Apr 2024 11:10:57 +0000 (20:10 +0900)]
[ hgemm ] Implement 8x16 hgemm kernel

- This commit introduces 2 types of 8x16 hgemm kernel
1. full-fp16
2. fp16-fp32 partial accumulation

**Self evaluation:**
1. Build test:     [X]Passed [ ]Failed [ ]Skipped
2. Run test:     [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: skykongkong8 <ss.kong@samsung.com>
12 months ago[neuralnet] enable multi batch incremental inference
hyeonseok lee [Fri, 5 Apr 2024 13:49:45 +0000 (22:49 +0900)]
[neuralnet] enable multi batch incremental inference

 - The output was not considered multi batch input in incremental inference.
   Now it will return multi batch output.

Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
12 months ago[application] update llm generate function
Seungbaek Hong [Wed, 3 Apr 2024 11:10:13 +0000 (20:10 +0900)]
[application] update llm generate function

- fix "temperature" operation
- add "top-k, top-p" option
- support batch mode

**Self evaluation:**
1. Build test:  [X]Passed [ ]Failed [ ]Skipped
2. Run test:  [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
12 months agoReformat code with .clang_format
Boseong Seo [Thu, 4 Apr 2024 07:34:29 +0000 (16:34 +0900)]
Reformat code with .clang_format

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Boseong Seo <suzy13549@snu.ac.kr>