skykongkong8 [Fri, 24 Nov 2023 06:56:11 +0000 (15:56 +0900)]
[neon/bugfix] Fix ewva function
- There was a wrong implementation of ewva function. It should be added, not multiplied.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
Donghyeon Jeong [Wed, 22 Nov 2023 00:26:27 +0000 (09:26 +0900)]
[Ahub] Fix AnalysisHub defects
**Changes proposed in this PR:**
- Fix uninitialized class members in the constructor
- Fix potential uninitialized data
- Add try-block to catch exceptions
- Check if malloc returns null
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Donghyeon Jeong [Fri, 17 Nov 2023 08:15:19 +0000 (17:15 +0900)]
[Tensor] Include Half Tensor when FP16 is enabled
**Changes proposed in this PR:**
- Edit meson.build file to add half_tensor.cpp when enable_fp16 is true
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Donghyeon Jeong [Fri, 10 Nov 2023 04:26:39 +0000 (13:26 +0900)]
[Tensor] HalfTensor class for 16-bit floating point
This PR includes creating the HalfTensor class which separates 16-bit floating point calculation from nntrainer::Tensor.
**Changes proposed in this PR:**
- Create a HalfTensor class that only handles 16-bit floating point calculation.
- Remove operations for Quantized Tensor.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
hyunil park [Thu, 9 Nov 2023 06:26:00 +0000 (15:26 +0900)]
[Sub-plugin] Refactoring sub-plugin class
- Change NNTrainerTrain class name to NNTrainerImpl
- Change InputTensorsInfo class name to TensorsQueue
- Add push method to TensorsQueue
- Change member variables of NNTrainerImpl and TensorsQueue to private and rename some variables and methods
Signed-off-by: hyunil park <hyunil46.park@samsung.com>
Seungbaek Hong [Wed, 5 Jul 2023 07:05:31 +0000 (16:05 +0900)]
[Application] Add multi_input dataloader example
Added multi-input dataloader example application.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
Donghyeon Jeong [Fri, 10 Nov 2023 03:19:48 +0000 (12:19 +0900)]
[Tensor] FloatTensor class for 32-bit floating point
This PR includes creating the FloatTensor class which separates 32-bit floating point calculation from nntrainer::Tensor.
**Changes proposed in this PR:**
- Create a FloatTensor class that only handles 32-bit floating point calculation.
- Remove operations for Quantized Tensor.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
skykongkong8 [Tue, 7 Nov 2023 00:41:44 +0000 (09:41 +0900)]
[trivial/bugfix] Add `inference_only_option` in multi_head_attention unittest
- We were using `inference_only` option for multi_head_attention fp16 unittest since we do not have loss scaling implementation yet.
- However, I discovered that there was a missing declaration of such option which might make malfunction when build, and added accordingly.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Fri, 27 Oct 2023 08:29:11 +0000 (17:29 +0900)]
[gtest] Add test suites for multiHeadAttention with w16a16
- We already had proper implementation of multihead attention layer with half-precision, but did not have any unittest cases.
- Add unittest accordingly
- Fix typo : last line indent
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Fri, 27 Oct 2023 07:41:03 +0000 (16:41 +0900)]
[layer] Support fp16 in embedding layer
- Add _FP16 code block in calcGrad in embedding layer (it does not need it for forwarding)
- Add unittest accordingly
- Explicit code in gtest for embedding layer:
- In layer gtest, each test suite runs without knowing any context of its adjacent layers
- Thus, for atypical layers like Embedding layer (which layer that has different dataType in input and output) we should either refactorize the code or explicitly process it
- Room for memory optimization
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Fri, 27 Oct 2023 04:48:12 +0000 (13:48 +0900)]
[gtest] Add gtest data for embedding layer
- Generating gtest data for the embedding layer should be differentiated from the other data because of following reasons:
1. Embedding layer takes 32bit for its input, even when working with fp16 models
2. Embedding layer has this particular object called 'IndexSlices', and we need additional processing in order to let it behave like the way in
the NNTrainer
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Fri, 27 Oct 2023 01:41:02 +0000 (10:41 +0900)]
[gtest/trivial] Change notation in gtest: fp16fp16 to w16a16
- Previous notation 'fp16fp16' does not really represent the true meaning: data file for layers with half-precision weight and activation
- Thus, I would like to propose a new notation style 'w16a16' not only for the better understanding, but also to avoid unnecessary misunderstandings for mixed precision support in the near future.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Fri, 27 Oct 2023 00:29:34 +0000 (09:29 +0900)]
[layer] Support fp16 in dropout layer
- Check dropout layer does not need code fix to support multiple dataTypes
- Add unittest accordingly
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Thu, 26 Oct 2023 02:33:11 +0000 (11:33 +0900)]
[layer] Support fp16 in lstm layer
- Add _FP16 code block to enable float16 functionality
- Add unittest accordingly
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Thu, 26 Oct 2023 01:58:30 +0000 (10:58 +0900)]
[layer] Support fp16 in concat layer
- Add _FP16 code block to enable float16 functionality
- Add unittest accordingly
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
Donghyeon Jeong [Fri, 3 Nov 2023 05:59:26 +0000 (14:59 +0900)]
[Utils] Conversion to/from half-precision floating point
This PR includes utility functions for conversion to/from 16-bit floating point number, in bit representation, from/to 32-bit floating point number in IEEE format.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Donghyeon Jeong [Thu, 2 Nov 2023 00:50:45 +0000 (09:50 +0900)]
[Tensor] Support multiple data types in copyData
Previously, copyData only supported copying data of the same data type. Copying data of different data types is needed with the increased demand for flexibility of mixed-precision.
**Changes proposed in this PR:**
- copyData supports copying data of different type with the use of NEON
- Remove the flate function
- utilize copyData in dequantize
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Seungbaek Hong [Thu, 19 Oct 2023 09:44:31 +0000 (18:44 +0900)]
[docs] add how-to-create-model document
I have added a tutorial document on how users can build their own models using NNTrainer.
The current tutorial is just the most basic draft, and it needs updating.
It seems that the API needs to be updated to use easily,
as setting up the data by the user is very inconvenient for now.
(So in this example, it uses random data generator.
so.. user couldn't learn how to train using a real data.)
I also added this tutorial link to the README on the first page,
and since the list of maintainers and contributors currently occupies too much space,
I moved this part to the bottom of the README.
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
Seungbaek Hong [Wed, 18 Oct 2023 06:06:24 +0000 (15:06 +0900)]
[trivial] fix typo errors and delete duplicated script
fix typo error in llama implementation and delete dummy
script (multi_head_attention copy.h).
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
skykongkong8 [Tue, 29 Aug 2023 02:19:56 +0000 (11:19 +0900)]
[TensorDim] Fix TensorDim constructor
- Previously, there were lack of default constructors for 1-batch, 1-channel, 1-height, 1-width with currently added TensorType option (regardless of format).
- Since there are a lot of codes using such case in FP32 implementation, I had no choice but have to explicitly feed the tensor_type (which contains the fp16 info) to the previously defined tensor.
- For clean code, I would like to propose a new default constructor for better construction of TensorDim instance.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
Donghyeon Jeong [Wed, 1 Nov 2023 00:00:57 +0000 (09:00 +0900)]
[bugfix] memory overwrite error fix in unittest_tizen_capi
This PR resolves the failing getWeight_01 test case in the unittest_tizen_capi.
In the ML API common data structure, the maximum rank in Tizen APIs has changed from 4 to 16 since Tizen 8.0.
However, the NNTrainer getWeight test uses a dimension with MAXDIM of 4.
This causes an issue of ml_tensors_info_get_tensor_dimension overwriting irrelevant memory since it expects to get an array length of 16 while it's passing array length of 4.
**Changes proposed in this PR:**
- Switch the order of defining variables to avoid memory overwrites.
This fixes:
[ RUN ] nntrainer_capi_nnmodel.getWeight_01
[ FAILED ] nntrainer_capi_nnmodel.getWeight_01 (10 ms)
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
skykongkong8 [Tue, 31 Oct 2023 05:45:08 +0000 (14:45 +0900)]
[blas/neon] Add copy function for fp32 and fp16
- There was a missing implementation of user interface of fp32<->fp16 copy neon function.
- Add blas interface to use such neon funcions
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
Seungbaek Hong [Wed, 18 Oct 2023 05:55:00 +0000 (14:55 +0900)]
[Application] LLaMA weights converter for mha model
Actually, I already added weights converter for mha model in pr #2287.
There were 2 types of converter for supporting legacy and mha model.
But now, converter for legacy model is useless. So I deleted it.
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
hyeonseok lee [Tue, 31 Oct 2023 02:23:10 +0000 (11:23 +0900)]
[Application] fix deepq to make it run
- Make res directory and move DeepQ.ini file to res dir
- If the input batch size does not match with batch size property of model graph,
set model graph batch size property as input batch size in forwarding function
- Allocate weight/tensor memory as train mode of mainNet, targetNet network
- Comment "return 1" statement so as to make it as train from scratch
Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
hs0207.kim [Tue, 24 Oct 2023 06:53:30 +0000 (15:53 +0900)]
implementation of nndetector
implementation of application to run object detection and learning personal object on mobile
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: hs0207.kim <hs0207.kim@samsung.com>
Donghyeon Jeong [Mon, 30 Oct 2023 23:46:20 +0000 (08:46 +0900)]
[bugfix] Android ndk-build error fix
This PR resolves ndk-build issue in the tensor fp16 unit test.
**Changes proposed in this PR:**
- Move implementation to header file to avoid linker error
- Change ambiguous variable and function names
This fixes:
[arm64-v8a] Executable : unittest_nntrainer_tensor_fp16
ld: error: undefined symbol: nntrainer::Tensor::setScaleFactors16(std::__ndk1::vector<_Float16, std::__ndk1::allocator<_Float16> >)
>>> referenced by unittest_nntrainer_tensor_fp16.cpp:5805 (../unittest/unittest_nntrainer_tensor_fp16.cpp:5805)
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
skykongkong8 [Tue, 24 Oct 2023 01:33:21 +0000 (10:33 +0900)]
[gtest] Fix gtest error assessing logic
- Float16 models tend to show unavoidable higher accuracy loss in: 1. huge Tensors 2. Tensors with huge values
- Reassess value-by-value logic with relative error when large error
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Wed, 25 Oct 2023 04:36:33 +0000 (13:36 +0900)]
[bugfix] Fix Tensor save function when float16
- Previously, wrong getData function was called when saving fp16 Tensor
- Apply proper template parameter: _FP16
Resolves:
```
...
23/41 unittest_nntrainer_tensor_fp16 FAIL 2.26 s (exit status 1)
...
[ FAILED ] nntrainer_Tensor.save_read_01_p
...
```
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Fri, 20 Oct 2023 01:49:59 +0000 (10:49 +0900)]
[neon] Support scopy for multiple dataTypes in neon
- scopy_int4_to_fp32
- scopy_int8_to_fp32
- scopy_int8_to_fp16
- scopy_int8_or_int4 : Since we use uint8_t for int4 Tensors, codes can be shared here
- vcvt_fp32_u32_bitwise : Faster optimization can be done with bitwise operation rather than elementwise operation
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
Donghak PARK [Tue, 24 Oct 2023 13:56:39 +0000 (22:56 +0900)]
[TFLite] Revisit tflite_opnode.cpp
For Fix TF Lite export method
- remove unnecessary transpose due to tflite_interpreter logic change
- change transpose direction according to equation change
**Changes proposed in this PR:**
- Change tflite_opnode.cpp
Resolves:
- TFLite Export ( BatchNormalization Fusing, Activation Fusing)
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Donghak PARK [Tue, 24 Oct 2023 13:52:50 +0000 (22:52 +0900)]
[TFLite] Revisit TF Lite Fusing Operation
TF Lite export feature has issue that exported tflite file has different
Output with NNTrainer original model.
This commit will fix that issue. with modified Fusing Operation
**Changes proposed in this PR:**
- Update tflite_interpreter.cpp
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
skykongkong8 [Tue, 24 Oct 2023 01:09:08 +0000 (10:09 +0900)]
[bugfix] Fix applying erf function
- std::erf function does not support _Float16
- Fix by casting temporally
Resolves:
error: call of overloaded ‘erf(_Float16&)’ is ambiguous
3394 | auto f = [](_FP16 in) { return std::erf(in); };
| ~~~~~~~~^~~~
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
jijoong.moon [Sun, 17 Sep 2023 02:34:59 +0000 (11:34 +0900)]
[ Model ] Fix incremental output in FP32
This PR fixes the output of incremental inference.
Resolves:
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
Donghyeon Jeong [Thu, 12 Oct 2023 02:05:05 +0000 (11:05 +0900)]
[Tensor] Optimize dequantize operation
- Perform dequantization by utilizing tensor operations instead of manual calculation.
- Tensor now contains two types of scale factors (FP32, FP16).
- Add flate function that copies tensor values of different data types. This function is temporal and should be later replaced by scopy to fasten up.
**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Donghyeon Jeong [Tue, 10 Oct 2023 05:33:25 +0000 (14:33 +0900)]
[Tensor] Read quantized tensor from binary file
This patch adds functionality to read quantized tensor from a binary file. Details are as follows.
- Read the tensor in the following order (axis, scale factors, zero points, and values).
- Tensor::read takes an extra argument of datatype to identify the datatype of scale factors and read exact bytes.
- Fix QINT4 tensor print segfault issue.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
jijoong.moon [Mon, 11 Sep 2023 14:40:02 +0000 (23:40 +0900)]
[ Weight ] Add packed property and output axis in weight spec
In This PR includes,
. enable packed (bool) property for layer node. It is only for the
wieght. If it is false, then it follows global activation datatype and
if it is true, then it will follows global weight datatype.
. add output axis in Weight Spec and set private variable in weight.
it is to find right direction for multiplying scales and zero point
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
jijoong.moon [Thu, 19 Oct 2023 06:58:50 +0000 (15:58 +0900)]
[ FP16 ] enable fp16 for tizen spec
This PR enables fp16 for tizen aarch64
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
jijoong.moon [Mon, 11 Sep 2023 05:04:21 +0000 (14:04 +0900)]
[ LLaMA ] apply temperature generator
This pr enable the tempearture generator for logit
Resolves:
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
jijoong.moon [Fri, 8 Sep 2023 12:56:03 +0000 (21:56 +0900)]
[ GEMM ] Using GEMM for initial sequence
This PR uses GEMM to compute initial sequences.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
jijoong.moon [Mon, 4 Sep 2023 10:29:47 +0000 (19:29 +0900)]
[ Multi Head ] enable cache shifting
This PR enables the cache sliding if it exceeds the max length of
cache.
also, fix the 32bit computing issue in RMS NORM layer
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
skykongkong8 [Mon, 16 Oct 2023 06:36:50 +0000 (15:36 +0900)]
neon/bugFix] Classify which sgemv optimizatized code block to use by row length
- Since we are relying on float16, using large batch for float16 computation might hinder accuracy when it comes to small-sized Tensors.
- Through unittest, I discovered there is a huge round-off error when performing non-4-divisible column-ed Tensor sgemv with batch size of 16, while batch size 8 is almost identical to batch size 1. Thus, I temporally block batch size 16 computation when column is not divisible with 4.
- Arbitrary classification implemented grounded on unittest, but optimal row length classification is needed.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
Donghak PARK [Wed, 18 Oct 2023 05:22:45 +0000 (14:22 +0900)]
[Encoder] Remove Open_sources in NNtrainer repo
Remove Open Sources and move to Resource Repo
these files need to encoder.hpp
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Donghak PARK [Wed, 18 Oct 2023 05:16:45 +0000 (14:16 +0900)]
[Encoder] Add prepare_encoder & modify related files
For using open_source (ctre_unicode, json), we upload tar.gz file to Android_resource repo and download it
To using Encoder.hpp we modify some meson build file and .sh files
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Donghak PARK [Wed, 13 Sep 2023 06:05:25 +0000 (15:05 +0900)]
[LLaMA] Add korean language
this commit will support korean language with utf8 encoding
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Donghak PARK [Thu, 7 Sep 2023 07:55:36 +0000 (16:55 +0900)]
[LLaMA] Add encoder for LLaMA model
Add encoder for LLaMA model
- it will encode unicode string to encoded int_64 for LLaMA model
- it need ctre-unicode.hpp and json.hpp files
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Donghak PARK [Thu, 7 Sep 2023 07:54:35 +0000 (16:54 +0900)]
[LLaMA] Add opensource for encoding
Add OpenSources for Encoder
- Add ctre-unicode.cpp for regex
- Add json.hpp for json parsing
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Donghak PARK [Thu, 7 Sep 2023 07:53:41 +0000 (16:53 +0900)]
[LLaMA] Add vocab, merges file
Add vocab and ,merges file for LLaMA model
- it will run collectly only for 2B model
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Donghak PARK [Thu, 7 Sep 2023 07:52:03 +0000 (16:52 +0900)]
[LLaMA] Add LLaMa main.cpp
Add LLaMA main.cpp
- only need get user input & convert unicode part
- it will updated by other PR
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
hyeonseok lee [Thu, 31 Aug 2023 06:18:24 +0000 (15:18 +0900)]
[multi head attention] make freq as static
- make freqs_cis as static
Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
hyeonseok lee [Fri, 19 May 2023 08:57:44 +0000 (17:57 +0900)]
[ LLaMA2 ] Enable FP16(W)FP16(A)
This pr incluides some fixes to run LLaMA2 with W16A16.
Resolves:
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
jijoong.moon [Tue, 29 Aug 2023 12:14:20 +0000 (21:14 +0900)]
[ LLaMA ] apply fp16 to LLaMA
This PR enables FP 16 compute for LLaMA
**Changes proposed in this PR:**
- Added TOC generator for README.md
Resolves:
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
Seungbaek Hong [Mon, 28 Aug 2023 08:22:18 +0000 (17:22 +0900)]
[Application] LLaMA v2
Added LLaMA v2 application.
This implementations is based on llama of meta.
ref url: "https://github.com/facebookresearch/llama/"
It contains...
- implementations of swiglu, rmsprop, rotary embedding
- load weights from pytorch implementation(HuggingFace)
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
jijoong.moon [Fri, 13 Oct 2023 10:04:43 +0000 (19:04 +0900)]
[ Execution Order ] set execution order according to execution mode
This PR set execution order properly according to execution mode.
Also, enable set the execution mode in initialization().
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
jijoong.moon [Sat, 26 Aug 2023 07:53:16 +0000 (16:53 +0900)]
[ PicoGPT ] Enable memory optimiation for picoGPT
This PR includes,
- Fixes to enable memory optimization
- remove unnecessary memory buffer
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
jijoong.moon [Fri, 25 Aug 2023 12:04:52 +0000 (21:04 +0900)]
[ FP16 ] Run PicoGPT with W16A16
This pr includes some fixes to run PicoGPT with W16A16 on Android
using NEON.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
jijoong.moon [Fri, 25 Aug 2023 06:35:16 +0000 (15:35 +0900)]
[ Application ] Fix for running GPT
This PR includes fixes for running GPT.
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
hyeonseok lee [Mon, 31 Jul 2023 13:44:34 +0000 (22:44 +0900)]
[Application] apply incremental inference to pico gpt
- Apply incremental inference to pico gpt
Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
hyeonseok lee [Tue, 1 Aug 2023 03:15:03 +0000 (12:15 +0900)]
[concat] enable incremental forwarding with multi threads
- Each threads will copy the data with batchwise direction
Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
hyeonseok lee [Tue, 18 Jul 2023 06:51:53 +0000 (15:51 +0900)]
[PoC] incremental inference
- PoC of incremental inference
- Only works if batch, channel size is 1
- For the concat layer, inference step only works if axis dimension is width axis
Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
skykongkong8 [Mon, 16 Oct 2023 02:05:14 +0000 (11:05 +0900)]
[neon/bugFix] Support every column length for SGEMV transpose
- Previous implementation had some potential bugs that might lead segmentation fault for non-divisible column length Tensors
- Fixed such practice with value-by-value allocation
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Thu, 12 Oct 2023 04:33:46 +0000 (13:33 +0900)]
[neon] Apply inline function style in sgemv_noTrans
- By applying inline function style in sgemv, we use more 128 bit variables in a single iteration.
- Since noTrans sgemv is optimized in column-direction, this optimization is valid and proben by unittest result.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Thu, 12 Oct 2023 01:57:23 +0000 (10:57 +0900)]
[neon] Optimize sgemv_transpose_neon_fp16 w.r.t. ops
- Previous sgemv trans function was using fp32 ops.
- Accelerated by using fp16 ops and assiging values simultaneously.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
MyungJoo Ham [Fri, 13 Oct 2023 13:40:19 +0000 (22:40 +0900)]
App/Yolo5: compiler error fix!
You cannot use a template function for std::function without
specifying the template type name.
The usage should be updated from
```
similarity.apply_i<float>(nntrainer::absFloat)
```
to
```
similarity.apply_i<float>(nntrainer::absFloat<float>)
```
This fixes:
```
[ 462s] [358/446] Compiling C++ object Applications/YOLOv3/jni/nntrainer_yolov3.p/yolo_v3_loss.cpp.o
[ 462s] FAILED: Applications/YOLOv3/jni/nntrainer_yolov3.p/yolo_v3_loss.cpp.o
[ 462s] c++ -IApplications/YOLOv3/jni/nntrainer_yolov3.p -IApplications/YOLOv3/jni -I../Applications/YOLOv3/jni -I../Applications/utils/jni/includes -Inntrainer -I../nntrainer -Iapi -I../api -I../api/ccapi/include -Inntrainer/compiler -I../nntrainer/compiler -Inntrainer/dataset -I../nntrainer/dataset -Inntrainer/layers/loss -I../nntrainer/layers/loss -Inntrainer/layers -I../nntrainer/layers -Inntrainer/models -I../nntrainer/models -Inntrainer/optimizers -I../nntrainer/optimizers -Inntrainer/tensor -I../nntrainer/tensor -Inntrainer/utils -I../nntrainer/utils -Inntrainer/graph -I../nntrainer/graph -I/usr/include/openblas -I/usr/include/nnstreamer -I/usr/include/dlog -I/usr/include/tensorflow2/ -I/usr/include/opencv4 -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -std=c++17 -D__TIZEN__=1 -DTIZENVERSION=8 -DTIZENVERSIONMINOR=0 -D__FEATURE_CHECK_SUPPORT__ -Wredundant-decls -Wwrite-strings -Wformat -Wformat-nonliteral -Wformat-security -Winit-self -Waddress -Wvla -Wpointer-arith -Wno-error=varargs -ftree-vectorize -Wno-maybe-uninitialized -Wno-unused-variable -DMIN_CPP_VERSION=201703L -DML_API_COMMON=1 -DNNSTREAMER_AVAILABLE=1 -DUSE_BLAS=1 -DNNTR_NUM_THREADS=1 -D__LOGGING__=1 -DENABLE_TEST=1 -DREDUCE_TOLERANCE=1 -DENABLE_NNSTREAMER_BACKBONE=1 -DENABLE_TFLITE_BACKBONE=1 -DENABLE_TFLITE_INTERPRETER=1 -DENABLE_DATA_AUGMENTATION_OPENCV=1 '-DNNTRAINER_CONF_PATH="/etc/nntrainer.ini"' -O2 -g2 -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong -Wformat-security -fmessage-length=0 -frecord-gcc-switches -Wl,-z,relro,--as-needed -feliminate-unused-debug-types --param=ssp-buffer-size=4 -fdiagnostics-color=never -m64 -march=nehalem -msse4.2 -mfpmath=sse -fasynchronous-unwind-tables -fno-omit-frame-pointer -g -fopenmp -pthread -MD -MQ Applications/YOLOv3/jni/nntrainer_yolov3.p/yolo_v3_loss.cpp.o -MF Applications/YOLOv3/jni/nntrainer_yolov3.p/yolo_v3_loss.cpp.o.d -o Applications/YOLOv3/jni/nntrainer_yolov3.p/yolo_v3_loss.cpp.o -c ../Applications/YOLOv3/jni/yolo_v3_loss.cpp
[ 462s] ../Applications/YOLOv3/jni/yolo_v3_loss.cpp: In member function 'unsigned int custom::YoloV3LossLayer::find_responsible_anchors(float)':
[ 462s] ../Applications/YOLOv3/jni/yolo_v3_loss.cpp:797:50: error: cannot convert '<unresolved overloaded function type>' to 'std::function<float(float)>'
[ 462s] 797 | similarity.apply_i<float>(nntrainer::absFloat);
[ 462s] | ^
[ 462s] In file included from ../nntrainer/layers/common_properties.h:22,
[ 462s] from ../nntrainer/layers/acti_func.h:19,
[ 462s] from ../Applications/YOLOv3/jni/yolo_v3_loss.h:19,
[ 462s] from ../Applications/YOLOv3/jni/yolo_v3_loss.cpp:15:
[ 462s] ../nntrainer/tensor/tensor.h:1303:65: note: initializing argument 1 of 'int nntrainer::Tensor::apply_i(std::function<T(T)>) [with T = float]'
[ 462s] 1303 | template <typename T = float> int apply_i(std::function<T(T)> f) {
[ 462s] |
~~~~~~~~~~~~~~~~~~~~^
```
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Donghak PARK [Mon, 11 Sep 2023 03:05:16 +0000 (12:05 +0900)]
[tflite_export] add Error Message, Fix Application
Now, an Error occurs when export resnet to tflite format
Previously, the prop of the layer was changed in the process of comparing the values with the Pytorch.
But, Tflite only supports ```same``` and ```valid``` padding, but the current application use value as 1,1.
To correct this, a macro was added, and a more detailed error message was added to the tflite exporter.
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Seungbaek Hong [Fri, 16 Jun 2023 07:46:56 +0000 (16:46 +0900)]
[Application] LOSS for YOLO v3 in nntrainer
Add Loss layer for YOLO v3.
I added loss layer for yolo v3,
so it can be trained using this layer.
But there is still some issues for this application.
- If add validation dataset,
it raise an error (segmentation fault).
- and, at the end of the main.cpp script,
it raise an error (segmentation fault, too).
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
SeungBaek [Mon, 12 Jun 2023 07:56:30 +0000 (16:56 +0900)]
[Application] LOSS class for YOLO v3 in torch
Loss class for YOLO v3.
Signed-off-by: SeungBaek <baek2sm@gmail.com>
Seungbaek Hong [Thu, 8 Jun 2023 12:01:41 +0000 (21:01 +0900)]
[Application] upsample layer for yolo v3
I added upsample layer for yolo v3 application.
Now it only supports twice upscale.
When the Yolo v3 task is completed, I will
implement an upscale layer that supports
various scales and add it as an official layer
of NNTRAINER.
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
Seungbaek Hong [Thu, 8 Jun 2023 06:37:27 +0000 (15:37 +0900)]
[Wait for #2221][Application] YOLOv3 implementation for forwarding in torch
Added YOLO v3 pytorch implementation for forwarding in pytorch.
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
Seungbaek Hong [Wed, 7 Jun 2023 08:15:17 +0000 (17:15 +0900)]
[Application] Load official pre-trained weights of darknet53
Now it can load official pre-trained weights of
DarkNet53 (backbone of YOLO v3).
We can download the official pre-trained binary
file of darknet53 from below link.
"""
https://pjreddie.com/media/files/darknet53.conv.74
"""
I've checked all weights are loaded well.
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
Seungbaek Hong [Thu, 1 Jun 2023 05:50:47 +0000 (14:50 +0900)]
[Application] darknet53 nntrainer implementation for yolo v3
Added nntrainer darknet53 model for yolo v3.
It's used in yolo v3 as a backbone model.
* ISSUE: Now it cannot import pre-trained weights
from pytorch model. It needs debugging for now.
I'll check and update it later commit.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
Seungbaek Hong [Tue, 30 May 2023 12:23:17 +0000 (21:23 +0900)]
[Application] darknet53 pytorch implementation for yolo v3
Added pytorch darknet53 model for yolo v3.
It is used in yolo v3 as a backbone model.
I'll add nntrainer implementation, too.
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
jijoong.moon [Thu, 12 Oct 2023 04:59:23 +0000 (13:59 +0900)]
[ Application ] Add encoder download script
This pr download encoder package and install
Resolves:
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
jijoong.moon [Mon, 19 Jun 2023 22:46:27 +0000 (07:46 +0900)]
[ Applciation ] PicoGPT Android Applciation with NNTrainer
This PR includes the PicoGPT(https://github.com/jaymody/picoGPT)
Android Application with NNTrainer.
We only use the PicoGPT Model Binary and provides the NNTrainer
implementation #2212. This is the Android application implementation
for that PR.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
DongHak Park [Tue, 9 May 2023 08:05:21 +0000 (17:05 +0900)]
[TFLite Export] Add Unit Test
For Test Fused Op in Tensorflow lite exporter add Unit Test case
Test Model consist like below
```Input -> Conv2D -> Batch Norm -> ReLU -> Flatten```
- make this model with nntrainer and export to tensorflow lite
- set input {1,3,4,4} Dim -> in Tensorflow lite native make same input with 3,4,4 and transpose this input
latency
10000 forward tflite native (sec) : 0.
0069193975830078125
10000 forward nntrainter exported (sec) : 0.
0070416927337646484
- in nntrainer exported has transpose layer
Signed-off-by: DongHak Park <donghak.park@samsung.com>
Donghak PARK [Tue, 10 Oct 2023 07:43:17 +0000 (16:43 +0900)]
[Encoder] Add prepare_encoder.sh file & remove opensources
in Encoder we use external opensource file and they didn'y written by us
so, add prepare_encoder.sh file for download them into our Application PicoGPT dir
for Running add some ifdef statement and add meson options for check if it need
- remove opensources and add prepare_encoder.sh file
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Donghak PARK [Wed, 7 Jun 2023 07:09:15 +0000 (16:09 +0900)]
[PoC] Add User Input, Comment
- Add PicoGPT's user input
- Add Comment in encoder.hpp
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Donghak PARK [Fri, 26 May 2023 06:54:18 +0000 (15:54 +0900)]
[WIP][POC] Implements picoGPT Encoder
Implementing picoGPT/GPT2's Encoder in CPP
using nlohman/json.hpp file so we need to add or make some path to compile json parser
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
hyeonseok lee [Fri, 19 May 2023 09:06:25 +0000 (18:06 +0900)]
[PoC] implements PicoGPT
- Added causal mask in attention layer
- Implements PicoGPT
Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
hyeonseok lee [Thu, 27 Jul 2023 00:39:30 +0000 (09:39 +0900)]
[Poc] implement reinitialize
- To provide dynamic input dimension implement reinitialize function
- This commit is PoC of reinitialize so many of codes are just copy & paste of initialize function.
Needs to refine this commit.
Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
hyeonseok lee [Thu, 29 Dec 2022 07:50:01 +0000 (16:50 +0900)]
[attention] add scaled dot product on attention layer
- To support scaled dot product on attention layer as described in paper "attention all you need" add scaled dot product property
Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
hyeonseok lee [Fri, 19 May 2023 08:57:44 +0000 (17:57 +0900)]
[cache_pool] bug fix request memory
- Match requestMemory arguments with memory_pool
- Added override keyword
Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>
Debadri Samaddar [Wed, 11 Oct 2023 09:43:47 +0000 (15:13 +0530)]
[sgemm/neon] Optimized noTrans scenario for SGEMM
Used NEON SIMD to calculate prefixes.
Added vectorization to process 16 rows together.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
skykongkong8 [Wed, 11 Oct 2023 06:29:46 +0000 (15:29 +0900)]
[neon] Optimzie sgemm_transB
- We can reduce the number of function calls by re-using the register variable
- This optimization is valid especially for large scale Tensor
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
Debadri Samaddar [Wed, 11 Oct 2023 05:04:17 +0000 (10:34 +0530)]
[neon/sgemm] Partial accumulation using both fp16 and fp32 intrinsics
Used partial accumulation using fp16 and fp32 intrinsics to enhance performance.
Modified function calls to inline calls to reduce register spilling.
Dynamically allocated temporary fp32 storage used to enhance accuracy.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Debadri Samaddar <s.debadri@samsung.com>
Donghyeon Jeong [Wed, 23 Aug 2023 06:36:14 +0000 (15:36 +0900)]
[TEST] Add memory reuse test for mixed tensors
This patch adds unit tests to cover scenarios of tensors reusing memory which test the memory efficiency of using fp16 tensors.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Donghyeon Jeong [Wed, 23 Aug 2023 00:47:42 +0000 (09:47 +0900)]
[TEST] Add fp16 tensor pool test
TensorPool tests are added to verify Tensorpool requests and the following operations from both FP16 and FP32 tensors.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Donghak PARK [Wed, 9 Aug 2023 11:24:10 +0000 (20:24 +0900)]
[Cifar100] Fix Cifar100 Dataloader
Update Cifar100 Dataloader (Cifar100 Dataloader is not compatible with real Cifar100 dataset)
- now our Cifar100 Dataloader can't load real Cifar100 dataset, cause of mismatch with dataset's shape
- Before : We Assume that Cifar100 dataset's shape <100 label><3072 pixel> per one image
- After : Actual Cifar100 dataset's shape <coarse 1 label><fine 1 label><3072 pixel> per one image
cause mismatch with dataset's shape Resnet real dataset example raise error
After merge this pr we can run and test resnet with real dataset
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Seungbaek Hong [Thu, 17 Aug 2023 05:52:26 +0000 (14:52 +0900)]
[Tensor] Add erf operation to Tensor
Added gaussian error function(erf) to Tensor
and unittest for this.
It was already added pr #2208,
but it is deleted in pr #2238 (I don't know why it is deleted).
So, I restored that function for our tensor operation.
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Seungbaek Hong <sb92.hong@samsung.com>
Donghak PARK [Thu, 14 Sep 2023 12:55:42 +0000 (21:55 +0900)]
[tflite_export] Update get output tensor from tflite
There some logical error on get output tensor
So, Fix get output tensor part
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
skykongkong8 [Tue, 5 Sep 2023 02:01:53 +0000 (11:01 +0900)]
[Tensor/trivial] Erase redundant code
- Erase repeatedly called code block
Resolves:
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Tue, 10 Oct 2023 03:04:12 +0000 (12:04 +0900)]
[neon] Optimize sgemv_transpose_neon_fp16
- Previously, mv transpose optimization was done in row-wise perspective, but this practice does not help reducing the number of functions called.
- In this commit, it is changed to col-wise perspective and this works with the same logic with mv noTrans
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
skykongkong8 [Fri, 6 Oct 2023 07:17:36 +0000 (16:17 +0900)]
[neon] Optmize sgemv
- Instead of declaring explicit register variable, declaring the function in inline code can save the number of register variable in use.
- This way, we can load more of variables to accelerate sgemv computation
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
Donghyeon Jeong [Tue, 10 Oct 2023 02:29:13 +0000 (11:29 +0900)]
[Tensor] Add output axis in weight spec
Add output axis in weight spec to identify scale and zero points multiplication direction
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
jijoong.moon [Tue, 12 Sep 2023 08:18:40 +0000 (17:18 +0900)]
[ Tensor ] Add Output Axis to dequantize api
This pr add outoput axis parameter in dequantize api
Resolves:
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>
skykongkong8 [Thu, 5 Oct 2023 01:43:47 +0000 (10:43 +0900)]
[blas/neon] Use inter-fp32 value in dimension shrinking computation
- Previously, sgemm and sgemv in neon intrinsics was depdendent on two conditions
1. should have 8-divisible column or row
2. fully work with fp16 variables (which might cause precision loss while being accumulated)
- In this commit, I expect sgemm and sgemv to work like:
1. support every column length with adaptive-length-compute optimization
2. use temporal fp32 array to secure cumulative data value especially in large scale Tensor
3. accelerate converting such fp32 array to fp16 Tensor and vice versa with neon to enhance time performance
4. consider the number of register to avoid register spilling
- More optimizations w.r.t. time and memory is on wip
Resolves:
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: skykongkong8 <ss.kong@samsung.com>
Donghyeon Jeong [Thu, 5 Oct 2023 05:40:12 +0000 (14:40 +0900)]
[Bug] Fix build error in yolo with fp 16
- This patch fixes build error in yolo_v2_loss with fp16 enabled
**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped
Signed-off-by: Donghyeon Jeong <dhyeon.jeong@samsung.com>
Donghak PARK [Wed, 20 Sep 2023 10:37:39 +0000 (19:37 +0900)]
[Coverity] Fix Coverity issue
For fix uninitialzed output_axis set default output_axis as 3
- in tensor.h file
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Donghak PARK [Tue, 19 Sep 2023 08:44:52 +0000 (17:44 +0900)]
[Coverity] Fix Coverity Issue
Fix Coverity Issues
- set default output_axis = 3
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Donghak PARK [Tue, 19 Sep 2023 08:33:12 +0000 (17:33 +0900)]
[Coverity] Fix Coverity Issue
Remove local reference return
- it already removed at latest version but not yet merged in main branch
Signed-off-by: Donghak PARK <donghak.park@samsung.com>
Donghak PARK [Tue, 19 Sep 2023 08:29:22 +0000 (17:29 +0900)]
[Coverity] Fix Coverity issue
Fix may be NULL and is dereferenced at blas_neon.cpp
Check NULL if failed to malloc
Signed-off-by: Donghak PARK <donghak.park@samsung.com>