review.tizen.org Git - platform/core/ml/nntrainer.git/log

[graph] In-place bug fix

This patch applied bug fix for in-place layer optimization.
As in-place layers and other layers manages derivative memory
differently, in-place layers cannot directly re-use the tensors
of their neighboring layers. But rather have to use tensors differently.
This patch applies this bug-fix.

See also #878

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[tensor] Support late memory allocation for tensors

Support late memory allocation for tensors.
This helps create tensor wrapper elements and allocate memory
later when needed.
This allows finalizing the graph, creating all the tensors and graph
connections for all the layers which can be stored in offline setup
and then loaded directly from file.

This decoupling allows to reuse tensors with every call to train
where we can allocate and deallocate memory without having to create
and do tensor management again and again.

In short term, this is necessary for in-place layer optimization.
See Also #879

The implementation requires caching the source tensor when creating
shared tensors as shared tensors cannot be created unless source
tensor memory has been allocated.

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[ ResNet ] Add Resnet18 Application

This PR includes ini configuration file for resnet18

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[ DEBIAN ] Fix gcc version for debian build

Set higher gcc version to build debian

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[ Application ] Bug fix of Android target Lib

Fix the path of tensorflow-lite lib

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[ Application ] Fix to use Tensorflow Lite 2.3.0

Fix Android.mk to use Tensorflow Lite 2.3.0

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[app_context] Resolve build issue with gcc5

This solves the build issue for gcc5 when using std::call_once
Original code calls bind with copy of the argument to the function
and fails are reference is expected with gcc5

This patch adds a std::ref explicitly to make this work with gcc5

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[ MESON ] fix capi-nnstreamer dependency

remove capi-nnstreamer dependency

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[ ANDROID ] change the tflite version

Change Tensorflow-Lite Version from 1.13.1 to 2.30.0

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[ Doc ] add how to run android application

Add documentation to build and run in android

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[adam] Reduce temporary memory allocation

Reduce temporary memory allocation for adam
This is done by reusing gradient memory to calculate the final
update which is to be applied for the weight.

This reduces temporary memory allocations, which were being every epoch
for each weight, but also reduces peak memory usage.

Resolves #917

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[blas] Add missing return

Add bug fix for blas raw implementation for missing return.

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[Fix] Resolve static init order fiasco

As static initialization order out of a translation unit is undefined,
initializing global caused some undefined behavior.

Initializing global app context is delayed until it is first called.

See https://gcc.gnu.org/legacy-ml/gcc-patches/2017-03/msg00863.html

Resolves #893

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Tensor] Handle copy edge case

When copying uninitialized tensor to another unintialized tensor,
tensor::copy tried to reshape uninitialized dimension.

This patch fixes the issue.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[CAPI/acr] Update tizen capi

**Changes proposed in this PR:**
- Setting void *user_data
- Update layer type enum

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Tensor] Rearrange Methods

This patch gathers tensor by arithmetic operation (tensor.h is reflected
with rebase)

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Tensor] rearrange methods

- Add missing out param methods
- Change way it is delegated for some methods
- Rename s/operator_/apply_broadcast
- Remove `operator_i` and `operator_i_util`
- Assure dimension checks

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Tensor/Clean] Relocate tensor methods

This patch relocate arithmetic methods while adding some missing
outplace operation signature.

Order is

1. inplace -> outplace -> outplace (with allocated memory)
2. multiply -> divide -> add -> subtract
3. scalar -> tensor

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[tensor] Update interface for tensor::map

Update interface for tensor::map to include the size of the original
buffer to ensure that the buffer contains enough memory required
by the tensor shape wrapping around the memory.
Added another negative unittest for it.

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[manager] bug fix for Tensor::map

This patch exposes a bug from Tensor::Map
where the offset is not checked when assigning the data.

This is because of the bug in manager for batch normalization layer.

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[manager] Disable user_shared_memory

Disable user_shared_memory to as NNAPI is not required to be supported.
It is further needed to decouple tensor structure allocation and its
internal memory allocation.

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[Docs] Generate nntrainer docs

Generate nntrainer docs.
See: https://nntrainer.github.io/

- Sub-documents such as Application/* need to be added
- The table and others need to be modified according to the hotdoc rule.

Signed-off-by: Gichan Jang <gichan2.jang@samsung.com>

[spec] add backward competibility under tizen 6

Since `ml-error-common` redclares error enum inside nnstreamer in tizen
5.5. This patch workarounds the issue by making a fake ml-api-error.h

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[meson] Clean up ml-api-common dependency

This patch cleans up the ml-api-common dependency.
If it is seems to be stable with multiple platform, we can remove
`api/capi/include/platform/ml-api-common.h`. let's keep it for now

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Fix] Reflect changes to upstream/main

From merging some big prs there happend some inconsistency which casued
a build break. This patch solves the issue

**Changes proposed in this PR:**
- Use manager.initializeTensor() in the unittest
- Add training signature to forwarding

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

License Fix / Relicense to Apache-2.0

1. Do not use "Apache-2.0-only". It's "Apache-2.0".
2. Relicense files to Apache-2.0. (The author permits; I'm the author.)

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>

[data augmentation] support for random translate

Added support for random translate which is fractional and does mirroring
This is implemented with opencv, but build is allowed without opencv
The model can be built but using this layer without opencv will throw

Added corresponding unittest as well.

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[data augmentation] Support for random flip

Add support for random flip data augmentation along with its unittests

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[dynamic-training] Add dynamic training using derivatives

Added dynamic training using derivatives where the decision to
apply the gradient is calculated using the derivative received
without calculating the gradient itself.

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[dyanmic training] Adding dynamic-training code

Added dynamic-training code with both max and l2norm mode
Verified working with existing examples given the threshold

TODO: support dynamic training with derivative

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[Fix/TFlite] Fix tflite allocation

Now, memory alllocation is handled outside of each layer.
Accordingly, allocating out tensor shouldn't be done inside a layer.

For the same reason, loss layer backwarding needs some fix, for now
it is just commented and will be handled soon

This patch handles the issue for tflite layer

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[var_grad] Remove redundant argument for initializeWeight

remove redundant argument for initializeWeight - gtrain
as weight initialization is independent of if the weight is
going to be used in training or not.

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[weight] Decouple init of weight and gradients

Decouple initialization of weight variables and its corresponding gradients
Weights are always intialized and used later with inference/train
but gradients are initialized only with training and with different
configurations based on the chosen optimization strategies.

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[pooling] Do not allocate memory in initialize

Set batch size in initialize for pooling layer allocates memory.
However, the final batch size is allowed to change in inference/training.
This unnecessarily changes the peak memory requirement.
For now, this memory is allocated with forwarding.
Later this will be handled as a tensor with manager once int data type is supported.

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[manager] Donot allocate adam for inference

Donot allocate adam and gradient memory for weights
when the model is being executed for inference

V2:
Separate memory allocation for weights and gradients
Gradient memory allocation is decided based on training/inference
However weight memory is always to be allocated and must be loaded
before readModel(), so need to be separated

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[README] Add coverity badge

Add nntrainer coverity badge to README.

Signed-off-by: Gichan Jang <gichan2.jang@samsung.com>

[optimization] Bug fix for in-place layer optimization

Inplace layer optimization is performed for multiple layers - activation and batch normalization layers
and this list will increase with data augmentation etc.
However, the in-place layers cannot work correctly consecutively if these layers are trainable.
They can work perfectly is they dont need to pass the derivative back.

For now, this patch limits two consecutive layers to be in-place.
This will be made generic later dependent on the trainable and inPlace property of the layer.

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[inference] Add input validation for inference

Add input validation for inference of the neural network

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[Tensor] Add outplace method for arithmetic ops

Add outplace ops with already allocated tensor.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[meson] Update meson for ubuntu 20.04

Update meson to work with ubuntu 20.04
Also add some missing checks

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[docs] Add missing dependencies

Add missing dependencies required to build nntrainer with meson

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[Layer] Add eval mode for the training

**Changes proposed in this PR:**
- This patch add eval mode for the training forward and
fix batch normalization layer accordingly

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[ Fix ] Fix Logistic Regression Example Error

This PR includes fixes about logistic regression application

Change forwarding function

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

Enable trainable property to layer

Set trainable value to false in constructor in activation layer, flatten_layer

Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>

[Tools] Fix bug that translayer cannot detect bn

For batchnormalization in tf 2.3 it is not detected in transLayer, so
added new type to detect batch normalization layer

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[transfer learning] Enable test on ubuntu

Enable testing of the trained model on ubuntu
Added check to ensure that nnstreamer is enabled

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[manager] Optimize input/output memory for inference

Optimize input/output memory for inference by using a shared buffer
where the max([sum(input_l, output_l)) for l from all layers]) memory
is allocated for inference.

Baseline working unittest added with models unittest which ensures
that inference works with and without optimizations without any
failures. Value verification tests is done by nnstreamer subplugin of
nntrainer.

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

Support sum value in profiler

Now profiler will show the avg, min, max, sum values

Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>

[Test] Disable deriv verification when opt is on

This patch disables derivative verification but only checks the whole
return derivatives.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Conv2d] Optimize layer loop

This optimize layer loops by

- minimize padding calculation
- Maximize cache hit by tranposing the matrix
- maximize cache hit by reordering loop order
- ~use single offset to minimize offset calculation~
- ~add shortcut when kernel size is 1~

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Conv2d] Reuse im2col array by batch

This patch enables reusing im2col array by batch, while saving
initializing time setting to zero.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Conv2d] Change conv2d gemm to dot

- Change conv2dgemm to dot to enable optimization path inside dot
operation
- Add beta option to dot operation (C = alpha*A*B + beta*C)

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[bugfix] Fix model path and dataset path in model_loader.cpp

Fix model path and dataset path to involve working directory path

Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>

[dist/tizen] Enable base unittests for tizen build

Enable nntrainer unittests for tizen build
Not sure why or when this got commented
but lets enable it

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[model] Optimize model input/output

Optimize models extra input/output memory allocation counting towards peak memory allocation.
Memory is allocated with for input of input layer and output/gradient of output layer.
However, that memory is never used as train_run() allocates new buffer and passes it to the
input layer/loss layer.
This patch takes the already allocted memory from input/loss layer to be used to collect input/label data.

This patch also removes the extra parameters from forwarding/backwarding and with corresponding
with_val functions. Further, two types of forwarding in loss layer has been merged to just 1 function.
Now, loss layer and input layer does not need to be distinguished and can be treated as a regular layer.

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[Conv] Optimize im2col

This patch optimize im2col by...

- Add padding as a argument instead of passing pad value
- Skip creating padded tensor and assignment for padded index
- Refactor variable names for clarity

See also #824

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Tensor] Optimize accessor

This patch...
- inlines some accessor with noexcept specifier to boost up
- Add getValuePadded to reduce memory copy to make a padded tensor

see also #825

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Cc: Parichay Kapoor <pk.kappor@samsung.com>
Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Fix] Assign default value for max_deriv size

This patch initialize max_dervative_size to avoid unexpected termination

resolves #834

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[model/test] Duplicate models test for optimization

Run models test twice, once with all the optimizations enabled
and then once with all the optimizations disabled.

This ensures that both the modes work properly.

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[activation] Making activation in-place

Added activation layer to be in-place.
Each layer now allocates memory for its output than for its input.

For activation layer, if its memory is optimized, then the memory
for the layer behind activation layer is not allocated.
And the memory for the derivative of the activation layer is shared
among all such layers.

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[layer] Use gradient instead of variable for derivative

Use gradient instead of variable for derivative
Manager internally sets gradient memory same as variable for the optimization
but hides this kind of optimizations from the layer

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[manager] Manager tracks input/output memory

Manager tracks input/output memory and allocates it
based on if the execution is training or inference

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[inputlayer] Input layer must be non-trainable

Input layer must always be non-trainable as it does not support backwarding operation

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[layer] Move layer input/output management to manager

Move layer inputs/outputs memory management to the manager.
This is accomplished by replacing the use of NetBuffers instead of Var_Grad.

Now, all the memory of weights, gradients, inputs, outputs and derivatives
are managed by the manager, and allows more optimizations to be done with
inputs/outputs.

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[Profiler] Change profiler specs

- Profiler time unit is changed: milli -> microsecond
- Now report is ordered by key

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Profiler] Apply ops level profiler

This patch attaches ops level profiler

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Profiler] Add event registerator

Profiler can now dynamically register event and send it to
profileListenr as of this patch with fixing few bugs

resolves #814

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Manager] Add MMaped memory

There was a requirement to separate weight memory region and grad memory
region.
To easily separate those two, this patch introduces no abstraction:
`MMapedMemory` while separating weight and grad mmap

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Manager/Fix] Disallow copy ctor of manager

Since manager is holding a memory, it shouldn't be copied as ownership
becoms not clear. This patch delets copy ctor / assignment ops. While
chainging signature for members and functions that uses manager

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Android] Manage ndk to deal with changes

1. Upgrade ndk version to 29
2. Add dependent library
3. Fix syntax for Application.mk

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Tensor] Add Tensor Wrap method

Add Tensor some factory methods
1. burrows external memory and use from
2. create from shared pointer without copy

To restrict unwanted use, those methods are static methods
called `Tensor::Wrap`

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[TensorDim] Add initializer list ctor

This patch adds a tensordim
initializer list ctor to easily pass as a functional argument

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[tensor] argmax bugfix

Apply memory allocation bugfix to argmax
where a empty vector is being addressed

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[tensor] Set stride for shared tensor

Set stride for shared tensor

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[layer] Support in-place batch normalization

Support in-place batch normalization where the batch normalization
input/output is not stored and is over-written by the next layer.

This patch removes the input/output memory requirement when using
batch normalization layer.

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[ ARGMAX ] Fix bug about argmax

Need to fix to calcuate argmax in tensor

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon <jijoong.moon@samsung.com>

[Test] Add macro to check if backbone is enabled

When backbone is not enabled, test fails because backbone is not enabled
This patch adds a define in the test so that test can pass when backbone
is not enabled

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[svace] Assure unintialized members

nnstreamer_layer had two unintialized members.
This patch initializes those two

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[svace] Error handling for applications/test

1. Fix inconsistent alloc/dealloc(new/free)
2. Add try catch to some statements
3. Fix memory leak from `asprintf`

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[svace] assure file to be closed before remove

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Docs] Remove unnecessary HTML link for feature/privilege.

This patch removes the unnecessary HTML link for feature/privilege.

Signed-off-by: Sangjung Woo <sangjung.woo@samsung.com>

[Optim] Add shortcut to dot product

When dimension is 1, it is vector by matrix or vector by vector
multiplication. This patch adds a shortcut in that situation

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Fix] fix lda, ldb param

**Changes proposed in this PR:**
- lda, ldb, ldc is for layout so it should be set in terms of memory
layout, this patch fixes the issue while adding a corresponding test

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Profiler] Add basic profilerlistener

This patch adds global profiler listener for various purpose

From this patch,
1. Profiler can called globally with designated event key
2. Listener reporting suite included
3. Enum key has changed to int key to deal with unhashable
key compile error in few platforms.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

v2)
1. Change listener to RAII object (with forcing profiler, event
designation)
2. Add unsubscribe method
3. Change event register to set to prevent notifying a listener twice
4. Change semintics to not allow adding same listener twice

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Test] Add profiler test

**Changes proposed in this PR:**
- Add profiler test
- Wire profiler sources / header to the build system

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Profiler] Separate Profiler for wider use

This patch extracts profiler from neuralnet.

Also, this seperates `ProfileListener` which
should be used for client side while `Profiler`
is used in library side

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[meson.build] Change join_paths to / in meson.build files

Replace join_paths in meson.build files to /

Check issue #709 for more details

Signed-off-by: hyeonseok lee <hs89.lee@samsung.com>

[Android] Integrate openblas into android

Android ndk was not building on top of openblas

This patch fixes the problem

resolves #794

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[mnist] Update saved model file

As saving the optimizer parameters has been updated, the previous
model file gives wrong result. This patch adds the new model file.

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[network] Rework the backwarding

- remove forwarding from backwarding
backwarding should just do backwarding and no more
- moved backwarding back to neuralnetwork so that graph
does not has to care about how to backward etc.
Graph just provides iterators for iterating the graph
in reverse. Graph does not know that layers have backwarding etc.

Also this removes dependency of graph from optimizer.

V2:
Added comment fixes for the corresponding PR

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[optimizer] Move optimizer out of layer

This patch moves optimizer out of layer.
Now backwarding just calculates derivatives and gradient
but does not applies the gradient.
This gradient applying is done by the model.

Layer still support applyGradient operation but requires optimizer
as an argument.
This decouples layers from optimizers and can operate independently.

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[optimizer] Simplify optimizer initialize

As there is just one optimizer and shared by layers, it must be initialized just once by the neural network.
Also, addOptimizerVariables() moved out separately from initialize() as initialize() should work
on optimizers parameters and should not need list of weights.

Also remove set_tensor argument which was redundant

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[optimizer] Move optimizer variables to weights

Move optimizer variables to weights
Now all the weight related tensors are handled by weights themselves
So, optimizer can be shared across all layers, no need to create new
copies for all layers

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[vgg] Added pytorch model for vgg16

Added pytorch model for vgg16
This is to benchmark against tf and nntrainer

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[vgg] Update to official vgg16 model

Update the nntrainer and tensorflow to use official VGG16 model architecture
The FC layers setup is different as the cifar100 dataset has just 100 output classes
than 1000 classes of the imagenet.
Further, the number of epochs are reduced to 1.
When training, this can be increased appropriately.

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[MNIST] Added pytorch version

Added pytorch version of MNIST for benchmarking purpose
This code is only tested with CPU

**Self evaluation:**
1. Build test: [x]Passed [ ]Failed [ ]Skipped
2. Run test: [x]Passed [ ]Failed [ ]Skipped

Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>

[ndk] Add enable profile flag

This patch add enable profile flag for ndk build for profiling purpose

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Experiment] Add profiler

This patch add `enable-profile` option to enable profile. Also this
patch adds a simple profiling logic to `neuralnet::inference`

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Meson] Add ndk-build to be part of ndk build

**Changes proposed in this PR:**
- Add option to build library using ndk

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>

[Chores] CustomShortcut bug

As ini format has been changed, ini for customshortcut need change

This patch handles it.

**Self evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Jihoon Lee <jhoon.it.lee@samsung.com>