From 1cdf02d3a640290111bfd2378a3eb42c9cbf8682 Mon Sep 17 00:00:00 2001
From: =?utf8?q?=D0=A2=D0=B8=D0=BC=D1=83=D1=80=20=D0=9E=D1=82=D0=B5=D0=BB?=
 =?utf8?q?=D0=BB=D0=BE=D0=B2=D0=B8=D1=87=20=D0=90=D0=B1=D0=BB=D1=8F=D0=B7?=
 =?utf8?q?=D0=B8=D0=BC=D0=BE=D0=B2/AI=20Tools=20Lab=20/SRR/Staff=20Enginee?=
 =?utf8?q?r/=EC=82=BC=EC=84=B1=EC=A0=84=EC=9E=90?=
 <t.ablyazimov@samsung.com>
Date: Fri, 7 Dec 2018 12:13:54 +0300
Subject: [PATCH] ACL backend documentation (#2547)

The ACL backend internal structure documentation.

Signed-off-by: Timur Ablyazimov <t.ablyazimov@samsung.com>
---
 .../project/18_NN_Compiler_and_Optimizer_DLD.rst   | 152 ++++++++++++++++++++-
 1 file changed, 151 insertions(+), 1 deletion(-)
diff --git a/contrib/nnc/doc/project/18_NN_Compiler_and_Optimizer_DLD.rst b/contrib/nnc/doc/project/18_NN_Compiler_and_Optimizer_DLD.rst
index f1e3530..75dffec 100644
--- a/contrib/nnc/doc/project/18_NN_Compiler_and_Optimizer_DLD.rst
+++ b/contrib/nnc/doc/project/18_NN_Compiler_and_Optimizer_DLD.rst
@@ -657,11 +657,161 @@ If an operation defines a temporary variable, then the temporary variable is all
 ``<serialized parameters array> + <data offset for this operation>``
 data offset is defined by ``Serializer``.
 
+ACL Soft backend
+````````````````
 
 Generation of C++ source code for GPU
 #####################################
 
-This backend has to generate the source code in C++ language to perform NN inference using GPU device. This feature of NN Compiler is under developement now. The detailed design will be provided till the project completion.
+Glossary
+~~~~~~~~
++ **ACL** - refers to the ARM Compute Library, the Computer Vision and Machine Learning library
+  consisting of a set of functions optimized for ARM CPUs and GPUs and supplying optimized implementations
+  of many popular neural networks layers.
++ **Artifact** - refers to the generated class implementation. This class provides methods
+  implementations making it computationally equivalent to the input neural network.
++ **DOM** - refers to the Document Object Model, an abstract description of a program as a decomposition
+  to hierarchically organized computational and structural blocks using a simple and limited set of
+  basic abstractions. This approach is used to subdivide the process of code generation into two stages:
+  a more sophisticated stage of converting the computational graph into DOM, and a pretty straightforward stage
+  of translating DOM into the final target language representation. This can allow to facilitate the support
+  of new target languages in the backends.
+
+Overview
+~~~~~~~~
+The ACL Soft backend walks through the computational graph (Model IR) and converts the encountered
+operation descriptions to the lines of C++ code consisting of the ACL layer classes instantiations,
+calling their instances methods and some other auxiliary statements like operation results output
+tensors declarations as well as declarations and settings up of the ACL layers configuration parameters.
+The output of this backend contains a single (artifact) class definition and consists of three files:
+an artifact header file, an artifact implementation file and a so called parameter files containing
+serialized operation kernels weights.
+
+The ``nnc`` executable must be called with ``--target=arm-gpu-c++`` option for ACL Soft backend was called.
+The ``-o`` option determines how the generated artifact is called.
+
+The artifact declaration looks like:
+
+.. code-block:: c++
+
+  class AclArtifact {
+  public:
+    AclArtifact();
+    void Inference();
+    arm_compute::CLTensor& get_tensor1();
+    arm_compute::CLTensor& get_tensor2();
+    arm_compute::CLTensor& getInput();
+    arm_compute::CLTensor& getOutput();
+
+  private:
+    std::ifstream _parIn;
+    arm_compute::CLTensor _tensor1;
+    arm_compute::CLTensor _tensor2;
+    arm_compute::CLTensor _tensor2_convolution_layer_weights;
+    arm_compute::CLConvolutionLayer _tensor2_convolution_layer;
+};
+
+The usage looks like:
+
+.. code-block:: c++
+
+  AclArtifact artifact;
+  CLTensor& artifact_in = artifact.getInput();
+  readTensorFromHDF5File(artifact_in, "in.hdf5");
+
+  artifact.Inference();
+
+  CLTensor& artifact_out = artifact.getOutput();
+  writeTensorToHDF5File(artifact_out, "out", "out.hdf5");
+
+ACL soft backend needs the shape inference was done on the computational graph before it proceeds.
+Then it does its work in three passes: generate a DOM from the computational graph, generate the artifact
+header from the DOM, generated the implementation source file from the DOM.
+
+* DOM generation is implemented by the ``AclCppOpGenerator`` class,
+* Header generation is done by the ``ArtifactGeneratorCppDecl`` class,
+* Implementation generation is done by the ``ArtifactGeneratorCppCode`` class.
+
+Besides the DOM generation the ``AclCppOpGenerator`` class provides the operation kernels weights serialization.
+
+Generation sequence
+~~~~~~~~~~~~~~~~~~~
+The ACL backend generation sequence can be observed in the ``AclCppCodeGenerator::run()`` method:
+1. Create the output directory if it is not present.
+2. Create the header, implementation and parameter files in this directory.
+3. Create ``ArtifactGeneratorCppCode`` and ``ArtifactGeneratorCppDecl`` generators instances, used to generate
+   the source and header files from DOM.
+4. Create a ``AclCppOpGenerator`` generator instance to produce DOM from the model IR. An instance of this
+   class traverses the model IR Computational Graph being its ``visitor`` in terms of the ``Visitor`` pattern.
+5. The Computational Graph ``accepts`` the ``AclCppOpGenerator`` instance as its visitor to allow it oneself
+   and generate DOM having an ``ArtifactModule`` instance as a root node. As a side effect action the
+   operation kernels weights are serialized during this traversing.
+6. The ``ArtifactModule`` root instance ``accepts`` the ``ArtifactGeneratorCppCode`` and ``ArtifactGeneratorCppDecl''
+   instances in course to allow them traverse oneself and generate the source and header artifact files.
+
+Artifact structure
+~~~~~~~~~~~~~~~~~~
+Conceptually a generated ACL Soft backend artifact is a C++ class. Physically this is a set of three
+files: header, implementation and (binary) parameter file with the neural network operation kernels weights.
+
+The class has a public default constructor, where all the configuration tasks are performed: all the operation
+layer are configured, all the used tensors are allocated, and the operation kernels weights are read.
+The class has an ``Inference()`` function, where the underlying neural network inference is implemented.
+And the class also has a set of ``get_<tensor name>()`` functions returning references for all the named tensors generated.
+There are also two special accessor functions: ``getInput()`` and ``getOutput()`` which are generated if
+there is the only input or the only output to the neural network respectively.
+
+DOM
+~~~
+Document Object Model of the ACL Soft backend is implemented in the ``ArtifactModel.h`` and ``ArtifactModel.cpp``
+header and source files respectively. The following classes are the main DOM building bricks:
+
+1. ``ArtifactEntity`` is the very basic class in the DOM hierarchy. All the other DOM classes are derivative classes of ````.
+2. ``ArtifactNamed`` is a basic class for all named DOM entities: variables, functions, classes etc.
+3. ``ArtifactExpr`` is a basic class for different kinds of expressions: identifiers, unary and binary expressions, function calls etc.
+4. ``ArtifactId`` used to reference named entities by name.
+5. ``ArtifactRef`` has an ``address of`` (``&``) semantics of ``C/C++``.
+6. ``ArtifactDeref`` has a ``dereference`` (``*``) semantics of ``C/C++``.
+7. ``ArtifactFunctionCall`` is a function call expression.
+8. ``ArtifactUnOp`` is an unary operation.
+9. ``ArtifactBinaryExpr`` is a binary operation.
+10. ``ArtifactIndex`` is an indexing (``[]``) operation.
+11. ``ArtifactRet`` represent a value return from a function.
+12. ``ArtifactVariable`` represents a variable.
+13. ``ArtifactBlock`` represents a block of instructions.
+14. ``ArtifactFunction`` represents a function definition.
+15. ``ArtifactClass`` represents a class.
+16. ``ArtifactModule`` represents a whole program module.
+
+There are several other DOM entities used for the artifact DOM representation not mentioned in the list above.
+There is also the ``ArtifactFactory`` class used as factory for producing the DOM building blocks.
+The is composed as a graph of ``ArtifactEntity`` nodes allocated in the dynamical memory with lifetimes
+controlled by the ``std::shared_ptr<>s``.
+
+ACL Specifics
+~~~~~~~~~~~~~
+ACL library supplies its own classes and other types which should be used when working with the artifact
+generated by the ACL Soft backend. The most important of them is, probably, the ``CLTensor`` type,
+which is, as obvious from its name, a type for tensors. All input, output and intermediate data in the
+generated ACL backend artifacts are variables having the ``CLTensor`` type.
+
+The tensors layout in ACL is different from such in the Model IR, so reshapes must be done when the ACL tensors
+are generated from the corresponding Model IR tensors. The axes order is NHWC in the Model IR and WHCN in the
+ACL library.
+
+A very important ACL feature is related to the way how the ``paddings`` are calculated for ACL tensors.
+There are two practical ways to instruct the ACL library how two calculate tensors ``paddings``:
+
+* Do a tensor allocation after ``all`` of the operation layers ``cofigure()`` function calls which use this tensor.
+  Call the ``TensorInfo::init()`` function in this case. This option is recommended, as it guarantees smart
+  ``paddings`` size calculation and keeps the tensor sizes moderate.
+* Use auto-padding: call the ``TensorInfo::init_auto_padding()`` for tensors ``TensorInfo`` initialization.
+  The order of calls to the tensor allocation function and to the operation layers ``configure()`` does not matter.
+  This approach is ``highly``, ``HIGHLY`` not recommended! In practice it can lead to the memory allocated for
+  tensors storing volume inflation by several orders of magnitude!
+
+In the generated ACL Soft backend artifact all the tensors allocations are done after all the operation
+layers are configured.
 
 Interface Design
 ================
-- 
2.7.4