From 1cdf02d3a640290111bfd2378a3eb42c9cbf8682 Mon Sep 17 00:00:00 2001 From: =?utf8?q?=D0=A2=D0=B8=D0=BC=D1=83=D1=80=20=D0=9E=D1=82=D0=B5=D0=BB?= =?utf8?q?=D0=BB=D0=BE=D0=B2=D0=B8=D1=87=20=D0=90=D0=B1=D0=BB=D1=8F=D0=B7?= =?utf8?q?=D0=B8=D0=BC=D0=BE=D0=B2/AI=20Tools=20Lab=20/SRR/Staff=20Enginee?= =?utf8?q?r/=EC=82=BC=EC=84=B1=EC=A0=84=EC=9E=90?= Date: Fri, 7 Dec 2018 12:13:54 +0300 Subject: [PATCH] ACL backend documentation (#2547) The ACL backend internal structure documentation. Signed-off-by: Timur Ablyazimov --- .../project/18_NN_Compiler_and_Optimizer_DLD.rst | 152 ++++++++++++++++++++- 1 file changed, 151 insertions(+), 1 deletion(-) diff --git a/contrib/nnc/doc/project/18_NN_Compiler_and_Optimizer_DLD.rst b/contrib/nnc/doc/project/18_NN_Compiler_and_Optimizer_DLD.rst index f1e3530..75dffec 100644 --- a/contrib/nnc/doc/project/18_NN_Compiler_and_Optimizer_DLD.rst +++ b/contrib/nnc/doc/project/18_NN_Compiler_and_Optimizer_DLD.rst @@ -657,11 +657,161 @@ If an operation defines a temporary variable, then the temporary variable is all `` + `` data offset is defined by ``Serializer``. +ACL Soft backend +```````````````` Generation of C++ source code for GPU ##################################### -This backend has to generate the source code in C++ language to perform NN inference using GPU device. This feature of NN Compiler is under developement now. The detailed design will be provided till the project completion. +Glossary +~~~~~~~~ ++ **ACL** - refers to the ARM Compute Library, the Computer Vision and Machine Learning library + consisting of a set of functions optimized for ARM CPUs and GPUs and supplying optimized implementations + of many popular neural networks layers. ++ **Artifact** - refers to the generated class implementation. This class provides methods + implementations making it computationally equivalent to the input neural network. ++ **DOM** - refers to the Document Object Model, an abstract description of a program as a decomposition + to hierarchically organized computational and structural blocks using a simple and limited set of + basic abstractions. This approach is used to subdivide the process of code generation into two stages: + a more sophisticated stage of converting the computational graph into DOM, and a pretty straightforward stage + of translating DOM into the final target language representation. This can allow to facilitate the support + of new target languages in the backends. + +Overview +~~~~~~~~ +The ACL Soft backend walks through the computational graph (Model IR) and converts the encountered +operation descriptions to the lines of C++ code consisting of the ACL layer classes instantiations, +calling their instances methods and some other auxiliary statements like operation results output +tensors declarations as well as declarations and settings up of the ACL layers configuration parameters. +The output of this backend contains a single (artifact) class definition and consists of three files: +an artifact header file, an artifact implementation file and a so called parameter files containing +serialized operation kernels weights. + +The ``nnc`` executable must be called with ``--target=arm-gpu-c++`` option for ACL Soft backend was called. +The ``-o`` option determines how the generated artifact is called. + +The artifact declaration looks like: + +.. code-block:: c++ + + class AclArtifact { + public: + AclArtifact(); + void Inference(); + arm_compute::CLTensor& get_tensor1(); + arm_compute::CLTensor& get_tensor2(); + arm_compute::CLTensor& getInput(); + arm_compute::CLTensor& getOutput(); + + private: + std::ifstream _parIn; + arm_compute::CLTensor _tensor1; + arm_compute::CLTensor _tensor2; + arm_compute::CLTensor _tensor2_convolution_layer_weights; + arm_compute::CLConvolutionLayer _tensor2_convolution_layer; +}; + +The usage looks like: + +.. code-block:: c++ + + AclArtifact artifact; + CLTensor& artifact_in = artifact.getInput(); + readTensorFromHDF5File(artifact_in, "in.hdf5"); + + artifact.Inference(); + + CLTensor& artifact_out = artifact.getOutput(); + writeTensorToHDF5File(artifact_out, "out", "out.hdf5"); + +ACL soft backend needs the shape inference was done on the computational graph before it proceeds. +Then it does its work in three passes: generate a DOM from the computational graph, generate the artifact +header from the DOM, generated the implementation source file from the DOM. + +* DOM generation is implemented by the ``AclCppOpGenerator`` class, +* Header generation is done by the ``ArtifactGeneratorCppDecl`` class, +* Implementation generation is done by the ``ArtifactGeneratorCppCode`` class. + +Besides the DOM generation the ``AclCppOpGenerator`` class provides the operation kernels weights serialization. + +Generation sequence +~~~~~~~~~~~~~~~~~~~ +The ACL backend generation sequence can be observed in the ``AclCppCodeGenerator::run()`` method: +1. Create the output directory if it is not present. +2. Create the header, implementation and parameter files in this directory. +3. Create ``ArtifactGeneratorCppCode`` and ``ArtifactGeneratorCppDecl`` generators instances, used to generate + the source and header files from DOM. +4. Create a ``AclCppOpGenerator`` generator instance to produce DOM from the model IR. An instance of this + class traverses the model IR Computational Graph being its ``visitor`` in terms of the ``Visitor`` pattern. +5. The Computational Graph ``accepts`` the ``AclCppOpGenerator`` instance as its visitor to allow it oneself + and generate DOM having an ``ArtifactModule`` instance as a root node. As a side effect action the + operation kernels weights are serialized during this traversing. +6. The ``ArtifactModule`` root instance ``accepts`` the ``ArtifactGeneratorCppCode`` and ``ArtifactGeneratorCppDecl'' + instances in course to allow them traverse oneself and generate the source and header artifact files. + +Artifact structure +~~~~~~~~~~~~~~~~~~ +Conceptually a generated ACL Soft backend artifact is a C++ class. Physically this is a set of three +files: header, implementation and (binary) parameter file with the neural network operation kernels weights. + +The class has a public default constructor, where all the configuration tasks are performed: all the operation +layer are configured, all the used tensors are allocated, and the operation kernels weights are read. +The class has an ``Inference()`` function, where the underlying neural network inference is implemented. +And the class also has a set of ``get_()`` functions returning references for all the named tensors generated. +There are also two special accessor functions: ``getInput()`` and ``getOutput()`` which are generated if +there is the only input or the only output to the neural network respectively. + +DOM +~~~ +Document Object Model of the ACL Soft backend is implemented in the ``ArtifactModel.h`` and ``ArtifactModel.cpp`` +header and source files respectively. The following classes are the main DOM building bricks: + +1. ``ArtifactEntity`` is the very basic class in the DOM hierarchy. All the other DOM classes are derivative classes of ````. +2. ``ArtifactNamed`` is a basic class for all named DOM entities: variables, functions, classes etc. +3. ``ArtifactExpr`` is a basic class for different kinds of expressions: identifiers, unary and binary expressions, function calls etc. +4. ``ArtifactId`` used to reference named entities by name. +5. ``ArtifactRef`` has an ``address of`` (``&``) semantics of ``C/C++``. +6. ``ArtifactDeref`` has a ``dereference`` (``*``) semantics of ``C/C++``. +7. ``ArtifactFunctionCall`` is a function call expression. +8. ``ArtifactUnOp`` is an unary operation. +9. ``ArtifactBinaryExpr`` is a binary operation. +10. ``ArtifactIndex`` is an indexing (``[]``) operation. +11. ``ArtifactRet`` represent a value return from a function. +12. ``ArtifactVariable`` represents a variable. +13. ``ArtifactBlock`` represents a block of instructions. +14. ``ArtifactFunction`` represents a function definition. +15. ``ArtifactClass`` represents a class. +16. ``ArtifactModule`` represents a whole program module. + +There are several other DOM entities used for the artifact DOM representation not mentioned in the list above. +There is also the ``ArtifactFactory`` class used as factory for producing the DOM building blocks. +The is composed as a graph of ``ArtifactEntity`` nodes allocated in the dynamical memory with lifetimes +controlled by the ``std::shared_ptr<>s``. + +ACL Specifics +~~~~~~~~~~~~~ +ACL library supplies its own classes and other types which should be used when working with the artifact +generated by the ACL Soft backend. The most important of them is, probably, the ``CLTensor`` type, +which is, as obvious from its name, a type for tensors. All input, output and intermediate data in the +generated ACL backend artifacts are variables having the ``CLTensor`` type. + +The tensors layout in ACL is different from such in the Model IR, so reshapes must be done when the ACL tensors +are generated from the corresponding Model IR tensors. The axes order is NHWC in the Model IR and WHCN in the +ACL library. + +A very important ACL feature is related to the way how the ``paddings`` are calculated for ACL tensors. +There are two practical ways to instruct the ACL library how two calculate tensors ``paddings``: + +* Do a tensor allocation after ``all`` of the operation layers ``cofigure()`` function calls which use this tensor. + Call the ``TensorInfo::init()`` function in this case. This option is recommended, as it guarantees smart + ``paddings`` size calculation and keeps the tensor sizes moderate. +* Use auto-padding: call the ``TensorInfo::init_auto_padding()`` for tensors ``TensorInfo`` initialization. + The order of calls to the tensor allocation function and to the operation layers ``configure()`` does not matter. + This approach is ``highly``, ``HIGHLY`` not recommended! In practice it can lead to the memory allocated for + tensors storing volume inflation by several orders of magnitude! + +In the generated ACL Soft backend artifact all the tensors allocations are done after all the operation +layers are configured. Interface Design ================ -- 2.7.4