docs/runtime/core.md

   1 # Core
   2
   3 Runtime Core is a compilation/execution engine for neural network models.
   4
   5 ## Modules
   6
   7 Runtime Core has four modules. These are namespaces as well as directory names in `/runtime/onert/core/src/`.
   8
   9 - `ir`  stands for Intermediate Representation which contains Neural Network Graph data structures
  10 - `compiler` converts IR to an executable format
  11 - `exec` is an execution module which is the result of a compilation
  12 - `backend` is an interface for memory management for operands and actual calculation of operations
  13
  14 ### Module `ir`
  15
  16 This module contains data structures of pure Neural Network models. The models from NN Packages or NN API are converted to these structures.
  17
  18 - `Subgraphs` is the entire neural network model which is a set of subgraphs
  19 - `Subgraph` consists of operands and operations
  20 - `Operand` (a.k.a. Tensor) has a shape, data type, data and references to operations
  21 - `Operation` (a.k.a. Operator) has operation type, params, and references to operands
  22
  23 `Operand` and `Operation` are nodes of the graph, and the reference relationship between them is the edges of the graph.
  24
  25 `Subgraphs` represents the whole model. A model can have more than one `Subgraph` to support control flow operations. Those operations make calls to another subgraph and when the execution on another subgraph is done it gets back to previous subgraph execution with returned operands.
  26
  27 All graphs are a [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph) so once model inputs are given we can run it in topological order.
  28
  29 Here's a figure of how those data structures are organized.
  30
  31 ![Core](core-figure-ir.png)
  32
  33 ### Module `compiler`
  34
  35 `Compiler` is the main class of this module. Everything starts from it.
  36
  37 What it does is making models executable. It schedules execution order and assigns a backend for each operation. Here are major phases of compilation.
  38
  39 #### 1. Lowering
  40
  41 In Lowering, `Compiler` assigns a [backend](#module-`backend`) for each operation. It means that the operation will be run with the assigned backend's kernel.
  42
  43 There is a scheduler that allows the user to manually specify backends via compiler options. There is another scheduler that automatically assigns backends based on profile information measured in advance and saved.
  44
  45 #### 2. Tensor Registration
  46
  47 Each backend manages its tensors. In this phase, operand informations get registered to the corresponding backend. This will be used in generating tensor objects.
  48
  49 ##### Q. What are the differences between 'operand' and 'tensor'?
  50
  51 In **ONE** runtime, 'operand' refers to an operand in a neural network model. While 'tensor' includes all 'operand' info plus actual execution info like actual buffer pointer. In short, 'operand' is for `ir`, 'tensor' is for `backend`.
  52
  53 #### 3. Linearization (Linear Executor Only)
  54
  55 Linearization means sorting operations in topological order. It saves execution time since it is not needed to resolve the next available operations after every operation at execution time.
  56
  57 It also makes plans for tensor memory. It can save some memory space by reusing other operands' space that does not overlap lifetime. All allocations are done at compile time (after [4. Kernel Generation](#4.-kernel-generation)) which saves execution time too.
  58
  59 #### 4. Kernel Generation
  60
  61 'kernel' here means an implementation of the actual calculation of an operation.
  62
  63 A backend is assigned for each operation. In this phase, a kernel for each operation is generated.
  64
  65 Let's say we have some functions written in a certain programming language. Then its compiler compiles each function into a chunk of assembly. Here 'function' is like 'operation' and 'assembly' is like 'kernel'.
  66
  67 #### 5. Create Executor
  68
  69 With generated tensors and kernels, the compiler creates executor objects. There are 3 types of executors are supported - Linear, Dataflow, and Parallel. Linear executor is the default executor and Dataflow Executor and Parallel Executor are experimental.
  70
  71 For more about executors, please refer to [Executors](executors.md) document.
  72
  73 ### Module `exec`
  74
  75 `exec` stands for 'execution'. As a result of the compilation, `Execution` class is created. This class manages the actual execution of the model inference. Here is a typical usage of using this class.
  76
  77 1. Resize input size if needed
  78 2. Provide input and output buffers
  79 3. Run the inference in either synchronous/asynchronous mode
  80 4. Check out the results which are stored in output buffers provided earlier
  81
  82 ### Module `backend`
  83
  84 Backends are plugins and they are loaded dynamically(via `dlopen`). So this module is a set of interface classes for backend implementation. `compiler` can compile with a variety of backends without knowing specific backend implementation.
  85
  86 Backend interface classes are mostly about memory management and kernel generation. For more, please refer to [Backend API](backend-api.md) document.