compiler/luci-interpreter/README.md

   1 # luci-interpreter
   2
   3 `luci-interpreter` is an inference engine for neural networks represented in luci IR.
   4 See `compiler/luci/lang` directory for details about IR.
   5 You can find useful infrastructure, like importer/exporter, optimizations in `compiler/luci`.
   6
   7 `luci-interpreter` provides:
   8 - Basic inference functionality, input setters and output getters
   9 - Interface for inspecting hidden interpreter state, like activation values during inference
  10 - Customization mechanisms to fit the interpreter to specific platforms, like MCUs
  11
  12 Public interface headers are placed in `luci-interpreter/include/luci_interpreter` directory
  13
  14 ## Basic usage
  15
  16 Minimal usage includes:
  17 - Setting input data
  18 - Running inference
  19 - Fetching inference results
  20
  21 Interpreter object is reusable and can run multiple inferences.
  22 Elements in tensors (input/output/internal) are stored contiguously and have C-like layout:
  23 This means for tensor t=[[0, 1],[2, 3]], t[0,1] == 1.
  24
  25 Input and output tensors have the same indexes as in original luci model.
  26
  27 **Usage example:**
  28 ``` c++
  29 // Note getTensorSize is a function that computes tensor size,
  30 // it is not part of interpreter and should be implemented by user
  31
  32 luci_interpreter::Interpreter interpreter(luci_module);
  33
  34 // Set inputs
  35 // assuming model has only one input and one output
  36 const auto input_nodes = loco::input_nodes(module->graph());
  37
  38 const auto *input_node = dynamic_cast<const luci::CircleInput *>(input_nodes[0]);
  39 std::vector<char> input_data(getTensorSize(input_node));
  40 // Initialize input data here
  41
  42 interpreter.writeInputTensor(input_node, input_data.data(), input_data.size());
  43
  44 // Start inference
  45 interpreter.interpret();
  46
  47 // Fetch inference results
  48 const auto output_nodes = loco::output_nodes(module->graph());
  49 const auto *output_node = dynamic_cast<const luci::CircleOutput *>(output_nodes[0]);
  50 std::vector<char> output_data(getTensorSize(output_node));
  51 interpreter.readOutputTensor(output_node, output_data.data(), output_data.size());
  52 ```
  53
  54 ## Inspecting intermediate state
  55
  56 Interpreter provides interfaces to investigate internal state of interpreter during inference.
  57
  58 This is done by "observer" mechanism:
  59 - `Interpreter` class has `attachObserver` method, which takes pointer to `ExecutionObserver` object
  60 - `ExecutionObserver` defines several callback methods user can override to inject custom code
  61
  62 ExecutionObserver provides three callbacks:
  63 - `postTensorWrite` checks contents of output tensor after operation execution
  64 - `preOperatorExecute` notifies that interpreter is going to execute operation
  65 - `postOperatorExecute` notifies that interpreter has finished execution of an operation
  66
  67 See `luci-interpreter/include/luci_interpreter/Interpreter.h` for this interface details.
  68
  69 **Usage example:**
  70 ``` c++
  71 class CustomExecutionObserver: public luci_interpreter::ExecutionObserver
  72 {
  73 public:
  74   void postTensorWrite(const luci::CircleNode *node, const Tensor *tensor) override
  75   {
  76     if (tensor->element_type() != loco::DataType::FLOAT32)
  77       return;
  78     for (int i = 0; i < tensor->shape().num_elements(); ++i)
  79       std::cout << tensor->data<float>[i] << ", ";
  80   }
  81
  82   // User observer can override only needed methods,
  83   // others will inherit empty implementation from base observer.
  84
  85   // void preOperatorExecute(const luci::CircleNode *node);
  86   // void postOperatorExecute(const luci::CircleNode *node);
  87 };
  88
  89 luci_interpreter::Interpreter interpreter(module);
  90 CustomExecutionObserver observer;
  91 interpreter.attachObserver(&observer);
  92
  93 // initialize input_data
  94 interpreter.writeInputTensor(input_node, input_data.data(), input_data.size());
  95
  96 interpreter.interpret();
  97 ```
  98
  99 ## Customizing inference
 100
 101 ### Memory manager
 102
 103 Interpreter provides a handle for altering default memory management mechanisms.
 104
 105 This is done by `MemoryManger` interface, see `luci-interpreter/include/luci_interpreter/MemoryManager.h` for implementation details.
 106
 107 This header contains `IMemoryManager` abstract class which is responsible for allocation and dealocation of tensors' memory.
 108
 109 User can construct an interpreter with one of predefined memory managers or their own custom memory manager.
 110 Note that one memory manager could be shared between multiple interpreter instances, because an interpreter does not own the manager object.
 111
 112 List of predefined memory managers:
 113 - `SimpleMemoryManager` This is a simple wrapper around new/delete, default one.
 114 - `TestMemoryManager` Memorizes all allocated memory and releases it in Manager desctuctor, used in kernel unit tests.
 115 - `BuddyMemoryManager` Implements Buddy algorithm, uses external buffer for tensor data allocations, does not need new/delete.
 116 - `StaticMemoryManger` Uses precomputed memory allocation plan. Requires preparation with MemoryPlanner, but could reduce memory consumption in restricted environments (like MCUs).
 117
 118 **SimpleMemoryManager usage example:**
 119
 120 No need to select anything, to use this memory manager.
 121 ``` c++
 122 luci_interpreter::Interpreter interpreter(module);
 123 ```
 124
 125 **TestMemoryManager usage example:**
 126
 127 ``` c++
 128 luci_interpreter::TestMemoryManager mm;
 129 luci_interpreter::Interpreter interpreter(module, &mm);
 130 ```
 131
 132 **BuddyMemoryManager usage example:**
 133
 134 `BuddyMemoryManager` implements a classic allocation algorithm: https://en.wikipedia.org/wiki/Buddy_memory_allocation.
 135
 136 This allocator uses an external buffer as a memory pool. That allows to use static memory arrays for allocations.
 137
 138 Limitations
 139 - Current implementation uses only lower power-of-two bytes of given buffer.
 140
 141   For example for 1000 bytes buffer, only lower 512 bytes will be used.
 142 - Current implementation can handle maximum 4 gigabyte memory pool
 143
 144 ``` c++
 145   constexpr int buffer_size = 2048;
 146   static uint8_t buffer[buffer_size];
 147   luci_interpreter::BuddyMemoryManager memory_manager(buffer, buffer_size);
 148   luci_interpreter::Interpreter interpreter(module.get(), &memory_manager);
 149 ```
 150
 151 **StaticMemoryManager usage example:**
 152 ``` c++
 153 TBD when it is merged
 154 ```
 155
 156 ## Further reading
 157
 158 If you want to participate in development, please read `DEVELOPER.md` for SW architecture details.