3 `luci-interpreter` is an inference engine for neural networks represented in luci IR.
4 See `compiler/luci/lang` directory for details about IR.
5 You can find useful infrastructure, like importer/exporter, optimizations in `compiler/luci`.
7 `luci-interpreter` provides:
8 - Basic inference functionality, input setters and output getters
9 - Interface for inspecting hidden interpreter state, like activation values during inference
10 - Customization mechanisms to fit the interpreter to specific platforms, like MCUs
12 Public interface headers are placed in `luci-interpreter/include/luci_interpreter` directory
16 Minimal usage includes:
19 - Fetching inference results
21 Interpreter object is reusable and can run multiple inferences.
22 Elements in tensors (input/output/internal) are stored contiguously and have C-like layout:
23 This means for tensor t=[[0, 1],[2, 3]], t[0,1] == 1.
25 Input and output tensors have the same indexes as in original luci model.
29 // Note getTensorSize is a function that computes tensor size,
30 // it is not part of interpreter and should be implemented by user
32 luci_interpreter::Interpreter interpreter(luci_module);
35 // assuming model has only one input and one output
36 const auto input_nodes = loco::input_nodes(module->graph());
38 const auto *input_node = dynamic_cast<const luci::CircleInput *>(input_nodes[0]);
39 std::vector<char> input_data(getTensorSize(input_node));
40 // Initialize input data here
42 interpreter.writeInputTensor(input_node, input_data.data(), input_data.size());
45 interpreter.interpret();
47 // Fetch inference results
48 const auto output_nodes = loco::output_nodes(module->graph());
49 const auto *output_node = dynamic_cast<const luci::CircleOutput *>(output_nodes[0]);
50 std::vector<char> output_data(getTensorSize(output_node));
51 interpreter.readOutputTensor(output_node, output_data.data(), output_data.size());
54 ## Inspecting intermediate state
56 Interpreter provides interfaces to investigate internal state of interpreter during inference.
58 This is done by "observer" mechanism:
59 - `Interpreter` class has `attachObserver` method, which takes pointer to `ExecutionObserver` object
60 - `ExecutionObserver` defines several callback methods user can override to inject custom code
62 ExecutionObserver provides three callbacks:
63 - `postTensorWrite` checks contents of output tensor after operation execution
64 - `preOperatorExecute` notifies that interpreter is going to execute operation
65 - `postOperatorExecute` notifies that interpreter has finished execution of an operation
67 See `luci-interpreter/include/luci_interpreter/Interpreter.h` for this interface details.
71 class CustomExecutionObserver: public luci_interpreter::ExecutionObserver
74 void postTensorWrite(const luci::CircleNode *node, const Tensor *tensor) override
76 if (tensor->element_type() != loco::DataType::FLOAT32)
78 for (int i = 0; i < tensor->shape().num_elements(); ++i)
79 std::cout << tensor->data<float>[i] << ", ";
82 // User observer can override only needed methods,
83 // others will inherit empty implementation from base observer.
85 // void preOperatorExecute(const luci::CircleNode *node);
86 // void postOperatorExecute(const luci::CircleNode *node);
89 luci_interpreter::Interpreter interpreter(module);
90 CustomExecutionObserver observer;
91 interpreter.attachObserver(&observer);
93 // initialize input_data
94 interpreter.writeInputTensor(input_node, input_data.data(), input_data.size());
96 interpreter.interpret();
99 ## Customizing inference
103 Interpreter provides a handle for altering default memory management mechanisms.
105 This is done by `MemoryManger` interface, see `luci-interpreter/include/luci_interpreter/MemoryManager.h` for implementation details.
107 This header contains `IMemoryManager` abstract class which is responsible for allocation and dealocation of tensors' memory.
109 User can construct an interpreter with one of predefined memory managers or their own custom memory manager.
110 Note that one memory manager could be shared between multiple interpreter instances, because an interpreter does not own the manager object.
112 List of predefined memory managers:
113 - `SimpleMemoryManager` This is a simple wrapper around new/delete, default one.
114 - `TestMemoryManager` Memorizes all allocated memory and releases it in Manager desctuctor, used in kernel unit tests.
115 - `BuddyMemoryManager` Implements Buddy algorithm, uses external buffer for tensor data allocations, does not need new/delete.
116 - `StaticMemoryManger` Uses precomputed memory allocation plan. Requires preparation with MemoryPlanner, but could reduce memory consumption in restricted environments (like MCUs).
118 **SimpleMemoryManager usage example:**
120 No need to select anything, to use this memory manager.
122 luci_interpreter::Interpreter interpreter(module);
125 **TestMemoryManager usage example:**
128 luci_interpreter::TestMemoryManager mm;
129 luci_interpreter::Interpreter interpreter(module, &mm);
132 **BuddyMemoryManager usage example:**
134 `BuddyMemoryManager` implements a classic allocation algorithm: https://en.wikipedia.org/wiki/Buddy_memory_allocation.
136 This allocator uses an external buffer as a memory pool. That allows to use static memory arrays for allocations.
139 - Current implementation uses only lower power-of-two bytes of given buffer.
141 For example for 1000 bytes buffer, only lower 512 bytes will be used.
142 - Current implementation can handle maximum 4 gigabyte memory pool
145 constexpr int buffer_size = 2048;
146 static uint8_t buffer[buffer_size];
147 luci_interpreter::BuddyMemoryManager memory_manager(buffer, buffer_size);
148 luci_interpreter::Interpreter interpreter(module.get(), &memory_manager);
151 **StaticMemoryManager usage example:**
153 TBD when it is merged
158 If you want to participate in development, please read `DEVELOPER.md` for SW architecture details.