1 # How To Introduce a New Operation Into Runtime
3 **ONE**'s runtime has three main modules: **core**, **frontend** and **backend**. This document
4 provides some lightweight guidance about how to introduce a new operation into these modules to make
5 onert support the operation.
9 - [How To Introduce a New Operation Into Runtime](#how-to-introduce-a-new-operation-into-runtime)
12 - [Frontend](#frontend)
14 - [Base Loader](#base-loader)
15 - [TFLite Loader](#tflite-loader)
16 - [Circle Loader](#circle-loader)
19 - [ShapeFixer](#shapefixer)
21 - [acl_neon](#acl_neon)
23 - [KernelGenerator](#kernelgenerator)
25 - [acl_neon](#acl_neon-1)
27 - [ConstantInitializer (in some cases)](#constantinitializer-in-some-cases)
29 - [Samples (to be updated)](#samples-to-be-updated)
33 This module has graph-based IR(intermediate representation). You have to add IR for the new
36 1. Add name of new operation at [Operations.lst](/runtime/onert/core/include/ir/Operations.lst)
42 2. Create a class for node of new operation in [here](/runtime/onert/core/include/ir/operation/)
45 #include "ir/Operation.h"
54 class Select : public Operation
70 Select(const OperandIndexSequence &inputs, const OperandIndexSequence &outputs);
73 void accept(OperationVisitor &v) const override;
74 OpCode opcode() const final { return OpCode::Select; }
77 } // namespace operation
82 You can also define the class in other source file like below
85 #include "ir/operation/Select.h"
87 #include "ir/OperationVisitor.h"
96 void Select::accept(OperationVisitor &v) const { v.visit(*this); }
98 Select::Select(const OperandIndexSequence &inputs, const OperandIndexSequence &outputs)
99 : Operation{OperandConstraint::createExact(3u), inputs, outputs}
103 - [Operations.Include.h](/runtime/onert/core/include/ir/Operations.Include.h)
106 #include "ir/operation/Select.h"
109 3. Add to the OperationValidator to check if the node is valid.
110 - [OperationValidator.h](/runtime/onert/core/src/compiler/OperationValidator.h)
113 void visit(const operation::Select &node) override;
116 - [OperationValidator.cc](/runtime/onert/core/src/compiler/OperationValidator.cc)
119 void OperationValidator::visit(const ir::operation::Select &node)
121 const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)};
122 const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)};
123 const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)};
124 const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)};
126 UNUSED_RELEASE(output_index);
127 UNUSED_RELEASE(cond_index);
128 UNUSED_RELEASE(input1_index);
129 UNUSED_RELEASE(input2_index);
131 const auto output_type = _ctx.at(output_index).typeInfo();
132 const auto cond_type = _ctx.at(cond_index).typeInfo();
133 const auto input1_type = _ctx.at(input1_index).typeInfo();
134 const auto input2_type = _ctx.at(input2_index).typeInfo();
136 UNUSED_RELEASE(output_type);
137 UNUSED_RELEASE(cond_type);
138 UNUSED_RELEASE(input1_type);
139 UNUSED_RELEASE(input2_type);
141 assert(cond_type.type() == ir::DataType::BOOL8);
142 assert(output_type.type() == ir::DataType::FLOAT32 || output_type.type() == ir::DataType::INT32 ||
143 output_type.type() == ir::DataType::QUANT8_ASYMM);
144 assert(output_type.type() == input1_type.type());
145 assert(output_type.type() == input2_type.type());
147 const auto output_shape = _ctx.at(output_index).shape();
148 const auto cond_shape = _ctx.at(cond_index).shape();
149 const auto input1_shape = _ctx.at(input1_index).shape();
150 const auto input2_shape = _ctx.at(input2_index).shape();
152 UNUSED_RELEASE(output_shape);
153 UNUSED_RELEASE(cond_shape);
154 UNUSED_RELEASE(input1_shape);
155 UNUSED_RELEASE(input2_shape);
157 assert(output_shape == input1_shape);
158 assert(cond_shape == input1_shape);
159 assert(input2_shape == input1_shape);
163 4. Add to the Dumper to dump IR information of new operation.
164 - [Dumper.cc](/runtime/onert/core/src/ir/dumper/Dumper.cc)
167 void Dumper::visit(const Select &node)
169 VERBOSE(LIR) << "* Select" << std::endl;
170 VERBOSE(LIR) << " - Inputs : Cond(" << node.getInputs().at(Select::Input::COND).value()
171 << ") Input1" << node.getInputs().at(Select::Input::INPUT1).value() << ") Input2"
172 << node.getInputs().at(Select::Input::INPUT2).value() << ")" << std::endl;
173 VERBOSE(LIR) << " - Output : Output(" << node.getOutputs().at(Select::Output::OUTPUT).value()
178 5. Add code for shape inference
179 - ONE runtime tries to calculate shapes and allocate memory during compilation time. For some calculations of output shapes that cannot be done during compilation time, ONE runtime will calculate shapes and allocate memory during execution time.
180 - Calculation of shapes during compilation time is called _static shape inference_ and calculation of shapes during execution time is called _dynamic shape inference_.
181 - [`StaticShapeInferer.h`](`/runtime/onert/compiler/StaticShapeInferer.h`)
184 void visit(const ir::operation::Select &op) override;
186 - [`StaticShapeInferer.cc`](/runtime/onert/core/src/compiler/StaticShapeInferer.cc)
188 void StaticShapeInferer::visit(const ir::operation::Select &op)
190 const auto input_cond_idx{op.getInputs().at(ir::operation::Select::Input::CONDITION)};
191 const auto &input_cond = _operands.at(input_cond_idx);
193 const auto &input_true = ...
194 const auto &input_false = ...
195 ir::Operand &output = ...
197 // Select output shpae
198 ir::Shape new_shape = shape_inference::inferSelectShape(
199 input_cond.info().shape(), input_true.info().shape(), input_false.info().shape());
200 output.info().shape(new_shape);
203 - [`DynamicShapeInference.h`](/runtime/onert/core/include/exec/DynamicShapeInference.h)
205 void visit(const ir::operation::Select &op) override;
207 - [`DynamicShapeInference.cc`](/runtime/onert/core/src/exec/DynamicShapeInference.cc)
209 void DynamicShapeInferer::visit(const ir::operation::Select &op)
211 const auto input_cond_idx = op.getInputs().at(ir::operation::Select::Input::CONDITION);
212 const auto &input_cond = _tensor_registry->getITensor(input_cond_idx);
214 const auto &input_true = ...
215 const auto &input_false = ...
218 if ((!input_cond->is_dynamic()) && (!input_true->is_dynamic()) && (!input_false->is_dynamic()))
223 auto input_cond_shape = input_cond->getShape();
224 auto input_true_shape = input_true->getShape();
225 auto input_false_shape = input_false->getShape();
227 // Select output shpae
228 ir::Shape new_shape =
229 shape_inference::inferSelectShape(input_cond_shape, input_true_shape, input_false_shape);
231 output->applyShape(new_shape);
237 This module generates IR from a model. There are two kinds of frontend: Loader and NNAPI. First, Loader loads a model file and generates IR from it. Second, NNAPI generates IR from a model set via [Neural Networks API of android](https://developer.android.com/ndk/guides/neuralnetworks)
243 This is where the common parts of loaders are implemented.
245 1. Add to base_loader to load new operation and to generate IR from it
246 - [base_loader](/runtime/onert/frontend/base_loader/include/base_loader.h)
249 case BuiltinOperator::BuiltinOperator_SELECT:
255 template <typename LoaderDomain, typename SpecificLoader>
256 void BaseLoader<LoaderDomain, SpecificLoader>::loadSelect(const Operator *op)
258 ir::OperandIndexSequence inputs;
259 ir::OperandIndexSequence outputs;
261 loadOperationIO(op, inputs, outputs);
263 std::unique_ptr<ir::Operation> new_op{new ir::operation::Select{inputs, outputs}};
264 _graph.addOperation(std::move(new_op));
270 This loads a tflite file.
271 If you want new operation to be loaded on only TFLite Loader, you only need to implement loading the operation here.
275 This loads a circle file generated by the compiler.
276 If you want new operation to be loaded on only Circle Loader, you only need to implement loading the operation here.
280 1. Add to the OperationFactory to generate IR of new operation
281 - [OperationFactory](/runtime/onert/frontend/nnapi/wrapper/OperationFactory.cc)
284 _map[ANEURALNETWORKS_SELECT] = [](const OperationFactory::Param &init_param, Operands &) {
285 assert(init_param.input_count == 3 && init_param.output_count == 1);
287 OperandIndexSequence outputs{init_param.outputs[0]};
289 // Each input should be interpreted as follows:
291 // 0 -> Cond Tensor Index
292 // 1 -> Input1 Tensor Index
293 // 2 -> Input2 Tensor Index
294 OperandIndexSequence inputs;
295 for (uint32_t n = 0; n < init_param.input_count; ++n)
297 inputs.append(OperandIndex{init_param.inputs[n]});
300 return new operation::Select{inputs, outputs};
304 2. If you want that NNAPI supports new operation of TFLite's model, you need to update the things related to the operation in [nnapi_delegate](/runtime/libs/tflite/port/1.13.1/src/nnapi_delegate.cpp) like below
307 case tflite::BuiltinOperator_SELECT:
308 nnapi_version = 12; // require NNAPI 1.2
309 nn_op_type = ANEURALNETWORKS_SELECT;
315 This module generates kernels and tensors of backend such as [ComputeLibrary](https://github.com/ARM-software/ComputeLibrary/) from generated graph-based IR. For this, the runtime fairly works on it internally. But this is not enough because of dependence on backend. So, there are several components that require additional implementation on each backend.
319 Even for tensors of the same operation, the shape required for each backend can be different. Therefore, this component modifies and fixes shape of tensors of the backend.
323 The kernel of the ACL for the Add operation needs to match the same rank to support the broadcast.
324 - [ShapeFixer.h](/runtime/onert/backend/acl_cl/ShapeFixer.h)
327 void visit(const ir::operation::Add &) override;
330 - [ShapeFixer.cc](/runtime/onert/backend/acl_cl/ShapeFixer.cc)
333 void ShapeFixer::visit(const ir::operation::Add &node)
335 const auto lhs_index{node.getInputs().at(ir::operation::Add::Input::LHS)};
336 const auto rhs_index{node.getInputs().at(ir::operation::Add::Input::RHS)};
338 if (!(_ctx.at(lhs_index).shape() == _ctx.at(rhs_index).shape()))
340 const auto broadcast_rank =
341 std::max(_ctx.at(lhs_index).shape().rank(), _ctx.at(rhs_index).shape().rank());
342 const_cast<ir::Shape &>(_ctx.at(lhs_index).shape()).extendRank(broadcast_rank);
343 const_cast<ir::Shape &>(_ctx.at(rhs_index).shape()).extendRank(broadcast_rank);
350 Same implementation as acl_cl is required.
354 This backend doesn't usually require a change of shape.
355 - [ShapeFixer.h](/runtime/onert/backend/cpu/ShapeFixer.h)
358 void visit(const ir::operation::Select &) override;
361 - [ShapeFixer.cc](/runtime/onert/backend/cpu/ShapeFixer.cc)
364 void ShapeFixer::visit(const ir::operation::Select &) { /* DO NOTHING */}
369 This component generates kernels of backend. You have to generate kernel of new operation. And then append it to execution builder. You can obtain information of the node from IR and necessary tensors from tensor builder.
373 - [KernelGenerator.h](/runtime/onert/backend/acl_cl/KernelGenerator.h)
376 void visit(const ir::operation::Select &) override;
379 - [KernelGenerator.cc](/runtime/onert/backend/acl_cl/KernelGenerator.cc)
382 void KernelGenerator::visit(const ir::operation::Select &node)
384 const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)};
385 const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)};
386 const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)};
387 const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)};
389 auto output_alloc = _tensor_builder->at(output_index).get();
390 auto cond_alloc = _tensor_builder->at(cond_index).get();
391 auto input1_alloc = _tensor_builder->at(input1_index).get();
392 auto input2_alloc = _tensor_builder->at(input2_index).get();
394 auto fn = std::make_unique<::arm_compute::CLSelect>();
396 fn->configure(cond_alloc->handle(), input1_alloc->handle(), input2_alloc->handle(),
397 output_alloc->handle());
399 auto acl_fn = asAclFunction(std::move(fn));
401 _execution_builder->append(std::move(acl_fn));
407 Similar implementation as acl_cl is required.
411 - [KernelGenerator.h](/runtime/onert/backend/cpu/KernelGenerator.h)
414 void visit(const ir::operation::Select &) override;
417 - [KernelGenerator.cc](/runtime/onert/backend/cpu/KernelGenerator.cc)
420 void KernelGenerator::visit(const ir::operation::Select &node)
422 const auto output_index{node.getOutputs().at(0)};
423 const auto condition_index{node.getInputs().at(ir::operation::Select::Input::CONDITION)};
424 const auto true_index{node.getInputs().at(ir::operation::Select::Input::INPUT_TRUE)};
425 const auto false_index{node.getInputs().at(ir::operation::Select::Input::INPUT_FALSE)};
427 auto output_tensor = _tensor_reg->getPortableTensor(output_index);
428 auto condition_tensor = _tensor_reg->getPortableTensor(condition_index);
429 auto true_tensor = _tensor_reg->getPortableTensor(true_index);
430 auto false_tensor = _tensor_reg->getPortableTensor(false_index);
432 auto fn = std::make_unique<ops::SelectLayer>();
434 fn->configure(condition_tensor, true_tensor, false_tensor, output_tensor);
436 _return_fn = std::move(fn);
440 ### ConstantInitializer (in some cases)
442 This component registers function initializing constant tensors and initialize constant tensor
443 layer. Most tensors will be automatically registered internally. And there are some exceptions.
447 - [ConstantInitializer.h](/runtime/onert/backend/cpu/ConstantInitializer.h)
450 void visit(const ir::operation::Conv2D &) override;
453 - [ConstantInitializer.cc](/runtime/onert/backend/cpu/ConstantInitializer.cc)
456 void ConstantInitializer::visit(const ir::operation::Conv2D &node)
458 const auto &kernel_index = node.getInputs().at(ir::operation::Conv2D::KERNEL);
459 const auto &kernel_obj = _operands.at(kernel_index);
460 registerCopyInitializer(kernel_index, kernel_obj);
462 const auto &bias_index = node.getInputs().at(ir::operation::Conv2D::BIAS);
463 const auto &bias_obj = _operands.at(bias_index);
464 registerCopyInitializer(bias_index, bias_obj);
468 ## Samples (to be updated)
471 - Simple explanation : `Output[i] = Condition[i] ? input1[i] : input2[i]`
472 - PR : https://github.com/Samsung/ONE/pull/XXX