1 # How To Introduce a New Operation Into Runtime
3 **ONE**'s runtime has three main modules: **core**, **frontend** and **backend**. This document
4 provides some lightweight guidance about how to introduce a new operation into these modules to make
5 onert support the operation.
9 - [How To Introduce a New Operation Into Runtime](#how-to-introduce-a-new-operation-into-runtime)
12 - [Frontend](#frontend)
14 - [Base Loader](#base-loader)
15 - [TFLite Loader](#tflite-loader)
16 - [Circle Loader](#circle-loader)
19 - [ShapeFixer](#shapefixer)
21 - [acl_neon](#acl_neon)
23 - [KernelGenerator](#kernelgenerator)
25 - [acl_neon](#acl_neon-1)
27 - [TensorRegister (in some cases)](#tensorregister-in-some-cases)
28 - [ConstantInitializer (in some cases)](#constantinitializer-in-some-cases)
30 - [Samples (to be updated)](#samples-to-be-updated)
34 This module has graph-based IR(intermediate representation). You have to add IR for the new
37 1. Add name of new operation at [Operations.lst](/runtime/onert/core/include/ir/Operations.lst)
43 2. Create a class for node of new operation in [here](/runtime/onert/core/include/ir/operation/)
46 #include "ir/Operation.h"
55 class Select : public Operation
71 Select(const OperandIndexSequence &inputs, const OperandIndexSequence &outputs);
74 void accept(OperationVisitor &v) const override;
75 OpCode opcode() const final { return OpCode::Select; }
78 } // namespace operation
83 You can also define the class in other source file like below
86 #include "ir/operation/Select.h"
88 #include "ir/OperationVisitor.h"
97 void Select::accept(OperationVisitor &v) const { v.visit(*this); }
99 Select::Select(const OperandIndexSequence &inputs, const OperandIndexSequence &outputs)
100 : Operation{OperandConstraint::createExact(3u), inputs, outputs}
104 - [Operations.Include.h](/runtime/onert/core/include/ir/Operations.Include.h)
107 #include "ir/operation/Select.h"
110 3. Add to the OperationValidator to check if the node is valid.
111 - [OperationValidator.h](/runtime/onert/core/src/compiler/OperationValidator.h)
114 void visit(const operation::Select &node) override;
117 - [OperationValidator.cc](/runtime/onert/core/src/compiler/OperationValidator.cc)
120 void OperationValidator::visit(const ir::operation::Select &node)
122 const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)};
123 const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)};
124 const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)};
125 const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)};
127 UNUSED_RELEASE(output_index);
128 UNUSED_RELEASE(cond_index);
129 UNUSED_RELEASE(input1_index);
130 UNUSED_RELEASE(input2_index);
132 const auto output_type = _ctx.at(output_index).typeInfo();
133 const auto cond_type = _ctx.at(cond_index).typeInfo();
134 const auto input1_type = _ctx.at(input1_index).typeInfo();
135 const auto input2_type = _ctx.at(input2_index).typeInfo();
137 UNUSED_RELEASE(output_type);
138 UNUSED_RELEASE(cond_type);
139 UNUSED_RELEASE(input1_type);
140 UNUSED_RELEASE(input2_type);
142 assert(cond_type.type() == ir::DataType::BOOL8);
143 assert(output_type.type() == ir::DataType::FLOAT32 || output_type.type() == ir::DataType::INT32 ||
144 output_type.type() == ir::DataType::QUANT8_ASYMM);
145 assert(output_type.type() == input1_type.type());
146 assert(output_type.type() == input2_type.type());
148 const auto output_shape = _ctx.at(output_index).shape();
149 const auto cond_shape = _ctx.at(cond_index).shape();
150 const auto input1_shape = _ctx.at(input1_index).shape();
151 const auto input2_shape = _ctx.at(input2_index).shape();
153 UNUSED_RELEASE(output_shape);
154 UNUSED_RELEASE(cond_shape);
155 UNUSED_RELEASE(input1_shape);
156 UNUSED_RELEASE(input2_shape);
158 assert(output_shape == input1_shape);
159 assert(cond_shape == input1_shape);
160 assert(input2_shape == input1_shape);
164 4. Add to the Dumper to dump IR information of new operation.
165 - [Dumper.cc](/runtime/onert/core/src/ir/dumper/Dumper.cc)
168 void Dumper::visit(const Select &node)
170 VERBOSE(LIR) << "* Select" << std::endl;
171 VERBOSE(LIR) << " - Inputs : Cond(" << node.getInputs().at(Select::Input::COND).value()
172 << ") Input1" << node.getInputs().at(Select::Input::INPUT1).value() << ") Input2"
173 << node.getInputs().at(Select::Input::INPUT2).value() << ")" << std::endl;
174 VERBOSE(LIR) << " - Output : Output(" << node.getOutputs().at(Select::Output::OUTPUT).value()
179 5. Add code for shape inference
180 - ONE runtime tries to calculate shapes and allocate memory during compilation time. For some calculations of output shapes that cannot be done during compilation time, ONE runtime will calculate shapes and allocate memory during execution time.
181 - Calculation of shapes during compilation time is called _static shape inference_ and calculation of shapes during execution time is called _dynamic shape inference_.
182 - [`StaticShapeInferer.h`](`/runtime/onert/compiler/StaticShapeInferer.h`)
185 void visit(const ir::operation::Select &op) override;
187 - [`StaticShapeInferer.cc`](/runtime/onert/core/src/compiler/StaticShapeInferer.cc)
189 void StaticShapeInferer::visit(const ir::operation::Select &op)
191 const auto input_cond_idx{op.getInputs().at(ir::operation::Select::Input::CONDITION)};
192 const auto &input_cond = _operands.at(input_cond_idx);
194 const auto &input_true = ...
195 const auto &input_false = ...
196 ir::Operand &output = ...
198 // Select output shpae
199 ir::Shape new_shape = shape_inference::inferSelectShape(
200 input_cond.info().shape(), input_true.info().shape(), input_false.info().shape());
201 output.info().shape(new_shape);
204 - [`DynamicShapeInference.h`](/runtime/onert/core/include/exec/DynamicShapeInference.h)
206 void visit(const ir::operation::Select &op) override;
208 - [`DynamicShapeInference.cc`](/runtime/onert/core/src/exec/DynamicShapeInference.cc)
210 void DynamicShapeInferer::visit(const ir::operation::Select &op)
212 const auto input_cond_idx = op.getInputs().at(ir::operation::Select::Input::CONDITION);
213 const auto &input_cond = _tensor_registry->getITensor(input_cond_idx);
215 const auto &input_true = ...
216 const auto &input_false = ...
219 if ((!input_cond->is_dynamic()) && (!input_true->is_dynamic()) && (!input_false->is_dynamic()))
224 auto input_cond_shape = input_cond->getShape();
225 auto input_true_shape = input_true->getShape();
226 auto input_false_shape = input_false->getShape();
228 // Select output shpae
229 ir::Shape new_shape =
230 shape_inference::inferSelectShape(input_cond_shape, input_true_shape, input_false_shape);
232 output->applyShape(new_shape);
238 This module generates IR from a model. There are two kinds of frontend: Loader and NNAPI. First, Loader loads a model file and generates IR from it. Second, NNAPI generates IR from a model set via [Neural Networks API of android](https://developer.android.com/ndk/guides/neuralnetworks)
244 This is where the common parts of loaders are implemented.
246 1. Add to base_loader to load new operation and to generate IR from it
247 - [base_loader](/runtime/onert/frontend/base_loader/include/base_loader.h)
250 case BuiltinOperator::BuiltinOperator_SELECT:
256 template <typename LoaderDomain, typename SpecificLoader>
257 void BaseLoader<LoaderDomain, SpecificLoader>::loadSelect(const Operator *op)
259 ir::OperandIndexSequence inputs;
260 ir::OperandIndexSequence outputs;
262 loadOperationIO(op, inputs, outputs);
264 std::unique_ptr<ir::Operation> new_op{new ir::operation::Select{inputs, outputs}};
265 _graph.addOperation(std::move(new_op));
271 This loads a tflite file.
272 If you want new operation to be loaded on only TFLite Loader, you only need to implement loading the operation here.
276 This loads a circle file generated by the compiler.
277 If you want new operation to be loaded on only Circle Loader, you only need to implement loading the operation here.
281 1. Add to the OperationFactory to generate IR of new operation
282 - [OperationFactory](/runtime/onert/frontend/nnapi/wrapper/OperationFactory.cc)
285 _map[ANEURALNETWORKS_SELECT] = [](const OperationFactory::Param &init_param, Operands &) {
286 assert(init_param.input_count == 3 && init_param.output_count == 1);
288 OperandIndexSequence outputs{init_param.outputs[0]};
290 // Each input should be interpreted as follows:
292 // 0 -> Cond Tensor Index
293 // 1 -> Input1 Tensor Index
294 // 2 -> Input2 Tensor Index
295 OperandIndexSequence inputs;
296 for (uint32_t n = 0; n < init_param.input_count; ++n)
298 inputs.append(OperandIndex{init_param.inputs[n]});
301 return new operation::Select{inputs, outputs};
305 2. If you want that NNAPI supports new operation of TFLite's model, you need to update the things related to the operation in [nnapi_delegate](/runtime/libs/tflite/port/1.13.1/src/nnapi_delegate.cpp) like below
308 case tflite::BuiltinOperator_SELECT:
309 nnapi_version = 12; // require NNAPI 1.2
310 nn_op_type = ANEURALNETWORKS_SELECT;
316 This module generates kernels and tensors of backend such as [ComputeLibrary](https://github.com/ARM-software/ComputeLibrary/) from generated graph-based IR. For this, the runtime fairly works on it internally. But this is not enough because of dependence on backend. So, there are several components that require additional implementation on each backend.
320 Even for tensors of the same operation, the shape required for each backend can be different. Therefore, this component modifies and fixes shape of tensors of the backend.
324 The kernel of the ACL for the Add operation needs to match the same rank to support the broadcast.
325 - [ShapeFixer.h](/runtime/onert/backend/acl_cl/ShapeFixer.h)
328 void visit(const ir::operation::Add &) override;
331 - [ShapeFixer.cc](/runtime/onert/backend/acl_cl/ShapeFixer.cc)
334 void ShapeFixer::visit(const ir::operation::Add &node)
336 const auto lhs_index{node.getInputs().at(ir::operation::Add::Input::LHS)};
337 const auto rhs_index{node.getInputs().at(ir::operation::Add::Input::RHS)};
339 if (!(_ctx.at(lhs_index).shape() == _ctx.at(rhs_index).shape()))
341 const auto broadcast_rank =
342 std::max(_ctx.at(lhs_index).shape().rank(), _ctx.at(rhs_index).shape().rank());
343 const_cast<ir::Shape &>(_ctx.at(lhs_index).shape()).extendRank(broadcast_rank);
344 const_cast<ir::Shape &>(_ctx.at(rhs_index).shape()).extendRank(broadcast_rank);
351 Same implementation as acl_cl is required.
355 This backend doesn't usually require a change of shape.
356 - [ShapeFixer.h](/runtime/onert/backend/cpu/ShapeFixer.h)
359 void visit(const ir::operation::Select &) override;
362 - [ShapeFixer.cc](/runtime/onert/backend/cpu/ShapeFixer.cc)
365 void ShapeFixer::visit(const ir::operation::Select &) { /* DO NOTHING */}
370 This component generates kernels of backend. You have to generate kernel of new operation. And then append it to execution builder. You can obtain information of the node from IR and necessary tensors from tensor builder.
374 - [KernelGenerator.h](/runtime/onert/backend/acl_cl/KernelGenerator.h)
377 void visit(const ir::operation::Select &) override;
380 - [KernelGenerator.cc](/runtime/onert/backend/acl_cl/KernelGenerator.cc)
383 void KernelGenerator::visit(const ir::operation::Select &node)
385 const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)};
386 const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)};
387 const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)};
388 const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)};
390 auto output_alloc = _tensor_builder->at(output_index).get();
391 auto cond_alloc = _tensor_builder->at(cond_index).get();
392 auto input1_alloc = _tensor_builder->at(input1_index).get();
393 auto input2_alloc = _tensor_builder->at(input2_index).get();
395 auto fn = std::make_unique<::arm_compute::CLSelect>();
397 fn->configure(cond_alloc->handle(), input1_alloc->handle(), input2_alloc->handle(),
398 output_alloc->handle());
400 auto acl_fn = asAclFunction(std::move(fn));
402 _execution_builder->append(std::move(acl_fn));
408 Similar implementation as acl_cl is required.
412 - [KernelGenerator.h](/runtime/onert/backend/cpu/KernelGenerator.h)
415 void visit(const ir::operation::Select &) override;
418 - [KernelGenerator.cc](/runtime/onert/backend/cpu/KernelGenerator.cc)
421 void KernelGenerator::visit(const ir::operation::Select &node)
423 const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)};
424 const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)};
425 const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)};
426 const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)};
428 const auto output_backend_descr = ::onert::backend::cpu::kernel::getTensorDescriptor(
429 _ctx.at(output_index), _current_op_seq_layout);
430 const auto cond_backend_descr = ::onert::backend::cpu::kernel::getTensorDescriptor(
431 _ctx.at(cond_index), _current_op_seq_layout);
432 const auto input1_backend_descr = ::onert::backend::cpu::kernel::getTensorDescriptor(
433 _ctx.at(input1_index), _current_op_seq_layout);
434 const auto input2_backend_descr = ::onert::backend::cpu::kernel::getTensorDescriptor(
435 _ctx.at(input2_index), _current_op_seq_layout);
437 auto output_alloc = _tensor_builder->at(output_index).get();
438 auto cond_alloc = _tensor_builder->at(cond_index).get();
439 auto input1_alloc = _tensor_builder->at(input1_index).get();
440 auto input2_alloc = _tensor_builder->at(input2_index).get();
442 auto fn = std::make_unique<::onert::backend::cpu::kernel::SelectLayer>();
444 fn->configure(cond_alloc->buffer(), cond_backend_descr, input1_alloc->buffer(),
445 input1_backend_descr, input2_alloc->buffer(), input2_backend_descr,
446 output_alloc->buffer(), output_backend_descr);
448 _execution_builder->append(std::move(fn));
452 ### TensorRegister (in some cases)
454 This component registers tensors. Most tensors will be automatically registered internally. There
455 are some exceptions, however, where additional implementations are required. It is the case when a
456 tensor is treated unusually in its backend.
458 The kernel of some operation has weights in `HWIO` as layout(data format) in case of that input's
459 layout is `NHWC`. And, for `NCHW`, weights is `OIHW`. But TFLite model has weigths, `OHWI` for
460 `NHWC` and `OIHW` for `NCHW`. Therefore, to register the appropriate tensor on the backend, you have
461 to implement it additionally.
463 ### ConstantInitializer (in some cases)
465 This component registers function initializing constant tensors and initialize constant tensor
466 layer. This is similar to TensorRegister. Most tensors will be automatically registered internally.
467 And there are some exceptions.
471 - [ConstantInitializer.h](/runtime/onert/backend/cpu/ConstantInitializer.h)
474 void visit(const ir::operation::Conv2D &) override;
477 - [ConstantInitializer.cc](/runtime/onert/backend/cpu/ConstantInitializer.cc)
480 void ConstantInitializer::visit(const ir::operation::Conv2D &node)
482 const auto &kernel_index = node.getInputs().at(ir::operation::Conv2D::KERNEL);
483 const auto &kernel_obj = _operands.at(kernel_index);
484 registerCopyInitializer(kernel_index, kernel_obj);
486 const auto &bias_index = node.getInputs().at(ir::operation::Conv2D::BIAS);
487 const auto &bias_obj = _operands.at(bias_index);
488 registerCopyInitializer(bias_index, bias_obj);
492 ## Samples (to be updated)
495 - Simple explanation : `Output[i] = Condition[i] ? input1[i] : input2[i]`
496 - PR : https://github.com/Samsung/ONE/pull/XXX