docs/howto/how-to-introduce-a-new-operation-into-runtime.md

   1 # How To Introduce a New Operation Into Runtime
   2
   3 **ONE**'s runtime has three main modules: **core**, **frontend** and **backend**. This document
   4 provides some lightweight guidance about how to introduce a new operation into these modules to make
   5 onert support the operation.
   6
   7 ## Index
   8
   9 - [How To Introduce a New Operation Into Runtime](#how-to-introduce-a-new-operation-into-runtime)
  10   - [Index](#index)
  11   - [Core](#core)
  12   - [Frontend](#frontend)
  13     - [Loaders](#loaders)
  14       - [Base Loader](#base-loader)
  15       - [TFLite Loader](#tflite-loader)
  16       - [Circle Loader](#circle-loader)
  17     - [NNAPI](#nnapi)
  18   - [Backend](#backend)
  19     - [ShapeFixer](#shapefixer)
  20       - [acl_cl](#acl_cl)
  21       - [acl_neon](#acl_neon)
  22       - [cpu](#cpu)
  23     - [KernelGenerator](#kernelgenerator)
  24       - [acl_cl](#acl_cl-1)
  25       - [acl_neon](#acl_neon-1)
  26       - [cpu](#cpu-1)
  27     - [TensorRegister (in some cases)](#tensorregister-in-some-cases)
  28     - [ConstantInitializer (in some cases)](#constantinitializer-in-some-cases)
  29       - [cpu](#cpu-2)
  30   - [Samples (to be updated)](#samples-to-be-updated)
  31
  32 ## Core
  33
  34 This module has graph-based IR(intermediate representation). You have to add IR for the new
  35 operation.
  36
  37 1. Add name of new operation at [Operations.lst](/runtime/onert/core/include/ir/Operations.lst)
  38
  39 ```cpp
  40 OP(Select)
  41 ```
  42
  43 2. Create a class for node of new operation in [here](/runtime/onert/core/include/ir/operation/)
  44
  45 ```cpp
  46 #include "ir/Operation.h"
  47
  48 namespace onert
  49 {
  50 namespace ir
  51 {
  52 namespace operation
  53 {
  54
  55 class Select : public Operation
  56 {
  57 public:
  58   enum Input
  59   {
  60     COND = 0,
  61     INPUT1 = 1,
  62     INPUT2 = 2
  63   };
  64
  65   enum Output
  66   {
  67     OUTPUT = 0,
  68   };
  69
  70 public:
  71   Select(const OperandIndexSequence &inputs, const OperandIndexSequence &outputs);
  72
  73 public:
  74   void accept(OperationVisitor &v) const override;
  75   OpCode opcode() const final { return OpCode::Select; }
  76 };
  77
  78 } // namespace operation
  79 } // namespace ir
  80 } // namespace onert
  81 ```
  82
  83 You can also define the class in other source file like below
  84
  85 ```cpp
  86 #include "ir/operation/Select.h"
  87
  88 #include "ir/OperationVisitor.h"
  89
  90 namespace onert
  91 {
  92 namespace ir
  93 {
  94 namespace operation
  95 {
  96
  97 void Select::accept(OperationVisitor &v) const { v.visit(*this); }
  98
  99 Select::Select(const OperandIndexSequence &inputs, const OperandIndexSequence &outputs)
 100     : Operation{OperandConstraint::createExact(3u), inputs, outputs}
 101 {
 102 }
 103 ```
 104   - [Operations.Include.h](/runtime/onert/core/include/ir/Operations.Include.h)
 105
 106 ```cpp
 107 #include "ir/operation/Select.h"
 108 ```
 109
 110 3. Add to the OperationValidator to check if the node is valid.
 111   - [OperationValidator.h](/runtime/onert/core/src/compiler/OperationValidator.h)
 112
 113 ```cpp
 114 void visit(const operation::Select &node) override;
 115 ```
 116
 117   - [OperationValidator.cc](/runtime/onert/core/src/compiler/OperationValidator.cc)
 118
 119 ```cpp
 120 void OperationValidator::visit(const ir::operation::Select &node)
 121 {
 122   const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)};
 123   const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)};
 124   const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)};
 125   const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)};
 126
 127   UNUSED_RELEASE(output_index);
 128   UNUSED_RELEASE(cond_index);
 129   UNUSED_RELEASE(input1_index);
 130   UNUSED_RELEASE(input2_index);
 131
 132   const auto output_type = _ctx.at(output_index).typeInfo();
 133   const auto cond_type = _ctx.at(cond_index).typeInfo();
 134   const auto input1_type = _ctx.at(input1_index).typeInfo();
 135   const auto input2_type = _ctx.at(input2_index).typeInfo();
 136
 137   UNUSED_RELEASE(output_type);
 138   UNUSED_RELEASE(cond_type);
 139   UNUSED_RELEASE(input1_type);
 140   UNUSED_RELEASE(input2_type);
 141
 142   assert(cond_type.type() == ir::DataType::BOOL8);
 143   assert(output_type.type() == ir::DataType::FLOAT32 || output_type.type() == ir::DataType::INT32 ||
 144          output_type.type() == ir::DataType::QUANT8_ASYMM);
 145   assert(output_type.type() == input1_type.type());
 146   assert(output_type.type() == input2_type.type());
 147
 148   const auto output_shape = _ctx.at(output_index).shape();
 149   const auto cond_shape = _ctx.at(cond_index).shape();
 150   const auto input1_shape = _ctx.at(input1_index).shape();
 151   const auto input2_shape = _ctx.at(input2_index).shape();
 152
 153   UNUSED_RELEASE(output_shape);
 154   UNUSED_RELEASE(cond_shape);
 155   UNUSED_RELEASE(input1_shape);
 156   UNUSED_RELEASE(input2_shape);
 157
 158   assert(output_shape == input1_shape);
 159   assert(cond_shape == input1_shape);
 160   assert(input2_shape == input1_shape);
 161 }
 162 ```
 163
 164 4. Add to the Dumper to dump IR information of new operation.
 165 - [Dumper.cc](/runtime/onert/core/src/ir/dumper/Dumper.cc)
 166
 167 ```cpp
 168 void Dumper::visit(const Select &node)
 169 {
 170   VERBOSE(LIR) << "* Select" << std::endl;
 171   VERBOSE(LIR) << "  - Inputs : Cond(" << node.getInputs().at(Select::Input::COND).value()
 172                << ") Input1" << node.getInputs().at(Select::Input::INPUT1).value() << ") Input2"
 173                << node.getInputs().at(Select::Input::INPUT2).value() << ")" << std::endl;
 174   VERBOSE(LIR) << "  - Output : Output(" << node.getOutputs().at(Select::Output::OUTPUT).value()
 175                << ")" << std::endl;
 176 }
 177 ```
 178
 179 5. Add code for shape inference
 180 - ONE runtime tries to calculate shapes and allocate memory during compilation time. For some calculations of output shapes that cannot be done during compilation time, ONE runtime will calculate shapes and allocate memory during execution time.
 181 - Calculation of shapes during compilation time is called _static shape inference_ and calculation of shapes during execution time is called _dynamic shape inference_.
 182 - [`StaticShapeInferer.h`](`/runtime/onert/compiler/StaticShapeInferer.h`)
 183
 184 ```CPP
 185   void visit(const ir::operation::Select &op) override;
 186 ```
 187 - [`StaticShapeInferer.cc`](/runtime/onert/core/src/compiler/StaticShapeInferer.cc)
 188 ```CPP
 189 void StaticShapeInferer::visit(const ir::operation::Select &op)
 190 {
 191   const auto input_cond_idx{op.getInputs().at(ir::operation::Select::Input::CONDITION)};
 192   const auto &input_cond = _operands.at(input_cond_idx);
 193
 194   const auto &input_true = ...
 195   const auto &input_false = ...
 196   ir::Operand &output = ...
 197
 198   // Select output shpae
 199   ir::Shape new_shape = shape_inference::inferSelectShape(
 200       input_cond.info().shape(), input_true.info().shape(), input_false.info().shape());
 201   output.info().shape(new_shape);
 202 }
 203 ```
 204 - [`DynamicShapeInference.h`](/runtime/onert/core/include/exec/DynamicShapeInference.h)
 205 ```CPP
 206   void visit(const ir::operation::Select &op) override;
 207 ```
 208 - [`DynamicShapeInference.cc`](/runtime/onert/core/src/exec/DynamicShapeInference.cc)
 209 ```CPP
 210 void DynamicShapeInferer::visit(const ir::operation::Select &op)
 211 {
 212   const auto input_cond_idx = op.getInputs().at(ir::operation::Select::Input::CONDITION);
 213   const auto &input_cond = _tensor_registry->getITensor(input_cond_idx);
 214
 215   const auto &input_true = ...
 216   const auto &input_false = ...
 217   auto output = ...
 218
 219   if ((!input_cond->is_dynamic()) && (!input_true->is_dynamic()) && (!input_false->is_dynamic()))
 220   {
 221     return;
 222   }
 223
 224   auto input_cond_shape = input_cond->getShape();
 225   auto input_true_shape = input_true->getShape();
 226   auto input_false_shape = input_false->getShape();
 227
 228   // Select output shpae
 229   ir::Shape new_shape =
 230       shape_inference::inferSelectShape(input_cond_shape, input_true_shape, input_false_shape);
 231
 232   output->applyShape(new_shape);
 233 }
 234 ```
 235
 236 ## Frontend
 237
 238 This module generates IR from a model. There are two kinds of frontend: Loader and NNAPI. First, Loader loads a model file and generates IR from it. Second, NNAPI generates IR from a model set via [Neural Networks API of android](https://developer.android.com/ndk/guides/neuralnetworks)
 239
 240 ### Loaders
 241
 242 #### Base Loader
 243
 244 This is where the common parts of loaders are implemented.
 245
 246 1. Add to base_loader to load new operation and to generate IR from it
 247 - [base_loader](/runtime/onert/frontend/base_loader/include/base_loader.h)
 248
 249 ```cpp
 250     case BuiltinOperator::BuiltinOperator_SELECT:
 251       loadSelect(op);
 252       return;
 253 ```
 254
 255 ```cpp
 256 template <typename LoaderDomain, typename SpecificLoader>
 257 void BaseLoader<LoaderDomain, SpecificLoader>::loadSelect(const Operator *op)
 258 {
 259   ir::OperandIndexSequence inputs;
 260   ir::OperandIndexSequence outputs;
 261
 262   loadOperationIO(op, inputs, outputs);
 263
 264   std::unique_ptr<ir::Operation> new_op{new ir::operation::Select{inputs, outputs}};
 265   _graph.addOperation(std::move(new_op));
 266 }
 267 ```
 268
 269 #### TFLite Loader
 270
 271 This loads a tflite file.
 272 If you want new operation to be loaded on only TFLite Loader, you only need to implement loading the operation here.
 273
 274 #### Circle Loader
 275
 276 This loads a circle file generated by the compiler.
 277 If you want new operation to be loaded on only Circle Loader, you only need to implement loading the operation here.
 278
 279 ### NNAPI
 280
 281 1. Add to the OperationFactory to generate IR of new operation
 282 - [OperationFactory](/runtime/onert/frontend/nnapi/wrapper/OperationFactory.cc)
 283
 284 ```cpp
 285   _map[ANEURALNETWORKS_SELECT] = [](const OperationFactory::Param &init_param, Operands &) {
 286     assert(init_param.input_count == 3 && init_param.output_count == 1);
 287
 288     OperandIndexSequence outputs{init_param.outputs[0]};
 289
 290     // Each input should be interpreted as follows:
 291     //
 292     //  0 -> Cond Tensor Index
 293     //  1 -> Input1 Tensor Index
 294     //  2 -> Input2 Tensor Index
 295     OperandIndexSequence inputs;
 296     for (uint32_t n = 0; n < init_param.input_count; ++n)
 297     {
 298       inputs.append(OperandIndex{init_param.inputs[n]});
 299     }
 300
 301     return new operation::Select{inputs, outputs};
 302   };
 303 ```
 304
 305 2. If you want that NNAPI supports new operation of TFLite's model, you need to update the things related to the operation in [nnapi_delegate](/runtime/libs/tflite/port/1.13.1/src/nnapi_delegate.cpp) like below
 306
 307 ```cpp
 308       case tflite::BuiltinOperator_SELECT:
 309         nnapi_version = 12;  // require NNAPI 1.2
 310         nn_op_type = ANEURALNETWORKS_SELECT;
 311         break;
 312 ```
 313
 314 ## Backend
 315
 316 This module generates kernels and tensors of backend such as [ComputeLibrary](https://github.com/ARM-software/ComputeLibrary/) from generated graph-based IR. For this, the runtime fairly works on it internally. But this is not enough because of dependence on backend. So, there are several components that require additional implementation on each backend.
 317
 318 ### ShapeFixer
 319
 320 Even for tensors of the same operation, the shape required for each backend can be different. Therefore, this component modifies and fixes shape of tensors of the backend.
 321
 322 #### acl_cl
 323
 324 The kernel of the ACL for the Add operation needs to match the same rank to support the broadcast.
 325 - [ShapeFixer.h](/runtime/onert/backend/acl_cl/ShapeFixer.h)
 326
 327 ```cpp
 328 void visit(const ir::operation::Add &) override;
 329 ```
 330
 331 - [ShapeFixer.cc](/runtime/onert/backend/acl_cl/ShapeFixer.cc)
 332
 333 ```cpp
 334 void ShapeFixer::visit(const ir::operation::Add &node)
 335 {
 336   const auto lhs_index{node.getInputs().at(ir::operation::Add::Input::LHS)};
 337   const auto rhs_index{node.getInputs().at(ir::operation::Add::Input::RHS)};
 338
 339   if (!(_ctx.at(lhs_index).shape() == _ctx.at(rhs_index).shape()))
 340   {
 341     const auto broadcast_rank =
 342         std::max(_ctx.at(lhs_index).shape().rank(), _ctx.at(rhs_index).shape().rank());
 343     const_cast<ir::Shape &>(_ctx.at(lhs_index).shape()).extendRank(broadcast_rank);
 344     const_cast<ir::Shape &>(_ctx.at(rhs_index).shape()).extendRank(broadcast_rank);
 345   }
 346 }
 347 ```
 348
 349 #### acl_neon
 350
 351 Same implementation as acl_cl is required.
 352
 353 #### cpu
 354
 355 This backend doesn't usually require a change of shape.
 356 - [ShapeFixer.h](/runtime/onert/backend/cpu/ShapeFixer.h)
 357
 358 ```cpp
 359 void visit(const ir::operation::Select &) override;
 360 ```
 361
 362 - [ShapeFixer.cc](/runtime/onert/backend/cpu/ShapeFixer.cc)
 363
 364 ```cpp
 365 void ShapeFixer::visit(const ir::operation::Select &) { /* DO NOTHING */}
 366 ```
 367
 368 ### KernelGenerator
 369
 370 This component generates kernels of backend. You have to generate kernel of new operation. And then append it to execution builder. You can obtain information of the node from IR and necessary tensors from tensor builder.
 371
 372 #### acl_cl
 373
 374 - [KernelGenerator.h](/runtime/onert/backend/acl_cl/KernelGenerator.h)
 375
 376 ```cpp
 377 void visit(const ir::operation::Select &) override;
 378 ```
 379
 380 - [KernelGenerator.cc](/runtime/onert/backend/acl_cl/KernelGenerator.cc)
 381
 382 ```cpp
 383 void KernelGenerator::visit(const ir::operation::Select &node)
 384 {
 385   const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)};
 386   const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)};
 387   const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)};
 388   const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)};
 389
 390   auto output_alloc = _tensor_builder->at(output_index).get();
 391   auto cond_alloc = _tensor_builder->at(cond_index).get();
 392   auto input1_alloc = _tensor_builder->at(input1_index).get();
 393   auto input2_alloc = _tensor_builder->at(input2_index).get();
 394
 395   auto fn = std::make_unique<::arm_compute::CLSelect>();
 396
 397   fn->configure(cond_alloc->handle(), input1_alloc->handle(), input2_alloc->handle(),
 398                 output_alloc->handle());
 399
 400   auto acl_fn = asAclFunction(std::move(fn));
 401
 402   _execution_builder->append(std::move(acl_fn));
 403 }
 404 ```
 405
 406 #### acl_neon
 407
 408 Similar implementation as acl_cl is required.
 409
 410 #### cpu
 411
 412 - [KernelGenerator.h](/runtime/onert/backend/cpu/KernelGenerator.h)
 413
 414 ```cpp
 415 void visit(const ir::operation::Select &) override;
 416 ```
 417
 418 - [KernelGenerator.cc](/runtime/onert/backend/cpu/KernelGenerator.cc)
 419
 420 ```cpp
 421 void KernelGenerator::visit(const ir::operation::Select &node)
 422 {
 423   const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)};
 424   const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)};
 425   const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)};
 426   const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)};
 427
 428   const auto output_backend_descr = ::onert::backend::cpu::kernel::getTensorDescriptor(
 429       _ctx.at(output_index), _current_op_seq_layout);
 430   const auto cond_backend_descr = ::onert::backend::cpu::kernel::getTensorDescriptor(
 431       _ctx.at(cond_index), _current_op_seq_layout);
 432   const auto input1_backend_descr = ::onert::backend::cpu::kernel::getTensorDescriptor(
 433       _ctx.at(input1_index), _current_op_seq_layout);
 434   const auto input2_backend_descr = ::onert::backend::cpu::kernel::getTensorDescriptor(
 435       _ctx.at(input2_index), _current_op_seq_layout);
 436
 437   auto output_alloc = _tensor_builder->at(output_index).get();
 438   auto cond_alloc = _tensor_builder->at(cond_index).get();
 439   auto input1_alloc = _tensor_builder->at(input1_index).get();
 440   auto input2_alloc = _tensor_builder->at(input2_index).get();
 441
 442   auto fn = std::make_unique<::onert::backend::cpu::kernel::SelectLayer>();
 443
 444   fn->configure(cond_alloc->buffer(), cond_backend_descr, input1_alloc->buffer(),
 445                 input1_backend_descr, input2_alloc->buffer(), input2_backend_descr,
 446                 output_alloc->buffer(), output_backend_descr);
 447
 448   _execution_builder->append(std::move(fn));
 449 }
 450 ```
 451
 452 ### TensorRegister (in some cases)
 453
 454 This component registers tensors. Most tensors will be automatically registered internally. There
 455 are some exceptions, however, where additional implementations are required. It is the case when a
 456 tensor is treated unusually in its backend.
 457
 458 The kernel of some operation has weights in `HWIO` as layout(data format) in case of that input's
 459 layout is `NHWC`. And, for `NCHW`, weights is `OIHW`. But TFLite model has weigths, `OHWI` for
 460 `NHWC` and `OIHW` for `NCHW`. Therefore, to register the appropriate tensor on the backend, you have
 461 to implement it additionally.
 462
 463 ### ConstantInitializer (in some cases)
 464
 465 This component registers function initializing constant tensors and initialize constant tensor
 466 layer. This is similar to TensorRegister. Most tensors will be automatically registered internally.
 467 And there are some exceptions.
 468
 469 #### cpu
 470
 471 - [ConstantInitializer.h](/runtime/onert/backend/cpu/ConstantInitializer.h)
 472
 473 ```cpp
 474 void visit(const ir::operation::Conv2D &) override;
 475 ```
 476
 477 - [ConstantInitializer.cc](/runtime/onert/backend/cpu/ConstantInitializer.cc)
 478
 479 ```cpp
 480 void ConstantInitializer::visit(const ir::operation::Conv2D &node)
 481 {
 482   const auto &kernel_index = node.getInputs().at(ir::operation::Conv2D::KERNEL);
 483   const auto &kernel_obj = _operands.at(kernel_index);
 484   registerCopyInitializer(kernel_index, kernel_obj);
 485
 486   const auto &bias_index = node.getInputs().at(ir::operation::Conv2D::BIAS);
 487   const auto &bias_obj = _operands.at(bias_index);
 488   registerCopyInitializer(bias_index, bias_obj);
 489 }
 490 ```
 491
 492 ## Samples (to be updated)
 493
 494 - `Select` operation
 495   - Simple explanation : `Output[i] = Condition[i] ? input1[i] : input2[i]`
 496   - PR : https://github.com/Samsung/ONE/pull/XXX
 497