docs/howto/how-to-introduce-a-new-operation-into-runtime.md

   1 # How To Introduce a New Operation Into Runtime
   2
   3 **ONE**'s runtime has three main modules: **core**, **frontend** and **backend**. This document
   4 provides some lightweight guidance about how to introduce a new operation into these modules to make
   5 onert support the operation.
   6
   7 ## Index
   8
   9 - [How To Introduce a New Operation Into Runtime](#how-to-introduce-a-new-operation-into-runtime)
  10   - [Index](#index)
  11   - [Core](#core)
  12   - [Frontend](#frontend)
  13     - [Loaders](#loaders)
  14       - [Base Loader](#base-loader)
  15       - [TFLite Loader](#tflite-loader)
  16       - [Circle Loader](#circle-loader)
  17     - [NNAPI](#nnapi)
  18   - [Backend](#backend)
  19     - [ShapeFixer](#shapefixer)
  20       - [acl_cl](#acl_cl)
  21       - [acl_neon](#acl_neon)
  22       - [cpu](#cpu)
  23     - [KernelGenerator](#kernelgenerator)
  24       - [acl_cl](#acl_cl-1)
  25       - [acl_neon](#acl_neon-1)
  26       - [cpu](#cpu-1)
  27     - [ConstantInitializer (in some cases)](#constantinitializer-in-some-cases)
  28       - [cpu](#cpu-2)
  29   - [Samples (to be updated)](#samples-to-be-updated)
  30
  31 ## Core
  32
  33 This module has graph-based IR(intermediate representation). You have to add IR for the new
  34 operation.
  35
  36 1. Add name of new operation at [Operations.lst](/runtime/onert/core/include/ir/Operations.lst)
  37
  38 ```cpp
  39 OP(Select)
  40 ```
  41
  42 2. Create a class for node of new operation in [here](/runtime/onert/core/include/ir/operation/)
  43
  44 ```cpp
  45 #include "ir/Operation.h"
  46
  47 namespace onert
  48 {
  49 namespace ir
  50 {
  51 namespace operation
  52 {
  53
  54 class Select : public Operation
  55 {
  56 public:
  57   enum Input
  58   {
  59     COND = 0,
  60     INPUT1 = 1,
  61     INPUT2 = 2
  62   };
  63
  64   enum Output
  65   {
  66     OUTPUT = 0,
  67   };
  68
  69 public:
  70   Select(const OperandIndexSequence &inputs, const OperandIndexSequence &outputs);
  71
  72 public:
  73   void accept(OperationVisitor &v) const override;
  74   OpCode opcode() const final { return OpCode::Select; }
  75 };
  76
  77 } // namespace operation
  78 } // namespace ir
  79 } // namespace onert
  80 ```
  81
  82 You can also define the class in other source file like below
  83
  84 ```cpp
  85 #include "ir/operation/Select.h"
  86
  87 #include "ir/OperationVisitor.h"
  88
  89 namespace onert
  90 {
  91 namespace ir
  92 {
  93 namespace operation
  94 {
  95
  96 void Select::accept(OperationVisitor &v) const { v.visit(*this); }
  97
  98 Select::Select(const OperandIndexSequence &inputs, const OperandIndexSequence &outputs)
  99     : Operation{OperandConstraint::createExact(3u), inputs, outputs}
 100 {
 101 }
 102 ```
 103   - [Operations.Include.h](/runtime/onert/core/include/ir/Operations.Include.h)
 104
 105 ```cpp
 106 #include "ir/operation/Select.h"
 107 ```
 108
 109 3. Add to the OperationValidator to check if the node is valid.
 110   - [OperationValidator.h](/runtime/onert/core/src/compiler/OperationValidator.h)
 111
 112 ```cpp
 113 void visit(const operation::Select &node) override;
 114 ```
 115
 116   - [OperationValidator.cc](/runtime/onert/core/src/compiler/OperationValidator.cc)
 117
 118 ```cpp
 119 void OperationValidator::visit(const ir::operation::Select &node)
 120 {
 121   const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)};
 122   const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)};
 123   const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)};
 124   const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)};
 125
 126   UNUSED_RELEASE(output_index);
 127   UNUSED_RELEASE(cond_index);
 128   UNUSED_RELEASE(input1_index);
 129   UNUSED_RELEASE(input2_index);
 130
 131   const auto output_type = _ctx.at(output_index).typeInfo();
 132   const auto cond_type = _ctx.at(cond_index).typeInfo();
 133   const auto input1_type = _ctx.at(input1_index).typeInfo();
 134   const auto input2_type = _ctx.at(input2_index).typeInfo();
 135
 136   UNUSED_RELEASE(output_type);
 137   UNUSED_RELEASE(cond_type);
 138   UNUSED_RELEASE(input1_type);
 139   UNUSED_RELEASE(input2_type);
 140
 141   assert(cond_type.type() == ir::DataType::BOOL8);
 142   assert(output_type.type() == ir::DataType::FLOAT32 || output_type.type() == ir::DataType::INT32 ||
 143          output_type.type() == ir::DataType::QUANT8_ASYMM);
 144   assert(output_type.type() == input1_type.type());
 145   assert(output_type.type() == input2_type.type());
 146
 147   const auto output_shape = _ctx.at(output_index).shape();
 148   const auto cond_shape = _ctx.at(cond_index).shape();
 149   const auto input1_shape = _ctx.at(input1_index).shape();
 150   const auto input2_shape = _ctx.at(input2_index).shape();
 151
 152   UNUSED_RELEASE(output_shape);
 153   UNUSED_RELEASE(cond_shape);
 154   UNUSED_RELEASE(input1_shape);
 155   UNUSED_RELEASE(input2_shape);
 156
 157   assert(output_shape == input1_shape);
 158   assert(cond_shape == input1_shape);
 159   assert(input2_shape == input1_shape);
 160 }
 161 ```
 162
 163 4. Add to the Dumper to dump IR information of new operation.
 164 - [Dumper.cc](/runtime/onert/core/src/ir/dumper/Dumper.cc)
 165
 166 ```cpp
 167 void Dumper::visit(const Select &node)
 168 {
 169   VERBOSE(LIR) << "* Select" << std::endl;
 170   VERBOSE(LIR) << "  - Inputs : Cond(" << node.getInputs().at(Select::Input::COND).value()
 171                << ") Input1" << node.getInputs().at(Select::Input::INPUT1).value() << ") Input2"
 172                << node.getInputs().at(Select::Input::INPUT2).value() << ")" << std::endl;
 173   VERBOSE(LIR) << "  - Output : Output(" << node.getOutputs().at(Select::Output::OUTPUT).value()
 174                << ")" << std::endl;
 175 }
 176 ```
 177
 178 5. Add code for shape inference
 179 - ONE runtime tries to calculate shapes and allocate memory during compilation time. For some calculations of output shapes that cannot be done during compilation time, ONE runtime will calculate shapes and allocate memory during execution time.
 180 - Calculation of shapes during compilation time is called _static shape inference_ and calculation of shapes during execution time is called _dynamic shape inference_.
 181 - [`StaticShapeInferer.h`](`/runtime/onert/compiler/StaticShapeInferer.h`)
 182
 183 ```CPP
 184   void visit(const ir::operation::Select &op) override;
 185 ```
 186 - [`StaticShapeInferer.cc`](/runtime/onert/core/src/compiler/StaticShapeInferer.cc)
 187 ```CPP
 188 void StaticShapeInferer::visit(const ir::operation::Select &op)
 189 {
 190   const auto input_cond_idx{op.getInputs().at(ir::operation::Select::Input::CONDITION)};
 191   const auto &input_cond = _operands.at(input_cond_idx);
 192
 193   const auto &input_true = ...
 194   const auto &input_false = ...
 195   ir::Operand &output = ...
 196
 197   // Select output shpae
 198   ir::Shape new_shape = shape_inference::inferSelectShape(
 199       input_cond.info().shape(), input_true.info().shape(), input_false.info().shape());
 200   output.info().shape(new_shape);
 201 }
 202 ```
 203 - [`DynamicShapeInference.h`](/runtime/onert/core/include/exec/DynamicShapeInference.h)
 204 ```CPP
 205   void visit(const ir::operation::Select &op) override;
 206 ```
 207 - [`DynamicShapeInference.cc`](/runtime/onert/core/src/exec/DynamicShapeInference.cc)
 208 ```CPP
 209 void DynamicShapeInferer::visit(const ir::operation::Select &op)
 210 {
 211   const auto input_cond_idx = op.getInputs().at(ir::operation::Select::Input::CONDITION);
 212   const auto &input_cond = _tensor_registry->getITensor(input_cond_idx);
 213
 214   const auto &input_true = ...
 215   const auto &input_false = ...
 216   auto output = ...
 217
 218   if ((!input_cond->is_dynamic()) && (!input_true->is_dynamic()) && (!input_false->is_dynamic()))
 219   {
 220     return;
 221   }
 222
 223   auto input_cond_shape = input_cond->getShape();
 224   auto input_true_shape = input_true->getShape();
 225   auto input_false_shape = input_false->getShape();
 226
 227   // Select output shpae
 228   ir::Shape new_shape =
 229       shape_inference::inferSelectShape(input_cond_shape, input_true_shape, input_false_shape);
 230
 231   output->applyShape(new_shape);
 232 }
 233 ```
 234
 235 ## Frontend
 236
 237 This module generates IR from a model. There are two kinds of frontend: Loader and NNAPI. First, Loader loads a model file and generates IR from it. Second, NNAPI generates IR from a model set via [Neural Networks API of android](https://developer.android.com/ndk/guides/neuralnetworks)
 238
 239 ### Loaders
 240
 241 #### Base Loader
 242
 243 This is where the common parts of loaders are implemented.
 244
 245 1. Add to base_loader to load new operation and to generate IR from it
 246 - [base_loader](/runtime/onert/frontend/base_loader/include/base_loader.h)
 247
 248 ```cpp
 249     case BuiltinOperator::BuiltinOperator_SELECT:
 250       loadSelect(op);
 251       return;
 252 ```
 253
 254 ```cpp
 255 template <typename LoaderDomain, typename SpecificLoader>
 256 void BaseLoader<LoaderDomain, SpecificLoader>::loadSelect(const Operator *op)
 257 {
 258   ir::OperandIndexSequence inputs;
 259   ir::OperandIndexSequence outputs;
 260
 261   loadOperationIO(op, inputs, outputs);
 262
 263   std::unique_ptr<ir::Operation> new_op{new ir::operation::Select{inputs, outputs}};
 264   _graph.addOperation(std::move(new_op));
 265 }
 266 ```
 267
 268 #### TFLite Loader
 269
 270 This loads a tflite file.
 271 If you want new operation to be loaded on only TFLite Loader, you only need to implement loading the operation here.
 272
 273 #### Circle Loader
 274
 275 This loads a circle file generated by the compiler.
 276 If you want new operation to be loaded on only Circle Loader, you only need to implement loading the operation here.
 277
 278 ### NNAPI
 279
 280 1. Add to the OperationFactory to generate IR of new operation
 281 - [OperationFactory](/runtime/onert/frontend/nnapi/wrapper/OperationFactory.cc)
 282
 283 ```cpp
 284   _map[ANEURALNETWORKS_SELECT] = [](const OperationFactory::Param &init_param, Operands &) {
 285     assert(init_param.input_count == 3 && init_param.output_count == 1);
 286
 287     OperandIndexSequence outputs{init_param.outputs[0]};
 288
 289     // Each input should be interpreted as follows:
 290     //
 291     //  0 -> Cond Tensor Index
 292     //  1 -> Input1 Tensor Index
 293     //  2 -> Input2 Tensor Index
 294     OperandIndexSequence inputs;
 295     for (uint32_t n = 0; n < init_param.input_count; ++n)
 296     {
 297       inputs.append(OperandIndex{init_param.inputs[n]});
 298     }
 299
 300     return new operation::Select{inputs, outputs};
 301   };
 302 ```
 303
 304 2. If you want that NNAPI supports new operation of TFLite's model, you need to update the things related to the operation in [nnapi_delegate](/runtime/libs/tflite/port/1.13.1/src/nnapi_delegate.cpp) like below
 305
 306 ```cpp
 307       case tflite::BuiltinOperator_SELECT:
 308         nnapi_version = 12;  // require NNAPI 1.2
 309         nn_op_type = ANEURALNETWORKS_SELECT;
 310         break;
 311 ```
 312
 313 ## Backend
 314
 315 This module generates kernels and tensors of backend such as [ComputeLibrary](https://github.com/ARM-software/ComputeLibrary/) from generated graph-based IR. For this, the runtime fairly works on it internally. But this is not enough because of dependence on backend. So, there are several components that require additional implementation on each backend.
 316
 317 ### ShapeFixer
 318
 319 Even for tensors of the same operation, the shape required for each backend can be different. Therefore, this component modifies and fixes shape of tensors of the backend.
 320
 321 #### acl_cl
 322
 323 The kernel of the ACL for the Add operation needs to match the same rank to support the broadcast.
 324 - [ShapeFixer.h](/runtime/onert/backend/acl_cl/ShapeFixer.h)
 325
 326 ```cpp
 327 void visit(const ir::operation::Add &) override;
 328 ```
 329
 330 - [ShapeFixer.cc](/runtime/onert/backend/acl_cl/ShapeFixer.cc)
 331
 332 ```cpp
 333 void ShapeFixer::visit(const ir::operation::Add &node)
 334 {
 335   const auto lhs_index{node.getInputs().at(ir::operation::Add::Input::LHS)};
 336   const auto rhs_index{node.getInputs().at(ir::operation::Add::Input::RHS)};
 337
 338   if (!(_ctx.at(lhs_index).shape() == _ctx.at(rhs_index).shape()))
 339   {
 340     const auto broadcast_rank =
 341         std::max(_ctx.at(lhs_index).shape().rank(), _ctx.at(rhs_index).shape().rank());
 342     const_cast<ir::Shape &>(_ctx.at(lhs_index).shape()).extendRank(broadcast_rank);
 343     const_cast<ir::Shape &>(_ctx.at(rhs_index).shape()).extendRank(broadcast_rank);
 344   }
 345 }
 346 ```
 347
 348 #### acl_neon
 349
 350 Same implementation as acl_cl is required.
 351
 352 #### cpu
 353
 354 This backend doesn't usually require a change of shape.
 355 - [ShapeFixer.h](/runtime/onert/backend/cpu/ShapeFixer.h)
 356
 357 ```cpp
 358 void visit(const ir::operation::Select &) override;
 359 ```
 360
 361 - [ShapeFixer.cc](/runtime/onert/backend/cpu/ShapeFixer.cc)
 362
 363 ```cpp
 364 void ShapeFixer::visit(const ir::operation::Select &) { /* DO NOTHING */}
 365 ```
 366
 367 ### KernelGenerator
 368
 369 This component generates kernels of backend. You have to generate kernel of new operation. And then append it to execution builder. You can obtain information of the node from IR and necessary tensors from tensor builder.
 370
 371 #### acl_cl
 372
 373 - [KernelGenerator.h](/runtime/onert/backend/acl_cl/KernelGenerator.h)
 374
 375 ```cpp
 376 void visit(const ir::operation::Select &) override;
 377 ```
 378
 379 - [KernelGenerator.cc](/runtime/onert/backend/acl_cl/KernelGenerator.cc)
 380
 381 ```cpp
 382 void KernelGenerator::visit(const ir::operation::Select &node)
 383 {
 384   const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)};
 385   const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)};
 386   const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)};
 387   const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)};
 388
 389   auto output_alloc = _tensor_builder->at(output_index).get();
 390   auto cond_alloc = _tensor_builder->at(cond_index).get();
 391   auto input1_alloc = _tensor_builder->at(input1_index).get();
 392   auto input2_alloc = _tensor_builder->at(input2_index).get();
 393
 394   auto fn = std::make_unique<::arm_compute::CLSelect>();
 395
 396   fn->configure(cond_alloc->handle(), input1_alloc->handle(), input2_alloc->handle(),
 397                 output_alloc->handle());
 398
 399   auto acl_fn = asAclFunction(std::move(fn));
 400
 401   _execution_builder->append(std::move(acl_fn));
 402 }
 403 ```
 404
 405 #### acl_neon
 406
 407 Similar implementation as acl_cl is required.
 408
 409 #### cpu
 410
 411 - [KernelGenerator.h](/runtime/onert/backend/cpu/KernelGenerator.h)
 412
 413 ```cpp
 414 void visit(const ir::operation::Select &) override;
 415 ```
 416
 417 - [KernelGenerator.cc](/runtime/onert/backend/cpu/KernelGenerator.cc)
 418
 419 ```cpp
 420 void KernelGenerator::visit(const ir::operation::Select &node)
 421 {
 422   const auto output_index{node.getOutputs().at(0)};
 423   const auto condition_index{node.getInputs().at(ir::operation::Select::Input::CONDITION)};
 424   const auto true_index{node.getInputs().at(ir::operation::Select::Input::INPUT_TRUE)};
 425   const auto false_index{node.getInputs().at(ir::operation::Select::Input::INPUT_FALSE)};
 426
 427   auto output_tensor = _tensor_reg->getPortableTensor(output_index);
 428   auto condition_tensor = _tensor_reg->getPortableTensor(condition_index);
 429   auto true_tensor = _tensor_reg->getPortableTensor(true_index);
 430   auto false_tensor = _tensor_reg->getPortableTensor(false_index);
 431
 432   auto fn = std::make_unique<ops::SelectLayer>();
 433
 434   fn->configure(condition_tensor, true_tensor, false_tensor, output_tensor);
 435
 436   _return_fn = std::move(fn);
 437 }
 438 ```
 439
 440 ### ConstantInitializer (in some cases)
 441
 442 This component registers function initializing constant tensors and initialize constant tensor
 443 layer. Most tensors will be automatically registered internally. And there are some exceptions.
 444
 445 #### cpu
 446
 447 - [ConstantInitializer.h](/runtime/onert/backend/cpu/ConstantInitializer.h)
 448
 449 ```cpp
 450 void visit(const ir::operation::Conv2D &) override;
 451 ```
 452
 453 - [ConstantInitializer.cc](/runtime/onert/backend/cpu/ConstantInitializer.cc)
 454
 455 ```cpp
 456 void ConstantInitializer::visit(const ir::operation::Conv2D &node)
 457 {
 458   const auto &kernel_index = node.getInputs().at(ir::operation::Conv2D::KERNEL);
 459   const auto &kernel_obj = _operands.at(kernel_index);
 460   registerCopyInitializer(kernel_index, kernel_obj);
 461
 462   const auto &bias_index = node.getInputs().at(ir::operation::Conv2D::BIAS);
 463   const auto &bias_obj = _operands.at(bias_index);
 464   registerCopyInitializer(bias_index, bias_obj);
 465 }
 466 ```
 467
 468 ## Samples (to be updated)
 469
 470 - `Select` operation
 471   - Simple explanation : `Output[i] = Condition[i] ? input1[i] : input2[i]`
 472   - PR : https://github.com/Samsung/ONE/pull/XXX
 473