docs/howto/how-to-introduce-a-new-operation-into-runtime.md

   1 # How To Introduce a New Operation Into Runtime
   2
   3 **ONE**'s runtime has three main modules: **core**, **frontend** and **backend**. This document
   4 provides some lightweight guidance about how to introduce a new operation into these modules to make
   5 onert support the operation.
   6
   7 ## Index
   8
   9 - [How To Introduce a New Operation Into Runtime](#how-to-introduce-a-new-operation-into-runtime)
  10   - [Index](#index)
  11   - [Core](#core)
  12   - [Frontend](#frontend)
  13     - [Loaders](#loaders)
  14       - [Base Loader](#base-loader)
  15       - [TFLite Loader](#tflite-loader)
  16       - [Circle Loader](#circle-loader)
  17     - [NNAPI](#nnapi)
  18   - [Backend](#backend)
  19     - [ShapeFixer](#shapefixer)
  20       - [acl_cl](#acl_cl)
  21       - [acl_neon](#acl_neon)
  22       - [cpu](#cpu)
  23     - [KernelGenerator](#kernelgenerator)
  24       - [acl_cl](#acl_cl-1)
  25       - [acl_neon](#acl_neon-1)
  26       - [cpu](#cpu-1)
  27     - [TensorRegister (in some cases)](#tensorregister-in-some-cases)
  28     - [ConstantInitializer (in some cases)](#constantinitializer-in-some-cases)
  29       - [cpu](#cpu-2)
  30   - [Samples (to be updated)](#samples-to-be-updated)
  31
  32 ## Core
  33
  34 This module has graph-based IR(intermediate representation). You have to add IR for the new
  35 operation.
  36
  37 1. Add name of new operation at [Operations.lst](/runtime/onert/core/include/ir/Operations.lst)
  38
  39 ```cpp
  40 OP(Select)
  41 ```
  42
  43 2. Create a class for node of new operation in [here](/runtime/onert/core/include/ir/operation/)
  44
  45 ```cpp
  46 #include "ir/Operation.h"
  47
  48 namespace onert
  49 {
  50 namespace ir
  51 {
  52 namespace operation
  53 {
  54
  55 class Select : public Operation
  56 {
  57 public:
  58   enum Input
  59   {
  60     COND = 0,
  61     INPUT1 = 1,
  62     INPUT2 = 2
  63   };
  64
  65   enum Output
  66   {
  67     OUTPUT = 0,
  68   };
  69
  70 public:
  71   Select(const OperandIndexSequence &inputs, const OperandIndexSequence &outputs);
  72
  73 public:
  74   void accept(OperationVisitor &v) const override;
  75   OpCode opcode() const final { return OpCode::Select; }
  76 };
  77
  78 } // namespace operation
  79 } // namespace ir
  80 } // namespace onert
  81 ```
  82
  83 You can also define the class in other source file like below
  84
  85 ```cpp
  86 #include "ir/operation/Select.h"
  87
  88 #include "ir/OperationVisitor.h"
  89
  90 namespace onert
  91 {
  92 namespace ir
  93 {
  94 namespace operation
  95 {
  96
  97 void Select::accept(OperationVisitor &v) const { v.visit(*this); }
  98
  99 Select::Select(const OperandIndexSequence &inputs, const OperandIndexSequence &outputs)
 100     : Operation{OperandConstraint::createExact(3u), inputs, outputs}
 101 {
 102 }
 103 ```
 104   - [Operations.Include.h](/runtime/onert/core/include/ir/Operations.Include.h)
 105
 106 ```cpp
 107 #include "ir/operation/Select.h"
 108 ```
 109
 110 3. Add to the OperationValidator to check if the node is valid.
 111   - [OperationValidator.h](/runtime/onert/core/src/compiler/OperationValidator.h)
 112
 113 ```cpp
 114 void visit(const operation::Select &node) override;
 115 ```
 116
 117   - [OperationValidator.cc](/runtime/onert/core/src/compiler/OperationValidator.cc)
 118
 119 ```cpp
 120 void OperationValidator::visit(const ir::operation::Select &node)
 121 {
 122   const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)};
 123   const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)};
 124   const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)};
 125   const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)};
 126
 127   UNUSED_RELEASE(output_index);
 128   UNUSED_RELEASE(cond_index);
 129   UNUSED_RELEASE(input1_index);
 130   UNUSED_RELEASE(input2_index);
 131
 132   const auto output_type = _ctx.at(output_index).typeInfo();
 133   const auto cond_type = _ctx.at(cond_index).typeInfo();
 134   const auto input1_type = _ctx.at(input1_index).typeInfo();
 135   const auto input2_type = _ctx.at(input2_index).typeInfo();
 136
 137   UNUSED_RELEASE(output_type);
 138   UNUSED_RELEASE(cond_type);
 139   UNUSED_RELEASE(input1_type);
 140   UNUSED_RELEASE(input2_type);
 141
 142   assert(cond_type.type() == ir::DataType::BOOL8);
 143   assert(output_type.type() == ir::DataType::FLOAT32 || output_type.type() == ir::DataType::INT32 ||
 144          output_type.type() == ir::DataType::QUANT8_ASYMM);
 145   assert(output_type.type() == input1_type.type());
 146   assert(output_type.type() == input2_type.type());
 147
 148   const auto output_shape = _ctx.at(output_index).shape();
 149   const auto cond_shape = _ctx.at(cond_index).shape();
 150   const auto input1_shape = _ctx.at(input1_index).shape();
 151   const auto input2_shape = _ctx.at(input2_index).shape();
 152
 153   UNUSED_RELEASE(output_shape);
 154   UNUSED_RELEASE(cond_shape);
 155   UNUSED_RELEASE(input1_shape);
 156   UNUSED_RELEASE(input2_shape);
 157
 158   assert(output_shape == input1_shape);
 159   assert(cond_shape == input1_shape);
 160   assert(input2_shape == input1_shape);
 161 }
 162 ```
 163
 164 4. Add to the Dumper to dump IR information of new operation.
 165 - [Dumper.cc](/runtime/onert/core/src/ir/dumper/Dumper.cc)
 166
 167 ```cpp
 168 void Dumper::visit(const Select &node)
 169 {
 170   VERBOSE(LIR) << "* Select" << std::endl;
 171   VERBOSE(LIR) << "  - Inputs : Cond(" << node.getInputs().at(Select::Input::COND).value()
 172                << ") Input1" << node.getInputs().at(Select::Input::INPUT1).value() << ") Input2"
 173                << node.getInputs().at(Select::Input::INPUT2).value() << ")" << std::endl;
 174   VERBOSE(LIR) << "  - Output : Output(" << node.getOutputs().at(Select::Output::OUTPUT).value()
 175                << ")" << std::endl;
 176 }
 177 ```
 178
 179 ## Frontend
 180
 181 This module generates IR from a model. There are two kinds of frontend: Loader and NNAPI. First, Loader loads a model file and generates IR from it. Second, NNAPI generates IR from a model set via [Neural Networks API of android](https://developer.android.com/ndk/guides/neuralnetworks)
 182
 183 ### Loaders
 184
 185 #### Base Loader
 186
 187 This is where the common parts of loaders are implemented.
 188
 189 1. Add to base_loader to load new operation and to generate IR from it
 190 - [base_loader](/runtime/onert/frontend/base_loader/include/base_loader.h)
 191
 192 ```cpp
 193     case BuiltinOperator::BuiltinOperator_SELECT:
 194       loadSelect(op);
 195       return;
 196 ```
 197
 198 ```cpp
 199 template <typename LoaderDomain, typename SpecificLoader>
 200 void BaseLoader<LoaderDomain, SpecificLoader>::loadSelect(const Operator *op)
 201 {
 202   ir::OperandIndexSequence inputs;
 203   ir::OperandIndexSequence outputs;
 204
 205   loadOperationIO(op, inputs, outputs);
 206
 207   std::unique_ptr<ir::Operation> new_op{new ir::operation::Select{inputs, outputs}};
 208   _graph.addOperation(std::move(new_op));
 209 }
 210 ```
 211
 212 #### TFLite Loader
 213
 214 This loads a tflite file.
 215 If you want new operation to be loaded on only TFLite Loader, you only need to implement loading the operation here.
 216
 217 #### Circle Loader
 218
 219 This loads a circle file generated by the compiler.
 220 If you want new operation to be loaded on only Circle Loader, you only need to implement loading the operation here.
 221
 222 ### NNAPI
 223
 224 1. Add to the OperationFactory to generate IR of new operation
 225 - [OperationFactory](/runtime/onert/frontend/nnapi/wrapper/OperationFactory.cc)
 226
 227 ```cpp
 228   _map[ANEURALNETWORKS_SELECT] = [](const OperationFactory::Param &init_param, Operands &) {
 229     assert(init_param.input_count == 3 && init_param.output_count == 1);
 230
 231     OperandIndexSequence outputs{init_param.outputs[0]};
 232
 233     // Each input should be interpreted as follows:
 234     //
 235     //  0 -> Cond Tensor Index
 236     //  1 -> Input1 Tensor Index
 237     //  2 -> Input2 Tensor Index
 238     OperandIndexSequence inputs;
 239     for (uint32_t n = 0; n < init_param.input_count; ++n)
 240     {
 241       inputs.append(OperandIndex{init_param.inputs[n]});
 242     }
 243
 244     return new operation::Select{inputs, outputs};
 245   };
 246 ```
 247
 248 2. If you want that NNAPI supports new operation of TFLite's model, you need to update the things related to the operation in [nnapi_delegate](/runtime/libs/tflite/port/1.13.1/src/nnapi_delegate.cpp) like below
 249
 250 ```cpp
 251       case tflite::BuiltinOperator_SELECT:
 252         nnapi_version = 12;  // require NNAPI 1.2
 253         nn_op_type = ANEURALNETWORKS_SELECT;
 254         break;
 255 ```
 256
 257 ## Backend
 258
 259 This module generates kernels and tensors of backend such as [ComputeLibrary](https://github.com/ARM-software/ComputeLibrary/) from generated graph-based IR. For this, the runtime fairly works on it internally. But this is not enough because of dependence on backend. So, there are several components that require additional implementation on each backend.
 260
 261 ### ShapeFixer
 262
 263 Even for tensors of the same operation, the shape required for each backend can be different. Therefore, this component modifies and fixes shape of tensors of the backend.
 264
 265 #### acl_cl
 266
 267 The kernel of the ACL for the Add operation needs to match the same rank to support the broadcast.
 268 - [ShapeFixer.h](/runtime/onert/backend/acl_cl/ShapeFixer.h)
 269
 270 ```cpp
 271 void visit(const ir::operation::Add &) override;
 272 ```
 273
 274 - [ShapeFixer.cc](/runtime/onert/backend/acl_cl/ShapeFixer.cc)
 275
 276 ```cpp
 277 void ShapeFixer::visit(const ir::operation::Add &node)
 278 {
 279   const auto lhs_index{node.getInputs().at(ir::operation::Add::Input::LHS)};
 280   const auto rhs_index{node.getInputs().at(ir::operation::Add::Input::RHS)};
 281
 282   if (!(_ctx.at(lhs_index).shape() == _ctx.at(rhs_index).shape()))
 283   {
 284     const auto broadcast_rank =
 285         std::max(_ctx.at(lhs_index).shape().rank(), _ctx.at(rhs_index).shape().rank());
 286     const_cast<ir::Shape &>(_ctx.at(lhs_index).shape()).extendRank(broadcast_rank);
 287     const_cast<ir::Shape &>(_ctx.at(rhs_index).shape()).extendRank(broadcast_rank);
 288   }
 289 }
 290 ```
 291
 292 #### acl_neon
 293
 294 Same implementation as acl_cl is required.
 295
 296 #### cpu
 297
 298 This backend doesn't usually require a change of shape.
 299 - [ShapeFixer.h](/runtime/onert/backend/cpu/ShapeFixer.h)
 300
 301 ```cpp
 302 void visit(const ir::operation::Select &) override;
 303 ```
 304
 305 - [ShapeFixer.cc](/runtime/onert/backend/cpu/ShapeFixer.cc)
 306
 307 ```cpp
 308 void ShapeFixer::visit(const ir::operation::Select &) { /* DO NOTHING */}
 309 ```
 310
 311 ### KernelGenerator
 312
 313 This component generates kernels of backend. You have to generate kernel of new operation. And then append it to execution builder. You can obtain information of the node from IR and necessary tensors from tensor builder.
 314
 315 #### acl_cl
 316
 317 - [KernelGenerator.h](/runtime/onert/backend/acl_cl/KernelGenerator.h)
 318
 319 ```cpp
 320 void visit(const ir::operation::Select &) override;
 321 ```
 322
 323 - [KernelGenerator.cc](/runtime/onert/backend/acl_cl/KernelGenerator.cc)
 324
 325 ```cpp
 326 void KernelGenerator::visit(const ir::operation::Select &node)
 327 {
 328   const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)};
 329   const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)};
 330   const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)};
 331   const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)};
 332
 333   auto output_alloc = _tensor_builder->at(output_index).get();
 334   auto cond_alloc = _tensor_builder->at(cond_index).get();
 335   auto input1_alloc = _tensor_builder->at(input1_index).get();
 336   auto input2_alloc = _tensor_builder->at(input2_index).get();
 337
 338   auto fn = std::make_unique<::arm_compute::CLSelect>();
 339
 340   fn->configure(cond_alloc->handle(), input1_alloc->handle(), input2_alloc->handle(),
 341                 output_alloc->handle());
 342
 343   auto acl_fn = asAclFunction(std::move(fn));
 344
 345   _execution_builder->append(std::move(acl_fn));
 346 }
 347 ```
 348
 349 #### acl_neon
 350
 351 Similar implementation as acl_cl is required.
 352
 353 #### cpu
 354
 355 - [KernelGenerator.h](/runtime/onert/backend/cpu/KernelGenerator.h)
 356
 357 ```cpp
 358 void visit(const ir::operation::Select &) override;
 359 ```
 360
 361 - [KernelGenerator.cc](/runtime/onert/backend/cpu/KernelGenerator.cc)
 362
 363 ```cpp
 364 void KernelGenerator::visit(const ir::operation::Select &node)
 365 {
 366   const auto output_index{node.getOutputs().at(ir::operation::Select::Output::OUTPUT)};
 367   const auto cond_index{node.getInputs().at(ir::operation::Select::Input::COND)};
 368   const auto input1_index{node.getInputs().at(ir::operation::Select::Input::INPUT1)};
 369   const auto input2_index{node.getInputs().at(ir::operation::Select::Input::INPUT2)};
 370
 371   const auto output_backend_descr = ::onert::backend::cpu::kernel::getTensorDescriptor(
 372       _ctx.at(output_index), _current_op_seq_layout);
 373   const auto cond_backend_descr = ::onert::backend::cpu::kernel::getTensorDescriptor(
 374       _ctx.at(cond_index), _current_op_seq_layout);
 375   const auto input1_backend_descr = ::onert::backend::cpu::kernel::getTensorDescriptor(
 376       _ctx.at(input1_index), _current_op_seq_layout);
 377   const auto input2_backend_descr = ::onert::backend::cpu::kernel::getTensorDescriptor(
 378       _ctx.at(input2_index), _current_op_seq_layout);
 379
 380   auto output_alloc = _tensor_builder->at(output_index).get();
 381   auto cond_alloc = _tensor_builder->at(cond_index).get();
 382   auto input1_alloc = _tensor_builder->at(input1_index).get();
 383   auto input2_alloc = _tensor_builder->at(input2_index).get();
 384
 385   auto fn = std::make_unique<::onert::backend::cpu::kernel::SelectLayer>();
 386
 387   fn->configure(cond_alloc->buffer(), cond_backend_descr, input1_alloc->buffer(),
 388                 input1_backend_descr, input2_alloc->buffer(), input2_backend_descr,
 389                 output_alloc->buffer(), output_backend_descr);
 390
 391   _execution_builder->append(std::move(fn));
 392 }
 393 ```
 394
 395 ### TensorRegister (in some cases)
 396
 397 This component registers tensors. Most tensors will be automatically registered internally. There
 398 are some exceptions, however, where additional implementations are required. It is the case when a
 399 tensor is treated unusually in its backend.
 400
 401 The kernel of some operation has weights in `HWIO` as layout(data format) in case of that input's
 402 layout is `NHWC`. And, for `NCHW`, weights is `OIHW`. But TFLite model has weigths, `OHWI` for
 403 `NHWC` and `OIHW` for `NCHW`. Therefore, to register the appropriate tensor on the backend, you have
 404 to implement it additionally.
 405
 406 ### ConstantInitializer (in some cases)
 407
 408 This component registers function initializing constant tensors and initialize constant tensor
 409 layer. This is similar to TensorRegister. Most tensors will be automatically registered internally.
 410 And there are some exceptions.
 411
 412 #### cpu
 413
 414 - [ConstantInitializer.h](/runtime/onert/backend/cpu/ConstantInitializer.h)
 415
 416 ```cpp
 417 void visit(const ir::operation::Conv2D &) override;
 418 ```
 419
 420 - [ConstantInitializer.cc](/runtime/onert/backend/cpu/ConstantInitializer.cc)
 421
 422 ```cpp
 423 void ConstantInitializer::visit(const ir::operation::Conv2D &node)
 424 {
 425   const auto &kernel_index = node.getInputs().at(ir::operation::Conv2D::KERNEL);
 426   const auto &kernel_obj = _operands.at(kernel_index);
 427   registerCopyInitializer(kernel_index, kernel_obj);
 428
 429   const auto &bias_index = node.getInputs().at(ir::operation::Conv2D::BIAS);
 430   const auto &bias_obj = _operands.at(bias_index);
 431   registerCopyInitializer(bias_index, bias_obj);
 432 }
 433 ```
 434
 435 ## Samples (to be updated)
 436
 437 - `Select` operation
 438   - Simple explanation : `Output[i] = Condition[i] ? input1[i] : input2[i]`
 439   - PR : https://github.com/Samsung/ONE/pull/XXX
 440