inference-engine/thirdparty/mkl-dnn/doc/ex_simplenet.md

   1 SimpleNet Example {#ex_simplenet}
   2 ================================
   3
   4 This C++ API example demonstrates how to build an AlexNet neural
   5 network topology for forward-pass inference. Some key take-aways
   6 include:
   7
   8 * How tensors implemented and submitted to primitives.
   9 * How primitives are created.
  10 * How primitives are sequentially submitted to the network, where the output from
  11   primitives is passed as input to the next primitive. The later specifies
  12   dependency between primitive input <-> output data.
  13 * Specific 'inference-only' configurations.
  14 * Limit the number of reorders performed which are decremental to performance.
  15
  16 The simple_net.cpp example implements the AlexNet layers
  17 as numbered primitives (e.g. conv1, pool1, conv2).
  18
  19 ## Highlights for implementing the simple_net.cpp Example:
  20
  21 1. Initialize a CPU engine. The last parameter in the engine() call represents the index of the
  22    engine.
  23 ~~~cpp
  24 using namespace mkldnn;
  25 auto cpu_engine = engine(engine::cpu, 0);
  26 ~~~
  27
  28 2. Create a primitives vector that represents the net.
  29 ~~~cpp
  30 std::vector<primitive> net;
  31 ~~~
  32
  33 3. Additionally, create a separate vector holding the weights. This will allow
  34 executing transformations only once and outside the topology stream.
  35 ~~~cpp
  36 std::vector<primitive> net_weights;
  37 ~~~
  38
  39 4. Allocate a vector for input data and create the tensor to configure the dimensions.
  40 ~~~cpp
  41 memory::dims conv1_src_tz = { batch, 3, 227, 227 };
  42 std::vector<float> user_src(batch * 3 * 227 * 227);
  43 /* similarly, specify tensor structure for output, weights and bias */
  44 ~~~
  45
  46 5. Create a memory primitive for data in user format as `nchw`
  47    (minibatch-channels-height-width). Create a memory descriptor
  48    for the convolution input, selecting `any` for the data format.
  49    The `any` format allows the convolution primitive to choose the data format
  50    that is most suitable for its input parameters (convolution kernel
  51    sizes, strides, padding, and so on). If the resulting format is different
  52    from `nchw`, the user data must be transformed to the format required for
  53    the convolution (as explained below).
  54 ~~~cpp
  55 auto user_src_memory = memory({ { { conv1_src_tz }, memory::data_type::f32,
  56     memory::format::nchw }, cpu_engine}, user_src.data());
  57 auto conv1_src_md = memory::desc({conv1_src_tz},
  58     memory::data_type::f32, memory::format::any);
  59 /* similarly create conv_weights_md and conv_dst_md in format::any */
  60 ~~~
  61
  62 6. Create a convolution descriptor by specifying the algorithm([convolution algorithms](@ref winograd_convolution), propagation
  63    kind, shapes of input, weights, bias, output, convolution strides,
  64    padding, and kind of padding. Propagation kind is set to *forward_inference*
  65    -optimized for inference execution and omits computations that are only necessary
  66    for backward propagation. */
  67 ~~~cpp
  68 auto conv1_desc = convolution_forward::desc(
  69     prop_kind::forward_inference, algorithm::convolution_direct,
  70     conv1_src_md, conv1_weights_md, conv1_bias_md, conv1_dst_md,
  71     conv1_strides, conv1_padding, padding_kind::zero);
  72 ~~~
  73
  74 7. Create a descriptor of the convolution primitive. Once created, this
  75    descriptor has specific formats instead of the `any` format specified
  76    in the convolution descriptor.
  77 ~~~cpp
  78 auto conv1_prim_desc = convolution_forward::primitive_desc(conv1_desc, cpu_engine);
  79 ~~~
  80
  81 8. Create a convolution memory primitive from the user memory and check whether the user
  82    data format differs from the format that the convolution requires. In
  83    case it is different, create a reorder primitive that transforms the user data
  84    to the convolution format and add it to the net. Repeat this process for weights as well.
  85 ~~~cpp
  86 auto conv1_src_memory = user_src_memory;
  87
  88 /* Check whether a reorder is necessary  */
  89 if (memory::primitive_desc(conv1_prim_desc.src_primitive_desc())
  90         != user_src_memory.get_primitive_desc()) {
  91     /* Yes, a reorder is necessary */
  92
  93     /* The convolution primitive descriptor contains the descriptor of a memory
  94      * primitive it requires as input. Because a pointer to the allocated
  95      * memory is not specified, Intel MKL-DNN allocates the memory. */
  96     conv1_src_memory = memory(conv1_prim_desc.src_primitive_desc());
  97
  98     /* create a reorder between user and convolution data and put the reorder
  99      * into the net. The conv1_src_memory will be the input for the convolution */
 100     net.push_back(reorder(user_src_memory, conv1_src_memory));
 101 }
 102 ~~~
 103
 104 9. Create a memory primitive for output.
 105 ~~~cpp
 106 auto conv1_dst_memory = memory(conv1_prim_desc.dst_primitive_desc());
 107 ~~~
 108
 109 10. Create a convolution primitive and add it to the net.
 110 ~~~cpp
 111 /* Note that the conv_reorder_src primitive
 112  * is an input dependency for the convolution primitive, which means that the
 113  * convolution primitive will not be executed before the data is ready. */
 114 net.push_bash(convolution_forward(conv1_prim_desc, conv1_src_memory, conv1_weights_memory,
 115                               user_bias_memory, conv1_dst_memory));
 116 ~~~
 117
 118 11. Create relu primitive. For better performance keep ReLU
 119    (as well as for other operation primitives until another convolution or
 120     inner product is encountered) input data format in the same format as was chosen by
 121    convolution. Furthermore, ReLU is done in-place by using conv1 memory.
 122 ~~~cpp
 123 auto relu1_desc = eltwise_forward::desc(prop_kind::forward_inference,
 124     algorithm::eltwise_relu, conv1_dst_memory.get_primitive_desc().desc(), negative1_slope);
 125 auto relu1_prim_desc = eltwise_forward::primitive_desc(relu1_desc, cpu_engine);
 126 net.push_back(eltwise_forward(relu1_prim_desc, conv1_dst_memory, conv1_dst_memory));
 127 ~~~
 128
 129 12. For training execution, pooling requires a private workspace memory to perform
 130 the backward pass. However, pooling should not use 'workspace' for inference
 131 as this is decremental to performance.
 132 ~~~cpp
 133 /* create pooling indices memory from pooling primitive descriptor */
 134 // auto pool1_indices_memory = memory(pool1_pd.workspace_primitive_desc());
 135 auto pool1_dst_memory = memory(pool1_pd.dst_primitive_desc());
 136
 137 /* create pooling primitive an add it to net */
 138 net.push_back(pooling_forward(pool1_pd, lrn1_dst_memory, pool1_dst_memory
 139     /* pool1_indices_memory */));
 140 ~~~
 141   The example continues to create more layers according to
 142   the AlexNet topology.
 143
 144 14. Finally, create a stream to execute weights data transformation. This is only
 145 required once. Create another stream that will exeute the 'net' primitives. For
 146 this example, the net is executed multiple times and each execution es timed
 147 individually.
 148 ~~~cpp
 149 /* Weight transformation - executed once */
 150 stream(stream::kind::eager).submit(net_weights).wait();
 151
 152 /* Execute the topology */
 153 mkldnn::stream(mkldnn::stream::kind::eager).submit(net).wait();
 154 ~~~
 155 ---
 156
 157 [Legal information](@ref legal_information)