inference-engine/thirdparty/mkl-dnn/doc/mainpage.md

   1 A Performance Library for Deep Learning
   2 ================
   3
   4 The Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN) is an
   5 open source performance library for Deep Learning (DL) applications intended
   6 for acceleration of DL frameworks on Intel(R) architecture. Intel MKL-DNN
   7 includes highly vectorized and threaded building blocks for implementation of
   8 convolutional neural networks (CNNs) and reccurent neural networks (RNNs) with
   9 C and C++ interfaces. This project is created to help the DL community innovate
  10 on the Intel(R) processor family.
  11
  12 The library provides optimized implementations for the most common
  13 computational functions (also called primitives) used in deep neural
  14 networks covering a wide range of applications, including image recognition,
  15 object detection, semantic segmentation, neural machine translation,
  16 and speech recognition.
  17 The table below summarizes the list of supported functions and their variants.
  18
  19 | Primitive class   | Primitive                | fp32 training | fp32 inference | int8 inference |
  20 | :---------------- | :----------------------- | :-----------: | :------------: | :------------: |
  21 | Convolution       | 1D direct convolution    | x             | x              |                |
  22 |                   | 2D direct convolution    | x             | x              | x              |
  23 |                   | 2D direct deconvolution  | x             | x              | x              |
  24 |                   | 2D winograd convolution  | x             | x              | x              |
  25 |                   | 3D direct convolution    | x             | x              |                |
  26 |                   | 3D direct deconvolution  | x             | x              |                |
  27 | Inner Product     | 2D inner product         | x             | x              | x              |
  28 |                   | 3D inner product         | x             | x              |                |
  29 | RNN               | Vanilla RNN              | x             | x              |                |
  30 |                   | LSTM                     | x             | x              | x              |
  31 |                   | GRU                      | x             | x              |                |
  32 | Pooling           | 2D maximum pooling       | x             | x              | x              |
  33 |                   | 2D average pooling       | x             | x              | x              |
  34 |                   | 3D maximum pooling       | x             | x              |                |
  35 |                   | 3D average pooling       | x             | x              |                |
  36 | Normalization     | 2D LRN (within channel)  | x             | x              |                |
  37 |                   | 2D LRN (across channels) | x             | x              |                |
  38 |                   | 2D batch normalization   | x             | x              |                |
  39 |                   | 3D batch normalization   | x             | x              |                |
  40 | Activation and    | ReLU                     | x             | x              | x              |
  41 | elementwise       | Tanh                     | x             | x              |                |
  42 | functions         | ELU                      | x             | x              |                |
  43 |                   | Square                   | x             | x              |                |
  44 |                   | Sqrt                     | x             | x              |                |
  45 |                   | Abs                      | x             | x              |                |
  46 |                   | Linear                   | x             | x              |                |
  47 |                   | Bounded ReLU             | x             | x              |                |
  48 |                   | Soft ReLU                | x             | x              |                |
  49 |                   | Logistic                 | x             | x              |                |
  50 |                   | Softmax                  | x             | x              |                |
  51 | Data manipulation | Reorder/quantization     | x             | x              | x              |
  52 |                   | Sum                      | x             | x              | x              |
  53 |                   | Concat                   | x             | x              | x              |
  54 |                   | Shuffle                  | x             | x              | x              |
  55
  56 ## Programming Model
  57
  58 Intel MKL-DNN models memory as a primitive similar to an operation
  59 primitive.  This allows reconstruction of the graph of computations
  60 at run time.
  61
  62 ### Basic Terminology
  63
  64 Intel MKL-DNN operates on the following main objects:
  65
  66 * **Primitive** - any operation, such as convolution, data format reordering, and even
  67   memory. Primitives can have other primitives as inputs, but can have only
  68   memory primitives as outputs.
  69
  70 * **Engine** - an execution device. Currently the only supported engine is CPU.
  71   Every primitive is mapped to a specific engine.
  72
  73 * **Stream** - an execution context. You submit primitives to a stream and
  74   wait for their completion. Primitives submitted to the same stream can have
  75   different engines. The stream also tracks dependencies between the primitives.
  76
  77 A typical workflow is to create a set of primitives to run,
  78 push them to a stream all at once or one at a time, and wait for completion.
  79
  80 ### Creating Primitives
  81
  82 In Intel MKL-DNN, creating primitives involves three levels of
  83 abstraction:
  84
  85 * **Operation/memory descriptor** - a high-level description with logical
  86   parameters of an operation or memory. It is a lightweight structure that does
  87   not allocate any physical memory or computation resources.
  88
  89 * **Primitive descriptor** - a complete description of a primitive that contains
  90   an operation descriptor, descriptors of primitive inputs and outputs, and the
  91   target engine. This permits future API extensions to enable querying the descriptor
  92   for estimated performance, memory consumptions, and so on. A primitive
  93   descriptor is also a lightweight structure.
  94
  95 * **Primitive** - a specific instance of a primitive created using the
  96   corresponding primitive descriptor. A primitive structure contains pointers to
  97   input primitives and output memory. Creating a primitive is a potentially
  98   expensive operation because when a primitive is created, Intel MKL-DNN
  99   allocates the necessary resources to execute the primitive.
 100
 101 To create a memory primitive:
 102
 103 1. Create a memory descriptor. The memory descriptor contains the dimensions, precision, and
 104    format of the data layout in memory. The data layout can be either user-specified
 105    or set to `any`. The `any` format allows the operation primitives
 106    (convolution and inner product) to choose the best memory format for optimal
 107    performance.
 108 2. Create a memory primitive descriptor. The memory primitive descriptor contains the memory
 109    descriptor and the
 110    target engine.
 111 3. Create a memory primitive. The memory primitive requires allocating a memory buffer and
 112    attaching the data handle to the memory primitive descriptor.
 113    **Note:** in the C++ API for creating an output memory primitive, you
 114    do not need to allocate buffer unless the output is needed in a
 115    user-defined format.
 116
 117 To create an operation primitive:
 118
 119 1. Create a logical description of the operation. For example, the description
 120    of a convolution operation contains parameters such as sizes, strides, and
 121    propagation type. It also contains the input and output memory descriptors.
 122 2. Create a primitive descriptor by attaching the target engine to the logical
 123    description.
 124 3. Create an instance of the primitive and specify the input and output
 125    primitives.
 126
 127 ## Examples
 128
 129 A walk-through example for implementing an AlexNet topology using the c++ API:
 130
 131 * [SimpleNet Example](@ref ex_simplenet)
 132
 133 An introductory example to low-precision 8-bit computations:
 134
 135 * [Int8 SimpleNet Example](@ref ex_int8_simplenet)
 136
 137 The following examples are available in the /examples directory and provide more details about the API.
 138 * Creation of forward primitives
 139     - C: simple_net.c
 140     - C++: simple_net.cpp
 141
 142 * Creation of full training net (forward and backward primitives)
 143     - C: simple_training.c
 144     - C++: simple_training_net.cpp
 145
 146 * Creation of forward propagation of GNMT topology
 147     - C++: simple_rnn.cpp
 148
 149 * Training RNN with sequences of variable length
 150     - C++: simple_rnn_training.cpp
 151
 152 ### Performance Considerations
 153
 154 *  Convolution and inner product primitives choose the memory format when you create them with the unspecified memory
 155    format `any` for input or output.
 156    The memory format chosen is based on different circumstances such as hardware and
 157    convolutional parameters.
 158 *  Convolution could be executed using the [Winograd algorithm](@ref winograd_convolution) for a significant performance boost.
 159 *  Operation primitives (such as ReLU, LRN, or pooling) following convolution or
 160    inner product, should have input in the same memory format as the
 161    convolution or inner-product. Reordering can be an expensive
 162    operation, so you should avoid it unless it is necessary for performance in
 163    convolution, inner product, or output specifications.
 164 *  Pooling, concat and sum can be created with the output memory format `any`.
 165 *  An operation primitive (typically operations such as pooling, LRN, or softmax)
 166    might need workspace memory for storing results of intermediate operations
 167    that help with backward propagation.
 168
 169
 170 The following link provides a guide to MKLDNN verbose mode for profiling execution:
 171
 172 * [Performance profiling](@ref perf_profile)
 173
 174 ### Operational Details
 175
 176 *  You might need to create a reorder primitive to convert the data from a user
 177    format to the format preferred by convolution or inner product.
 178 *  All operations should be queried for workspace memory requirements.
 179    If workspace is needed, it should only be created during the forward
 180    propagation and then shared with the corresponding primitive on
 181    backward propagation.
 182 *  A primitive descriptor from forward propagation must be provided while
 183    creating corresponding primitive descriptor for backward propagation. This
 184    tells the backward operation what exact implementation is chosen for
 185    the primitive on forward propagation. This in turn helps the backward operation
 186    to decode the workspace memory correctly.
 187 *  You should always check the correspondence between current data format and
 188    the format that is required by a primitive. For instance, forward convolution
 189    and backward convolution with respect to source might choose different memory
 190    formats for weights (if created with `any`). In this case, you should create
 191    a reorder primitive for weights.
 192    Similarly, a reorder primitive might be required for a source data between
 193    forward convolution and backward convolution with respect to weights.
 194
 195    **Note:** Please refer to extended examples to illustrate these details.
 196
 197 ### Auxiliary Types
 198
 199 * **Primitive_at** - a structure that contains a primitive and an index. This
 200   structure specifies which output of the primitive to use as an input for
 201   another primitive. For a memory primitive the index is always `0`
 202   because it does not have a output.
 203
 204
 205 ## Architecture and design of Intel MKL-DNN
 206
 207 For better understanding the architecture and design of Intel MKL-DNN
 208 as well as the concepts used in the library please read the following
 209 topics:
 210
 211 [Understanding Memory Formats](@ref understanding_memory_formats)
 212
 213
 214 --------
 215
 216 [Legal information](@ref legal_information)