| | 3D direct deconvolution | x | x | |
| Inner Product | 2D inner product | x | x | x |
| | 3D inner product | x | x | |
-| RNN (experimental)| Vanilla RNN cell | x | x | |
-| | LSTM cell | x | x | |
-| | GRU cell | x | x | |
+| RNN | Vanilla RNN | x | x | |
+| | LSTM | x | x | x |
+| | GRU | x | x | |
| Pooling | 2D maximum pooling | x | x | x |
| | 2D average pooling | x | x | x |
| | 3D maximum pooling | x | x | |
| Normalization | 2D LRN (within channel) | x | x | |
| | 2D LRN (across channels) | x | x | |
| | 2D batch normalization | x | x | |
-| | 3D Batch Normalization | x | x | |
-| Activation | ReLU | x | x | x |
-| | Tanh | | x | |
-| | ELU | | x | |
-| | Bounded ReLU | | x | |
-| | Soft ReLU | | x | |
-| | Logistic regression | | x | |
+| | 3D batch normalization | x | x | |
+| Activation and | ReLU | x | x | x |
+| elementwise | Tanh | x | x | |
+| functions | ELU | x | x | |
+| | Square | x | x | |
+| | Sqrt | x | x | |
+| | Abs | x | x | |
+| | Linear | x | x | |
+| | Bounded ReLU | x | x | |
+| | Soft ReLU | x | x | |
+| | Logistic | x | x | |
| | Softmax | x | x | |
| Data manipulation | Reorder/quantization | x | x | x |
| | Sum | x | x | x |
| | Concat | x | x | x |
-| | Elementwise operations | | x | |
-| | Channel Shuffle | x | x | x |
+| | Shuffle | x | x | x |
## Programming Model
- C: simple_training.c
- C++: simple_training_net.cpp
-* Creation of forward propagation of GNMT topology (experimental support)
+* Creation of forward propagation of GNMT topology
- C++: simple_rnn.cpp
* Training RNN with sequences of variable length
format `any` for input or output.
The memory format chosen is based on different circumstances such as hardware and
convolutional parameters.
+* Convolution could be executed using the [Winograd algorithm](@ref winograd_convolution) for a significant performance boost.
* Operation primitives (such as ReLU, LRN, or pooling) following convolution or
inner product, should have input in the same memory format as the
convolution or inner-product. Reordering can be an expensive
might need workspace memory for storing results of intermediate operations
that help with backward propagation.
+
The following link provides a guide to MKLDNN verbose mode for profiling execution:
* [Performance profiling](@ref perf_profile)