1 SimpleNet Example {#ex_simplenet}
2 ================================
4 This C++ API example demonstrates how to build an AlexNet neural
5 network topology for forward-pass inference. Some key take-aways
8 * How tensors implemented and submitted to primitives.
9 * How primitives are created.
10 * How primitives are sequentially submitted to the network, where the output from
11 primitives is passed as input to the next primitive. The later specifies
12 dependency between primitive input <-> output data.
13 * Specific 'inference-only' configurations.
14 * Limit the number of reorders performed which are decremental to performance.
16 The simple_net.cpp example implements the AlexNet layers
17 as numbered primitives (e.g. conv1, pool1, conv2).
19 ## Highlights for implementing the simple_net.cpp Example:
21 1. Initialize a CPU engine. The last parameter in the engine() call represents the index of the
24 using namespace mkldnn;
25 auto cpu_engine = engine(engine::cpu, 0);
28 2. Create a primitives vector that represents the net.
30 std::vector<primitive> net;
33 3. Additionally, create a separate vector holding the weights. This will allow
34 executing transformations only once and outside the topology stream.
36 std::vector<primitive> net_weights;
39 4. Allocate a vector for input data and create the tensor to configure the dimensions.
41 memory::dims conv1_src_tz = { batch, 3, 227, 227 };
42 std::vector<float> user_src(batch * 3 * 227 * 227);
43 /* similarly, specify tensor structure for output, weights and bias */
46 5. Create a memory primitive for data in user format as `nchw`
47 (minibatch-channels-height-width). Create a memory descriptor
48 for the convolution input, selecting `any` for the data format.
49 The `any` format allows the convolution primitive to choose the data format
50 that is most suitable for its input parameters (convolution kernel
51 sizes, strides, padding, and so on). If the resulting format is different
52 from `nchw`, the user data must be transformed to the format required for
53 the convolution (as explained below).
55 auto user_src_memory = memory({ { { conv1_src_tz }, memory::data_type::f32,
56 memory::format::nchw }, cpu_engine}, user_src.data());
57 auto conv1_src_md = memory::desc({conv1_src_tz},
58 memory::data_type::f32, memory::format::any);
59 /* similarly create conv_weights_md and conv_dst_md in format::any */
62 6. Create a convolution descriptor by specifying the algorithm([convolution algorithms](@ref winograd_convolution), propagation
63 kind, shapes of input, weights, bias, output, convolution strides,
64 padding, and kind of padding. Propagation kind is set to *forward_inference*
65 -optimized for inference execution and omits computations that are only necessary
66 for backward propagation. */
68 auto conv1_desc = convolution_forward::desc(
69 prop_kind::forward_inference, algorithm::convolution_direct,
70 conv1_src_md, conv1_weights_md, conv1_bias_md, conv1_dst_md,
71 conv1_strides, conv1_padding, padding_kind::zero);
74 7. Create a descriptor of the convolution primitive. Once created, this
75 descriptor has specific formats instead of the `any` format specified
76 in the convolution descriptor.
78 auto conv1_prim_desc = convolution_forward::primitive_desc(conv1_desc, cpu_engine);
81 8. Create a convolution memory primitive from the user memory and check whether the user
82 data format differs from the format that the convolution requires. In
83 case it is different, create a reorder primitive that transforms the user data
84 to the convolution format and add it to the net. Repeat this process for weights as well.
86 auto conv1_src_memory = user_src_memory;
88 /* Check whether a reorder is necessary */
89 if (memory::primitive_desc(conv1_prim_desc.src_primitive_desc())
90 != user_src_memory.get_primitive_desc()) {
91 /* Yes, a reorder is necessary */
93 /* The convolution primitive descriptor contains the descriptor of a memory
94 * primitive it requires as input. Because a pointer to the allocated
95 * memory is not specified, Intel MKL-DNN allocates the memory. */
96 conv1_src_memory = memory(conv1_prim_desc.src_primitive_desc());
98 /* create a reorder between user and convolution data and put the reorder
99 * into the net. The conv1_src_memory will be the input for the convolution */
100 net.push_back(reorder(user_src_memory, conv1_src_memory));
104 9. Create a memory primitive for output.
106 auto conv1_dst_memory = memory(conv1_prim_desc.dst_primitive_desc());
109 10. Create a convolution primitive and add it to the net.
111 /* Note that the conv_reorder_src primitive
112 * is an input dependency for the convolution primitive, which means that the
113 * convolution primitive will not be executed before the data is ready. */
114 net.push_bash(convolution_forward(conv1_prim_desc, conv1_src_memory, conv1_weights_memory,
115 user_bias_memory, conv1_dst_memory));
118 11. Create relu primitive. For better performance keep ReLU
119 (as well as for other operation primitives until another convolution or
120 inner product is encountered) input data format in the same format as was chosen by
121 convolution. Furthermore, ReLU is done in-place by using conv1 memory.
123 auto relu1_desc = eltwise_forward::desc(prop_kind::forward_inference,
124 algorithm::eltwise_relu, conv1_dst_memory.get_primitive_desc().desc(), negative1_slope);
125 auto relu1_prim_desc = eltwise_forward::primitive_desc(relu1_desc, cpu_engine);
126 net.push_back(eltwise_forward(relu1_prim_desc, conv1_dst_memory, conv1_dst_memory));
129 12. For training execution, pooling requires a private workspace memory to perform
130 the backward pass. However, pooling should not use 'workspace' for inference
131 as this is decremental to performance.
133 /* create pooling indices memory from pooling primitive descriptor */
134 // auto pool1_indices_memory = memory(pool1_pd.workspace_primitive_desc());
135 auto pool1_dst_memory = memory(pool1_pd.dst_primitive_desc());
137 /* create pooling primitive an add it to net */
138 net.push_back(pooling_forward(pool1_pd, lrn1_dst_memory, pool1_dst_memory
139 /* pool1_indices_memory */));
141 The example continues to create more layers according to
142 the AlexNet topology.
144 14. Finally, create a stream to execute weights data transformation. This is only
145 required once. Create another stream that will exeute the 'net' primitives. For
146 this example, the net is executed multiple times and each execution es timed
149 /* Weight transformation - executed once */
150 stream(stream::kind::eager).submit(net_weights).wait();
152 /* Execute the topology */
153 mkldnn::stream(mkldnn::stream::kind::eager).submit(net).wait();
157 [Legal information](@ref legal_information)