docs/tutorial/layers.md

   1 ---
   2 title: Layer Catalogue
   3 ---
   4 # Layers
   5
   6 To create a Caffe model you need to define the model architecture in a protocol buffer definition file (prototxt).
   7
   8 Caffe layers and their parameters are defined in the protocol buffer definitions for the project in [caffe.proto](https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto). The latest definitions are in the [dev caffe.proto](https://github.com/BVLC/caffe/blob/dev/src/caffe/proto/caffe.proto).
   9
  10 TODO complete list of layers linking to headings
  11
  12 ### Vision Layers
  13
  14 * Header: `./include/caffe/vision_layers.hpp`
  15
  16 Vision layers usually take *images* as input and produce other *images* as output.
  17 A typical "image" in the real-world may have one color channel ($$c = 1$$), as in a grayscale image, or three color channels ($$c = 3$$) as in an RGB (red, green, blue) image.
  18 But in this context, the distinguishing characteristic of an image is its spatial structure: usually an image has some non-trivial height $$h > 1$$ and width $$w > 1$$.
  19 This 2D geometry naturally lends itself to certain decisions about how to process the input.
  20 In particular, most of the vision layers work by applying a particular operation to some region of the input to produce a corresponding region of the output.
  21 In contrast, other layers (with few exceptions) ignore the spatial structure of the input, effectively treating it as "one big vector" with dimension $$chw$$.
  22
  23
  24 #### Convolution
  25
  26 * Layer type: `Convolution`
  27 * CPU implementation: `./src/caffe/layers/convolution_layer.cpp`
  28 * CUDA GPU implementation: `./src/caffe/layers/convolution_layer.cu`
  29 * Parameters (`ConvolutionParameter convolution_param`)
  30     - Required
  31         - `num_output` (`c_o`): the number of filters
  32         - `kernel_size` (or `kernel_h` and `kernel_w`): specifies height and width of each filter
  33     - Strongly Recommended
  34         - `weight_filler` [default `type: 'constant' value: 0`]
  35     - Optional
  36         - `bias_term` [default `true`]: specifies whether to learn and apply a set of additive biases to the filter outputs
  37         - `pad` (or `pad_h` and `pad_w`) [default 0]: specifies the number of pixels to (implicitly) add to each side of the input
  38         - `stride` (or `stride_h` and `stride_w`) [default 1]: specifies the intervals at which to apply the filters to the input
  39         - `group` (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a subset of the input. Specifically, the input and output channels are separated into g groups, and the $$i$$th output group channels will be only connected to the $$i$$th input group channels.
  40 * Input
  41     - `n * c_i * h_i * w_i`
  42 * Output
  43     - `n * c_o * h_o * w_o`, where `h_o = (h_i + 2 * pad_h - kernel_h) / stride_h + 1` and `w_o` likewise.
  44 * Sample (as seen in `./examples/imagenet/imagenet_train_val.prototxt`)
  45
  46       layer {
  47         name: "conv1"
  48         type: "Convolution"
  49         bottom: "data"
  50         top: "conv1"
  51         # learning rate and decay multipliers for the filters
  52         param { lr_mult: 1 decay_mult: 1 }
  53         # learning rate and decay multipliers for the biases
  54         param { lr_mult: 2 decay_mult: 0 }
  55         convolution_param {
  56           num_output: 96     # learn 96 filters
  57           kernel_size: 11    # each filter is 11x11
  58           stride: 4          # step 4 pixels between each filter application
  59           weight_filler {
  60             type: "gaussian" # initialize the filters from a Gaussian
  61             std: 0.01        # distribution with stdev 0.01 (default mean: 0)
  62           }
  63           bias_filler {
  64             type: "constant" # initialize the biases to zero (0)
  65             value: 0
  66           }
  67         }
  68       }
  69
  70 The `Convolution` layer convolves the input image with a set of learnable filters, each producing one feature map in the output image.
  71
  72 #### Pooling
  73
  74 * Layer type: `Pooling`
  75 * CPU implementation: `./src/caffe/layers/pooling_layer.cpp`
  76 * CUDA GPU implementation: `./src/caffe/layers/pooling_layer.cu`
  77 * Parameters (`PoolingParameter pooling_param`)
  78     - Required
  79         - `kernel_size` (or `kernel_h` and `kernel_w`): specifies height and width of each filter
  80     - Optional
  81         - `pool` [default MAX]: the pooling method. Currently MAX, AVE, or STOCHASTIC
  82         - `pad` (or `pad_h` and `pad_w`) [default 0]: specifies the number of pixels to (implicitly) add to each side of the input
  83         - `stride` (or `stride_h` and `stride_w`) [default 1]: specifies the intervals at which to apply the filters to the input
  84 * Input
  85     - `n * c * h_i * w_i`
  86 * Output
  87     - `n * c * h_o * w_o`, where h_o and w_o are computed in the same way as convolution.
  88 * Sample (as seen in `./examples/imagenet/imagenet_train_val.prototxt`)
  89
  90       layer {
  91         name: "pool1"
  92         type: "Pooling"
  93         bottom: "conv1"
  94         top: "pool1"
  95         pooling_param {
  96           pool: MAX
  97           kernel_size: 3 # pool over a 3x3 region
  98           stride: 2      # step two pixels (in the bottom blob) between pooling regions
  99         }
 100       }
 101
 102 #### Local Response Normalization (LRN)
 103
 104 * Layer type: `LRN`
 105 * CPU Implementation: `./src/caffe/layers/lrn_layer.cpp`
 106 * CUDA GPU Implementation: `./src/caffe/layers/lrn_layer.cu`
 107 * Parameters (`LRNParameter lrn_param`)
 108     - Optional
 109         - `local_size` [default 5]: the number of channels to sum over (for cross channel LRN) or the side length of the square region to sum over (for within channel LRN)
 110         - `alpha` [default 1]: the scaling parameter (see below)
 111         - `beta` [default 5]: the exponent (see below)
 112         - `norm_region` [default `ACROSS_CHANNELS`]: whether to sum over adjacent channels (`ACROSS_CHANNELS`) or nearby spatial locaitons (`WITHIN_CHANNEL`)
 113
 114 The local response normalization layer performs a kind of "lateral inhibition" by normalizing over local input regions. In `ACROSS_CHANNELS` mode, the local regions extend across nearby channels, but have no spatial extent (i.e., they have shape `local_size x 1 x 1`). In `WITHIN_CHANNEL` mode, the local regions extend spatially, but are in separate channels (i.e., they have shape `1 x local_size x local_size`). Each input value is divided by $$(1 + (\alpha/n) \sum_i x_i^2)^\beta$$, where $$n$$ is the size of each local region, and the sum is taken over the region centered at that value (zero padding is added where necessary).
 115
 116 #### im2col
 117
 118 `Im2col` is a helper for doing the image-to-column transformation that you most likely do not need to know about. This is used in Caffe's original convolution to do matrix multiplication by laying out all patches into a matrix.
 119
 120 ### Loss Layers
 121
 122 Loss drives learning by comparing an output to a target and assigning cost to minimize. The loss itself is computed by the forward pass and the gradient w.r.t. to the loss is computed by the backward pass.
 123
 124 #### Softmax
 125
 126 * Layer type: `SoftmaxWithLoss`
 127
 128 The softmax loss layer computes the multinomial logistic loss of the softmax of its inputs. It's conceptually identical to a softmax layer followed by a multinomial logistic loss layer, but provides a more numerically stable gradient.
 129
 130 #### Sum-of-Squares / Euclidean
 131
 132 * Layer type: `EuclideanLoss`
 133
 134 The Euclidean loss layer computes the sum of squares of differences of its two inputs, $$\frac 1 {2N} \sum_{i=1}^N \| x^1_i - x^2_i \|_2^2$$.
 135
 136 #### Hinge / Margin
 137
 138 * Layer type: `HingeLoss`
 139 * CPU implementation: `./src/caffe/layers/hinge_loss_layer.cpp`
 140 * CUDA GPU implementation: none yet
 141 * Parameters (`HingeLossParameter hinge_loss_param`)
 142     - Optional
 143         - `norm` [default L1]: the norm used. Currently L1, L2
 144 * Inputs
 145     - `n * c * h * w` Predictions
 146     - `n * 1 * 1 * 1` Labels
 147 * Output
 148     - `1 * 1 * 1 * 1` Computed Loss
 149 * Samples
 150
 151       # L1 Norm
 152       layer {
 153         name: "loss"
 154         type: "HingeLoss"
 155         bottom: "pred"
 156         bottom: "label"
 157       }
 158
 159       # L2 Norm
 160       layer {
 161         name: "loss"
 162         type: "HingeLoss"
 163         bottom: "pred"
 164         bottom: "label"
 165         top: "loss"
 166         hinge_loss_param {
 167           norm: L2
 168         }
 169       }
 170
 171 The hinge loss layer computes a one-vs-all hinge or squared hinge loss.
 172
 173 #### Sigmoid Cross-Entropy
 174
 175 `SigmoidCrossEntropyLoss`
 176
 177 #### Infogain
 178
 179 `InfogainLoss`
 180
 181 #### Accuracy and Top-k
 182
 183 `Accuracy` scores the output as the accuracy of output with respect to target -- it is not actually a loss and has no backward step.
 184
 185 ### Activation / Neuron Layers
 186
 187 In general, activation / Neuron layers are element-wise operators, taking one bottom blob and producing one top blob of the same size. In the layers below, we will ignore the input and out sizes as they are identical:
 188
 189 * Input
 190     - n * c * h * w
 191 * Output
 192     - n * c * h * w
 193
 194 #### ReLU / Rectified-Linear and Leaky-ReLU
 195
 196 * Layer type: `ReLU`
 197 * CPU implementation: `./src/caffe/layers/relu_layer.cpp`
 198 * CUDA GPU implementation: `./src/caffe/layers/relu_layer.cu`
 199 * Parameters (`ReLUParameter relu_param`)
 200     - Optional
 201         - `negative_slope` [default 0]: specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0.
 202 * Sample (as seen in `./examples/imagenet/imagenet_train_val.prototxt`)
 203
 204       layer {
 205         name: "relu1"
 206         type: "ReLU"
 207         bottom: "conv1"
 208         top: "conv1"
 209       }
 210
 211 Given an input value x, The `ReLU` layer computes the output as x if x > 0 and negative_slope * x if x <= 0. When the negative slope parameter is not set, it is equivalent to the standard ReLU function of taking max(x, 0). It also supports in-place computation, meaning that the bottom and the top blob could be the same to preserve memory consumption.
 212
 213 #### Sigmoid
 214
 215 * Layer type: `Sigmoid`
 216 * CPU implementation: `./src/caffe/layers/sigmoid_layer.cpp`
 217 * CUDA GPU implementation: `./src/caffe/layers/sigmoid_layer.cu`
 218 * Sample (as seen in `./examples/imagenet/mnist_autoencoder.prototxt`)
 219
 220       layer {
 221         name: "encode1neuron"
 222         bottom: "encode1"
 223         top: "encode1neuron"
 224         type: "Sigmoid"
 225       }
 226
 227 The `Sigmoid` layer computes the output as sigmoid(x) for each input element x.
 228
 229 #### TanH / Hyperbolic Tangent
 230
 231 * Layer type: `TanH`
 232 * CPU implementation: `./src/caffe/layers/tanh_layer.cpp`
 233 * CUDA GPU implementation: `./src/caffe/layers/tanh_layer.cu`
 234 * Sample
 235
 236       layer {
 237         name: "layer"
 238         bottom: "in"
 239         top: "out"
 240         type: "TanH"
 241       }
 242
 243 The `TanH` layer computes the output as tanh(x) for each input element x.
 244
 245 #### Absolute Value
 246
 247 * Layer type: `AbsVal`
 248 * CPU implementation: `./src/caffe/layers/absval_layer.cpp`
 249 * CUDA GPU implementation: `./src/caffe/layers/absval_layer.cu`
 250 * Sample
 251
 252       layer {
 253         name: "layer"
 254         bottom: "in"
 255         top: "out"
 256         type: "AbsVal"
 257       }
 258
 259 The `AbsVal` layer computes the output as abs(x) for each input element x.
 260
 261 #### Power
 262
 263 * Layer type: `Power`
 264 * CPU implementation: `./src/caffe/layers/power_layer.cpp`
 265 * CUDA GPU implementation: `./src/caffe/layers/power_layer.cu`
 266 * Parameters (`PowerParameter power_param`)
 267     - Optional
 268         - `power` [default 1]
 269         - `scale` [default 1]
 270         - `shift` [default 0]
 271 * Sample
 272
 273       layer {
 274         name: "layer"
 275         bottom: "in"
 276         top: "out"
 277         type: "Power"
 278         power_param {
 279           power: 1
 280           scale: 1
 281           shift: 0
 282         }
 283       }
 284
 285 The `Power` layer computes the output as (shift + scale * x) ^ power for each input element x.
 286
 287 #### BNLL
 288
 289 * Layer type: `BNLL`
 290 * CPU implementation: `./src/caffe/layers/bnll_layer.cpp`
 291 * CUDA GPU implementation: `./src/caffe/layers/bnll_layer.cu`
 292 * Sample
 293
 294       layer {
 295         name: "layer"
 296         bottom: "in"
 297         top: "out"
 298         type: BNLL
 299       }
 300
 301 The `BNLL` (binomial normal log likelihood) layer computes the output as log(1 + exp(x)) for each input element x.
 302
 303
 304 ### Data Layers
 305
 306 Data enters Caffe through data layers: they lie at the bottom of nets. Data can come from efficient databases (LevelDB or LMDB), directly from memory, or, when efficiency is not critical, from files on disk in HDF5 or common image formats.
 307
 308 Common input preprocessing (mean subtraction, scaling, random cropping, and mirroring) is available by specifying `TransformationParameter`s.
 309
 310 #### Database
 311
 312 * Layer type: `Data`
 313 * Parameters
 314     - Required
 315         - `source`: the name of the directory containing the database
 316         - `batch_size`: the number of inputs to process at one time
 317     - Optional
 318         - `rand_skip`: skip up to this number of inputs at the beginning; useful for asynchronous sgd
 319         - `backend` [default `LEVELDB`]: choose whether to use a `LEVELDB` or `LMDB`
 320
 321
 322
 323 #### In-Memory
 324
 325 * Layer type: `MemoryData`
 326 * Parameters
 327     - Required
 328         - `batch_size`, `channels`, `height`, `width`: specify the size of input chunks to read from memory
 329
 330 The memory data layer reads data directly from memory, without copying it. In order to use it, one must call `MemoryDataLayer::Reset` (from C++) or `Net.set_input_arrays` (from Python) in order to specify a source of contiguous data (as 4D row major array), which is read one batch-sized chunk at a time.
 331
 332 #### HDF5 Input
 333
 334 * Layer type: `HDF5Data`
 335 * Parameters
 336     - Required
 337         - `source`: the name of the file to read from
 338         - `batch_size`
 339
 340 #### HDF5 Output
 341
 342 * Layer type: `HDF5Output`
 343 * Parameters
 344     - Required
 345         - `file_name`: name of file to write to
 346
 347 The HDF5 output layer performs the opposite function of the other layers in this section: it writes its input blobs to disk.
 348
 349 #### Images
 350
 351 * Layer type: `ImageData`
 352 * Parameters
 353     - Required
 354         - `source`: name of a text file, with each line giving an image filename and label
 355         - `batch_size`: number of images to batch together
 356     - Optional
 357         - `rand_skip`
 358         - `shuffle` [default false]
 359         - `new_height`, `new_width`: if provided, resize all images to this size
 360
 361 #### Windows
 362
 363 `WindowData`
 364
 365 #### Dummy
 366
 367 `DummyData` is for development and debugging. See `DummyDataParameter`.
 368
 369 ### Common Layers
 370
 371 #### Inner Product
 372
 373 * Layer type: `InnerProduct`
 374 * CPU implementation: `./src/caffe/layers/inner_product_layer.cpp`
 375 * CUDA GPU implementation: `./src/caffe/layers/inner_product_layer.cu`
 376 * Parameters (`InnerProductParameter inner_product_param`)
 377     - Required
 378         - `num_output` (`c_o`): the number of filters
 379     - Strongly recommended
 380         - `weight_filler` [default `type: 'constant' value: 0`]
 381     - Optional
 382         - `bias_filler` [default `type: 'constant' value: 0`]
 383         - `bias_term` [default `true`]: specifies whether to learn and apply a set of additive biases to the filter outputs
 384 * Input
 385     - `n * c_i * h_i * w_i`
 386 * Output
 387     - `n * c_o * 1 * 1`
 388 * Sample
 389
 390       layer {
 391         name: "fc8"
 392         type: "InnerProduct"
 393         # learning rate and decay multipliers for the weights
 394         param { lr_mult: 1 decay_mult: 1 }
 395         # learning rate and decay multipliers for the biases
 396         param { lr_mult: 2 decay_mult: 0 }
 397         inner_product_param {
 398           num_output: 1000
 399           weight_filler {
 400             type: "gaussian"
 401             std: 0.01
 402           }
 403           bias_filler {
 404             type: "constant"
 405             value: 0
 406           }
 407         }
 408         bottom: "fc7"
 409         top: "fc8"
 410       }
 411
 412 The `InnerProduct` layer (also usually referred to as the fully connected layer) treats the input as a simple vector and produces an output in the form of a single vector (with the blob's height and width set to 1).
 413
 414 #### Splitting
 415
 416 The `Split` layer is a utility layer that splits an input blob to multiple output blobs. This is used when a blob is fed into multiple output layers.
 417
 418 #### Flattening
 419
 420 The `Flatten` layer is a utility layer that flattens an input of shape `n * c * h * w` to a simple vector output of shape `n * (c*h*w)`
 421
 422 #### Concatenation
 423
 424 * Layer type: `Concat`
 425 * CPU implementation: `./src/caffe/layers/concat_layer.cpp`
 426 * CUDA GPU implementation: `./src/caffe/layers/concat_layer.cu`
 427 * Parameters (`ConcatParameter concat_param`)
 428     - Optional
 429         - `axis` [default 1]: 0 for concatenation along num and 1 for channels.
 430 * Input
 431     - `n_i * c_i * h * w` for each input blob i from 1 to K.
 432 * Output
 433     - if `axis = 0`: `(n_1 + n_2 + ... + n_K) * c_1 * h * w`, and all input `c_i` should be the same.
 434     - if `axis = 1`: `n_1 * (c_1 + c_2 + ... + c_K) * h * w`, and all input `n_i` should be the same.
 435 * Sample
 436
 437       layer {
 438         name: "concat"
 439         bottom: "in1"
 440         bottom: "in2"
 441         top: "out"
 442         type: "Concat"
 443         concat_param {
 444           axis: 1
 445         }
 446       }
 447
 448 The `Concat` layer is a utility layer that concatenates its multiple input blobs to one single output blob.
 449
 450 #### Slicing
 451
 452 The `Slice` layer is a utility layer that slices an input layer to multiple output layers along a given dimension (currently num or channel only) with given slice indices.
 453
 454 * Sample
 455
 456       layer {
 457         name: "slicer_label"
 458         type: "Slice"
 459         bottom: "label"
 460         ## Example of label with a shape N x 3 x 1 x 1
 461         top: "label1"
 462         top: "label2"
 463         top: "label3"
 464         slice_param {
 465           axis: 1
 466           slice_point: 1
 467           slice_point: 2
 468         }
 469       }
 470
 471 `axis` indicates the target axis; `slice_point` indicates indexes in the selected dimension (the number of indices must be equal to the number of top blobs minus one).
 472
 473
 474 #### Elementwise Operations
 475
 476 `Eltwise`
 477
 478 #### Argmax
 479
 480 `ArgMax`
 481
 482 #### Softmax
 483
 484 `Softmax`
 485
 486 #### Mean-Variance Normalization
 487
 488 `MVN`