docs/tutorial/layers.md

   1 ---
   2 title: Layer Catalogue
   3 ---
   4
   5 # Layers
   6
   7 To create a Caffe model you need to define the model architecture in a protocol buffer definition file (prototxt).
   8
   9 Caffe layers and their parameters are defined in the protocol buffer definitions for the project in [caffe.proto](https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto).
  10
  11 ## Data Layers
  12
  13 Data enters Caffe through data layers: they lie at the bottom of nets. Data can come from efficient databases (LevelDB or LMDB), directly from memory, or, when efficiency is not critical, from files on disk in HDF5 or common image formats.
  14
  15 Common input preprocessing (mean subtraction, scaling, random cropping, and mirroring) is available by specifying `TransformationParameter`s by some of the layers.
  16 The [bias](layers/bias.html), [scale](layers/scale.html), and [crop](layers/crop.html) layers can be helpful with transforming the inputs, when `TransformationParameter` isn't available.
  17
  18 Layers:
  19
  20 * [Image Data](layers/imagedata.html) - read raw images.
  21 * [Database](layers/data.html) - read data from LEVELDB or LMDB.
  22 * [HDF5 Input](layers/hdf5data.html) - read HDF5 data, allows data of arbitrary dimensions.
  23 * [HDF5 Output](layers/hdf5output.html) - write data as HDF5.
  24 * [Input](layers/input.html) - typically used for networks that are being deployed.
  25 * [Window Data](layers/windowdata.html) - read window data file.
  26 * [Memory Data](layers/memorydata.html) - read data directly from memory.
  27 * [Dummy Data](layers/dummydata.html) - for static data and debugging.
  28
  29 Note that the [Python](layers/python.html) Layer can be useful for create custom data layers.
  30
  31 ## Vision Layers
  32
  33 Vision layers usually take *images* as input and produce other *images* as output, although they can take data of other types and dimensions.
  34 A typical "image" in the real-world may have one color channel ($$c = 1$$), as in a grayscale image, or three color channels ($$c = 3$$) as in an RGB (red, green, blue) image.
  35 But in this context, the distinguishing characteristic of an image is its spatial structure: usually an image has some non-trivial height $$h > 1$$ and width $$w > 1$$.
  36 This 2D geometry naturally lends itself to certain decisions about how to process the input.
  37 In particular, most of the vision layers work by applying a particular operation to some region of the input to produce a corresponding region of the output.
  38 In contrast, other layers (with few exceptions) ignore the spatial structure of the input, effectively treating it as "one big vector" with dimension $$chw$$.
  39
  40 Layers:
  41
  42 * [Convolution Layer](layers/convolution.html) - convolves the input image with a set of learnable filters, each producing one feature map in the output image.
  43 * [Pooling Layer](layers/pooling.html) - max, average, or stochastic pooling.
  44 * [Spatial Pyramid Pooling (SPP)](layers/spp.html)
  45 * [Crop](layers/crop.html) - perform cropping transformation.
  46 * [Deconvolution Layer](layers/deconvolution.html) - transposed convolution.
  47
  48 * [Im2Col](layers/im2col.html) - relic helper layer that is not used much anymore.
  49
  50 ## Recurrent Layers
  51
  52 Layers:
  53
  54 * [Recurrent](layers/recurrent.html)
  55 * [RNN](layers/rnn.html)
  56 * [Long-Short Term Memory (LSTM)](layers/lstm.html)
  57
  58 ## Common Layers
  59
  60 Layers:
  61
  62 * [Inner Product](layers/innerproduct.html) - fully connected layer.
  63 * [Dropout](layers/dropout.html)
  64 * [Embed](layers/embed.html) - for learning embeddings of one-hot encoded vector (takes index as input).
  65
  66 ## Normalization Layers
  67
  68 * [Local Response Normalization (LRN)](layers/lrn.html) - performs a kind of "lateral inhibition" by normalizing over local input regions.
  69 * [Mean Variance Normalization (MVN)](layers/mvn.html) - performs contrast normalization / instance normalization.
  70 * [Batch Normalization](layers/batchnorm.html) - performs normalization over mini-batches.
  71
  72 The [bias](layers/bias.html) and [scale](layers/scale.html) layers can be helpful in combination with normalization.
  73
  74 ## Activation / Neuron Layers
  75
  76 In general, activation / Neuron layers are element-wise operators, taking one bottom blob and producing one top blob of the same size. In the layers below, we will ignore the input and out sizes as they are identical:
  77
  78 * Input
  79     - n * c * h * w
  80 * Output
  81     - n * c * h * w
  82
  83 Layers:
  84
  85 * [ReLU / Rectified-Linear and Leaky-ReLU](layers/relu.html) - ReLU and Leaky-ReLU rectification.
  86 * [PReLU](layers/prelu.html) - parametric ReLU.
  87 * [ELU](layers/elu.html) - exponential linear rectification.
  88 * [Sigmoid](layers/sigmoid.html)
  89 * [TanH](layers/tanh.html)
  90 * [Absolute Value](layers/abs.html)
  91 * [Power](layers/power.html) - f(x) = (shift + scale * x) ^ power.
  92 * [Exp](layers/exp.html) - f(x) = base ^ (shift + scale * x).
  93 * [Log](layers/log.html) - f(x) = log(x).
  94 * [BNLL](layers/bnll.html) - f(x) = log(1 + exp(x)).
  95 * [Threshold](layers/threshold.html) - performs step function at user defined threshold.
  96 * [Bias](layers/bias.html) - adds a bias to a blob that can either be learned or fixed.
  97 * [Scale](layers/scale.html) - scales a blob by an amount that can either be learned or fixed.
  98
  99 ## Utility Layers
 100
 101 Layers:
 102
 103 * [Flatten](layers/flatten.html)
 104 * [Reshape](layers/reshape.html)
 105 * [Batch Reindex](layers/batchreindex.html)
 106
 107 * [Split](layers/split.html)
 108 * [Concat](layers/concat.html)
 109 * [Slicing](layers/slice.html)
 110 * [Eltwise](layers/eltwise.html) - element-wise operations such as product or sum between two blobs.
 111 * [Filter / Mask](layers/filter.html) - mask or select output using last blob.
 112 * [Parameter](layers/parameter.html) - enable parameters to be shared between layers.
 113 * [Reduction](layers/reduction.html) - reduce input blob to scalar blob using operations such as sum or mean.
 114 * [Silence](layers/silence.html) - prevent top-level blobs from being printed during training.
 115
 116 * [ArgMax](layers/argmax.html)
 117 * [Softmax](layers/softmax.html)
 118
 119 * [Python](layers/python.html) - allows custom Python layers.
 120
 121 ## Loss Layers
 122
 123 Loss drives learning by comparing an output to a target and assigning cost to minimize. The loss itself is computed by the forward pass and the gradient w.r.t. to the loss is computed by the backward pass.
 124
 125 Layers:
 126
 127 * [Multinomial Logistic Loss](layers/multinomiallogisticloss.html)
 128 * [Infogain Loss](layers/infogainloss.html) - a generalization of MultinomialLogisticLossLayer.
 129 * [Softmax with Loss](layers/softmaxwithloss.html) - computes the multinomial logistic loss of the softmax of its inputs. It's conceptually identical to a softmax layer followed by a multinomial logistic loss layer, but provides a more numerically stable gradient.
 130 * [Sum-of-Squares / Euclidean](layers/euclideanloss.html) - computes the sum of squares of differences of its two inputs, $$\frac 1 {2N} \sum_{i=1}^N \| x^1_i - x^2_i \|_2^2$$.
 131 * [Hinge / Margin](layers/hingeloss.html) - The hinge loss layer computes a one-vs-all hinge (L1) or squared hinge loss (L2).
 132 * [Sigmoid Cross-Entropy Loss](layers/sigmoidcrossentropyloss.html) - computes the cross-entropy (logistic) loss, often used for predicting targets interpreted as probabilities.
 133 * [Accuracy / Top-k layer](layers/accuracy.html) - scores the output as an accuracy with respect to target -- it is not actually a loss and has no backward step.
 134 * [Contrastive Loss](layers/contrastiveloss.html)
 135