doc/tutorials/ml/introduction_to_svm/introduction_to_svm.markdown

   1 Introduction to Support Vector Machines {#tutorial_introduction_to_svm}
   2 =======================================
   3
   4 @next_tutorial{tutorial_non_linear_svms}
   5
   6 Goal
   7 ----
   8
   9 In this tutorial you will learn how to:
  10
  11 -   Use the OpenCV functions @ref cv::ml::SVM::train to build a classifier based on SVMs and @ref
  12     cv::ml::SVM::predict to test its performance.
  13
  14 What is a SVM?
  15 --------------
  16
  17 A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating
  18 hyperplane. In other words, given labeled training data (*supervised learning*), the algorithm
  19 outputs an optimal hyperplane which categorizes new examples.
  20
  21 In which sense is the hyperplane obtained optimal? Let's consider the following simple problem:
  22
  23 For a linearly separable set of 2D-points which belong to one of two classes, find a separating
  24 straight line.
  25
  26 ![](images/separating-lines.png)
  27
  28 @note In this example we deal with lines and points in the Cartesian plane instead of hyperplanes
  29 and vectors in a high dimensional space. This is a simplification of the problem.It is important to
  30 understand that this is done only because our intuition is better built from examples that are easy
  31 to imagine. However, the same concepts apply to tasks where the examples to classify lie in a space
  32 whose dimension is higher than two.
  33
  34 In the above picture you can see that there exists multiple lines that offer a solution to the
  35 problem. Is any of them better than the others? We can intuitively define a criterion to estimate
  36 the worth of the lines: <em> A line is bad if it passes too close to the points because it will be
  37 noise sensitive and it will not generalize correctly. </em> Therefore, our goal should be to find
  38 the line passing as far as possible from all points.
  39
  40 Then, the operation of the SVM algorithm is based on finding the hyperplane that gives the largest
  41 minimum distance to the training examples. Twice, this distance receives the important name of
  42 **margin** within SVM's theory. Therefore, the optimal separating hyperplane *maximizes* the margin
  43 of the training data.
  44
  45 ![](images/optimal-hyperplane.png)
  46
  47 How is the optimal hyperplane computed?
  48 ---------------------------------------
  49
  50 Let's introduce the notation used to define formally a hyperplane:
  51
  52 \f[f(x) = \beta_{0} + \beta^{T} x,\f]
  53
  54 where \f$\beta\f$ is known as the *weight vector* and \f$\beta_{0}\f$ as the *bias*.
  55
  56 @note A more in depth description of this and hyperplanes you can find in the section 4.5 (*Separating
  57 Hyperplanes*) of the book: *Elements of Statistical Learning* by T. Hastie, R. Tibshirani and J. H.
  58 Friedman (@cite HTF01).
  59
  60 The optimal hyperplane can be represented in an infinite number of different ways by
  61 scaling of \f$\beta\f$ and \f$\beta_{0}\f$. As a matter of convention, among all the possible
  62 representations of the hyperplane, the one chosen is
  63
  64 \f[|\beta_{0} + \beta^{T} x| = 1\f]
  65
  66 where \f$x\f$ symbolizes the training examples closest to the hyperplane. In general, the training
  67 examples that are closest to the hyperplane are called **support vectors**. This representation is
  68 known as the **canonical hyperplane**.
  69
  70 Now, we use the result of geometry that gives the distance between a point \f$x\f$ and a hyperplane
  71 \f$(\beta, \beta_{0})\f$:
  72
  73 \f[\mathrm{distance} = \frac{|\beta_{0} + \beta^{T} x|}{||\beta||}.\f]
  74
  75 In particular, for the canonical hyperplane, the numerator is equal to one and the distance to the
  76 support vectors is
  77
  78 \f[\mathrm{distance}_{\text{ support vectors}} = \frac{|\beta_{0} + \beta^{T} x|}{||\beta||} = \frac{1}{||\beta||}.\f]
  79
  80 Recall that the margin introduced in the previous section, here denoted as \f$M\f$, is twice the
  81 distance to the closest examples:
  82
  83 \f[M = \frac{2}{||\beta||}\f]
  84
  85 Finally, the problem of maximizing \f$M\f$ is equivalent to the problem of minimizing a function
  86 \f$L(\beta)\f$ subject to some constraints. The constraints model the requirement for the hyperplane to
  87 classify correctly all the training examples \f$x_{i}\f$. Formally,
  88
  89 \f[\min_{\beta, \beta_{0}} L(\beta) = \frac{1}{2}||\beta||^{2} \text{ subject to } y_{i}(\beta^{T} x_{i} + \beta_{0}) \geq 1 \text{ } \forall i,\f]
  90
  91 where \f$y_{i}\f$ represents each of the labels of the training examples.
  92
  93 This is a problem of Lagrangian optimization that can be solved using Lagrange multipliers to obtain
  94 the weight vector \f$\beta\f$ and the bias \f$\beta_{0}\f$ of the optimal hyperplane.
  95
  96 Source Code
  97 -----------
  98
  99 @add_toggle_cpp
 100 -   **Downloadable code**: Click
 101     [here](https://github.com/opencv/opencv/tree/3.4/samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp)
 102
 103 -   **Code at glance:**
 104     @include samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp
 105 @end_toggle
 106
 107 @add_toggle_java
 108 -   **Downloadable code**: Click
 109     [here](https://github.com/opencv/opencv/tree/3.4/samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java)
 110
 111 -   **Code at glance:**
 112     @include samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java
 113 @end_toggle
 114
 115 @add_toggle_python
 116 -   **Downloadable code**: Click
 117     [here](https://github.com/opencv/opencv/tree/3.4/samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py)
 118
 119 -   **Code at glance:**
 120     @include samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py
 121 @end_toggle
 122
 123 Explanation
 124 -----------
 125
 126 -   **Set up the training data**
 127
 128 The training data of this exercise is formed by a set of labeled 2D-points that belong to one of
 129 two different classes; one of the classes consists of one point and the other of three points.
 130
 131 @add_toggle_cpp
 132 @snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp setup1
 133 @end_toggle
 134
 135 @add_toggle_java
 136 @snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java setup1
 137 @end_toggle
 138
 139 @add_toggle_python
 140 @snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py setup1
 141 @end_toggle
 142
 143 The function @ref cv::ml::SVM::train that will be used afterwards requires the training data to be
 144 stored as @ref cv::Mat objects of floats. Therefore, we create these objects from the arrays
 145 defined above:
 146
 147 @add_toggle_cpp
 148 @snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp setup2
 149 @end_toggle
 150
 151 @add_toggle_java
 152 @snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java setup2
 153 @end_toggle
 154
 155 @add_toggle_python
 156 @snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py setup1
 157 @end_toggle
 158
 159 -   **Set up SVM's parameters**
 160
 161     In this tutorial we have introduced the theory of SVMs in the most simple case, when the
 162     training examples are spread into two classes that are linearly separable. However, SVMs can be
 163     used in a wide variety of problems (e.g. problems with non-linearly separable data, a SVM using
 164     a kernel function to raise the dimensionality of the examples, etc). As a consequence of this,
 165     we have to define some parameters before training the SVM. These parameters are stored in an
 166     object of the class @ref cv::ml::SVM.
 167
 168 @add_toggle_cpp
 169 @snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp init
 170 @end_toggle
 171
 172 @add_toggle_java
 173 @snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java init
 174 @end_toggle
 175
 176 @add_toggle_python
 177 @snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py init
 178 @end_toggle
 179
 180 Here:
 181 -   *Type of SVM*. We choose here the type @ref cv::ml::SVM::C_SVC "C_SVC" that can be used for
 182     n-class classification (n \f$\geq\f$ 2). The important feature of this type is that it deals
 183     with imperfect separation of classes (i.e. when the training data is non-linearly separable).
 184     This feature is not important here since the data is linearly separable and we chose this SVM
 185     type only for being the most commonly used.
 186
 187 -   *Type of SVM kernel*. We have not talked about kernel functions since they are not
 188     interesting for the training data we are dealing with. Nevertheless, let's explain briefly now
 189     the main idea behind a kernel function. It is a mapping done to the training data to improve
 190     its resemblance to a linearly separable set of data. This mapping consists of increasing the
 191     dimensionality of the data and is done efficiently using a kernel function. We choose here the
 192     type @ref cv::ml::SVM::LINEAR "LINEAR" which means that no mapping is done. This parameter is
 193     defined using cv::ml::SVM::setKernel.
 194
 195 -   *Termination criteria of the algorithm*. The SVM training procedure is implemented solving a
 196     constrained quadratic optimization problem in an **iterative** fashion. Here we specify a
 197     maximum number of iterations and a tolerance error so we allow the algorithm to finish in
 198     less number of steps even if the optimal hyperplane has not been computed yet. This
 199     parameter is defined in a structure @ref cv::TermCriteria .
 200
 201 -   **Train the SVM**
 202     We call the method @ref cv::ml::SVM::train to build the SVM model.
 203
 204 @add_toggle_cpp
 205 @snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp train
 206 @end_toggle
 207
 208 @add_toggle_java
 209 @snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java train
 210 @end_toggle
 211
 212 @add_toggle_python
 213 @snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py train
 214 @end_toggle
 215
 216 -   **Regions classified by the SVM**
 217
 218     The method @ref cv::ml::SVM::predict is used to classify an input sample using a trained SVM. In
 219     this example we have used this method in order to color the space depending on the prediction done
 220     by the SVM. In other words, an image is traversed interpreting its pixels as points of the
 221     Cartesian plane. Each of the points is colored depending on the class predicted by the SVM; in
 222     green if it is the class with label 1 and in blue if it is the class with label -1.
 223
 224 @add_toggle_cpp
 225 @snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp show
 226 @end_toggle
 227
 228 @add_toggle_java
 229 @snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java show
 230 @end_toggle
 231
 232 @add_toggle_python
 233 @snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py show
 234 @end_toggle
 235
 236 -   **Support vectors**
 237
 238     We use here a couple of methods to obtain information about the support vectors.
 239     The method @ref cv::ml::SVM::getSupportVectors obtain all of the support
 240     vectors. We have used this methods here to find the training examples that are
 241     support vectors and highlight them.
 242
 243 @add_toggle_cpp
 244 @snippet samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp show_vectors
 245 @end_toggle
 246
 247 @add_toggle_java
 248 @snippet samples/java/tutorial_code/ml/introduction_to_svm/IntroductionToSVMDemo.java show_vectors
 249 @end_toggle
 250
 251 @add_toggle_python
 252 @snippet samples/python/tutorial_code/ml/introduction_to_svm/introduction_to_svm.py show_vectors
 253 @end_toggle
 254
 255 Results
 256 -------
 257
 258 -   The code opens an image and shows the training examples of both classes. The points of one class
 259     are represented with white circles and black ones are used for the other class.
 260 -   The SVM is trained and used to classify all the pixels of the image. This results in a division
 261     of the image in a blue region and a green region. The boundary between both regions is the
 262     optimal separating hyperplane.
 263 -   Finally the support vectors are shown using gray rings around the training examples.
 264
 265 ![](images/svm_intro_result.png)