doc/tutorials/ml/introduction_to_svm/introduction_to_svm.rst

   1 .. _introductiontosvms:
   2
   3 Introduction to Support Vector Machines
   4 ***************************************
   5
   6 Goal
   7 ====
   8
   9 In this tutorial you will learn how to:
  10
  11 .. container:: enumeratevisibleitemswithsquare
  12
  13    + Use the OpenCV functions :svms:`CvSVM::train <cvsvm-train>` to build a classifier based on SVMs and :svms:`CvSVM::predict <cvsvm-predict>` to test its performance.
  14
  15 What is a SVM?
  16 ==============
  17
  18 A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (*supervised learning*), the algorithm outputs an optimal hyperplane which categorizes new examples.
  19
  20 In which sense is the hyperplane obtained optimal? Let's consider the following
  21 simple problem:
  22
  23   For a linearly separable set of 2D-points which belong to one of two classes, find a separating straight line.
  24
  25 .. image:: images/separating-lines.png
  26    :alt: A seperation example
  27    :align: center
  28
  29 .. note:: In this example we deal with lines and points in the Cartesian plane instead of hyperplanes and vectors in a high dimensional space. This is a simplification of the problem.It is important to understand that this is done only because our intuition is better built from examples that are easy to imagine. However, the same concepts apply to tasks where the examples to classify lie in a space whose dimension is higher than two.
  30
  31 In the above picture you can see that there exists multiple lines that offer a solution to the problem. Is any of them better than the others? We can intuitively define a criterion to estimate the worth of the lines:
  32
  33   A line is bad if it passes too close to the points because it will be noise sensitive and it will not generalize correctly. Therefore, our goal should be to find the line passing as far as possible from all points.
  34
  35 Then, the operation of the SVM algorithm is based on finding the hyperplane that gives the largest minimum distance to the training examples. Twice, this distance receives the important name of **margin** within SVM's theory. Therefore, the optimal separating hyperplane *maximizes* the margin of the training data.
  36
  37 .. image:: images/optimal-hyperplane.png
  38    :alt: The Optimal hyperplane
  39    :align: center
  40
  41 How is the optimal hyperplane computed?
  42 =======================================
  43
  44 Let's introduce the notation used to define formally a hyperplane:
  45
  46 .. math::
  47   f(x) = \beta_{0} + \beta^{T} x,
  48
  49 where :math:`\beta` is known as the *weight vector* and :math:`\beta_{0}` as the *bias*.
  50
  51 .. seealso:: A more in depth description of this and hyperplanes you can find in the section 4.5 (*Seperating Hyperplanes*) of the book: *Elements of Statistical Learning* by T. Hastie, R. Tibshirani and J. H. Friedman.
  52
  53 The optimal hyperplane can be represented in an infinite number of different ways by scaling of :math:`\beta` and :math:`\beta_{0}`. As a matter of convention, among all the possible representations of the hyperplane, the one chosen is
  54
  55 .. math::
  56   |\beta_{0} + \beta^{T} x| = 1
  57
  58 where :math:`x` symbolizes the training examples closest to the hyperplane. In general, the training examples that are closest to the hyperplane are called **support vectors**. This representation is known as the **canonical hyperplane**.
  59
  60 Now, we use the result of geometry that gives the distance between a point :math:`x` and a hyperplane :math:`(\beta, \beta_{0})`:
  61
  62 .. math::
  63    \mathrm{distance} = \frac{|\beta_{0} + \beta^{T} x|}{||\beta||}.
  64
  65 In particular, for the canonical hyperplane, the numerator is equal to one and the distance to the support vectors is
  66
  67 .. math::
  68   \mathrm{distance}_{\text{ support vectors}} = \frac{|\beta_{0} + \beta^{T} x|}{||\beta||} = \frac{1}{||\beta||}.
  69
  70 Recall that the margin introduced in the previous section, here denoted as :math:`M`, is twice the distance to the closest examples:
  71
  72 .. math::
  73   M = \frac{2}{||\beta||}
  74
  75 Finally, the problem of maximizing :math:`M` is equivalent to the problem of minimizing a function :math:`L(\beta)` subject to some constraints. The constraints model the requirement for the hyperplane to classify correctly all the training examples :math:`x_{i}`. Formally,
  76
  77 .. math::
  78   \min_{\beta, \beta_{0}} L(\beta) = \frac{1}{2}||\beta||^{2} \text{ subject to } y_{i}(\beta^{T} x_{i} + \beta_{0}) \geq 1 \text{ } \forall i,
  79
  80 where :math:`y_{i}` represents each of the labels of the training examples.
  81
  82 This is a problem of Lagrangian optimization that can be solved using Lagrange multipliers to obtain the weight vector :math:`\beta` and the bias :math:`\beta_{0}` of the optimal hyperplane.
  83
  84 Source Code
  85 ===========
  86
  87 .. literalinclude:: ../../../../samples/cpp/tutorial_code/ml/introduction_to_svm/introduction_to_svm.cpp
  88    :language: cpp
  89    :linenos:
  90    :tab-width: 4
  91
  92 Explanation
  93 ===========
  94
  95 1. **Set up the training data**
  96
  97   The training data of this exercise is formed by a set of labeled 2D-points that belong to one of two different classes; one of the classes consists of one point and the other of three points.
  98
  99   .. code-block:: cpp
 100
 101      float labels[4] = {1.0, -1.0, -1.0, -1.0};
 102      float trainingData[4][2] = {{501, 10}, {255, 10}, {501, 255}, {10, 501}};
 103
 104   The function :svms:`CvSVM::train <cvsvm-train>` that will be used afterwards requires the training data to be stored as :basicstructures:`Mat <mat>` objects of floats. Therefore, we create these objects from the arrays defined above:
 105
 106   .. code-block:: cpp
 107
 108      Mat trainingDataMat(3, 2, CV_32FC1, trainingData);
 109      Mat labelsMat      (3, 1, CV_32FC1, labels);
 110
 111 2. **Set up SVM's parameters**
 112
 113    In this tutorial we have introduced the theory of SVMs in the most simple case, when the training examples are spread into two classes that are linearly separable. However, SVMs can be used in a wide variety of problems (e.g. problems with non-linearly separable data, a SVM using a kernel function to raise the dimensionality of the examples, etc). As a consequence of this, we have to define some parameters before training the SVM. These parameters are stored in an object of the class :svms:`CvSVMParams <cvsvmparams>` .
 114
 115    .. code-block:: cpp
 116
 117       CvSVMParams params;
 118       params.svm_type    = CvSVM::C_SVC;
 119       params.kernel_type = CvSVM::LINEAR;
 120       params.term_crit   = cvTermCriteria(CV_TERMCRIT_ITER, 100, 1e-6);
 121
 122    * *Type of SVM*. We choose here the type **CvSVM::C_SVC** that can be used for n-class classification (n :math:`\geq` 2). This parameter is defined in the attribute *CvSVMParams.svm_type*.
 123
 124      .. note:: The important feature of the type of SVM **CvSVM::C_SVC** deals with imperfect separation of classes (i.e. when the training data is non-linearly separable). This feature is not important here since the data is linearly separable and we chose this SVM type only for being the most commonly used.
 125
 126    * *Type of SVM kernel*. We have not talked about kernel functions since they are not interesting for the training data we are dealing with. Nevertheless, let's explain briefly now the main idea behind a kernel function. It is a mapping done to the training data to improve its resemblance to a linearly separable set of data. This mapping consists of increasing the dimensionality of the data and is done efficiently using a kernel function. We choose here the type **CvSVM::LINEAR** which means that no mapping is done. This parameter is defined in the attribute *CvSVMParams.kernel_type*.
 127
 128    * *Termination criteria of the algorithm*. The SVM training procedure is implemented solving a constrained quadratic optimization problem in an **iterative** fashion. Here we specify a maximum number of iterations and a tolerance error so we allow the algorithm to finish in less number of steps even if the optimal hyperplane has not been computed yet. This parameter is defined in a structure :oldbasicstructures:`cvTermCriteria <cvtermcriteria>`.
 129
 130 3. **Train the SVM**
 131
 132    We call the method `CvSVM::train <http://docs.opencv.org/modules/ml/doc/support_vector_machines.html#cvsvm-train>`_ to build the SVM model.
 133
 134    .. code-block:: cpp
 135
 136       CvSVM SVM;
 137       SVM.train(trainingDataMat, labelsMat, Mat(), Mat(), params);
 138
 139 4. **Regions classified by the SVM**
 140
 141   The method :svms:`CvSVM::predict <cvsvm-predict>` is used to classify an input sample using a trained SVM. In this example we have used this method in order to color the space depending on the prediction done by the SVM. In other words, an image is traversed interpreting its pixels as points of the Cartesian plane. Each of the points is colored depending on the class predicted by the SVM; in green if it is the class with label 1 and in blue if it is the class with label -1.
 142
 143   .. code-block:: cpp
 144
 145      Vec3b green(0,255,0), blue (255,0,0);
 146
 147      for (int i = 0; i < image.rows; ++i)
 148          for (int j = 0; j < image.cols; ++j)
 149          {
 150          Mat sampleMat = (Mat_<float>(1,2) << i,j);
 151          float response = SVM.predict(sampleMat);
 152
 153          if (response == 1)
 154             image.at<Vec3b>(j, i)  = green;
 155          else
 156          if (response == -1)
 157             image.at<Vec3b>(j, i)  = blue;
 158          }
 159
 160 5. **Support vectors**
 161
 162    We use here a couple of methods to obtain information about the support vectors. The method :svms:`CvSVM::get_support_vector_count <cvsvm-get-support-vector>` outputs the total number of support vectors used in the problem and with the method :svms:`CvSVM::get_support_vector <cvsvm-get-support-vector>` we obtain each of the support vectors using an index. We have used this methods here to find the training examples that are support vectors and highlight them.
 163
 164    .. code-block:: cpp
 165
 166       int c     = SVM.get_support_vector_count();
 167
 168       for (int i = 0; i < c; ++i)
 169       {
 170       const float* v = SVM.get_support_vector(i); // get and then highlight with grayscale
 171       circle(   image,  Point( (int) v[0], (int) v[1]),   6,  Scalar(128, 128, 128), thickness, lineType);
 172       }
 173
 174 Results
 175 =======
 176
 177 .. container:: enumeratevisibleitemswithsquare
 178
 179    * The code opens an image and shows the training examples of both classes. The points of one class are represented with white circles and black ones are used for the other class.
 180
 181    * The SVM is trained and used to classify all the pixels of the image. This results in a division of the image in a blue region and a green region. The boundary between both regions is the optimal separating hyperplane.
 182
 183    * Finally the support vectors are shown using gray rings around the training examples.
 184
 185 .. image:: images/result.png
 186   :alt: The seperated planes
 187   :align: center
 188