doc/tutorials/ml/non_linear_svms/non_linear_svms.markdown

   1 Support Vector Machines for Non-Linearly Separable Data {#tutorial_non_linear_svms}
   2 =======================================================
   3
   4 Goal
   5 ----
   6
   7 In this tutorial you will learn how to:
   8
   9 -   Define the optimization problem for SVMs when it is not possible to separate linearly the
  10     training data.
  11 -   How to configure the parameters to adapt your SVM for this class of problems.
  12
  13 Motivation
  14 ----------
  15
  16 Why is it interesting to extend the SVM optimation problem in order to handle non-linearly separable
  17 training data? Most of the applications in which SVMs are used in computer vision require a more
  18 powerful tool than a simple linear classifier. This stems from the fact that in these tasks __the
  19 training data can be rarely separated using an hyperplane__.
  20
  21 Consider one of these tasks, for example, face detection. The training data in this case is composed
  22 by a set of images that are faces and another set of images that are non-faces (_every other thing
  23 in the world except from faces_). This training data is too complex so as to find a representation
  24 of each sample (_feature vector_) that could make the whole set of faces linearly separable from the
  25 whole set of non-faces.
  26
  27 Extension of the Optimization Problem
  28 -------------------------------------
  29
  30 Remember that using SVMs we obtain a separating hyperplane. Therefore, since the training data is
  31 now non-linearly separable, we must admit that the hyperplane found will misclassify some of the
  32 samples. This _misclassification_ is a new variable in the optimization that must be taken into
  33 account. The new model has to include both the old requirement of finding the hyperplane that gives
  34 the biggest margin and the new one of generalizing the training data correctly by not allowing too
  35 many classification errors.
  36
  37 We start here from the formulation of the optimization problem of finding the hyperplane which
  38 maximizes the __margin__ (this is explained in the previous tutorial (@ref tutorial_introduction_to_svm):
  39
  40 \f[\min_{\beta, \beta_{0}} L(\beta) = \frac{1}{2}||\beta||^{2} \text{ subject to } y_{i}(\beta^{T} x_{i} + \beta_{0}) \geq 1 \text{ } \forall i\f]
  41
  42 There are multiple ways in which this model can be modified so it takes into account the
  43 misclassification errors. For example, one could think of minimizing the same quantity plus a
  44 constant times the number of misclassification errors in the training data, i.e.:
  45
  46 \f[\min ||\beta||^{2} + C \text{(\# misclassication errors)}\f]
  47
  48 However, this one is not a very good solution since, among some other reasons, we do not distinguish
  49 between samples that are misclassified with a small distance to their appropriate decision region or
  50 samples that are not. Therefore, a better solution will take into account the _distance of the
  51 misclassified samples to their correct decision regions_, i.e.:
  52
  53 \f[\min ||\beta||^{2} + C \text{(distance of misclassified samples to their correct regions)}\f]
  54
  55 For each sample of the training data a new parameter \f$\xi_{i}\f$ is defined. Each one of these
  56 parameters contains the distance from its corresponding training sample to their correct decision
  57 region. The following picture shows non-linearly separable training data from two classes, a
  58 separating hyperplane and the distances to their correct regions of the samples that are
  59 misclassified.
  60
  61 ![](images/sample-errors-dist.png)
  62
  63 @note Only the distances of the samples that are misclassified are shown in the picture. The
  64 distances of the rest of the samples are zero since they lay already in their correct decision
  65 region.
  66
  67 The red and blue lines that appear on the picture are the margins to each one of the
  68 decision regions. It is very __important__ to realize that each of the \f$\xi_{i}\f$ goes from a
  69 misclassified training sample to the margin of its appropriate region.
  70
  71 Finally, the new formulation for the optimization problem is:
  72
  73 \f[\min_{\beta, \beta_{0}} L(\beta) = ||\beta||^{2} + C \sum_{i} {\xi_{i}} \text{ subject to } y_{i}(\beta^{T} x_{i} + \beta_{0}) \geq 1 - \xi_{i} \text{ and } \xi_{i} \geq 0 \text{ } \forall i\f]
  74
  75 How should the parameter C be chosen? It is obvious that the answer to this question depends on how
  76 the training data is distributed. Although there is no general answer, it is useful to take into
  77 account these rules:
  78
  79 -   Large values of C give solutions with _less misclassification errors_ but a _smaller margin_.
  80     Consider that in this case it is expensive to make misclassification errors. Since the aim of
  81     the optimization is to minimize the argument, few misclassifications errors are allowed.
  82 -   Small values of C give solutions with _bigger margin_ and _more classification errors_. In this
  83     case the minimization does not consider that much the term of the sum so it focuses more on
  84     finding a hyperplane with big margin.
  85
  86 Source Code
  87 -----------
  88
  89 You may also find the source code in `samples/cpp/tutorial_code/ml/non_linear_svms` folder of the OpenCV source library or
  90 [download it from here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp).
  91
  92 @note The following code has been implemented with OpenCV 3.0 classes and functions. An equivalent version of the code
  93 using OpenCV 2.4 can be found in [this page.](http://docs.opencv.org/2.4/doc/tutorials/ml/non_linear_svms/non_linear_svms.html#nonlinearsvms)
  94
  95 @add_toggle_cpp
  96 -   **Downloadable code**: Click
  97     [here](https://github.com/opencv/opencv/tree/master/samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp)
  98
  99 -   **Code at glance:**
 100     @include samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp
 101 @end_toggle
 102
 103 @add_toggle_java
 104 -   **Downloadable code**: Click
 105     [here](https://github.com/opencv/opencv/tree/master/samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java)
 106
 107 -   **Code at glance:**
 108     @include samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java
 109 @end_toggle
 110
 111 @add_toggle_python
 112 -   **Downloadable code**: Click
 113     [here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py)
 114
 115 -   **Code at glance:**
 116     @include samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py
 117 @end_toggle
 118
 119 Explanation
 120 -----------
 121
 122 -   __Set up the training data__
 123
 124 The training data of this exercise is formed by a set of labeled 2D-points that belong to one of
 125 two different classes. To make the exercise more appealing, the training data is generated
 126 randomly using a uniform probability density functions (PDFs).
 127
 128 We have divided the generation of the training data into two main parts.
 129
 130 In the first part we generate data for both classes that is linearly separable.
 131
 132 @add_toggle_cpp
 133 @snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp setup1
 134 @end_toggle
 135
 136 @add_toggle_java
 137 @snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java setup1
 138 @end_toggle
 139
 140 @add_toggle_python
 141 @snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py setup1
 142 @end_toggle
 143
 144 In the second part we create data for both classes that is non-linearly separable, data that
 145 overlaps.
 146
 147 @add_toggle_cpp
 148 @snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp setup2
 149 @end_toggle
 150
 151 @add_toggle_java
 152 @snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java setup2
 153 @end_toggle
 154
 155 @add_toggle_python
 156 @snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py setup2
 157 @end_toggle
 158
 159 -   __Set up SVM's parameters__
 160
 161 @note In the previous tutorial @ref tutorial_introduction_to_svm there is an explanation of the
 162 attributes of the class @ref cv::ml::SVM that we configure here before training the SVM.
 163
 164 @add_toggle_cpp
 165 @snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp init
 166 @end_toggle
 167
 168 @add_toggle_java
 169 @snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java init
 170 @end_toggle
 171
 172 @add_toggle_python
 173 @snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py init
 174 @end_toggle
 175
 176 There are just two differences between the configuration we do here and the one that was done in
 177 the previous tutorial (@ref tutorial_introduction_to_svm) that we use as reference.
 178
 179 -   _C_. We chose here a small value of this parameter in order not to punish too much the
 180     misclassification errors in the optimization. The idea of doing this stems from the will of
 181     obtaining a solution close to the one intuitively expected. However, we recommend to get a
 182     better insight of the problem by making adjustments to this parameter.
 183
 184     @note In this case there are just very few points in the overlapping region between classes.
 185     By giving a smaller value to __FRAC_LINEAR_SEP__ the density of points can be incremented and the
 186     impact of the parameter _C_ explored deeply.
 187
 188 -   _Termination Criteria of the algorithm_. The maximum number of iterations has to be
 189     increased considerably in order to solve correctly a problem with non-linearly separable
 190     training data. In particular, we have increased in five orders of magnitude this value.
 191
 192 -   __Train the SVM__
 193
 194 We call the method @ref cv::ml::SVM::train to build the SVM model. Watch out that the training
 195 process may take a quite long time. Have patiance when your run the program.
 196
 197 @add_toggle_cpp
 198 @snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp train
 199 @end_toggle
 200
 201 @add_toggle_java
 202 @snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java train
 203 @end_toggle
 204
 205 @add_toggle_python
 206 @snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py train
 207 @end_toggle
 208
 209 -   __Show the Decision Regions__
 210
 211 The method @ref cv::ml::SVM::predict is used to classify an input sample using a trained SVM. In
 212 this example we have used this method in order to color the space depending on the prediction done
 213 by the SVM. In other words, an image is traversed interpreting its pixels as points of the
 214 Cartesian plane. Each of the points is colored depending on the class predicted by the SVM; in
 215 dark green if it is the class with label 1 and in dark blue if it is the class with label 2.
 216
 217 @add_toggle_cpp
 218 @snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp show
 219 @end_toggle
 220
 221 @add_toggle_java
 222 @snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java show
 223 @end_toggle
 224
 225 @add_toggle_python
 226 @snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py show
 227 @end_toggle
 228
 229 -   __Show the training data__
 230
 231 The method @ref cv::circle is used to show the samples that compose the training data. The samples
 232 of the class labeled with 1 are shown in light green and in light blue the samples of the class
 233 labeled with 2.
 234
 235 @add_toggle_cpp
 236 @snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp show_data
 237 @end_toggle
 238
 239 @add_toggle_java
 240 @snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java show_data
 241 @end_toggle
 242
 243 @add_toggle_python
 244 @snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py show_data
 245 @end_toggle
 246
 247 -   __Support vectors__
 248
 249 We use here a couple of methods to obtain information about the support vectors. The method
 250 @ref cv::ml::SVM::getSupportVectors obtain all support vectors. We have used this methods here
 251 to find the training examples that are support vectors and highlight them.
 252
 253 @add_toggle_cpp
 254 @snippet samples/cpp/tutorial_code/ml/non_linear_svms/non_linear_svms.cpp show_vectors
 255 @end_toggle
 256
 257 @add_toggle_java
 258 @snippet samples/java/tutorial_code/ml/non_linear_svms/NonLinearSVMsDemo.java show_vectors
 259 @end_toggle
 260
 261 @add_toggle_python
 262 @snippet samples/python/tutorial_code/ml/non_linear_svms/non_linear_svms.py show_vectors
 263 @end_toggle
 264
 265 Results
 266 -------
 267
 268 -   The code opens an image and shows the training examples of both classes. The points of one class
 269     are represented with light green and light blue ones are used for the other class.
 270 -   The SVM is trained and used to classify all the pixels of the image. This results in a division
 271     of the image in a blue region and a green region. The boundary between both regions is the
 272     separating hyperplane. Since the training data is non-linearly separable, it can be seen that
 273     some of the examples of both classes are misclassified; some green points lay on the blue region
 274     and some blue points lay on the green one.
 275 -   Finally the support vectors are shown using gray rings around the training examples.
 276
 277 ![](images/svm_non_linear_result.png)
 278
 279 You may observe a runtime instance of this on the [YouTube here](https://www.youtube.com/watch?v=vFv2yPcSo-Q).
 280
 281 @youtube{vFv2yPcSo-Q}