Purpose: completed the ml chapter

author Elena Fedotova <no@email>

Sun, 15 May 2011 19:15:36 +0000 (19:15 +0000)

committer Elena Fedotova <no@email>

Sun, 15 May 2011 19:15:36 +0000 (19:15 +0000)
author Elena Fedotova <no@email>
Sun, 15 May 2011 19:15:36 +0000 (19:15 +0000)
committer Elena Fedotova <no@email>
Sun, 15 May 2011 19:15:36 +0000 (19:15 +0000)
diff --git a/modules/ml/doc/boosting.rst b/modules/ml/doc/boosting.rst

index c45557f..9ee439a 100644 (file)
--- a/modules/ml/doc/boosting.rst
+++ b/modules/ml/doc/boosting.rst
@@ -8,10 +8,10 @@ A common machine learning task is supervised learning. In supervised learning, t
  :math:`x` and the output
  :math:`y` . Predicting the qualitative output is called classification, while predicting the quantitative output is called regression.
  
-Boosting is a powerful learning concept, which provide a solution to the supervised classification learning task. It combines the performance of many "weak" classifiers to produce a powerful 'committee'
-:ref:`[HTF01] <HTF01>` . A weak classifier is only required to be better than chance, and thus can be very simple and computationally inexpensive. Many of them smartly combined, however, results in a strong classifier, which often outperforms most 'monolithic' strong classifiers such as SVMs and Neural Networks.
+Boosting is a powerful learning concept that provides a solution to the supervised classification learning task. It combines the performance of many "weak" classifiers to produce a powerful 'committee'
+:ref:`[HTF01] <HTF01>` . A weak classifier is only required to be better than chance, and thus can be very simple and computationally inexpensive. However, many of them smartly combine results to a strong classifier that often outperforms most "monolithic" strong classifiers such as SVMs and Neural Networks.??
  
-Decision trees are the most popular weak classifiers used in boosting schemes. Often the simplest decision trees with only a single split node per tree (called stumps) are sufficient.
+Decision trees are the most popular weak classifiers used in boosting schemes. Often the simplest decision trees with only a single split node per tree (called ``stumps`` ) are sufficient.
  
  The boosted model is based on
  :math:`N` training examples
@@ -21,66 +21,68 @@ The boosted model is based on
  :math:`x_i` is a
  :math:`K` -component vector. Each component encodes a feature relevant for the learning task at hand. The desired two-class output is encoded as -1 and +1.
  
-Different variants of boosting are known such as Discrete Adaboost, Real AdaBoost, LogitBoost, and Gentle AdaBoost
-:ref:`[FHT98] <FHT98>` . All of them are very similar in their overall structure. Therefore, we will look only at the standard two-class Discrete AdaBoost algorithm as shown in the box below. Each sample is initially assigned the same weight (step 2). Next a weak classifier
+Different variants of boosting are known as Discrete Adaboost, Real AdaBoost, LogitBoost, and Gentle AdaBoost
+:ref:`[FHT98] <FHT98>` . All of them are very similar in their overall structure. Therefore, this chapter focuses only on the standard two-class Discrete AdaBoost algorithm as shown in the box below??. Each sample is initially assigned the same weight (step 2). Then, a weak classifier
  :math:`f_{m(x)}` is trained on the weighted training data (step 3a). Its weighted training error and scaling factor
-:math:`c_m` is computed (step 3b). The weights are increased for training samples, which have been misclassified (step 3c). All weights are then normalized, and the process of finding the next weak classifier continues for another
+:math:`c_m` is computed (step 3b). The weights are increased for training samples that have been misclassified (step 3c). All weights are then normalized, and the process of finding the next weak classifier continues for another
  :math:`M` -1 times. The final classifier
  :math:`F(x)` is the sign of the weighted sum over the individual weak classifiers (step 4).
  
-*
-    Given
+#.
+    Set
      :math:`N`     examples
      :math:`{(x_i,y_i)}1N`     with
      :math:`x_i \in{R^K}, y_i \in{-1, +1}`     .
  
-*
-    Start with weights
+#.
+    Assign weights as
      :math:`w_i = 1/N, i = 1,...,N`     .
  
-*
+#.
      Repeat for
      :math:`m`     =
      :math:`1,2,...,M`     :
  
-    *
+    ##.
          Fit the classifier
          :math:`f_m(x) \in{-1,1}`         , using weights
          :math:`w_i`         on the training data.
  
-    *
+    ##.
          Compute
          :math:`err_m = E_w [1_{(y =\neq f_m(x))}], c_m = log((1 - err_m)/err_m)`         .
  
-    *
+    ##.
          Set
          :math:`w_i \Leftarrow w_i exp[c_m 1_{(y_i \neq f_m(x_i))}], i = 1,2,...,N,`         and renormalize so that
          :math:`\Sigma i w_i = 1`         .
  
-    *
+    ##.
          Output the classifier sign
          :math:`[\Sigma m = 1M c_m f_m(x)]`         .
  
-Two-class Discrete AdaBoost Algorithm: Training (steps 1 to 3) and Evaluation (step 4)
+Two-class Discrete AdaBoost Algorithm: Training (steps 1 to 3) and Evaluation (step 4)??you need to revise this section. what is this? a title for the image that is missing?
+
  **NOTE:**
-As well as the classical boosting methods, the current implementation supports 2-class classifiers only. For M
-:math:`>` 2 classes there is the
+
+Similar to the classical boosting methods, the current implementation supports two-class classifiers only. For M
+:math:`>` two classes, there is the
  **AdaBoost.MH**
-algorithm, described in
-:ref:`[FHT98] <FHT98>` , that reduces the problem to the 2-class problem, yet with a much larger training set.
+algorithm (described in
+:ref:`[FHT98] <FHT98>` ) that reduces the problem to the two-class problem, yet with a much larger training set.
  
-In order to reduce computation time for boosted models without substantially losing accuracy, the influence trimming technique may be employed. As the training algorithm proceeds and the number of trees in the ensemble is increased, a larger number of the training samples are classified correctly and with increasing confidence, thereby those samples receive smaller weights on the subsequent iterations. Examples with very low relative weight have small impact on training of the weak classifier. Thus such examples may be excluded during the weak classifier training without having much effect on the induced classifier. This process is controlled with the weight_trim_rate parameter. Only examples with the summary fraction weight_trim_rate of the total weight mass are used in the weak classifier training. Note that the weights for
+To reduce computation time for boosted models without substantially losing accuracy, the influence trimming technique may be employed. As the training algorithm proceeds and the number of trees in the ensemble is increased, a larger number of the training samples are classified correctly and with increasing confidence, thereby those samples receive smaller weights on the subsequent iterations. Examples with a very low relative weight have a small impact on the weak classifier training. Thus, such examples may be excluded during the weak classifier training without having much effect on the induced classifier. This process is controlled with the ``weight_trim_rate`` parameter. Only examples with the summary fraction ``weight_trim_rate`` of the total weight mass are used in the weak classifier training. Note that the weights for
  **all**
  training examples are recomputed at each training iteration. Examples deleted at a particular iteration may be used again for learning some of the weak classifiers further
  :ref:`[FHT98] <FHT98>` .
  
-.. _HTF01:
+.. _HTF01:??what is this meant to be? it doesn't work
  
-[HTF01] Hastie, T., Tibshirani, R., Friedman, J. H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. 2001.**
+[HTF01] Hastie, T., Tibshirani, R., Friedman, J. H. *The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics*. 2001.
  
-.. _FHT98:
+.. _FHT98:??the same comment
  
-[FHT98] Friedman, J. H., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting. Technical Report, Dept. of Statistics, Stanford University, 1998.**
+[FHT98] Friedman, J. H., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting. Technical Report, Dept. of Statistics*, Stanford University, 1998.
  
  .. index:: CvBoostParams
  
@@ -90,7 +92,7 @@ CvBoostParams
  -------------
  .. c:type:: CvBoostParams
  
-Boosting training parameters. ::
+Boosting training parameters ::
  
      struct CvBoostParams : public CvDTreeParams
      {
@@ -106,7 +108,7 @@ Boosting training parameters. ::
  
  
  The structure is derived from
-:ref:`CvDTreeParams` , but not all of the decision tree parameters are supported. In particular, cross-validation is not supported.
+:ref:`CvDTreeParams`  but not all of the decision tree parameters are supported. In particular, cross-validation is not supported.
  
  .. index:: CvBoostTree
  
@@ -116,7 +118,7 @@ CvBoostTree
  -----------
  .. c:type:: CvBoostTree
  
-Weak tree classifier. ::
+Weak tree classifier ::
  
      class CvBoostTree: public CvDTree
      {
@@ -139,13 +141,15 @@ Weak tree classifier. ::
  
  The weak classifier, a component of the boosted tree classifier
  :ref:`CvBoost` , is a derivative of
-:ref:`CvDTree` . Normally, there is no need to use the weak classifiers directly, however they can be accessed as elements of the sequence ``CvBoost::weak`` , retrieved by ``CvBoost::get_weak_predictors`` .
+:ref:`CvDTree` . Normally, there is no need to use the weak classifiers directly. However, they can be accessed as elements of the sequence ``CvBoost::weak`` , retrieved by ``CvBoost::get_weak_predictors`` .
+
+**Note:**
  
-Note, that in the case of LogitBoost and Gentle AdaBoost each weak predictor is a regression tree, rather than a classification tree. Even in the case of Discrete AdaBoost and Real AdaBoost the ``CvBoostTree::predict`` return value ( ``CvDTreeNode::value`` ) is not the output class label; a negative value "votes" for class
+In case of LogitBoost and Gentle AdaBoost, each weak predictor is a regression tree, rather than a classification tree. Even in case of Discrete AdaBoost and Real AdaBoost, the ``CvBoostTree::predict`` return value ( ``CvDTreeNode::value`` ) is not an output class label. A negative value "votes" for class
  #
  0, a positive - for class
  #
-1. And the votes are weighted. The weight of each individual tree may be increased or decreased using the method ``CvBoostTree::scale`` .
+1. The votes are weighted. The weight of each individual tree may be increased or decreased using the method ``CvBoostTree::scale`` .
  
  .. index:: CvBoost
  
@@ -155,7 +159,7 @@ CvBoost
  -------
  .. c:type:: CvBoost
  
-Boosted tree classifier. ::
+Boosted tree classifier ::
  
      class CvBoost : public CvStatModel
      {
@@ -221,7 +225,7 @@ CvBoost::train
  
      Trains a boosted tree classifier.
  
-The train method follows the common template; the last parameter ``update`` specifies whether the classifier needs to be updated (i.e. the new weak tree classifiers added to the existing ensemble), or the classifier needs to be rebuilt from scratch. The responses must be categorical, i.e. boosted trees can not be built for regression, and there should be 2 classes.
+The train method follows the common template. The last parameter ``update`` specifies whether the classifier needs to be updated (the new weak tree classifiers added to the existing ensemble) or the classifier needs to be rebuilt from scratch. The responses must be categorical, which means that boosted trees cannot be built for regression, and there should be two classes.
  
  .. index:: CvBoost::predict
  
@@ -231,7 +235,7 @@ CvBoost::predict
  ----------------
  .. c:function:: float CvBoost::predict(  const CvMat* sample,  const CvMat* missing=0,                          CvMat* weak_responses=0,  CvSlice slice=CV_WHOLE_SEQ,                          bool raw_mode=false ) const
  
-    Predicts a response for the input sample.
+    Predicts a response for an input sample.
  
  The method ``CvBoost::predict`` runs the sample through the trees in the ensemble and returns the output class label based on the weighted voting.
  
@@ -245,7 +249,11 @@ CvBoost::prune
  
      Removes the specified weak classifiers.
  
-The method removes the specified weak classifiers from the sequence. Note that this method should not be confused with the pruning of individual decision trees, which is currently not supported.
+The method removes the specified weak classifiers from the sequence. 
+
+**Note:**
+
+Do not confuse this method with the pruning of individual decision trees, which is currently not supported.
  
  .. index:: CvBoost::get_weak_predictors
  
@@ -257,5 +265,5 @@ CvBoost::get_weak_predictors
  
      Returns the sequence of weak tree classifiers.
  
-The method returns the sequence of weak classifiers. Each element of the sequence is a pointer to a ``CvBoostTree`` class (or, probably, to some of its derivatives).
+The method returns the sequence of weak classifiers. Each element of the sequence is a pointer to the ``CvBoostTree`` class or, probably, to some of its derivatives.
  
diff --git a/modules/ml/doc/decision_trees.rst b/modules/ml/doc/decision_trees.rst

index 5ad3ce1..e3a6c38 100644 (file)
--- a/modules/ml/doc/decision_trees.rst
+++ b/modules/ml/doc/decision_trees.rst
@@ -1,7 +1,7 @@
  Decision Trees
  ==============
  
-The ML classes discussed in this section implement Classification And Regression Tree algorithms, which are described in `[Breiman84] <#paper_Breiman84>`_
+The ML classes discussed in this section implement Classification and Regression Tree algorithms described in `[Breiman84] <#paper_Breiman84>`_
  .
  
  The class
@@ -9,56 +9,53 @@ The class
  :ref:`Boosting` and
  :ref:`Random Trees` ).
  
-A decision tree is a binary tree (i.e. tree where each non-leaf node has exactly 2 child nodes). It can be used either for classification, when each tree leaf is marked with some class label (multiple leafs may have the same label), or for regression, when each tree leaf is also assigned a constant (so the approximation function is piecewise constant).
+A decision tree is a binary tree (tree where each non-leaf node has exactly two child nodes). It can be used either for classification or for regression. For classification, each tree leaf is marked with a class label; multiple leafs may have the same label. For regression, a constant is also assigned to each tree leaf, so the approximation function is piecewise constant.
  
  Predicting with Decision Trees
  ------------------------------
  
-To reach a leaf node, and to obtain a response for the input feature
+To reach a leaf node and to obtain a response for the input feature
  vector, the prediction procedure starts with the root node. From each
-non-leaf node the procedure goes to the left (i.e. selects the left
-child node as the next observed node), or to the right based on the
-value of a certain variable, whose index is stored in the observed
-node. The variable can be either ordered or categorical. In the first
-case, the variable value is compared with the certain threshold (which
-is also stored in the node); if the value is less than the threshold,
-the procedure goes to the left, otherwise, to the right (for example,
-if the weight is less than 1 kilogram, the procedure goes to the left,
-else to the right). And in the second case the discrete variable value is
-tested to see if it belongs to a certain subset of values (also stored
-in the node) from a limited set of values the variable could take; if
-yes, the procedure goes to the left, else - to the right (for example,
-if the color is green or red, go to the left, else to the right). That
-is, in each node, a pair of entities (variable_index, decision_rule
-(threshold/subset)) is used. This pair is called a split (split on
-the variable variable_index). Once a leaf node is reached, the value
-assigned to this node is used as the output of prediction procedure.
-
-Sometimes, certain features of the input vector are missed (for example, in the darkness it is difficult to determine the object color), and the prediction procedure may get stuck in the certain node (in the mentioned example if the node is split by color). To avoid such situations, decision trees use so-called surrogate splits. That is, in addition to the best "primary" split, every tree node may also be split on one or more other variables with nearly the same results.
+non-leaf node the procedure goes to the left (selects the left
+child node as the next observed node) or to the right based on the
+value of a certain variable whose index is stored in the observed
+node. The following variables are possible:
+
+* 
+  **Ordered variables.** The variable value is compared with a threshold that is also stored in the node). If the value is less than the threshold, the procedure goes to the left. Otherwise, it goes to the right. For example, if the weight is less than 1 kilogram, the procedure goes to the left, else to the right.
+* 
+  **Categorical variables.**  A discrete variable value is tested to see whether it belongs to a certain subset of values (also stored in the node) from a limited set of values the variable could take. If it does, the procedure goes to the left. Otherwise, it goes to the right. For example, if the color is green or red, go to the left, else to the right.
+
+So, in each node, a pair of entities (``variable_index`` , ``decision_rule
+(threshold/subset)`` ) is used. This pair is called a *split* (split on
+the variable ``variable_index`` ). Once a leaf node is reached, the value
+assigned to this node is used as the output of the prediction procedure.
+
+Sometimes, certain features of the input vector are missed (for example, in the darkness it is difficult to determine the object color), and the prediction procedure may get stuck in the certain node (in the mentioned example, if the node is split by color). To avoid such situations, decision trees use so-called *surrogate splits*. That is, in addition to the best "primary" split, every tree node may also be split to one or more other variables with nearly the same results.
  
  Training Decision Trees
  -----------------------
  
-The tree is built recursively, starting from the root node. All of the training data (feature vectors and the responses) is used to split the root node. In each node the optimum decision rule (i.e. the best "primary" split) is found based on some criteria (in ML ``gini`` "purity" criteria is used for classification, and sum of squared errors is used for regression). Then, if necessary, the surrogate splits are found that resemble the results of the primary split on the training data; all of the data is divided using the primary and the surrogate splits (just like it is done in the prediction procedure) between the left and the right child node. Then the procedure recursively splits both left and right nodes. At each node the recursive procedure may stop (i.e. stop splitting the node further) in one of the following cases:
+The tree is built recursively, starting from the root node. All training data (feature vectors and responses) is used to split the root node. In each node the optimum decision rule (the best "primary" split) is found based on some criteria. In ML, ``gini`` "purity" criteria are used for classification, and sum of squared errors is used for regression. Then, if necessary, the surrogate splits are found. They resemble the results of the primary split on the training data. All the data is divided using the primary and the surrogate splits (like it is done in the prediction procedure) between the left and the right child node. Then, the procedure recursively splits both left and right nodes. At each node the recursive procedure may stop (that is, stop splitting the node further) in one of the following cases:
  
-* depth of the tree branch being constructed has reached the specified maximum value.
+* Depth of the constructed tree branch has reached the specified maximum value.
  
-* number of training samples in the node is less than the specified threshold, when it is not statistically representative to split the node further.
+* Number of training samples in the node is less than the specified threshold when it is not statistically representative to split the node further.
  
-* all the samples in the node belong to the same class (or, in the case of regression, the variation is too small).
+* All the samples in the node belong to the same class or, in case of regression, the variation is too small.
  
-* the best split found does not give any noticeable improvement compared to a random choice.
+* The best found split does not give any noticeable improvement compared to a random choice.
  
-When the tree is built, it may be pruned using a cross-validation procedure, if necessary. That is, some branches of the tree that may lead to the model overfitting are cut off. Normally this procedure is only applied to standalone decision trees, while tree ensembles usually build small enough trees and use their own protection schemes against overfitting.
+When the tree is built, it may be pruned using a cross-validation procedure, if necessary. That is, some branches of the tree that may lead to the model overfitting are cut off. Normally, this procedure is only applied to standalone decision trees. Tree ensembles usually build trees that are small enough and use their own protection schemes against overfitting.
  
-Variable importance
+Variable Importance
  -------------------
  
-Besides the obvious use of decision trees - prediction, the tree can be also used for various data analysis. One of the key properties of the constructed decision tree algorithms is that it is possible to compute importance (relative decisive power) of each variable. For example, in a spam filter that uses a set of words occurred in the message as a feature vector, the variable importance rating can be used to determine the most "spam-indicating" words and thus help to keep the dictionary size reasonable.
+Besides the prediction that is an obvious use of decision trees, the tree can be also used for various data analyses. One of the key properties of the constructed decision tree algorithms is an ability to compute the importance (relative decisive power) of each variable. For example, in a spam filter that uses a set of words occurred in the message as a feature vector, the variable importance rating can be used to determine the most "spam-indicating" words and thus help keep the dictionary size reasonable.
  
  Importance of each variable is computed over all the splits on this variable in the tree, primary and surrogate ones. Thus, to compute variable importance correctly, the surrogate splits must be enabled in the training parameters, even if there is no missing data.
  
-**[Breiman84] Breiman, L., Friedman, J. Olshen, R. and Stone, C. (1984), "Classification and Regression Trees", Wadsworth.**
+[Breiman84] Breiman, L., Friedman, J. Olshen, R. and Stone, C. (1984), *Classification and Regression Trees*, Wadsworth.
  
  .. index:: CvDTreeSplit
  
@@ -68,7 +65,7 @@ CvDTreeSplit
  ------------
  .. c:type:: CvDTreeSplit
  
-Decision tree node split. ::
+Decision tree node split ::
  
      struct CvDTreeSplit
      {
@@ -97,7 +94,7 @@ CvDTreeNode
  -----------
  .. c:type:: CvDTreeNode
  
-Decision tree node. ::
+Decision tree node ::
  
      struct CvDTreeNode
      {
@@ -127,7 +124,7 @@ CvDTreeParams
  -------------
  .. c:type:: CvDTreeParams
  
-Decision tree training parameters. ::
+Decision tree training parameters ::
  
      struct CvDTreeParams
      {
@@ -154,7 +151,7 @@ Decision tree training parameters. ::
      };
  
  
-The structure contains all the decision tree training parameters. There is a default constructor that initializes all the parameters with the default values tuned for standalone classification tree. Any of the parameters can be overridden then, or the structure may be fully initialized using the advanced variant of the constructor.
+The structure contains all the decision tree training parameters. There is a default constructor that initializes all the parameters with the default values tuned for the standalone classification tree. Any parameters can be overridden then, or the structure may be fully initialized using the advanced variant of the constructor.
  
  .. index:: CvDTreeTrainData
  
@@ -164,7 +161,7 @@ CvDTreeTrainData
  ----------------
  .. c:type:: CvDTreeTrainData
  
-Decision tree training data and shared data for tree ensembles. ::
+Decision tree training data and shared data for tree ensembles ::
  
      struct CvDTreeTrainData
      {
@@ -260,26 +257,26 @@ Decision tree training data and shared data for tree ensembles. ::
      };
  
  
-This structure is mostly used internally for storing both standalone trees and tree ensembles efficiently. Basically, it contains 3 types of information:
+This structure is mostly used internally for storing both standalone trees and tree ensembles efficiently. Basically, it contains the following types of information:
  
-#. The training parameters, an instance of :ref:`CvDTreeParams`.
+#. Training parameters, an instance of :ref:`CvDTreeParams`.
  
-#. The training data, preprocessed in order to find the best splits more efficiently. For tree ensembles this preprocessed data is reused by all the trees. Additionally, the training data characteristics that are shared by all trees in the ensemble are stored here: variable types, the number of classes, class label compression map etc.
+#. Training data, preprocessed to find the best splits more efficiently. For tree ensembles, this preprocessed data is reused by all trees. Additionally, the training data characteristics shared by all trees in the ensemble are stored here: variable types, the number of classes, class label compression map, and so on.
  
-#. Buffers, memory storages for tree nodes, splits and other elements of the trees constructed.
+#. Buffers, memory storages for tree nodes, splits, and other elements of the constructed trees.
  
-There are 2 ways of using this structure. In simple cases (e.g. a standalone tree, or the ready-to-use "black box" tree ensemble from ML, like
+There are two ways of using this structure. In simple cases (for example, a standalone tree or the ready-to-use "black box" tree ensemble from ML, like
  :ref:`Random Trees` or
-:ref:`Boosting` ) there is no need to care or even to know about the structure - just construct the needed statistical model, train it and use it. The ``CvDTreeTrainData`` structure will be constructed and used internally. However, for custom tree algorithms, or another sophisticated cases, the structure may be constructed and used explicitly. The scheme is the following:
+:ref:`Boosting` ), there is no need to care or even to know about the structure. You just construct the needed statistical model, train it, and use it. The ``CvDTreeTrainData`` structure is constructed and used internally. However, for custom tree algorithms or another sophisticated cases, the structure may be constructed and used explicitly. The scheme is the following:
  
-*
-    The structure is initialized using the default constructor, followed by ``set_data``     (or it is built using the full form of constructor). The parameter ``_shared``     must be set to ``true``     .
+#.
+    The structure is initialized using the default constructor, followed by ``set_data`` , or it is built using the full form of constructor. The parameter ``_shared``  must be set to ``true`` .
  
-*
-    One or more trees are trained using this data, see the special form of the method ``CvDTree::train``     .
+#.
+    One or more trees are trained using this data (see the special form of the method ``CvDTree::train``  ).
  
-*
-    Finally, the structure can be released only after all the trees using it are released.
+#.
+    The structure is released as soon as all the trees using it are released.
  
  .. index:: CvDTree
  
@@ -289,7 +286,7 @@ CvDTree
  -------
  .. c:type:: CvDTree
  
-Decision tree. ::
+Decision tree ::
  
      class CvDTree : public CvStatModel
      {
@@ -380,13 +377,13 @@ CvDTree::train
  
      Trains a decision tree.
  
-There are 2 ``train`` methods in ``CvDTree`` .
+There are two ``train`` methods in ``CvDTree`` :
  
-The first method follows the generic ``CvStatModel::train`` conventions,  it is the most complete form. Both data layouts ( ``_tflag=CV_ROW_SAMPLE`` and ``_tflag=CV_COL_SAMPLE`` ) are supported, as well as sample and variable subsets, missing measurements, arbitrary combinations of input and output variable types etc. The last parameter contains all of the necessary training parameters, see the
-:ref:`CvDTreeParams` description.
+* The first method follows the generic ``CvStatModel::train`` conventions. It is the most complete form. Both data layouts ( ``_tflag=CV_ROW_SAMPLE`` and ``_tflag=CV_COL_SAMPLE`` ) are supported, as well as sample and variable subsets, missing measurements, arbitrary combinations of input and output variable types, and so on. The last parameter contains all of the necessary training parameters (see the
+:ref:`CvDTreeParams` description).
  
-The second method ``train`` is mostly used for building tree ensembles. It takes the pre-constructed
-:ref:`CvDTreeTrainData` instance and the optional subset of training set. The indices in ``_subsample_idx`` are counted relatively to the ``_sample_idx`` , passed to ``CvDTreeTrainData`` constructor. For example, if ``_sample_idx=[1, 5, 7, 100]`` , then ``_subsample_idx=[0,3]`` means that the samples ``[1, 100]`` of the original training set are used.
+* The second method ``train`` is mostly used for building tree ensembles. It takes the pre-constructed
+:ref:`CvDTreeTrainData` instance and an optional subset of the training set. The indices in ``_subsample_idx`` are counted relatively to the ``_sample_idx`` , passed to the ``CvDTreeTrainData`` constructor. For example, if ``_sample_idx=[1, 5, 7, 100]`` , then ``_subsample_idx=[0,3]`` means that the samples ``[1, 100]`` of the original training set are used.
  
  .. index:: CvDTree::predict
  
@@ -396,9 +393,9 @@ CvDTree::predict
  ----------------
  .. c:function:: CvDTreeNode* CvDTree::predict(  const CvMat* _sample,  const CvMat* _missing_data_mask=0,                                 bool raw_mode=false ) const
  
-    Returns the leaf node of the decision tree corresponding to the input vector.
+    Returns the leaf node of a decision tree corresponding to the input vector.
  
-The method takes the feature vector and the optional missing measurement mask on input, traverses the decision tree and returns the reached leaf node on output. The prediction result, either the class label or the estimated function value, may be retrieved as the ``value`` field of the
+The method takes the feature vector and an optional missing measurement mask as input, traverses the decision tree, and returns the reached leaf node as output. The prediction result, either the class label or the estimated function value, may be retrieved as the ``value`` field of the
  :ref:`CvDTreeNode` structure, for example: dtree-
  :math:`>` predict(sample,mask)-
  :math:`>` value.
@@ -407,10 +404,10 @@ The last parameter is normally set to ``false`` , implying a regular
  input. If it is ``true`` , the method assumes that all the values of
  the discrete input variables have been already normalized to
  :math:`0` to
-:math:`num\_of\_categories_i-1` ranges. (as the decision tree uses such
-normalized representation internally). It is useful for faster prediction
-with tree ensembles. For ordered input variables the flag is not used.
+:math:`num\_of\_categories_i-1` ranges since the decision tree uses such
+normalized representation internally. It is useful for faster prediction
+with tree ensembles. For ordered input variables, the flag is not used.
  
-Example: Building A Tree for Classifying Mushrooms.  See the ``mushroom.cpp`` sample that demonstrates how to build and use the
+Example: building a tree for classifying mushrooms.  See the ``mushroom.cpp`` sample that demonstrates how to build and use the
  decision tree.
  
diff --git a/modules/ml/doc/expectation_maximization.rst b/modules/ml/doc/expectation_maximization.rst

index 8c8ed9c..1a54225 100644 (file)
--- a/modules/ml/doc/expectation_maximization.rst
+++ b/modules/ml/doc/expectation_maximization.rst
@@ -1,10 +1,10 @@
-Expectation-Maximization
+Expectation Maximization
  ========================
  
-The EM (Expectation-Maximization) algorithm estimates the parameters of the multivariate probability density function in the form of a Gaussian mixture distribution with a specified number of mixtures.
+The EM (Expectation Maximization) algorithm estimates the parameters of the multivariate probability density function in the form of a Gaussian mixture distribution with a specified number of mixtures.
  
-Consider the set of the feature vectors
-:math:`x_1, x_2,...,x_{N}` : N vectors from a d-dimensional Euclidean space drawn from a Gaussian mixture:
+Consider the set of the
+:math:`x_1, x_2,...,x_{N}` : N feature vectors?? from a d-dimensional Euclidean space drawn from a Gaussian mixture:
  
  .. math::
  
@@ -19,12 +19,15 @@ where
  :math:`p_k` is the normal distribution
  density with the mean
  :math:`a_k` and covariance matrix
-:math:`S_k`,:math:`\pi_k` is the weight of the k-th mixture. Given the number of mixtures
+:math:`S_k`,
+:math:`\pi_k` is the weight of the k-th mixture. Given the number of mixtures
  :math:`M` and the samples
-:math:`x_i`,:math:`i=1..N` the algorithm finds the
-maximum-likelihood estimates (MLE) of the all the mixture parameters,
-i.e.
-:math:`a_k`,:math:`S_k` and
+:math:`x_i`,
+:math:`i=1..N` the algorithm finds the
+maximum-likelihood estimates (MLE) of all the mixture parameters,
+that is,
+:math:`a_k`,
+:math:`S_k` and
  :math:`\pi_k` :
  
  .. math::
@@ -35,8 +38,8 @@ i.e.
  
      \Theta = \left \{ (a_k,S_k, \pi _k): a_k  \in \mathbbm{R} ^d,S_k=S_k^T>0,S_k  \in \mathbbm{R} ^{d  \times d}, \pi _k \geq 0, \sum _{k=1}^{m} \pi _k=1 \right \} .
  
-EM algorithm is an iterative procedure. Each iteration of it includes
-two steps. At the first step (Expectation-step, or E-step), we find a
+The EM algorithm is an iterative procedure. Each iteration includes
+two steps. At the first step (Expectation step or E-step), you find a
  probability
  :math:`p_{i,k}` (denoted
  :math:`\alpha_{i,k}` in the formula below) of
@@ -47,30 +50,30 @@ available mixture parameter estimates:
  
      \alpha _{ki} =  \frac{\pi_k\varphi(x;a_k,S_k)}{\sum\limits_{j=1}^{m}\pi_j\varphi(x;a_j,S_j)} .
  
-At the second step (Maximization-step, or M-step) the mixture parameter estimates are refined using the computed probabilities:
+At the second step (Maximization step or M-step), the mixture parameter estimates are refined using the computed probabilities:
  
  .. math::
  
-    \pi _k= \frac{1}{N} \sum _{i=1}^{N} \alpha _{ki},  \quad a_k= \frac{\sum\limits_{i=1}^{N}\alpha_{ki}x_i}{\sum\limits_{i=1}^{N}\alpha_{ki}} ,  \quad S_k= \frac{\sum\limits_{i=1}^{N}\alpha_{ki}(x_i-a_k)(x_i-a_k)^T}{\sum\limits_{i=1}^{N}\alpha_{ki}} ,
+    \pi _k= \frac{1}{N} \sum _{i=1}^{N} \alpha _{ki},  \quad a_k= \frac{\sum\limits_{i=1}^{N}\alpha_{ki}x_i}{\sum\limits_{i=1}^{N}\alpha_{ki}} ,  \quad S_k= \frac{\sum\limits_{i=1}^{N}\alpha_{ki}(x_i-a_k)(x_i-a_k)^T}{\sum\limits_{i=1}^{N}\alpha_{ki}} 
  
  Alternatively, the algorithm may start with the M-step when the initial values for
  :math:`p_{i,k}` can be provided. Another alternative when
-:math:`p_{i,k}` are unknown, is to use a simpler clustering algorithm to pre-cluster the input samples and thus obtain initial
-:math:`p_{i,k}` . Often (and in ML) the
+:math:`p_{i,k}` are unknown is to use a simpler clustering algorithm to pre-cluster the input samples and thus obtain initial
+:math:`p_{i,k}` . Often (including ML) the
  :ref:`kmeans` algorithm is used for that purpose.
  
-One of the main that EM algorithm should deal with is the large number
-of parameters to estimate. The majority of the parameters sits in
+One of the main problems?? the EM algorithm should deal with is a large number
+of parameters to estimate. The majority of the parameters reside in
  covariance matrices, which are
  :math:`d \times d` elements each
-(where
-:math:`d` is the feature space dimensionality). However, in
-many practical problems the covariance matrices are close to diagonal,
+where
+:math:`d` is the feature space dimensionality. However, in
+many practical problems, the covariance matrices are close to diagonal
  or even to
  :math:`\mu_k*I` , where
-:math:`I` is identity matrix and
-:math:`\mu_k` is mixture-dependent "scale" parameter. So a robust computation
-scheme could be to start with the harder constraints on the covariance
+:math:`I` is an identity matrix and
+:math:`\mu_k` is a mixture-dependent "scale" parameter. So, a robust computation
+scheme could start with harder constraints on the covariance
  matrices and then use the estimated parameters as an input for a less
  constrained optimization problem (often a diagonal covariance matrix is
  already a good enough approximation).
@@ -78,7 +81,7 @@ already a good enough approximation).
  **References:**
  
  *
-    Bilmes98 J. A. Bilmes. A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report TR-97-021, International Computer Science Institute and Computer Science Division, University of California at Berkeley, April 1998.
+    Bilmes98 J. A. Bilmes. *A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models*. Technical Report TR-97-021, International Computer Science Institute and Computer Science Division, University of California at Berkeley, April 1998.
  
  .. index:: CvEMParams
  
@@ -88,7 +91,7 @@ CvEMParams
  ----------
  .. c:type:: CvEMParams
  
-Parameters of the EM algorithm. ::
+Parameters of the EM algorithm ::
  
      struct CvEMParams
      {
@@ -124,7 +127,7 @@ Parameters of the EM algorithm. ::
      };
  
  
-The structure has 2 constructors, the default one represents a rough rule-of-thumb, with another one it is possible to override a variety of parameters, from a single number of mixtures (the only essential problem-dependent parameter), to the initial values for the mixture parameters.
+The structure has two constructors. The default one represents a rough rule-of-the-thumb. With another one it is possible to override a variety of parameters from a single number of mixtures (the only essential problem-dependent parameter) to initial values for the mixture parameters.
  
  .. index:: CvEM
  
@@ -134,7 +137,7 @@ CvEM
  ----
  .. c:type:: CvEM
  
-EM model. ::
+EM model ::
  
      class CV_EXPORTS CvEM : public CvStatModel
      {
@@ -142,7 +145,7 @@ EM model. ::
          // Type of covariance matrices
          enum { COV_MAT_SPHERICAL=0, COV_MAT_DIAGONAL=1, COV_MAT_GENERIC=2 };
  
-        // The initial step
+        // Initial step
          enum { START_E_STEP=1, START_M_STEP=2, START_AUTO_STEP=0 };
  
          CvEM();
@@ -194,14 +197,17 @@ CvEM::train
  -----------
  .. c:function:: void CvEM::train(  const CvMat* samples,  const CvMat*  sample_idx=0,                    CvEMParams params=CvEMParams(),  CvMat* labels=0 )
  
-    Estimates the Gaussian mixture parameters from the sample set.
+    Estimates the Gaussian mixture parameters from a sample set.
  
-Unlike many of the ML models, EM is an unsupervised learning algorithm and it does not take responses (class labels or the function values) on input. Instead, it computes the
-*Maximum Likelihood Estimate* of the Gaussian mixture parameters from the input sample set, stores all the parameters inside the structure:
-:math:`p_{i,k}` in ``probs``,:math:`a_k` in ``means`` :math:`S_k` in ``covs[k]``,:math:`\pi_k` in ``weights`` and optionally computes the output "class label" for each sample:
-:math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (i.e. indices of the most-probable mixture for each sample).
+Unlike many of the ML models, EM is an unsupervised learning algorithm and it does not take responses (class labels or function values) as input. Instead, it computes the
+*Maximum Likelihood Estimate* of the Gaussian mixture parameters from an input sample set, stores all the parameters inside the structure:
+:math:`p_{i,k}` in ``probs``,
+:math:`a_k` in ``means`` ,
+:math:`S_k` in ``covs[k]``,
+:math:`\pi_k` in ``weights`` , and optionally computes the output "class label" for each sample:
+:math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture for each sample).
  
-The trained model can be used further for prediction, just like any other classifier. The model trained is similar to the
+The trained model can be used further for prediction, just like any other classifier. The trained model is similar to the
  :ref:`Bayes classifier`.
  
  Example: Clustering random samples of multi-Gaussian distribution using EM ::
@@ -244,7 +250,7 @@ Example: Clustering random samples of multi-Gaussian distribution using EM ::
          }
          cvReshape( samples, samples, 1, 0 );
  
-        // initialize model's parameters
+        // initialize model parameters
          params.covs      = NULL;
          params.means     = NULL;
          params.weights   = NULL;
@@ -263,7 +269,7 @@ Example: Clustering random samples of multi-Gaussian distribution using EM ::
          // the piece of code shows how to repeatedly optimize the model
          // with less-constrained parameters
          //(COV_MAT_DIAGONAL instead of COV_MAT_SPHERICAL)
-        // when the output of the first stage is used as input for the second.
+        // when the output of the first stage is used as input for the second one.
          CvEM em_model2;
          params.cov_mat_type = CvEM::COV_MAT_DIAGONAL;
          params.start_step = CvEM::START_E_STEP;
diff --git a/modules/ml/doc/k_nearest_neighbors.rst b/modules/ml/doc/k_nearest_neighbors.rst

index de42ffb..8d349d2 100644 (file)
--- a/modules/ml/doc/k_nearest_neighbors.rst
+++ b/modules/ml/doc/k_nearest_neighbors.rst
@@ -1,9 +1,9 @@
  K Nearest Neighbors
  ===================
  
-The algorithm caches all of the training samples, and predicts the response for a new sample by analyzing a certain number (
+The algorithm caches all training samples and predicts the response for a new sample by analyzing a certain number (
  **K**
-) of the nearest neighbors of the sample (using voting, calculating weighted sum etc.) The method is sometimes referred to as "learning by example", because for prediction it looks for the feature vector with a known response that is closest to the given vector.
+) of the nearest neighbors of the sample (using voting, calculating weighted sum, and so on). The method is sometimes referred to as "learning by example" because for prediction it looks for the feature vector with a known response that is closest to the given vector.
  
  .. index:: CvKNearest
  
@@ -13,7 +13,7 @@ CvKNearest
  ----------
  .. c:type:: CvKNearest
  
-K Nearest Neighbors model. ::
+K-Nearest Neighbors model ::
  
      class CvKNearest : public CvStatModel
      {
@@ -53,12 +53,16 @@ CvKNearest::train
  
      Trains the model.
  
-The method trains the K-Nearest model. It follows the conventions of generic ``train`` "method" with the following limitations: only CV_ROW_SAMPLE data layout is supported, the input variables are all ordered, the output variables can be either categorical ( ``is_regression=false`` ) or ordered ( ``is_regression=true`` ), variable subsets ( ``var_idx`` ) and missing measurements are not supported.
+The method trains the K-Nearest model. It follows the conventions of the generic ``train`` "method" with the following limitations: 
+* Only ``CV_ROW_SAMPLE`` data layout is supported.
+* Input variables are all ordered.
+* Output variables can be either categorical ( ``is_regression=false`` ) or ordered ( ``is_regression=true`` ).
+* Variable subsets ( ``var_idx`` ) and missing measurements are not supported.
  
  The parameter ``_max_k`` specifies the number of maximum neighbors that may be passed to the method ``find_nearest`` .
  
  The parameter ``_update_base`` specifies whether the model is trained from scratch
-( ``_update_base=false`` ), or it is updated using the new training data ( ``_update_base=true`` ). In the latter case the parameter ``_max_k`` must not be larger than the original value.
+( ``_update_base=false`` ), or it is updated using the new training data ( ``_update_base=true`` ). In the latter case, the parameter ``_max_k`` must not be larger than the original value.
  
  .. index:: CvKNearest::find_nearest
  
@@ -68,18 +72,18 @@ CvKNearest::find_nearest
  ------------------------
  .. c:function:: float CvKNearest::find_nearest(  const CvMat* _samples,  int k, CvMat* results=0,          const float** neighbors=0,  CvMat* neighbor_responses=0,  CvMat* dist=0 ) const
  
-    Finds the neighbors for the input vectors.
+    Finds the neighbors for input vectors.
  
-For each input vector (which are the rows of the matrix ``_samples`` ) the method finds the
+For each input vector (a row of the matrix ``_samples`` ), the method finds the
  :math:`\texttt{k} \le
-\texttt{get\_max\_k()}` nearest neighbor.  In the case of regression,
-the predicted result will be a mean value of the particular vector's
-neighbor responses. In the case of classification the class is determined
+\texttt{get\_max\_k()}` nearest neighbor.  In case of regression,
+the predicted result is a mean value of the particular vector's
+neighbor responses. In case of classification, the class is determined
  by voting.
  
-For custom classification/regression prediction, the method can optionally return pointers to the neighbor vectors themselves ( ``neighbors`` , an array of ``k*_samples->rows`` pointers), their corresponding output values ( ``neighbor_responses`` , a vector of ``k*_samples->rows`` elements) and the distances from the input vectors to the neighbors ( ``dist`` , also a vector of ``k*_samples->rows`` elements).
+For a custom classification/regression prediction, the method can optionally return pointers to the neighbor vectors themselves ( ``neighbors`` , an array of ``k*_samples->rows`` pointers), their corresponding output values ( ``neighbor_responses`` , a vector of ``k*_samples->rows`` elements), and the distances from the input vectors to the neighbors ( ``dist`` , also a vector of ``k*_samples->rows`` elements).
  
-For each input vector the neighbors are sorted by their distances to the vector.
+For each input vector, the neighbors are sorted by their distances to the vector.
  
  If only a single input vector is passed, all output matrices are optional and the predicted value is returned by the method. ::
  
@@ -126,7 +130,7 @@ If only a single input vector is passed, all output matrices are optional and th
                  sample.data.fl[0] = (float)j;
                  sample.data.fl[1] = (float)i;
  
-                // estimates the response and get the neighbors' labels
+                // estimate the response and get the neighbors' labels
                  response = knn.find_nearest(&sample,K,0,0,nearests,0);
  
                  // compute the number of neighbors representing the majority
diff --git a/modules/ml/doc/ml.rst b/modules/ml/doc/ml.rst

index 644858d..4f0d6bb 100644 (file)
--- a/modules/ml/doc/ml.rst
+++ b/modules/ml/doc/ml.rst
@@ -2,9 +2,9 @@
  ml. Machine Learning
  ********************
  
-The Machine Learning Library (MLL) is a set of classes and functions for statistical classification, regression and clustering of data.
+The Machine Learning Library (MLL) is a set of classes and functions for statistical classification, regression, and clustering of data.
  
-Most of the classification and regression algorithms are implemented as C++ classes. As the algorithms have different seta of features (like the ability to handle missing measurements, or categorical input variables etc.), there is a little common ground between the classes. This common ground is defined by the class `CvStatModel` that all the other ML classes are derived from.
+Most of the classification and regression algorithms are implemented as C++ classes. As the algorithms have different sets of features (like an ability to handle missing measurements or categorical input variables), there is a little common ground between the classes. This common ground is defined by the class `CvStatModel` that all the other ML classes are derived from.
  
  .. toctree::
      :maxdepth: 2
diff --git a/modules/ml/doc/neural_networks.rst b/modules/ml/doc/neural_networks.rst

index 54e3d50..32390be 100644 (file)
--- a/modules/ml/doc/neural_networks.rst
+++ b/modules/ml/doc/neural_networks.rst
@@ -1,12 +1,12 @@
  Neural Networks
  ===============
  
-ML implements feed-forward artificial neural networks, more particularly, multi-layer perceptrons (MLP), the most commonly used type of neural networks. MLP consists of the input layer, output layer and one or more hidden layers. Each layer of MLP includes one or more neurons that are directionally linked with the neurons from the previous and the next layer. Here is an example of a 3-layer perceptron with 3 inputs, 2 outputs and the hidden layer including 5 neurons:
+ML implements feed-forward artificial neural networks, more particularly, multi-layer perceptrons (MLP), the most commonly used type of neural networks. MLP consists of the input layer, output layer, and one or more hidden layers. Each layer of MLP includes one or more neurons that are directionally linked with the neurons from the previous and the next layer. The example below represents a 3-layer perceptron with three inputs, two outputs, and the hidden layer including five neurons:
  
  .. image:: pics/mlp.png
  
-All the neurons in MLP are similar. Each of them has several input links (i.e. it takes the output values from several neurons in the previous layer on input) and several output links (i.e. it passes the response to several neurons in the next layer). The values retrieved from the previous layer are summed with certain weights, individual for each neuron, plus the bias term, and the sum is transformed using the activation function
-:math:`f` that may be also different for different neurons. Here is the picture:
+All the neurons in MLP are similar. Each of them has several input links (it takes the output values from several neurons in the previous layer as input) and several output links (it passes the response to several neurons in the next layer). The values retrieved from the previous layer are summed up with certain weights, individual for each neuron, plus the bias term. The sum is transformed using the activation function
+:math:`f` that may be also different for different neurons. 
  
  .. image:: pics/neuron_model.png
  
@@ -24,59 +24,64 @@ In other words, given the outputs
  
      y_i = f(u_i)
  
-Different activation functions may be used, ML implements 3 standard ones:
+Different activation functions may be used. ML implements three standard functions:
  
  *
      Identity function ( ``CvANN_MLP::IDENTITY``     ):
      :math:`f(x)=x`
  *
      Symmetrical sigmoid ( ``CvANN_MLP::SIGMOID_SYM``     ):
-    :math:`f(x)=\beta*(1-e^{-\alpha x})/(1+e^{-\alpha x}`     ), the default choice for MLP; the standard sigmoid with
+    :math:`f(x)=\beta*(1-e^{-\alpha x})/(1+e^{-\alpha x}`     ), which is the default choice for MLP. The standard sigmoid with
      :math:`\beta =1, \alpha =1`     is shown below:
  
      .. image:: pics/sigmoid_bipolar.png
  
  *
      Gaussian function ( ``CvANN_MLP::GAUSSIAN``     ):
-    :math:`f(x)=\beta e^{-\alpha x*x}`     , not completely supported by the moment.
+    :math:`f(x)=\beta e^{-\alpha x*x}`     , which is not completely supported at the moment.
  
-In ML all the neurons have the same activation functions, with the same free parameters (
+In ML, all the neurons have the same activation functions, with the same free parameters (
  :math:`\alpha, \beta` ) that are specified by user and are not altered by the training algorithms.
  
-So the whole trained network works as follows: It takes the feature vector on input, the vector size is equal to the size of the input layer, when the values are passed as input to the first hidden layer, the outputs of the hidden layer are computed using the weights and the activation functions and passed further downstream, until we compute the output layer.
+So, the whole trained network works as follows: 
  
-So, in order to compute the network one needs to know all the
+#. It takes the feature vector as input. The vector size is equal to the size of the input layer.
+#. Values are passed as input to the first hidden layer.
+#. Outputs of the hidden layer are computed using the weights and the activation functions.
+#. Outputs are passed further downstream until you compute the output layer.
+
+So, to compute the network, you need to know all the
  weights
  :math:`w^{n+1)}_{i,j}` . The weights are computed by the training
-algorithm. The algorithm takes a training set: multiple input vectors
+algorithm. The algorithm takes a training set, multiple input vectors
  with the corresponding output vectors, and iteratively adjusts the
-weights to try to make the network give the desired response on the
+weights to enable the network to give the desired response to the
  provided input vectors.
  
-The larger the network size (the number of hidden layers and their sizes),
-the more is the potential network flexibility, and the error on the
+The larger the network size (the number of hidden layers and their sizes) is,
+the more the potential network flexibility is. The error on the
  training set could be made arbitrarily small. But at the same time the
-learned network will also "learn" the noise present in the training set,
+learned network also "learns" the noise present in the training set,
  so the error on the test set usually starts increasing after the network
-size reaches some limit. Besides, the larger networks are train much
-longer than the smaller ones, so it is reasonable to preprocess the data
-(using
-:ref:`PCA::operator ()` or similar technique) and train a smaller network
-on only the essential features.
-
-Another feature of the MLP's is their inability to handle categorical
-data as is, however there is a workaround. If a certain feature in the
-input or output (i.e. in the case of ``n`` -class classifier for
+size reaches a limit. Besides, the larger networks are trained much
+longer than the smaller ones, so it is reasonable to pre-process the data,
+using
+:ref:`PCA::operator ()` or similar technique, and train a smaller network
+on only essential features.
+
+Another feature of MLP's is their inability to handle categorical
+data as is. However, there is a workaround. If a certain feature in the
+input or output (in case of ``n`` -class classifier for
  :math:`n>2` ) layer is categorical and can take
-:math:`M>2` different values, it makes sense to represent it as binary tuple of ``M`` elements, where ``i`` -th element is 1 if and only if the
+:math:`M>2` different values, it makes sense to represent it as a binary tuple of ``M`` elements, where the ``i`` -th element is 1 if and only if the
  feature is equal to the ``i`` -th value out of ``M`` possible. It
-will increase the size of the input/output layer, but will speedup the
-training algorithm convergence and at the same time enable "fuzzy" values
-of such variables, i.e. a tuple of probabilities instead of a fixed value.
+increases the size of the input/output layer but speeds up the
+training algorithm convergence and at the same time enables "fuzzy" values
+of such variables, that is, a tuple of probabilities instead of a fixed value.
  
-ML implements 2 algorithms for training MLP's. The first is the classical
-random sequential back-propagation algorithm
-and the second (default one) is batch RPROP algorithm.
+ML implements two algorithms for training MLP's. The first algorithm is a classical
+random sequential back-propagation algorithm.
+The second (default) one is a batch RPROP algorithm.
  
  References:
  
@@ -85,10 +90,10 @@ References:
      . Wikipedia article about the back-propagation algorithm.
  
  *
-    Y. LeCun, L. Bottou, G.B. Orr and K.-R. Muller, "Efficient backprop", in Neural Networks---Tricks of the Trade, Springer Lecture Notes in Computer Sciences 1524, pp.5-50, 1998.
+    Y. LeCun, L. Bottou, G.B. Orr and K.-R. Muller, *Efficient backprop*, in Neural Networks---Tricks of the Trade, Springer Lecture Notes in Computer Sciences 1524, pp.5-50, 1998.
  
  *
-    M. Riedmiller and H. Braun, "A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm", Proc. ICNN, San Francisco (1993).
+    M. Riedmiller and H. Braun, *A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm*, Proc. ICNN, San Francisco (1993).
  
  .. index:: CvANN_MLP_TrainParams
  
@@ -98,7 +103,7 @@ CvANN_MLP_TrainParams
  ---------------------
  .. c:type:: CvANN_MLP_TrainParams
  
-Parameters of the MLP training algorithm. ::
+Parameters of the MLP training algorithm ::
  
      struct CvANN_MLP_TrainParams
      {
@@ -112,7 +117,7 @@ Parameters of the MLP training algorithm. ::
          CvTermCriteria term_crit;
          int train_method;
  
-        // backpropagation parameters
+        // back-propagation parameters
          double bp_dw_scale, bp_moment_scale;
  
          // rprop parameters
@@ -121,7 +126,7 @@ Parameters of the MLP training algorithm. ::
  
  
  
-The structure has default constructor that initializes parameters for ``RPROP`` algorithm. There is also more advanced constructor to customize the parameters and/or choose backpropagation algorithm. Finally, the individual parameters can be adjusted after the structure is created.
+The structure has a default constructor that initializes parameters for the ``RPROP`` algorithm. There is also a more advanced constructor to customize the parameters and/or choose the back-propagation algorithm. Finally, the individual parameters can be adjusted after the structure is created.
  
  .. index:: CvANN_MLP
  
@@ -131,7 +136,7 @@ CvANN_MLP
  ---------
  .. c:type:: CvANN_MLP
  
-MLP model. ::
+MLP model ::
  
      class CvANN_MLP : public CvStatModel
      {
@@ -212,7 +217,7 @@ MLP model. ::
      
  
  
-Unlike many other models in ML that are constructed and trained at once, in the MLP model these steps are separated. First, a network with the specified topology is created using the non-default constructor or the method ``create`` . All the weights are set to zeros. Then the network is trained using the set of input and output vectors. The training procedure can be repeated more than once, i.e. the weights can be adjusted based on the new training data.
+Unlike many other models in ML that are constructed and trained at once, in the MLP model these steps are separated. First, a network with the specified topology is created using the non-default constructor or the method ``create`` . All the weights are set to zeros. Then, the network is trained using a set of input and output vectors. The training procedure can be repeated more than once, that is, the weights can be adjusted based on the new training data.
  
  .. index:: CvANN_MLP::create
  
@@ -222,15 +227,15 @@ CvANN_MLP::create
  -----------------
  .. c:function:: void CvANN_MLP::create(  const CvMat* _layer_sizes,                          int _activ_func=SIGMOID_SYM,                          double _f_param1=0,  double _f_param2=0 )
  
-    Constructs the MLP with the specified topology
+    Constructs MLP with the specified topology.
  
-    :param _layer_sizes: The integer vector specifies the number of neurons in each layer including the input and output layers.
+    :param _layer_sizes: Integer vector specifying the number of neurons in each layer including the input and output layers.
  
-    :param _activ_func: Specifies the activation function for each neuron; one of  ``CvANN_MLP::IDENTITY`` ,  ``CvANN_MLP::SIGMOID_SYM``  and  ``CvANN_MLP::GAUSSIAN`` .
+    :param _activ_func: Parameter specifying the activation function for each neuron: one of  ``CvANN_MLP::IDENTITY`` ,  ``CvANN_MLP::SIGMOID_SYM`` , and  ``CvANN_MLP::GAUSSIAN`` .
  
      :param _f_param1,_f_param2: Free parameters of the activation function,  :math:`\alpha`  and  :math:`\beta` , respectively. See the formulas in the introduction section.
  
-The method creates a MLP network with the specified topology and assigns the same activation function to all the neurons.
+The method creates an MLP network with the specified topology and assigns the same activation function to all the neurons.
  
  .. index:: CvANN_MLP::train
  
@@ -242,23 +247,23 @@ CvANN_MLP::train
  
      Trains/updates MLP.
  
-    :param _inputs: A floating-point matrix of input vectors, one vector per row.
+    :param _inputs: Floating-point matrix of input vectors, one vector per row.
  
-    :param _outputs: A floating-point matrix of the corresponding output vectors, one vector per row.
+    :param _outputs: Floating-point matrix of the corresponding output vectors, one vector per row.
  
-    :param _sample_weights: (RPROP only) The optional floating-point vector of weights for each sample. Some samples may be more important than others for training, and the user may want to raise the weight of certain classes to find the right balance between hit-rate and false-alarm rate etc.
+    :param _sample_weights: (RPROP only) Optional floating-point vector of weights for each sample. Some samples may be more important than others for training. You may want to raise the weight of certain classes to find the right balance between hit-rate and false-alarm rate, and so on.
  
-    :param _sample_idx: The optional integer vector indicating the samples (i.e. rows of  ``_inputs``  and  ``_outputs`` ) that are taken into account.
+    :param _sample_idx: Optional integer vector indicating the samples (rows of  ``_inputs``  and  ``_outputs`` ) that are taken into account.
  
-    :param _params: The training params. See  ``CvANN_MLP_TrainParams``  description.
+    :param _params: Training parameters. See the ``CvANN_MLP_TrainParams``  description.
  
-    :param _flags: The various parameters to control the training algorithm. May be a combination of the following:
+    :param _flags: Various parameters to control the training algorithm. A combination of the following parameters is possible:
  
-            * **UPDATE_WEIGHTS = 1** algorithm updates the network weights, rather than computes them from scratch (in the latter case the weights are initialized using  *Nguyen-Widrow*  algorithm).
+            * **UPDATE_WEIGHTS = 1** Algorithm updates the network weights, rather than computes them from scratch (in the latter case the weights are initialized using the  Nguyen-Widrow  algorithm).
  
-            * **NO_INPUT_SCALE** algorithm does not normalize the input vectors. If this flag is not set, the training algorithm normalizes each input feature independently, shifting its mean value to 0 and making the standard deviation =1. If the network is assumed to be updated frequently, the new training data could be much different from original one. In this case user should take care of proper normalization.
+            * **NO_INPUT_SCALE** Algorithm does not normalize the input vectors. If this flag is not set, the training algorithm normalizes each input feature independently, shifting its mean value to 0 and making the standard deviation =1. If the network is assumed to be updated frequently, the new training data could be much different from original one. In this case, you should take care of proper normalization.
  
-            * **NO_OUTPUT_SCALE** algorithm does not normalize the output vectors. If the flag is not set, the training algorithm normalizes each output features independently, by transforming it to the certain range depending on the activation function used.
+            * **NO_OUTPUT_SCALE** Algorithm does not normalize the output vectors. If the flag is not set, the training algorithm normalizes each output feature independently, by transforming it to the certain range depending on the used activation function.
  
-This method applies the specified training algorithm to compute/adjust the network weights. It returns the number of done iterations.
+This method applies the specified training algorithm to computing/adjusting the network weights. It returns the number of done iterations.
  
diff --git a/modules/ml/doc/normal_bayes_classifier.rst b/modules/ml/doc/normal_bayes_classifier.rst

index a134376..4a047b3 100644 (file)
--- a/modules/ml/doc/normal_bayes_classifier.rst
+++ b/modules/ml/doc/normal_bayes_classifier.rst
@@ -3,9 +3,9 @@
  Normal Bayes Classifier
  =======================
  
-This is a simple classification model assuming that feature vectors from each class are normally distributed (though, not necessarily independently distributed), so the whole data distribution function is assumed to be a Gaussian mixture, one component per  class. Using the training data the algorithm estimates mean vectors and covariance matrices for every class, and then it uses them for prediction.
+This is a simple classification model assuming that feature vectors from each class are normally distributed (though, not necessarily independently distributed). So, the whole data distribution function is assumed to be a Gaussian mixture, one component per  class. Using the training data the algorithm estimates mean vectors and covariance matrices for every class, and then it uses them for prediction.
  
-**[Fukunaga90] K. Fukunaga. Introduction to Statistical Pattern Recognition. second ed., New York: Academic Press, 1990.**
+[Fukunaga90] K. Fukunaga. *Introduction to Statistical Pattern Recognition*. second ed., New York: Academic Press, 1990.
  
  .. index:: CvNormalBayesClassifier
  
@@ -13,7 +13,7 @@ CvNormalBayesClassifier
  -----------------------
  .. c:type:: CvNormalBayesClassifier
  
-Bayes classifier for normally distributed data. ::
+Bayes classifier for normally distributed data ::
  
      class CvNormalBayesClassifier : public CvStatModel
      {
@@ -50,7 +50,12 @@ CvNormalBayesClassifier::train
  
      Trains the model.
  
-The method trains the Normal Bayes classifier. It follows the conventions of the generic ``train`` "method" with the following limitations: only CV_ROW_SAMPLE data layout is supported; the input variables are all ordered; the output variable is categorical (i.e. elements of ``_responses`` must be integer numbers, though the vector may have ``CV_32FC1`` type), and missing measurements are not supported.
+The method trains the Normal Bayes classifier. It follows the conventions of the generic ``train`` "method" with the following limitations: 
+
+* Only ``CV_ROW_SAMPLE`` data layout is supported.
+* Input variables are all ordered.
+* Output variable is categorical , which means that elements of ``_responses`` must be integer numbers, though the vector may have the ``CV_32FC1`` type.
+* Missing measurements are not supported.
  
  In addition, there is an ``update`` flag that identifies whether the model should be trained from scratch ( ``update=false`` ) or should be updated using the new training data ( ``update=true`` ).
  
@@ -62,7 +67,7 @@ CvNormalBayesClassifier::predict
  --------------------------------
  .. c:function:: float CvNormalBayesClassifier::predict(  const CvMat* samples,  CvMat* results=0 ) const
  
-    Predicts the response for sample(s)
+    Predicts the response for sample(s).
  
-The method ``predict`` estimates the most probable classes for the input vectors. The input vectors (one or more) are stored as rows of the matrix ``samples`` . In the case of multiple input vectors, there should be one output vector ``results`` . The predicted class for a single input vector is returned by the method.
+The method ``predict`` estimates the most probable classes for input vectors. Input vectors (one or more) are stored as rows of the matrix ``samples`` . In case of multiple input vectors, there should be one output vector ``results`` . The predicted class for a single input vector is returned by the method.
  
diff --git a/modules/ml/doc/random_trees.rst b/modules/ml/doc/random_trees.rst

index d7cbf78..7048841 100644 (file)
--- a/modules/ml/doc/random_trees.rst
+++ b/modules/ml/doc/random_trees.rst
@@ -6,24 +6,24 @@ Random Trees
  Random trees have been introduced by Leo Breiman and Adele Cutler:
  http://www.stat.berkeley.edu/users/breiman/RandomForests/
  . The algorithm can deal with both classification and regression problems. Random trees is a collection (ensemble) of tree predictors that is called
-**forest**
-further in this section (the term has been also introduced by L. Breiman). The classification works as follows: the random trees classifier takes the input feature vector, classifies it with every tree in the forest, and outputs the class label that recieved the majority of "votes". In the case of regression the classifier response is the average of the responses over all the trees in the forest.
+*forest*
+further in this section (the term has been also introduced by L. Breiman). The classification works as follows: the random trees classifier takes the input feature vector, classifies it with every tree in the forest, and outputs the class label that recieved the majority of "votes". In case of regression, the classifier response is the average of the responses over all the trees in the forest.
  
-All the trees are trained with the same parameters, but on the different training sets, which are generated from the original training set using the bootstrap procedure: for each training set we randomly select the same number of vectors as in the original set ( ``=N`` ). The vectors are chosen with replacement. That is, some vectors will occur more than once and some will be absent. At each node of each tree trained not all the variables are used to find the best split, rather than a random subset of them. With each node a new subset is generated, however its size is fixed for all the nodes and all the trees. It is a training parameter, set to
-:math:`\sqrt{number\_of\_variables}` by default. None of the trees that are built are pruned.
+All the trees are trained with the same parameters but on different training sets that are generated from the original training set using the bootstrap procedure: for each training set, you randomly select the same number of vectors as in the original set ( ``=N`` ). The vectors are chosen with replacement. That is, some vectors will occur more than once and some will be absent. At each node of each trained tree,  not all the variables are used to find the best split, rather than a random subset of them. With each node a new subset is generated. However, its size is fixed for all the nodes and all the trees. It is a training parameter set to
+:math:`\sqrt{number\_of\_variables}` by default. None of the built trees are pruned.
  
  In random trees there is no need for any accuracy estimation procedures, such as cross-validation or bootstrap, or a separate test set to get an estimate of the training error. The error is estimated internally during the training. When the training set for the current tree is drawn by sampling with replacement, some vectors are left out (so-called
  *oob (out-of-bag) data*
-). The size of oob data is about ``N/3`` . The classification error is estimated by using this oob-data as following:
+). The size of oob data is about ``N/3`` . The classification error is estimated by using this oob-data as follows:
  
-*
-    Get a prediction for each vector, which is oob relatively to the i-th tree, using the very i-th tree.
+#.
+    Get a prediction for each vector, which is oob relative to the i-th tree, using the very i-th tree.
  
-*
-    After all the trees have been trained, for each vector that has ever been oob, find the class-"winner" for it (i.e. the class that has got the majority of votes in the trees, where the vector was oob) and compare it to the ground-truth response.
+#.
+    After all the trees have been trained, for each vector that has ever been oob, find the class-"winner" for it (the class that has got the majority of votes in the trees where the vector was oob) and compare it to the ground-truth response.
  
-*
-    Then the classification error estimate is computed as ratio of number of misclassified oob vectors to all the vectors in the original data. In the case of regression the oob-error is computed as the squared error for oob vectors difference divided by the total number of vectors.
+#.
+    Compute the classification error estimate as ratio of the number of misclassified oob vectors to all the vectors in the original data. In case of regression, the oob-error is computed as the squared error for oob vectors difference divided by the total number of vectors.
  
  **References:**
  
@@ -55,7 +55,7 @@ CvRTParams
  ----------
  .. c:type:: CvRTParams
  
-Training Parameters of Random Trees. ::
+Training parameters of random trees ::
  
      struct CvRTParams : public CvDTreeParams
      {
@@ -78,7 +78,7 @@ Training Parameters of Random Trees. ::
      };
  
  
-The set of training parameters for the forest is the superset of the training parameters for a single tree. However, Random trees do not need all the functionality/features of decision trees, most noticeably, the trees are not pruned, so the cross-validation parameters are not used.
+The set of training parameters for the forest is a superset of the training parameters for a single tree. However, random trees do not need all the functionality/features of decision trees. Most noticeably, the trees are not pruned, so the cross-validation parameters are not used.
  
  .. index:: CvRTrees
  
@@ -88,7 +88,7 @@ CvRTrees
  --------
  .. c:type:: CvRTrees
  
-Random Trees. ::
+Random trees ::
  
      class CvRTrees : public CvStatModel
      {
@@ -138,9 +138,9 @@ CvRTrees::train
  ---------------
  .. c:function:: bool CvRTrees::train(  const CvMat* train_data,  int tflag,                      const CvMat* responses,  const CvMat* comp_idx=0,                      const CvMat* sample_idx=0,  const CvMat* var_type=0,                      const CvMat* missing_mask=0,                      CvRTParams params=CvRTParams() )
  
-    Trains the Random Trees model.
+    Trains the Random Tree model.
  
-The method ``CvRTrees::train`` is very similar to the first form of ``CvDTree::train`` () and follows the generic method ``CvStatModel::train`` conventions. All of the specific to the algorithm training parameters are passed as a
+The method ``CvRTrees::train`` is very similar to the first form of ``CvDTree::train`` () and follows the generic method ``CvStatModel::train`` conventions. All the parameters specific to the algorithm training are passed as a
  :ref:`CvRTParams` instance. The estimate of the training error ( ``oob-error`` ) is stored in the protected class member ``oob_error`` .
  
  .. index:: CvRTrees::predict
@@ -151,9 +151,9 @@ CvRTrees::predict
  -----------------
  .. c:function:: double CvRTrees::predict(  const CvMat* sample,  const CvMat* missing=0 ) const
  
-    Predicts the output for the input sample.
+    Predicts the output for an input sample.
  
-The input parameters of the prediction method are the same as in ``CvDTree::predict`` , but the return value type is different. This method returns the cumulative result from all the trees in the forest (the class that receives the majority of voices, or the mean of the regression function estimates).
+The input parameters of the prediction method are the same as in ``CvDTree::predict``  but the return value type is different. This method returns the cumulative result from all the trees in the forest (the class that receives the majority of voices, or the mean of the regression function estimates).
  
  .. index:: CvRTrees::get_var_importance
  
@@ -165,7 +165,7 @@ CvRTrees::get_var_importance
  
      Retrieves the variable importance array.
  
-The method returns the variable importance vector, computed at the training stage when ``:ref:`CvRTParams`::calc_var_importance`` is set. If the training flag is not set, then the ``NULL`` pointer is returned. This is unlike decision trees, where variable importance can be computed anytime after the training.
+The method returns the variable importance vector, computed at the training stage when ``:ref:`CvRTParams`::calc_var_importance`` is set. If the training flag is not set, the ``NULL`` pointer is returned. This differs from the decision trees where variable importance can be computed anytime after the training.
  
  .. index:: CvRTrees::get_proximity
  
@@ -177,9 +177,9 @@ CvRTrees::get_proximity
  
      Retrieves the proximity measure between two training samples.
  
-The method returns proximity measure between any two samples (the ratio of the those trees in the ensemble, in which the samples fall into the same leaf node, to the total number of the trees).
+The method returns proximity measure between any two samples, which is the ratio of those trees in the ensemble, in which the samples fall into the same leaf node, to the total number of the trees.
  
-Example: Prediction of mushroom goodness using random trees classifier ::
+Example: Prediction of mushroom goodness using the random-tree classifier ::
  
      #include <float.h>
      #include <stdio.h>
diff --git a/modules/ml/doc/statistical_models.rst b/modules/ml/doc/statistical_models.rst

index b99c620..ed3bd2f 100644 (file)
--- a/modules/ml/doc/statistical_models.rst
+++ b/modules/ml/doc/statistical_models.rst
@@ -9,7 +9,7 @@ CvStatModel
  -----------
  .. c:type:: CvStatModel
  
-Base class for the statistical models in ML. ::
+Base class for statistical models in ML ::
  
      class CvStatModel
      {
@@ -38,7 +38,7 @@ Base class for the statistical models in ML. ::
      };
  
  
-In this declaration some methods are commented off. Actually, these are methods for which there is no unified API (with the exception of the default constructor), however, there are many similarities in the syntax and semantics that are briefly described below in this section, as if they are a part of the base class.
+In this declaration, some methods are commented off. These are methods for which there is no unified API (with the exception of the default constructor). However, there are many similarities in the syntax and semantics that are briefly described below in this section, as if they are part of the base class.
  
  .. index:: CvStatModel::CvStatModel
  
@@ -48,9 +48,9 @@ CvStatModel::CvStatModel
  ------------------------
  .. c:function:: CvStatModel::CvStatModel()
  
-    Default constructor.
+    Serves as a default constructor.
  
-Each statistical model class in ML has a default constructor without parameters. This constructor is useful for 2-stage model construction, when the default constructor is followed by ``train()`` or ``load()`` .
+Each statistical model class in ML has a default constructor without parameters. This constructor is useful for a 2-stage model construction, when the default constructor is followed by ``train()`` or ``load()`` .
  
  .. index:: CvStatModel::CvStatModel(...)
  
@@ -60,9 +60,9 @@ CvStatModel::CvStatModel(...)
  -----------------------------
  .. c:function:: CvStatModel::CvStatModel( const CvMat* train_data ... )
  
-    Training constructor.
+    Serves as a training constructor.
  
-Most ML classes provide single-step construct and train constructors. This constructor is equivalent to the default constructor, followed by the ``train()`` method with the parameters that are passed to the constructor.
+Most ML classes provide a single-step constructor and train constructors. This constructor is equivalent to the default constructor, followed by the ``train()`` method with the parameters that are passed to the constructor.
  
  .. index:: CvStatModel::~CvStatModel
  
@@ -72,9 +72,9 @@ CvStatModel::~CvStatModel
  -------------------------
  .. c:function:: CvStatModel::~CvStatModel()
  
-    Virtual destructor.
+    Serves as a virtual destructor.
  
-The destructor of the base class is declared as virtual, so it is safe to write the following code: ::
+The destructor of the base class is declared as virtual. So, it is safe to write the following code: ::
  
      CvStatModel* model;
      if( use_svm )
@@ -85,7 +85,7 @@ The destructor of the base class is declared as virtual, so it is safe to write
      delete model;
  
  
-Normally, the destructor of each derived class does nothing, but in this instance it calls the overridden method ``clear()`` that deallocates all the memory.
+Normally, the destructor of each derived class does nothing. But in this instance, it calls the overridden method ``clear()`` that deallocates all the memory.
  
  .. index:: CvStatModel::clear
  
@@ -97,7 +97,7 @@ CvStatModel::clear
  
      Deallocates memory and resets the model state.
  
-The method ``clear`` does the same job as the destructor; it deallocates all the memory occupied by the class members. But the object itself is not destructed, and can be reused further. This method is called from the destructor, from the ``train`` methods of the derived classes, from the methods ``load()``,``read()`` or even explicitly by the user.
+The method ``clear`` does the same job as the destructor: it deallocates all the memory occupied by the class members. But the object itself is not destructed and can be reused further. This method is called from the destructor, from the ``train`` methods of the derived classes, from the methods ``load()``,``read()`` , or even explicitly by the user.
  
  .. index:: CvStatModel::save
  
@@ -109,7 +109,7 @@ CvStatModel::save
  
      Saves the model to a file.
  
-The method ``save`` stores the complete model state to the specified XML or YAML file with the specified name or default name (that depends on the particular class). ``Data persistence`` functionality from CxCore is used.
+The method ``save`` saves the complete model state to the specified XML or YAML file with the specified name or default name (which depends on a particular class). *Data persistence* functionality from ``CxCore`` is used.
  
  .. index:: CvStatModel::load
  
@@ -123,9 +123,10 @@ CvStatModel::load
  
  The method ``load`` loads the complete model state with the specified name (or default model-dependent name) from the specified XML or YAML file. The previous model state is cleared by ``clear()`` .
  
-Note that the method is virtual, so any model can be loaded using this virtual method. However, unlike the C types of OpenCV that can be loaded using the generic
-\
-cross{cvLoad}, here the model type must be known, because an empty model must be constructed beforehand. This limitation will be removed in the later ML versions.
+**Note**:
+
+The method is virtual, so any model can be loaded using this virtual method. However, unlike the C types of OpenCV that can be loaded using the generic
+``cross{cvLoad}`` , the model type is required here to enable constructing an empty model beforehand.?? This limitation will be removed in the later ML versions.
  
  .. index:: CvStatModel::write
  
@@ -135,9 +136,9 @@ CvStatModel::write
  ------------------
  .. c:function:: void CvStatModel::write( CvFileStorage* storage, const char* name )
  
-    Writes the model to file storage.
+    Writes the model to the file storage.
  
-The method ``write`` stores the complete model state to the file storage with the specified name or default name (that depends on the particular class). The method is called by ``save()`` .
+The method ``write`` stores the complete model state in the file storage with the specified name or default name (which depends on the particular class). The method is called by ``save()`` .
  
  .. index:: CvStatModel::read
  
@@ -147,10 +148,10 @@ CvStatModel::read
  -----------------
  .. c:function:: void CvStatMode::read( CvFileStorage* storage, CvFileNode* node )
  
-    Reads the model from file storage.
+    Reads the model from the file storage.
  
-The method ``read`` restores the complete model state from the specified node of the file storage. The node must be located by the user using the function
-:ref:`GetFileNodeByName` .
+The method ``read`` restores the complete model state from the specified node of the file storage. Use the function
+:ref:`GetFileNodeByName` to locate the node.
  
  The previous model state is cleared by ``clear()`` .
  
@@ -164,25 +165,25 @@ CvStatModel::train
  
      Trains the model.
  
-The method trains the statistical model using a set of input feature vectors and the corresponding output values (responses). Both input and output vectors/values are passed as matrices. By default the input feature vectors are stored as ``train_data`` rows, i.e. all the components (features) of a training vector are stored continuously. However, some algorithms can handle the transposed representation, when all values of each particular feature (component/input variable) over the whole input set are stored continuously. If both layouts are supported, the method includes ``tflag`` parameter that specifies the orientation:
+The method trains the statistical model using a set of input feature vectors and the corresponding output values (responses). Both input and output vectors/values are passed as matrices. By default, the input feature vectors are stored as ``train_data`` rows, that is, all the components (features) of a training vector are stored continuously. However, some algorithms can handle the transposed representation when all values of each particular feature (component/input variable) over the whole input set are stored continuously. If both layouts are supported, the method includes the ``tflag`` parameter that specifies the orientation as follows:
  
-* ``tflag=CV_ROW_SAMPLE``     means that the feature vectors are stored as rows,
+* ``tflag=CV_ROW_SAMPLE``     The feature vectors are stored as rows.
  
-* ``tflag=CV_COL_SAMPLE``     means that the feature vectors are stored as columns.
+* ``tflag=CV_COL_SAMPLE``     The feature vectors are stored as columns.
  
-The ``train_data`` must have a ``CV_32FC1`` (32-bit floating-point, single-channel) format. Responses are usually stored in the 1d vector (a row or a column) of ``CV_32SC1`` (only in the classification problem) or ``CV_32FC1`` format, one value per input vector (although some algorithms, like various flavors of neural nets, take vector responses).
+The ``train_data`` must have the ``CV_32FC1`` (32-bit floating-point, single-channel) format. Responses are usually stored in the 1D vector (a row or a column) of ``CV_32SC1`` (only in the classification problem) or ``CV_32FC1`` format, one value per input vector. Although, some algorithms, like various flavors of neural nets, take vector responses.
  
-For classification problems the responses are discrete class labels; for regression problems the responses are values of the function to be approximated. Some algorithms can deal only with classification problems, some - only with regression problems, and some can deal with both problems. In the latter case the type of output variable is either passed as separate parameter, or as a last element of ``var_type`` vector:
+For classification problems, the responses are discrete class labels. For regression problems, the responses are values of the function to be approximated. Some algorithms can deal only with classification problems, some - only with regression problems, and some can deal with both problems. In the latter case, the type of output variable is either passed as a separate parameter or as the last element of the ``var_type`` vector:
  
-* ``CV_VAR_CATEGORICAL``     means that the output values are discrete class labels,
+* ``CV_VAR_CATEGORICAL``     The output values are discrete class labels.
  
-* ``CV_VAR_ORDERED(=CV_VAR_NUMERICAL)``     means that the output values are ordered, i.e. 2 different values can be compared as numbers, and this is a regression problem
+* ``CV_VAR_ORDERED(=CV_VAR_NUMERICAL)``     The output values are ordered. This means that two different values can be compared as numbers, and this is a regression problem.
  
-The types of input variables can be also specified using ``var_type`` . Most algorithms can handle only ordered input variables.
+Types of input variables can be also specified using ``var_type`` . Most algorithms can handle only ordered input variables.
  
-Many models in the ML may be trained on a selected feature subset, and/or on a selected sample subset of the training set. To make it easier for the user, the method ``train`` usually includes ``var_idx`` and ``sample_idx`` parameters. The former identifies variables (features) of interest, and the latter identifies samples of interest. Both vectors are either integer ( ``CV_32SC1`` ) vectors, i.e. lists of 0-based indices, or 8-bit ( ``CV_8UC1`` ) masks of active variables/samples. The user may pass ``NULL`` pointers instead of either of the arguments, meaning that all of the variables/samples are used for training.
+Many models in the ML may be trained on a selected feature subset, and/or on a selected sample subset of the training set. To make it easier for you, the method ``train`` usually includes the ``var_idx`` and ``sample_idx`` parameters. The former parameter identifies variables (features) of interest, and the latter one identifies samples of interest. Both vectors are either integer ( ``CV_32SC1`` ) vectors (lists of 0-based indices) or 8-bit ( ``CV_8UC1`` ) masks of active variables/samples. You may pass ``NULL`` pointers instead of either of the arguments, meaning that all of the variables/samples are used for training.
  
-Additionally some algorithms can handle missing measurements, that is when certain features of certain training samples have unknown values (for example, they forgot to measure a temperature of patient A on Monday). The parameter ``missing_mask`` , an 8-bit matrix the same size as ``train_data`` , is used to mark the missed values (non-zero elements of the mask).
+Additionally, some algorithms can handle missing measurements, that is, when certain features of certain training samples have unknown values (for example, they forgot to measure a temperature of patient A on Monday). The parameter ``missing_mask`` , an 8-bit matrix of the same size as ``train_data`` , is used to mark the missed values (non-zero elements of the mask).
  
  Usually, the previous model state is cleared by ``clear()`` before running the training procedure. However, some algorithms may optionally update the model state with the new training data, instead of resetting it.
  
@@ -194,9 +195,9 @@ CvStatModel::predict
  --------------------
  .. c:function:: float CvStatMode::predict( const CvMat* sample[, <prediction_params>] ) const
  
-    Predicts the response for the sample.
+    Predicts the response for a sample.
  
-The method is used to predict the response for a new sample. In the case of classification the method returns the class label, in the case of regression - the output function value. The input sample must have as many components as the ``train_data`` passed to ``train`` contains. If the ``var_idx`` parameter is passed to ``train`` , it is remembered and then is used to extract only the necessary components from the input sample in the method ``predict`` .
+The method is used to predict the response for a new sample. In case of a classification, the method returns the class label. In case of a regression, the method returns the output function value. The input sample must have as many components as the ``train_data`` passed to ``train`` contains. If the ``var_idx`` parameter is passed to ``train`` , it is remembered and then is used to extract only the necessary components from the input sample in the method ``predict`` .
  
-The suffix "const" means that prediction does not affect the internal model state, so the method can be safely called from within different threads.
+The suffix ``const`` means that prediction does not affect the internal model state, so the method can be safely called from within different threads.
  
diff --git a/modules/ml/doc/support_vector_machines.rst b/modules/ml/doc/support_vector_machines.rst

index c78c3b2..c48940f 100644 (file)
--- a/modules/ml/doc/support_vector_machines.rst
+++ b/modules/ml/doc/support_vector_machines.rst
@@ -3,20 +3,20 @@ Support Vector Machines
  
  .. highlight:: cpp
  
-Originally, support vector machines (SVM) was a technique for building an optimal (in some sense) binary (2-class) classifier. Then the technique has been extended to regression and clustering problems. SVM is a partial case of kernel-based methods, it maps feature vectors into higher-dimensional space using some kernel function, and then it builds an optimal linear discriminating function in this space (or an optimal hyper-plane that fits into the training data, ...). in the case of SVM the kernel is not defined explicitly. Instead, a distance between any 2 points in the hyper-space needs to be defined.
+Originally, support vector machines (SVM) was a technique for building an optimal binary (2-class) classifier. Later the technique has been extended to regression and clustering problems. SVM is a partial case of kernel-based methods. It maps feature vectors into a higher-dimensional space using a kernel function and builds an optimal linear discriminating function in this space or an optimal hyper-plane that fits into the training data. In case of SVM, the kernel is not defined explicitly. Instead, a distance between any 2 points in the hyper-space needs to be defined.
  
-The solution is optimal in a sense that the margin between the separating hyper-plane and the nearest feature vectors from the both classes (in the case of 2-class classifier) is maximal. The feature vectors that are the closest to the hyper-plane are called "support vectors", meaning that the position of other vectors does not affect the hyper-plane (the decision function).
+The solution is optimal, which means that the margin between the separating hyper-plane and the nearest feature vectors from both classes (in case of 2-class classifier) is maximal. The feature vectors that are the closest to the hyper-plane are called "support vectors", which means that the position of other vectors does not affect the hyper-plane (the decision function).
  
-There are a lot of good references on SVM. Here are only a few ones to start with.
+There are a lot of good references on SVM. You may consider starting with the following:
  
  *
-    **[Burges98] C. Burges. "A tutorial on support vector machines for pattern recognition", Knowledge Discovery and Data Mining 2(2), 1998.**
+    [Burges98] C. Burges. *A tutorial on support vector machines for pattern recognition*, Knowledge Discovery and Data Mining 2(2), 1998.
      (available online at
      http://citeseer.ist.psu.edu/burges98tutorial.html
      ).
  
  *
-    **LIBSVM - A Library for Support Vector Machines. By Chih-Chung Chang and Chih-Jen Lin**
+    Chih-Chung Chang and Chih-Jen Lin. *LIBSVM - A Library for Support Vector Machines* 
      (
      http://www.csie.ntu.edu.tw/~cjlin/libsvm/
      )
@@ -29,7 +29,7 @@ CvSVM
  -----
  .. c:type:: CvSVM
  
-Support Vector Machines. ::
+Support Vector Machines ::
  
      class CvSVM : public CvStatModel
      {
@@ -92,7 +92,7 @@ CvSVMParams
  -----------
  .. c:type:: CvSVMParams
  
-SVM training parameters. ::
+SVM training parameters ::
  
      struct CvSVMParams
      {
@@ -129,9 +129,14 @@ CvSVM::train
  
      Trains SVM.
  
-The method trains the SVM model. It follows the conventions of the generic ``train`` "method" with the following limitations: only the CV_ROW_SAMPLE data layout is supported, the input variables are all ordered, the output variables can be either categorical ( ``_params.svm_type=CvSVM::C_SVC`` or ``_params.svm_type=CvSVM::NU_SVC`` ), or ordered ( ``_params.svm_type=CvSVM::EPS_SVR`` or ``_params.svm_type=CvSVM::NU_SVR`` ), or not required at all ( ``_params.svm_type=CvSVM::ONE_CLASS`` ), missing measurements are not supported.
+The method trains the SVM model. It follows the conventions of the generic ``train`` "method" with the following limitations: 
  
-All the other parameters are gathered in
+* Only the ``CV_ROW_SAMPLE`` data layout is supported.
+* Input variables are all ordered.
+* Output variables can be either categorical ( ``_params.svm_type=CvSVM::C_SVC`` or ``_params.svm_type=CvSVM::NU_SVC`` ), or ordered ( ``_params.svm_type=CvSVM::EPS_SVR`` or ``_params.svm_type=CvSVM::NU_SVR`` ), or not required at all ( ``_params.svm_type=CvSVM::ONE_CLASS`` ).
+* Missing measurements are not supported.
+
+All the other parameters are gathered in the
  :ref:`CvSVMParams` structure.
  
  .. index:: CvSVM::train_auto
@@ -144,20 +149,23 @@ CvSVM::train_auto
  
      Trains SVM with optimal parameters.
  
-    :param k_fold: Cross-validation parameter. The training set is divided into  ``k_fold``  subsets, one subset being used to train the model, the others forming the test set. So, the SVM algorithm is executed  ``k_fold``  times.
+    :param k_fold: Cross-validation parameter. The training set is divided into  ``k_fold``  subsets. One subset is used to train the model, the others form the test set. So, the SVM algorithm is executed  ``k_fold``  times.
  
  The method trains the SVM model automatically by choosing the optimal
-parameters ``C``,``gamma``,``p``,``nu``,``coef0``,``degree`` from
-:ref:`CvSVMParams` . By optimal
-one means that the cross-validation estimate of the test set error
+parameters ``C`` , ``gamma`` , ``p`` , ``nu`` , ``coef0`` , ``degree`` from
+:ref:`CvSVMParams`. Parameters are considered optimal
+when the cross-validation estimate of the test set error
  is minimal. The parameters are iterated by a logarithmic grid, for
  example, the parameter ``gamma`` takes the values in the set
  (
-:math:`min`,:math:`min*step`,:math:`min*{step}^2` , ...
+:math:`min`,
+:math:`min*step`,
+:math:`min*{step}^2` , ...
  :math:`min*{step}^n` )
  where
-:math:`min` is ``gamma_grid.min_val``,:math:`step` is ``gamma_grid.step`` , and
-:math:`n` is the maximal index such, that
+:math:`min` is ``gamma_grid.min_val`` ,
+:math:`step` is ``gamma_grid.step`` , and
+:math:`n` is the maximal index such that
  
  .. math::
  
@@ -165,16 +173,15 @@ where
  
  So ``step`` must always be greater than 1.
  
-If there is no need in optimization in some parameter, the according grid step should be set to any value less or equal to 1. For example, to avoid optimization in ``gamma`` one should set ``gamma_grid.step = 0``,``gamma_grid.min_val``,``gamma_grid.max_val`` being arbitrary numbers. In this case, the value ``params.gamma`` will be taken for ``gamma`` .
+If there is no need to optimize a parameter, the corresponding grid step should be set to any value less or equal to 1. For example, to avoid optimization in ``gamma`` , set ``gamma_grid.step = 0`` , ``gamma_grid.min_val`` , ``gamma_grid.max_val`` as arbitrary numbers. In this case, the value ``params.gamma`` is taken for ``gamma`` .
  
-And, finally, if the optimization in some parameter is required, but
-there is no idea of the corresponding grid, one may call the function ``CvSVM::get_default_grid`` . In
-order to generate a grid, say, for ``gamma`` , call ``CvSVM::get_default_grid(CvSVM::GAMMA)`` .
+And, finally, if the optimization in a parameter is required but
+the corresponding grid is unknown, you may call the function ``CvSVM::get_default_grid`` . To generate a grid, for example, for ``gamma`` , call ``CvSVM::get_default_grid(CvSVM::GAMMA)`` .
  
  This function works for the case of classification
  ( ``params.svm_type=CvSVM::C_SVC`` or ``params.svm_type=CvSVM::NU_SVC`` )
  as well as for the regression
-( ``params.svm_type=CvSVM::EPS_SVR`` or ``params.svm_type=CvSVM::NU_SVR`` ). If ``params.svm_type=CvSVM::ONE_CLASS`` , no optimization is made and the usual SVM with specified in ``params`` parameters is executed.
+( ``params.svm_type=CvSVM::EPS_SVR`` or ``params.svm_type=CvSVM::NU_SVR`` ). If ``params.svm_type=CvSVM::ONE_CLASS`` , no optimization is made and the usual SVM with parameters specified in ``params``  is executed.
  
  .. index:: CvSVM::get_default_grid
  
@@ -184,9 +191,9 @@ CvSVM::get_default_grid
  -----------------------
  .. c:function:: CvParamGrid CvSVM::get_default_grid( int param_id )
  
-    Generates a grid for the SVM parameters.
+    Generates a grid for SVM parameters.
  
-    :param param_id: Must be one of the following:
+    :param param_id: SVN parameters IDs that must be one of the following:
  
              * **CvSVM::C**
  
@@ -214,7 +221,7 @@ CvSVM::get_params
  
      Returns the current SVM parameters.
  
-This function may be used to get the optimal parameters that were obtained while automatically training ``CvSVM::train_auto`` .
+This function may be used to get the optimal parameters obtained while automatically training ``CvSVM::train_auto`` .
  
  .. index:: CvSVM::get_support_vector*
  
@@ -226,7 +233,7 @@ CvSVM::get_support_vector*
  
  .. c:function:: const float* CvSVM::get_support_vector(int i) const
  
-    Retrieves the number of support vectors and the particular vector.
+    Retrieves a number of support vectors and the particular vector.
  
-The methods can be used to retrieve the set of support vectors.
+The methods can be used to retrieve a set of support vectors.
author	Elena Fedotova <no@email>
	Sun, 15 May 2011 19:15:36 +0000 (19:15 +0000)
committer	Elena Fedotova <no@email>
	Sun, 15 May 2011 19:15:36 +0000 (19:15 +0000)
modules/ml/doc/boosting.rst		patch \| blob \| history
modules/ml/doc/decision_trees.rst		patch \| blob \| history
modules/ml/doc/expectation_maximization.rst		patch \| blob \| history
modules/ml/doc/k_nearest_neighbors.rst		patch \| blob \| history
modules/ml/doc/ml.rst		patch \| blob \| history
modules/ml/doc/neural_networks.rst		patch \| blob \| history
modules/ml/doc/normal_bayes_classifier.rst		patch \| blob \| history
modules/ml/doc/random_trees.rst		patch \| blob \| history
modules/ml/doc/statistical_models.rst		patch \| blob \| history
modules/ml/doc/support_vector_machines.rst		patch \| blob \| history