From c6593d02a6da5373f6b2a8cbb4afb11ef37ecaff Mon Sep 17 00:00:00 2001
From: Vadim Pisarevsky <vadim.pisarevsky@gmail.com>
Date: Sun, 3 Aug 2014 01:41:30 +0400
Subject: [PATCH] updated docs

---
 modules/ml/doc/boosting.rst                 | 164 ++++----------
 modules/ml/doc/decision_trees.rst           | 279 +++++++++--------------
 modules/ml/doc/ertrees.rst                  |  15 --
 modules/ml/doc/expectation_maximization.rst |  99 +++++----
 modules/ml/doc/gradient_boosted_trees.rst   | 272 -----------------------
 modules/ml/doc/k_nearest_neighbors.rst      | 181 +++------------
 modules/ml/doc/ml.rst                       |   2 -
 modules/ml/doc/mldata.rst                   | 329 ++++++++--------------------
 modules/ml/doc/neural_networks.rst          | 200 +++++++----------
 modules/ml/doc/normal_bayes_classifier.rst  |  53 +----
 modules/ml/doc/random_trees.rst             | 164 +++-----------
 modules/ml/doc/support_vector_machines.rst  | 238 ++++++++------------
 12 files changed, 513 insertions(+), 1483 deletions(-)
 delete mode 100644 modules/ml/doc/ertrees.rst
 delete mode 100644 modules/ml/doc/gradient_boosted_trees.rst

diff --git a/modules/ml/doc/boosting.rst b/modules/ml/doc/boosting.rst
index 7c5bc83..76a9293 100644
--- a/modules/ml/doc/boosting.rst
+++ b/modules/ml/doc/boosting.rst
@@ -63,41 +63,30 @@ training examples are recomputed at each training iteration. Examples deleted at
 
 .. [FHT98] Friedman, J. H., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting. Technical Report, Dept. of Statistics*, Stanford University, 1998.
 
-CvBoostParams
+Boost::Params
 -------------
-.. ocv:struct:: CvBoostParams : public CvDTreeParams
+.. ocv:struct:: Boost::Params : public DTree::Params
 
     Boosting training parameters.
 
-    There is one structure member that you can set directly:
-
-  .. ocv:member:: int split_criteria
-
-     Splitting criteria used to choose optimal splits during a weak tree construction. Possible values are:
-
-        * **CvBoost::DEFAULT** Use the default for the particular boosting method, see below.
-        * **CvBoost::GINI** Use Gini index. This is default option for Real AdaBoost; may be also used for Discrete AdaBoost.
-        * **CvBoost::MISCLASS** Use misclassification rate. This is default option for Discrete AdaBoost; may be also used for Real AdaBoost.
-        * **CvBoost::SQERR** Use least squares criteria. This is default and the only option for LogitBoost and Gentle AdaBoost.
-
-The structure is derived from :ocv:class:`CvDTreeParams` but not all of the decision tree parameters are supported. In particular, cross-validation is not supported.
+The structure is derived from ``DTrees::Params`` but not all of the decision tree parameters are supported. In particular, cross-validation is not supported.
 
 All parameters are public. You can initialize them by a constructor and then override some of them directly if you want.
 
-CvBoostParams::CvBoostParams
+Boost::Params::Params
 ----------------------------
 The constructors.
 
-.. ocv:function:: CvBoostParams::CvBoostParams()
+.. ocv:function:: Boost::Params::Params()
 
-.. ocv:function:: CvBoostParams::CvBoostParams( int boost_type, int weak_count, double weight_trim_rate, int max_depth, bool use_surrogates, const float* priors )
+.. ocv:function:: Boost::Params::Params( int boost_type, int weak_count, double weight_trim_rate, int max_depth, bool use_surrogates, const float* priors )
 
     :param boost_type: Type of the boosting algorithm. Possible values are:
 
-        * **CvBoost::DISCRETE** Discrete AdaBoost.
-        * **CvBoost::REAL** Real AdaBoost. It is a technique that utilizes confidence-rated predictions and works well with categorical data.
-        * **CvBoost::LOGIT** LogitBoost. It can produce good regression fits.
-        * **CvBoost::GENTLE** Gentle AdaBoost. It puts less weight on outlier data points and for that reason is often good with regression data.
+        * **Boost::DISCRETE** Discrete AdaBoost.
+        * **Boost::REAL** Real AdaBoost. It is a technique that utilizes confidence-rated predictions and works well with categorical data.
+        * **Boost::LOGIT** LogitBoost. It can produce good regression fits.
+        * **Boost::GENTLE** Gentle AdaBoost. It puts less weight on outlier data points and for that reason is often good with regression data.
 
         Gentle AdaBoost and Real AdaBoost are often the preferable choices.
 
@@ -105,131 +94,54 @@ The constructors.
 
     :param weight_trim_rate: A threshold between 0 and 1 used to save computational time. Samples with summary weight :math:`\leq 1 - weight\_trim\_rate` do not participate in the *next* iteration of training. Set this parameter to 0 to turn off this functionality.
 
-See :ocv:func:`CvDTreeParams::CvDTreeParams` for description of other parameters.
+See ``DTrees::Params`` for description of other parameters.
 
 Default parameters are:
 
 ::
 
-    CvBoostParams::CvBoostParams()
+    Boost::Params::Params()
     {
-        boost_type = CvBoost::REAL;
-        weak_count = 100;
-        weight_trim_rate = 0.95;
-        cv_folds = 0;
-        max_depth = 1;
+        boostType = Boost::REAL;
+        weakCount = 100;
+        weightTrimRate = 0.95;
+        CVFolds = 0;
+        maxDepth = 1;
     }
 
-CvBoostTree
------------
-.. ocv:class:: CvBoostTree : public CvDTree
-
-The weak tree classifier, a component of the boosted tree classifier :ocv:class:`CvBoost`, is a derivative of :ocv:class:`CvDTree`. Normally, there is no need to use the weak classifiers directly. However, they can be accessed as elements of the sequence ``CvBoost::weak``, retrieved by :ocv:func:`CvBoost::get_weak_predictors`.
-
-.. note:: In case of LogitBoost and Gentle AdaBoost, each weak predictor is a regression tree, rather than a classification tree. Even in case of Discrete AdaBoost and Real AdaBoost, the ``CvBoostTree::predict`` return value (:ocv:member:`CvDTreeNode::value`) is not an output class label. A negative value "votes" for class #0, a positive value - for class #1. The votes are weighted. The weight of each individual tree may be increased or decreased using the method ``CvBoostTree::scale``.
-
-CvBoost
+Boost
 -------
-.. ocv:class:: CvBoost : public CvStatModel
-
-Boosted tree classifier derived from :ocv:class:`CvStatModel`.
-
-CvBoost::CvBoost
-----------------
-Default and training constructors.
-
-.. ocv:function:: CvBoost::CvBoost()
-
-.. ocv:function:: CvBoost::CvBoost( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvBoostParams params=CvBoostParams() )
-
-.. ocv:function:: CvBoost::CvBoost( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvBoostParams params=CvBoostParams() )
-
-.. ocv:pyfunction:: cv2.Boost([trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]]) -> <Boost object>
-
-
-The constructors follow conventions of :ocv:func:`CvStatModel::CvStatModel`. See :ocv:func:`CvStatModel::train` for parameters descriptions.
-
-CvBoost::train
---------------
-Trains a boosted tree classifier.
-
-.. ocv:function:: bool CvBoost::train( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvBoostParams params=CvBoostParams(), bool update=false )
-
-.. ocv:function:: bool CvBoost::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvBoostParams params=CvBoostParams(), bool update=false )
-
-.. ocv:function:: bool CvBoost::train( CvMLData* data, CvBoostParams params=CvBoostParams(), bool update=false )
-
-.. ocv:pyfunction:: cv2.Boost.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params[, update]]]]]]) -> retval
-
-    :param update: Specifies whether the classifier needs to be updated (``true``, the new weak tree classifiers added to the existing ensemble) or the classifier needs to be rebuilt from scratch (``false``).
+.. ocv:class:: Boost : public DTrees
 
-The train method follows the common template of :ocv:func:`CvStatModel::train`. The responses must be categorical, which means that boosted trees cannot be built for regression, and there should be two classes.
+Boosted tree classifier derived from ``DTrees``
 
-CvBoost::predict
+Boost::create
 ----------------
-Predicts a response for an input sample.
+Creates the empty model
 
-.. ocv:function:: float CvBoost::predict( const cv::Mat& sample, const cv::Mat& missing=Mat(), const cv::Range& slice=Range::all(), bool rawMode=false, bool returnSum=false ) const
+.. ocv:function:: Ptr<Boost> Boost::create(const Params& params=Params())
 
-.. ocv:function:: float CvBoost::predict( const CvMat* sample, const CvMat* missing=0, CvMat* weak_responses=0, CvSlice slice=CV_WHOLE_SEQ, bool raw_mode=false, bool return_sum=false ) const
-
-.. ocv:pyfunction:: cv2.Boost.predict(sample[, missing[, slice[, rawMode[, returnSum]]]]) -> retval
-
-    :param sample: Input sample.
-
-    :param missing: Optional mask of missing measurements. To handle missing measurements, the weak classifiers must include surrogate splits (see ``CvDTreeParams::use_surrogates``).
-
-    :param weak_responses: Optional output parameter, a floating-point vector with responses of each individual weak classifier. The number of elements in the vector must be equal to the slice length.
-
-    :param slice: Continuous subset of the sequence of weak classifiers to be used for prediction. By default, all the weak classifiers are used.
-
-    :param rawMode: Normally, it should be set to ``false``.
-
-    :param returnSum: If ``true`` then return sum of votes instead of the class label.
-
-The method runs the sample through the trees in the ensemble and returns the output class label based on the weighted voting.
-
-CvBoost::prune
---------------
-Removes the specified weak classifiers.
-
-.. ocv:function:: void CvBoost::prune( CvSlice slice )
-
-.. ocv:pyfunction:: cv2.Boost.prune(slice) -> None
-
-    :param slice: Continuous subset of the sequence of weak classifiers to be removed.
-
-The method removes the specified weak classifiers from the sequence.
-
-.. note:: Do not confuse this method with the pruning of individual decision trees, which is currently not supported.
-
-
-CvBoost::calc_error
--------------------
-Returns error of the boosted tree classifier.
-
-.. ocv:function:: float CvBoost::calc_error( CvMLData* _data, int type , std::vector<float> *resp = 0 )
-
-The method is identical to :ocv:func:`CvDTree::calc_error` but uses the boosted tree classifier as predictor.
+Use ``StatModel::train`` to train the model, ``StatModel::train<Boost>(traindata, params)`` to create and train the model, ``StatModel::load<Boost>(filename)`` to load the pre-trained model.
 
+Boost::getBParams
+-----------------
+Returns the boosting parameters
 
-CvBoost::get_weak_predictors
-----------------------------
-Returns the sequence of weak tree classifiers.
+.. ocv:function:: Params Boost::getBParams() const
 
-.. ocv:function:: CvSeq* CvBoost::get_weak_predictors()
+The method returns the training parameters.
 
-The method returns the sequence of weak classifiers. Each element of the sequence is a pointer to the :ocv:class:`CvBoostTree` class or to some of its derivatives.
+Boost::setBParams
+-----------------
+Sets the boosting parameters
 
-CvBoost::get_params
--------------------
-Returns current parameters of the boosted tree classifier.
+.. ocv:function:: void Boost::setBParams( const Params& p )
 
-.. ocv:function:: const CvBoostParams& CvBoost::get_params() const
+    :param p: Training parameters of type Boost::Params.
 
+The method sets the training parameters.
 
-CvBoost::get_data
------------------
-Returns used train data of the boosted tree classifier.
+Prediction with Boost
+---------------------
 
-.. ocv:function:: const CvDTreeTrainData* CvBoost::get_data() const
+StatModel::predict(samples, results, flags) should be used. Pass ``flags=StatModel::RAW_OUTPUT`` to get the raw sum from Boost classifier.
diff --git a/modules/ml/doc/decision_trees.rst b/modules/ml/doc/decision_trees.rst
index de6fc99..9400ae4 100644
--- a/modules/ml/doc/decision_trees.rst
+++ b/modules/ml/doc/decision_trees.rst
@@ -3,10 +3,7 @@ Decision Trees
 
 The ML classes discussed in this section implement Classification and Regression Tree algorithms described in [Breiman84]_.
 
-The class
-:ocv:class:`CvDTree` represents a single decision tree that may be used alone or as a base class in tree ensembles (see
-:ref:`Boosting` and
-:ref:`Random Trees` ).
+The class ``cv::ml::DTrees`` represents a single decision tree or a collection of decision trees. It's also a base class for ``RTrees`` and ``Boost``.
 
 A decision tree is a binary tree (tree where each non-leaf node has two child nodes). It can be used either for classification or for regression. For classification, each tree leaf is marked with a class label; multiple leaves may have the same label. For regression, a constant is also assigned to each tree leaf, so the approximation function is piecewise constant.
 
@@ -55,123 +52,107 @@ Besides the prediction that is an obvious use of decision trees, the tree can be
 Importance of each variable is computed over all the splits on this variable in the tree, primary and surrogate ones. Thus, to compute variable importance correctly, the surrogate splits must be enabled in the training parameters, even if there is no missing data.
 
 
-CvDTreeSplit
+DTrees::Split
 ------------
-.. ocv:struct:: CvDTreeSplit
+.. ocv:class:: DTrees::Split
 
+  The class represents split in a decision tree. It has public members:
 
-  The structure represents a possible decision tree node split. It has public members:
-
-  .. ocv:member:: int var_idx
+  .. ocv:member:: int varIdx
 
      Index of variable on which the split is created.
 
-  .. ocv:member:: int inversed
+  .. ocv:member:: bool inversed
 
-     If it is not null then inverse split rule is used that is left and right branches are exchanged in the rule expressions below.
+     If true, then the inverse split rule is used (i.e. left and right branches are exchanged in the rule expressions below).
 
   .. ocv:member:: float quality
 
-     The split quality, a positive number. It is used to choose the best primary split, then to choose and sort the surrogate splits. After the tree is constructed, it is also used to compute variable importance.
-
-  .. ocv:member:: CvDTreeSplit* next
-
-     Pointer to the next split in the node list of splits.
-
-  .. ocv:member:: int[] subset
+     The split quality, a positive number. It is used to choose the best split.
 
-     Bit array indicating the value subset in case of split on a categorical variable. The rule is: ::
+  .. ocv:member:: int next
 
-        if var_value in subset
-          then next_node <- left
-          else next_node <- right
+     Index of the next split in the list of splits for the node
 
-  .. ocv:member:: float ord::c
+  .. ocv:member:: float c
 
      The threshold value in case of split on an ordered variable. The rule is: ::
 
-        if var_value < ord.c
-          then next_node<-left
-          else next_node<-right
+       if var_value < c
+         then next_node<-left
+         else next_node<-right
 
-  .. ocv:member:: int ord::split_point
+  .. ocv:member:: int subsetOfs
 
-     Used internally by the training algorithm.
-
-CvDTreeNode
------------
-.. ocv:struct:: CvDTreeNode
+     Offset of the bitset used by the split on a categorical variable. The rule is: ::
 
+        if bitset[var_value] == 1
+          then next_node <- left
+          else next_node <- right
 
-  The structure represents a node in a decision tree. It has public members:
-
-  .. ocv:member:: int class_idx
-
-    Class index normalized to 0..class_count-1 range and assigned to the node. It is used internally in classification trees and tree ensembles.
-
-  .. ocv:member:: int Tn
+DTrees::Node
+-----------
+.. ocv:class:: DTrees::Node
 
-    Tree index in a ordered sequence of pruned trees. The indices are used during and after the pruning procedure. The root node has the maximum value ``Tn`` of the whole tree, child nodes have ``Tn`` less than or equal to the parent's ``Tn``, and nodes with :math:`Tn \leq CvDTree::pruned\_tree\_idx` are not used at prediction stage (the corresponding branches are considered as cut-off), even if they have not been physically deleted from the tree at the pruning stage.
+  The class represents a decision tree node. It has public members:
 
   .. ocv:member:: double value
-
+  
     Value at the node: a class label in case of classification or estimated function value in case of regression.
 
-  .. ocv:member:: CvDTreeNode* parent
-
-    Pointer to the parent node.
+  .. ocv:member:: int classIdx
 
-  .. ocv:member:: CvDTreeNode* left
+    Class index normalized to 0..class_count-1 range and assigned to the node. It is used internally in classification trees and tree ensembles.
 
-    Pointer to the left child node.
+  .. ocv:member:: int parent
 
-  .. ocv:member:: CvDTreeNode* right
+    Index of the parent node
 
-    Pointer to the right child node.
+  .. ocv:member:: int left
 
-  .. ocv:member:: CvDTreeSplit* split
+    Index of the left child node
 
-    Pointer to the first (primary) split in the node list of splits.
+  .. ocv:member:: int right
 
-  .. ocv:member:: int sample_count
+    Index of right child node.
 
-    The number of samples that fall into the node at the training stage. It is used to resolve the difficult cases - when the variable for the primary split is missing and all the variables for other surrogate splits are missing too. In this case the sample is directed to the left if ``left->sample_count > right->sample_count`` and to the right otherwise.
+  .. ocv:member:: int defaultDir
 
-  .. ocv:member:: int depth
+    Default direction where to go (-1: left or +1: right). It helps in the case of missing values.
 
-    Depth of the node. The root node depth is 0, the child nodes depth is the parent's depth + 1.
+  .. ocv:member:: int split
 
-Other numerous fields of ``CvDTreeNode`` are used internally at the training stage.
+    Index of the first split
 
-CvDTreeParams
--------------
-.. ocv:struct:: CvDTreeParams
+DTrees::Params
+---------------
+.. ocv:class:: DTrees::Params
 
 The structure contains all the decision tree training parameters. You can initialize it by default constructor and then override any parameters directly before training, or the structure may be fully initialized using the advanced variant of the constructor.
 
-CvDTreeParams::CvDTreeParams
+DTrees::Params::Params
 ----------------------------
-The constructors.
+The constructors
 
-.. ocv:function:: CvDTreeParams::CvDTreeParams()
+.. ocv:function:: DTrees::Params::Params()
 
-.. ocv:function:: CvDTreeParams::CvDTreeParams( int max_depth, int min_sample_count, float regression_accuracy, bool use_surrogates, int max_categories, int cv_folds, bool use_1se_rule, bool truncate_pruned_tree, const float* priors )
+.. ocv:function:: DTrees::Params::Params( int maxDepth, int minSampleCount, double regressionAccuracy, bool useSurrogates, int maxCategories, int CVFolds, bool use1SERule, bool truncatePrunedTree, const Mat& priors )
 
-    :param max_depth: The maximum possible depth of the tree. That is the training algorithms attempts to split a node while its depth is less than ``max_depth``. The actual depth may be smaller if the other termination criteria are met (see the outline of the training procedure in the beginning of the section), and/or if the tree is pruned.
+    :param maxDepth: The maximum possible depth of the tree. That is the training algorithms attempts to split a node while its depth is less than ``maxDepth``. The root node has zero depth. The actual depth may be smaller if the other termination criteria are met (see the outline of the training procedure in the beginning of the section), and/or if the tree is pruned.
 
-    :param min_sample_count: If the number of samples in a node is less than this parameter then the node will not be split.
+    :param minSampleCount: If the number of samples in a node is less than this parameter then the node will not be split.
 
-    :param regression_accuracy: Termination criteria for regression trees. If all absolute differences between an estimated value in a node and values of train samples in this node are less than this parameter then the node will not be split.
+    :param regressionAccuracy: Termination criteria for regression trees. If all absolute differences between an estimated value in a node and values of train samples in this node are less than this parameter then the node will not be split further.
 
-    :param use_surrogates: If true then surrogate splits will be built. These splits allow to work with missing data and compute variable importance correctly.
+    :param useSurrogates: If true then surrogate splits will be built. These splits allow to work with missing data and compute variable importance correctly. .. note:: currently it's not implemented.
 
-    :param max_categories: Cluster possible values of a categorical variable into ``K`` :math:`\leq` ``max_categories`` clusters to find a suboptimal split. If a discrete variable, on which the training procedure tries to make a split, takes more than ``max_categories`` values, the precise best subset estimation may take a very long time because the algorithm is exponential. Instead, many decision trees engines (including ML) try to find sub-optimal split in this case by clustering all the samples into ``max_categories`` clusters that is some categories are merged together. The clustering is applied only in ``n``>2-class classification problems for categorical variables with ``N > max_categories`` possible values. In case of regression and 2-class classification the optimal split can be found efficiently without employing clustering, thus the parameter is not used in these cases.
+    :param maxCategories: Cluster possible values of a categorical variable into ``K<=maxCategories`` clusters to find a suboptimal split. If a discrete variable, on which the training procedure tries to make a split, takes more than ``maxCategories`` values, the precise best subset estimation may take a very long time because the algorithm is exponential. Instead, many decision trees engines (including our implementation) try to find sub-optimal split in this case by clustering all the samples into ``maxCategories`` clusters that is some categories are merged together. The clustering is applied only in ``n > 2``-class classification problems for categorical variables with ``N > max_categories`` possible values. In case of regression and 2-class classification the optimal split can be found efficiently without employing clustering, thus the parameter is not used in these cases.
 
-    :param cv_folds: If ``cv_folds > 1`` then prune a tree with ``K``-fold cross-validation where ``K`` is equal to ``cv_folds``.
+    :param CVFolds: If ``CVFolds > 1`` then algorithms prunes the built decision tree using ``K``-fold cross-validation procedure where ``K`` is equal to ``CVFolds``.
 
-    :param use_1se_rule: If true then a pruning will be harsher. This will make a tree more compact and more resistant to the training data noise but a bit less accurate.
+    :param use1SERule: If true then a pruning will be harsher. This will make a tree more compact and more resistant to the training data noise but a bit less accurate.
 
-    :param truncate_pruned_tree: If true then pruned branches are physically removed from the tree. Otherwise they are retained and it is possible to get results from the original unpruned (or pruned less aggressively) tree by decreasing ``CvDTree::pruned_tree_idx`` parameter.
+    :param truncatePrunedTree: If true then pruned branches are physically removed from the tree. Otherwise they are retained and it is possible to get results from the original unpruned (or pruned less aggressively) tree.
 
     :param priors: The array of a priori class probabilities, sorted by the class label value. The parameter can be used to tune the decision tree preferences toward a certain class. For example, if you want to detect some rare anomaly occurrence, the training base will likely contain much more normal cases than anomalies, so a very good classification performance will be achieved just by considering every case as normal. To avoid this, the priors can be specified, where the anomaly probability is artificially increased (up to 0.5 or even greater), so the weight of the misclassified anomalies becomes much bigger, and the tree is adjusted properly. You can also think about this parameter as weights of prediction categories which determine relative weights that you give to misclassification. That is, if the weight of the first category is 1 and the weight of the second category is 10, then each mistake in predicting the second category is equivalent to making 10 mistakes in predicting the first category.
 
@@ -179,142 +160,82 @@ The default constructor initializes all the parameters with the default values t
 
 ::
 
-    CvDTreeParams() : max_categories(10), max_depth(INT_MAX), min_sample_count(10),
-        cv_folds(10), use_surrogates(true), use_1se_rule(true),
-        truncate_pruned_tree(true), regression_accuracy(0.01f), priors(0)
-    {}
-
-
-CvDTreeTrainData
-----------------
-.. ocv:struct:: CvDTreeTrainData
-
-Decision tree training data and shared data for tree ensembles. The structure is mostly used internally for storing both standalone trees and tree ensembles efficiently. Basically, it contains the following types of information:
-
-#. Training parameters, an instance of :ocv:class:`CvDTreeParams`.
-
-#. Training data preprocessed to find the best splits more efficiently. For tree ensembles, this preprocessed data is reused by all trees. Additionally, the training data characteristics shared by all trees in the ensemble are stored here: variable types, the number of classes, a class label compression map, and so on.
-
-#. Buffers, memory storages for tree nodes, splits, and other elements of the constructed trees.
-
-There are two ways of using this structure. In simple cases (for example, a standalone tree or the ready-to-use "black box" tree ensemble from machine learning, like
-:ref:`Random Trees` or
-:ref:`Boosting` ), there is no need to care or even to know about the structure. You just construct the needed statistical model, train it, and use it. The ``CvDTreeTrainData`` structure is constructed and used internally. However, for custom tree algorithms or another sophisticated cases, the structure may be constructed and used explicitly. The scheme is the following:
-
-#.
-    The structure is initialized using the default constructor, followed by ``set_data``, or it is built using the full form of constructor. The parameter ``_shared`` must be set to ``true``.
-
-#.
-    One or more trees are trained using this data (see the special form of the method :ocv:func:`CvDTree::train`).
-
-#.
-    The structure is released as soon as all the trees using it are released.
+    DTrees::Params::Params()
+    {
+        maxDepth = INT_MAX;
+        minSampleCount = 10;
+        regressionAccuracy = 0.01f;
+        useSurrogates = false;
+        maxCategories = 10;
+        CVFolds = 10;
+        use1SERule = true;
+        truncatePrunedTree = true;
+        priors = Mat();
+    }
 
-CvDTree
--------
-.. ocv:class:: CvDTree : public CvStatModel
 
-The class implements a decision tree as described in the beginning of this section.
+DTrees
+------
 
+.. ocv:class:: DTrees : public StatModel
 
-CvDTree::train
---------------
-Trains a decision tree.
+The class represents a single decision tree or a collection of decision trees. The current public interface of the class allows user to train only a single decision tree, however the class is capable of storing multiple decision trees and using them for prediction (by summing responses or using a voting schemes), and the derived from DTrees classes (such as ``RTrees`` and ``Boost``) use this capability to implement decision tree ensembles.
 
-.. ocv:function:: bool CvDTree::train( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvDTreeParams params=CvDTreeParams() )
-
-.. ocv:function:: bool CvDTree::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvDTreeParams params=CvDTreeParams() )
-
-.. ocv:function:: bool CvDTree::train( CvMLData* trainData, CvDTreeParams params=CvDTreeParams() )
-
-.. ocv:function:: bool CvDTree::train( CvDTreeTrainData* trainData, const CvMat* subsampleIdx )
-
-.. ocv:pyfunction:: cv2.DTree.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]) -> retval
-
-There are four ``train`` methods in :ocv:class:`CvDTree`:
-
-* The **first two** methods follow the generic :ocv:func:`CvStatModel::train` conventions. It is the most complete form. Both data layouts (``tflag=CV_ROW_SAMPLE`` and ``tflag=CV_COL_SAMPLE``) are supported, as well as sample and variable subsets, missing measurements, arbitrary combinations of input and output variable types, and so on. The last parameter contains all of the necessary training parameters (see the :ocv:class:`CvDTreeParams` description).
-
-* The **third** method uses :ocv:class:`CvMLData` to pass training data to a decision tree.
-
-* The **last** method ``train`` is mostly used for building tree ensembles. It takes the pre-constructed :ocv:class:`CvDTreeTrainData` instance and an optional subset of the training set. The indices in ``subsampleIdx`` are counted relatively to the ``_sample_idx`` , passed to the ``CvDTreeTrainData`` constructor. For example, if ``_sample_idx=[1, 5, 7, 100]`` , then ``subsampleIdx=[0,3]`` means that the samples ``[1, 100]`` of the original training set are used.
-
-The function is parallelized with the TBB library.
-
-
-
-CvDTree::predict
+DTrees::create
 ----------------
-Returns the leaf node of a decision tree corresponding to the input vector.
-
-.. ocv:function:: CvDTreeNode* CvDTree::predict( const Mat& sample, const Mat& missingDataMask=Mat(), bool preprocessedInput=false ) const
-
-.. ocv:function:: CvDTreeNode* CvDTree::predict( const CvMat* sample, const CvMat* missingDataMask=0, bool preprocessedInput=false ) const
+Creates the empty model
 
-.. ocv:pyfunction:: cv2.DTree.predict(sample[, missingDataMask[, preprocessedInput]]) -> retval
+.. ocv:function:: Ptr<DTrees> DTrees::create(const Params& params=Params())
 
-    :param sample: Sample for prediction.
+The static method creates empty decision tree with the specified parameters. It should be then trained using ``train`` method (see ``StatModel::train``). Alternatively, you can load the model from file using ``StatModel::load<DTrees>(filename)``.
 
-    :param missingDataMask: Optional input missing measurement mask.
+DTrees::getDParams
+------------------
+Returns the training parameters
 
-    :param preprocessedInput: This parameter is normally set to ``false``, implying a regular input. If it is ``true``, the method assumes that all the values of the discrete input variables have been already normalized to :math:`0` to :math:`num\_of\_categories_i-1` ranges since the decision tree uses such normalized representation internally. It is useful for faster prediction with tree ensembles. For ordered input variables, the flag is not used.
+.. ocv:function:: Params DTrees::getDParams() const
 
-The method traverses the decision tree and returns the reached leaf node as output. The prediction result, either the class label or the estimated function value, may be retrieved as the ``value`` field of the :ocv:class:`CvDTreeNode` structure, for example: ``dtree->predict(sample,mask)->value``.
+The method returns the training parameters.
 
-
-
-CvDTree::calc_error
+DTrees::setDParams
 -------------------
-Returns error of the decision tree.
-
-.. ocv:function:: float CvDTree::calc_error( CvMLData* trainData, int type, std::vector<float> *resp = 0 )
-
-    :param trainData: Data for the decision tree.
-
-    :param type: Type of error. Possible values are:
-
-        * **CV_TRAIN_ERROR** Error on train samples.
-
-        * **CV_TEST_ERROR** Error on test samples.
+Sets the training parameters
 
-    :param resp: If it is not null then size of this vector will be set to the number of samples and each element will be set to result of prediction on the corresponding sample.
+.. ocv:function:: void DTrees::setDParams( const Params& p )
 
-The method calculates error of the decision tree. In case of classification it is the percentage of incorrectly classified samples and in case of regression it is the mean of squared errors on samples.
+    :param p: Training parameters of type DTrees::Params.
 
+The method sets the training parameters.
 
-CvDTree::getVarImportance
--------------------------
-Returns the variable importance array.
 
-.. ocv:function:: Mat CvDTree::getVarImportance()
-
-.. ocv:function:: const CvMat* CvDTree::get_var_importance()
+DTrees::getRoots
+-------------------
+Returns indices of root nodes
 
-.. ocv:pyfunction:: cv2.DTree.getVarImportance() -> retval
+.. ocv:function:: std::vector<int>& DTrees::getRoots() const
 
-CvDTree::get_root
------------------
-Returns the root of the decision tree.
+DTrees::getNodes
+-------------------
+Returns all the nodes
 
-.. ocv:function:: const CvDTreeNode* CvDTree::get_root() const
+.. ocv:function:: std::vector<Node>& DTrees::getNodes() const
 
+all the node indices, mentioned above (left, right, parent, root indices) are indices in the returned vector
 
-CvDTree::get_pruned_tree_idx
-----------------------------
-Returns the ``CvDTree::pruned_tree_idx`` parameter.
-
-.. ocv:function:: int CvDTree::get_pruned_tree_idx() const
+DTrees::getSplits
+-------------------
+Returns all the splits
 
-The parameter ``DTree::pruned_tree_idx`` is used to prune a decision tree. See the ``CvDTreeNode::Tn`` parameter.
+.. ocv:function:: std::vector<Split>& DTrees::getSplits() const
 
-CvDTree::get_data
------------------
-Returns used train data of the decision tree.
+all the split indices, mentioned above (split, next etc.) are indices in the returned vector
 
-.. ocv:function:: CvDTreeTrainData* CvDTree::get_data() const
+DTrees::getSubsets
+-------------------
+Returns all the bitsets for categorical splits
 
-Example: building a tree for classifying mushrooms.  See the ``mushroom.cpp`` sample that demonstrates how to build and use the
-decision tree.
+.. ocv:function:: std::vector<int>& DTrees::getSubsets() const
 
+``Split::subsetOfs`` is an offset in the returned vector
 
 .. [Breiman84] Breiman, L., Friedman, J. Olshen, R. and Stone, C. (1984), *Classification and Regression Trees*, Wadsworth.
diff --git a/modules/ml/doc/ertrees.rst b/modules/ml/doc/ertrees.rst
deleted file mode 100644
index 7e6d03e..0000000
--- a/modules/ml/doc/ertrees.rst
+++ /dev/null
@@ -1,15 +0,0 @@
-Extremely randomized trees
-==========================
-
-Extremely randomized trees have been introduced by Pierre Geurts, Damien Ernst and Louis Wehenkel in the article "Extremely randomized trees", 2006 [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.65.7485&rep=rep1&type=pdf]. The algorithm of growing Extremely randomized trees is similar to :ref:`Random Trees` (Random Forest), but there are two differences:
-
-#. Extremely randomized trees don't apply the bagging procedure to construct a set of the training samples for each tree. The same input training set is used to train all trees.
-
-#. Extremely randomized trees pick a node split very extremely (both a variable index and variable splitting value are chosen randomly), whereas Random Forest finds the best split (optimal one by variable index and variable splitting value) among random subset of variables.
-
-
-CvERTrees
-----------
-.. ocv:class:: CvERTrees : public CvRTrees
-
-    The class implements the Extremely randomized trees algorithm. ``CvERTrees`` is inherited from :ocv:class:`CvRTrees` and has the same interface, so see description of :ocv:class:`CvRTrees` class to get details. To set the training parameters of Extremely randomized trees the same class :ocv:struct:`CvRTParams` is used.
diff --git a/modules/ml/doc/expectation_maximization.rst b/modules/ml/doc/expectation_maximization.rst
index b79dea8..82450be 100644
--- a/modules/ml/doc/expectation_maximization.rst
+++ b/modules/ml/doc/expectation_maximization.rst
@@ -91,22 +91,23 @@ already a good enough approximation).
 
 EM
 --
-.. ocv:class:: EM : public Algorithm
+.. ocv:class:: EM : public StatModel
 
-The class implements the EM algorithm as described in the beginning of this section. It is inherited from :ocv:class:`Algorithm`.
+The class implements the EM algorithm as described in the beginning of this section.
 
+EM::Params
+----------
+.. ocv:class:: EM::Params
 
-EM::EM
-------
-The constructor of the class
+The class describes EM training parameters. It includes:
 
-.. ocv:function:: EM::EM(int nclusters=EM::DEFAULT_NCLUSTERS, int covMatType=EM::COV_MAT_DIAGONAL, const TermCriteria& termCrit=TermCriteria(TermCriteria::COUNT+TermCriteria::EPS, EM::DEFAULT_MAX_ITERS, FLT_EPSILON) )
+  .. ocv:member:: int clusters
+  
+    The number of mixture components in the Gaussian mixture model. Default value of the parameter is ``EM::DEFAULT_NCLUSTERS=5``. Some of EM implementation could determine the optimal number of mixtures within a specified value range, but that is not the case in ML yet.
 
-.. ocv:pyfunction:: cv2.EM([nclusters[, covMatType[, termCrit]]]) -> <EM object>
+  .. ocv:member:: int covMatType
 
-    :param nclusters: The number of mixture components in the Gaussian mixture model. Default value of the parameter is ``EM::DEFAULT_NCLUSTERS=5``. Some of EM implementation could determine the optimal number of mixtures within a specified value range, but that is not the case in ML yet.
-
-    :param covMatType: Constraint on covariance matrices which defines type of matrices. Possible values are:
+    Constraint on covariance matrices which defines type of matrices. Possible values are:
 
         * **EM::COV_MAT_SPHERICAL** A scaled identity matrix :math:`\mu_k * I`. There is the only parameter :math:`\mu_k` to be estimated for each matrix. The option may be used in special cases, when the constraint is relevant, or as a first step in the optimization (for example in case when the data is preprocessed with PCA). The results of such preliminary estimation may be passed again to the optimization procedure, this time with ``covMatType=EM::COV_MAT_DIAGONAL``.
 
@@ -114,23 +115,30 @@ The constructor of the class
 
         * **EM::COV_MAT_GENERIC** A symmetric positively defined matrix. The number of free parameters in each matrix is about :math:`d^2/2`. It is not recommended to use this option, unless there is pretty accurate initial estimation of the parameters and/or a huge number of training samples.
 
-    :param termCrit: The termination criteria of the EM algorithm. The EM algorithm can be terminated by the number of iterations ``termCrit.maxCount`` (number of M-steps) or when relative change of likelihood logarithm is less than ``termCrit.epsilon``. Default maximum number of iterations is ``EM::DEFAULT_MAX_ITERS=100``.
+  .. ocv:member:: TermCriteria termCrit
+  
+    The termination criteria of the EM algorithm. The EM algorithm can be terminated by the number of iterations ``termCrit.maxCount`` (number of M-steps) or when relative change of likelihood logarithm is less than ``termCrit.epsilon``. Default maximum number of iterations is ``EM::DEFAULT_MAX_ITERS=100``.
 
-EM::train
----------
-Estimates the Gaussian mixture parameters from a samples set.
 
-.. ocv:function:: bool EM::train(InputArray samples, OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray())
+EM::create
+----------
+Creates empty EM model
+
+.. ocv:function:: Ptr<EM> EM::create(const Params& params=Params())
 
-.. ocv:function:: bool EM::trainE(InputArray samples, InputArray means0, InputArray covs0=noArray(), InputArray weights0=noArray(), OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray())
+    :param params: EM parameters
 
-.. ocv:function:: bool EM::trainM(InputArray samples, InputArray probs0, OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray())
+The model should be trained then using ``StatModel::train(traindata, flags)`` method. Alternatively, you can use one of the ``EM::train*`` methods or load it from file using ``StatModel::load<EM>(filename)``.     
+
+EM::train
+---------
+Static methods that estimate the Gaussian mixture parameters from a samples set
 
-.. ocv:pyfunction:: cv2.EM.train(samples[, logLikelihoods[, labels[, probs]]]) -> retval, logLikelihoods, labels, probs
+.. ocv:function:: Ptr<EM> EM::train(InputArray samples, OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray(), const Params& params=Params())
 
-.. ocv:pyfunction:: cv2.EM.trainE(samples, means0[, covs0[, weights0[, logLikelihoods[, labels[, probs]]]]]) -> retval, logLikelihoods, labels, probs
+.. ocv:function:: bool EM::train_startWithE(InputArray samples, InputArray means0, InputArray covs0=noArray(), InputArray weights0=noArray(), OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray(), const Params& params=Params())
 
-.. ocv:pyfunction:: cv2.EM.trainM(samples, probs0[, logLikelihoods[, labels[, probs]]]) -> retval, logLikelihoods, labels, probs
+.. ocv:function:: bool EM::train_startWithM(InputArray samples, InputArray probs0, OutputArray logLikelihoods=noArray(), OutputArray labels=noArray(), OutputArray probs=noArray(), const Params& params=Params())
 
     :param samples: Samples from which the Gaussian mixture model will be estimated. It should be a one-channel matrix, each row of which is a sample. If the matrix does not have ``CV_64F`` type it will be converted to the inner matrix of such type for the further computing.
 
@@ -147,6 +155,8 @@ Estimates the Gaussian mixture parameters from a samples set.
     :param labels: The optional output "class label" for each sample: :math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture component for each sample). It has :math:`nsamples \times 1` size and ``CV_32SC1`` type.
 
     :param probs: The optional output matrix that contains posterior probabilities of each Gaussian mixture component given the each sample. It has :math:`nsamples \times nclusters` size and ``CV_64FC1`` type.
+    
+    :param params: The Gaussian mixture params, see ``EM::Params`` description above.
 
 Three versions of training method differ in the initialization of Gaussian mixture model parameters and start step:
 
@@ -167,15 +177,13 @@ Unlike many of the ML models, EM is an unsupervised learning algorithm and it do
 :math:`\texttt{labels}_i=\texttt{arg max}_k(p_{i,k}), i=1..N` (indices of the most probable mixture component for each sample).
 
 The trained model can be used further for prediction, just like any other classifier. The trained model is similar to the
-:ocv:class:`CvNormalBayesClassifier`.
+``NormalBayesClassifier``.
 
-EM::predict
------------
+EM::predict2
+------------
 Returns a likelihood logarithm value and an index of the most probable mixture component for the given sample.
 
-.. ocv:function:: Vec2d EM::predict(InputArray sample, OutputArray probs=noArray()) const
-
-.. ocv:pyfunction:: cv2.EM.predict(sample[, probs]) -> retval, probs
+.. ocv:function:: Vec2d EM::predict2(InputArray sample, OutputArray probs=noArray()) const
 
     :param sample: A sample for classification. It should be a one-channel matrix of :math:`1 \times dims` or :math:`dims \times 1` size.
 
@@ -183,28 +191,29 @@ Returns a likelihood logarithm value and an index of the most probable mixture c
 
 The method returns a two-element ``double`` vector. Zero element is a likelihood logarithm value for the sample. First element is an index of the most probable mixture component for the given sample.
 
-CvEM::isTrained
----------------
-Returns ``true`` if the Gaussian mixture model was trained.
 
-.. ocv:function:: bool EM::isTrained() const
+EM::getMeans
+------------
+Returns the cluster centers (means of the Gaussian mixture)
+
+.. ocv:function:: Mat EM::getMeans() const
+
+Returns matrix with the number of rows equal to the number of mixtures and number of columns equal to the space dimensionality.
+
+
+EM::getWeights
+--------------
+Returns weights of the mixtures
+
+.. ocv:function:: Mat EM::getWeights() const
 
-.. ocv:pyfunction:: cv2.EM.isTrained() -> retval
+Returns vector with the number of elements equal to the number of mixtures.
 
-EM::read, EM::write
--------------------
-See :ocv:func:`Algorithm::read` and :ocv:func:`Algorithm::write`.
 
-EM::get, EM::set
-----------------
-See :ocv:func:`Algorithm::get` and :ocv:func:`Algorithm::set`. The following parameters are available:
+EM::getCovs
+--------------
+Returns covariation matrices
 
-* ``"nclusters"``
-* ``"covMatType"``
-* ``"maxIters"``
-* ``"epsilon"``
-* ``"weights"`` *(read-only)*
-* ``"means"`` *(read-only)*
-* ``"covs"`` *(read-only)*
+.. ocv:function:: void EM::getCovs(std::vector<Mat>& covs) const
 
-..
+Returns vector of covariation matrices. Number of matrices is the number of gaussian mixtures, each matrix is a square floating-point matrix NxN, where N is the space dimensionality.
diff --git a/modules/ml/doc/gradient_boosted_trees.rst b/modules/ml/doc/gradient_boosted_trees.rst
deleted file mode 100644
index b83c47e..0000000
--- a/modules/ml/doc/gradient_boosted_trees.rst
+++ /dev/null
@@ -1,272 +0,0 @@
-.. _Gradient Boosted Trees:
-
-Gradient Boosted Trees
-======================
-
-.. highlight:: cpp
-
-Gradient Boosted Trees (GBT) is a generalized boosting algorithm introduced by
-Jerome Friedman: http://www.salfordsystems.com/doc/GreedyFuncApproxSS.pdf .
-In contrast to the AdaBoost.M1 algorithm, GBT can deal with both multiclass
-classification and regression problems. Moreover, it can use any
-differential loss function, some popular ones are implemented.
-Decision trees (:ocv:class:`CvDTree`) usage as base learners allows to process ordered
-and categorical variables.
-
-.. _Training GBT:
-
-Training the GBT model
-----------------------
-
-Gradient Boosted Trees model represents an ensemble of single regression trees
-built in a greedy fashion. Training procedure is an iterative process
-similar to the numerical optimization via the gradient descent method. Summary loss
-on the training set depends only on the current model predictions for the
-training samples,  in other words
-:math:`\sum^N_{i=1}L(y_i, F(x_i)) \equiv \mathcal{L}(F(x_1), F(x_2), ... , F(x_N))
-\equiv \mathcal{L}(F)`. And the :math:`\mathcal{L}(F)`
-gradient can be computed as follows:
-
-.. math::
-    grad(\mathcal{L}(F)) = \left( \dfrac{\partial{L(y_1, F(x_1))}}{\partial{F(x_1)}},
-    \dfrac{\partial{L(y_2, F(x_2))}}{\partial{F(x_2)}}, ... ,
-    \dfrac{\partial{L(y_N, F(x_N))}}{\partial{F(x_N)}} \right) .
-
-At every training step, a single regression tree is built to predict an
-antigradient vector components. Step length is computed corresponding to the
-loss function and separately for every region determined by the tree leaf. It
-can be eliminated by changing values of the leaves  directly.
-
-See below the main scheme of the training process:
-
-#.
-    Find the best constant model.
-#.
-    For :math:`i` in :math:`[1,M]`:
-
-    #.
-        Compute the antigradient.
-    #.
-        Grow a regression tree to predict antigradient components.
-    #.
-        Change values in the tree leaves.
-    #.
-        Add the tree to the model.
-
-
-The following loss functions are implemented for regression problems:
-
-*
-    Squared loss (``CvGBTrees::SQUARED_LOSS``):
-    :math:`L(y,f(x))=\dfrac{1}{2}(y-f(x))^2`
-*
-    Absolute loss (``CvGBTrees::ABSOLUTE_LOSS``):
-    :math:`L(y,f(x))=|y-f(x)|`
-*
-    Huber loss (``CvGBTrees::HUBER_LOSS``):
-    :math:`L(y,f(x)) = \left\{ \begin{array}{lr}
-    \delta\cdot\left(|y-f(x)|-\dfrac{\delta}{2}\right) & : |y-f(x)|>\delta\\
-    \dfrac{1}{2}\cdot(y-f(x))^2 & : |y-f(x)|\leq\delta \end{array} \right.`,
-
-    where :math:`\delta` is the :math:`\alpha`-quantile estimation of the
-    :math:`|y-f(x)|`. In the current implementation :math:`\alpha=0.2`.
-
-
-The following loss functions are implemented for classification problems:
-
-*
-    Deviance or cross-entropy loss (``CvGBTrees::DEVIANCE_LOSS``):
-    :math:`K` functions are built, one function for each output class, and
-    :math:`L(y,f_1(x),...,f_K(x)) = -\sum^K_{k=0}1(y=k)\ln{p_k(x)}`,
-    where :math:`p_k(x)=\dfrac{\exp{f_k(x)}}{\sum^K_{i=1}\exp{f_i(x)}}`
-    is the estimation of the probability of :math:`y=k`.
-
-As a result, you get the following model:
-
-.. math:: f(x) = f_0 + \nu\cdot\sum^M_{i=1}T_i(x) ,
-
-where :math:`f_0` is the initial guess (the best constant model) and :math:`\nu`
-is a regularization parameter from the interval :math:`(0,1]`, further called
-*shrinkage*.
-
-.. _Predicting with GBT:
-
-Predicting with the GBT Model
------------------------------
-
-To get the GBT model prediction, you need to compute the sum of responses of
-all the trees in the ensemble. For regression problems, it is the answer.
-For classification problems, the result is :math:`\arg\max_{i=1..K}(f_i(x))`.
-
-
-.. highlight:: cpp
-
-
-CvGBTreesParams
----------------
-.. ocv:struct:: CvGBTreesParams : public CvDTreeParams
-
-GBT training parameters.
-
-The structure contains parameters for each single decision tree in the ensemble,
-as well as the whole model characteristics. The structure is derived from
-:ocv:class:`CvDTreeParams` but not all of the decision tree parameters are supported:
-cross-validation, pruning, and class priorities are not used.
-
-CvGBTreesParams::CvGBTreesParams
---------------------------------
-.. ocv:function:: CvGBTreesParams::CvGBTreesParams()
-
-.. ocv:function:: CvGBTreesParams::CvGBTreesParams( int loss_function_type, int weak_count, float shrinkage, float subsample_portion, int max_depth, bool use_surrogates )
-
-   :param loss_function_type: Type of the loss function used for training
-    (see :ref:`Training GBT`). It must be one of the
-    following types: ``CvGBTrees::SQUARED_LOSS``, ``CvGBTrees::ABSOLUTE_LOSS``,
-    ``CvGBTrees::HUBER_LOSS``, ``CvGBTrees::DEVIANCE_LOSS``. The first three
-    types are used for regression problems, and the last one for
-    classification.
-
-   :param weak_count: Count of boosting algorithm iterations. ``weak_count*K`` is the total
-    count of trees in the GBT model, where ``K`` is the output classes count
-    (equal to one in case of a regression).
-
-   :param shrinkage: Regularization parameter (see :ref:`Training GBT`).
-
-   :param subsample_portion: Portion of the whole training set used for each algorithm iteration.
-    Subset is generated randomly. For more information see
-    http://www.salfordsystems.com/doc/StochasticBoostingSS.pdf.
-
-   :param max_depth: Maximal depth of each decision tree in the ensemble (see :ocv:class:`CvDTree`).
-
-   :param use_surrogates: If ``true``, surrogate splits are built (see :ocv:class:`CvDTree`).
-
-By default the following constructor is used:
-
-.. code-block:: cpp
-
-    CvGBTreesParams(CvGBTrees::SQUARED_LOSS, 200, 0.01f, 0.8f, 3, false)
-        : CvDTreeParams( 3, 10, 0, false, 10, 0, false, false, 0 )
-
-CvGBTrees
----------
-.. ocv:class:: CvGBTrees : public CvStatModel
-
-The class implements the Gradient boosted tree model as described in the beginning of this section.
-
-CvGBTrees::CvGBTrees
---------------------
-Default and training constructors.
-
-.. ocv:function:: CvGBTrees::CvGBTrees()
-
-.. ocv:function:: CvGBTrees::CvGBTrees( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams() )
-
-.. ocv:function:: CvGBTrees::CvGBTrees( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvGBTreesParams params=CvGBTreesParams() )
-
-.. ocv:pyfunction:: cv2.GBTrees([trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]]) -> <GBTrees object>
-
-The constructors follow conventions of :ocv:func:`CvStatModel::CvStatModel`. See :ocv:func:`CvStatModel::train` for parameters descriptions.
-
-CvGBTrees::train
-----------------
-Trains a Gradient boosted tree model.
-
-.. ocv:function:: bool CvGBTrees::train(const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams(), bool update=false)
-
-.. ocv:function:: bool CvGBTrees::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvGBTreesParams params=CvGBTreesParams(), bool update=false )
-
-.. ocv:function:: bool CvGBTrees::train(CvMLData* data, CvGBTreesParams params=CvGBTreesParams(), bool update=false)
-
-.. ocv:pyfunction:: cv2.GBTrees.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params[, update]]]]]]) -> retval
-
-The first train method follows the common template (see :ocv:func:`CvStatModel::train`).
-Both ``tflag`` values (``CV_ROW_SAMPLE``, ``CV_COL_SAMPLE``) are supported.
-``trainData`` must be of the ``CV_32F`` type. ``responses`` must be a matrix of type
-``CV_32S`` or ``CV_32F``. In both cases it is converted into the ``CV_32F``
-matrix inside the training procedure. ``varIdx`` and ``sampleIdx`` must be a
-list of indices (``CV_32S``) or a mask (``CV_8U`` or ``CV_8S``). ``update`` is
-a dummy parameter.
-
-The second form of :ocv:func:`CvGBTrees::train` function uses :ocv:class:`CvMLData` as a
-data set container. ``update`` is still a dummy parameter.
-
-All parameters specific to the GBT model are passed into the training function
-as a :ocv:class:`CvGBTreesParams` structure.
-
-
-CvGBTrees::predict
-------------------
-Predicts a response for an input sample.
-
-.. ocv:function:: float CvGBTrees::predict(const Mat& sample, const Mat& missing=Mat(), const Range& slice = Range::all(), int k=-1) const
-
-.. ocv:function:: float CvGBTrees::predict( const CvMat* sample, const CvMat* missing=0, CvMat* weakResponses=0, CvSlice slice = CV_WHOLE_SEQ, int k=-1 ) const
-
-.. ocv:pyfunction:: cv2.GBTrees.predict(sample[, missing[, slice[, k]]]) -> retval
-
-   :param sample: Input feature vector that has the same format as every training set
-    element. If not all the variables were actually used during training,
-    ``sample`` contains forged values at the appropriate places.
-
-   :param missing: Missing values mask, which is a dimensional matrix of the same size as
-    ``sample`` having the ``CV_8U`` type. ``1`` corresponds to the missing value
-    in the same position in the ``sample`` vector. If there are no missing values
-    in the feature vector, an empty matrix can be passed instead of the missing mask.
-
-   :param weakResponses: Matrix used to obtain predictions of all the trees.
-    The matrix has :math:`K` rows,
-    where :math:`K` is the count of output classes (1 for the regression case).
-    The matrix has as many columns as the ``slice`` length.
-
-   :param slice: Parameter defining the part of the ensemble used for prediction.
-    If ``slice = Range::all()``, all trees are used. Use this parameter to
-    get predictions of the GBT models with different ensemble sizes learning
-    only one model.
-
-   :param k: Number of tree ensembles built in case of the classification problem
-    (see :ref:`Training GBT`). Use this
-    parameter to change the output to sum of the trees' predictions in the
-    ``k``-th ensemble only. To get the total GBT model prediction, ``k`` value
-    must be -1. For regression problems, ``k`` is also equal to -1.
-
-The method predicts the response corresponding to the given sample
-(see :ref:`Predicting with GBT`).
-The result is either the class label or the estimated function value. The
-:ocv:func:`CvGBTrees::predict` method enables using the parallel version of the GBT model
-prediction if the OpenCV is built with the TBB library. In this case, predictions
-of single trees are computed in a parallel fashion.
-
-
-CvGBTrees::clear
-----------------
-Clears the model.
-
-.. ocv:function:: void CvGBTrees::clear()
-
-.. ocv:pyfunction:: cv2.GBTrees.clear() -> None
-
-The function deletes the data set information and all the weak models and sets all internal
-variables to the initial state. The function is called in :ocv:func:`CvGBTrees::train` and in the
-destructor.
-
-
-CvGBTrees::calc_error
----------------------
-Calculates a training or testing error.
-
-.. ocv:function:: float CvGBTrees::calc_error( CvMLData* _data, int type, std::vector<float> *resp = 0 )
-
-   :param _data: Data set.
-
-   :param type: Parameter defining the error that should be computed: train (``CV_TRAIN_ERROR``) or test
-    (``CV_TEST_ERROR``).
-
-   :param resp: If non-zero, a vector of predictions on the corresponding data set is
-    returned.
-
-If the :ocv:class:`CvMLData` data is used to store the data set, :ocv:func:`CvGBTrees::calc_error` can be
-used to get a training/testing error easily and (optionally) all predictions
-on the training/testing set. If the Intel* TBB* library is used, the error is computed in a
-parallel way, namely, predictions for different samples are computed at the same time.
-In case of a regression problem, a mean squared error is returned. For
-classifications, the result is a misclassification error in percent.
diff --git a/modules/ml/doc/k_nearest_neighbors.rst b/modules/ml/doc/k_nearest_neighbors.rst
index 05413c7..6e16641 100644
--- a/modules/ml/doc/k_nearest_neighbors.rst
+++ b/modules/ml/doc/k_nearest_neighbors.rst
@@ -5,9 +5,9 @@ K-Nearest Neighbors
 
 The algorithm caches all training samples and predicts the response for a new sample by analyzing a certain number (**K**) of the nearest neighbors of the sample using voting, calculating weighted sum, and so on. The method is sometimes referred to as "learning by example" because for prediction it looks for the feature vector with a known response that is closest to the given vector.
 
-CvKNearest
+KNearest
 ----------
-.. ocv:class:: CvKNearest : public CvStatModel
+.. ocv:class:: KNearest : public StatModel
 
 The class implements K-Nearest Neighbors model as described in the beginning of this section.
 
@@ -17,65 +17,32 @@ The class implements K-Nearest Neighbors model as described in the beginning of
    * (Python) An example of grid search digit recognition using KNearest can be found at opencv_source/samples/python2/digits_adjust.py
    * (Python) An example of video digit recognition using KNearest can be found at opencv_source/samples/python2/digits_video.py
 
-CvKNearest::CvKNearest
+KNearest::create
 ----------------------
-Default and training constructors.
+Creates the empty model
 
-.. ocv:function:: CvKNearest::CvKNearest()
+.. ocv:function:: Ptr<KNearest> KNearest::create(const Params& params=Params())
 
-.. ocv:function:: CvKNearest::CvKNearest( const Mat& trainData, const Mat& responses, const Mat& sampleIdx=Mat(), bool isRegression=false, int max_k=32 )
+    :param params: The model parameters: default number of neighbors to use in predict method (in ``KNearest::findNearest`` this number must be passed explicitly) and the flag on whether classification or regression model should be trained.
 
-.. ocv:function:: CvKNearest::CvKNearest( const CvMat* trainData, const CvMat* responses, const CvMat* sampleIdx=0, bool isRegression=false, int max_k=32 )
+The static method creates empty KNearest classifier. It should be then trained using ``train`` method (see ``StatModel::train``). Alternatively, you can load boost model from file using ``StatModel::load<KNearest>(filename)``.
 
-See :ocv:func:`CvKNearest::train` for additional parameters descriptions.
 
-CvKNearest::train
------------------
-Trains the model.
-
-.. ocv:function:: bool CvKNearest::train( const Mat& trainData, const Mat& responses, const Mat& sampleIdx=Mat(), bool isRegression=false, int maxK=32, bool updateBase=false )
-
-.. ocv:function:: bool CvKNearest::train( const CvMat* trainData, const CvMat* responses, const CvMat* sampleIdx=0, bool is_regression=false, int maxK=32, bool updateBase=false )
-
-.. ocv:pyfunction:: cv2.KNearest.train(trainData, responses[, sampleIdx[, isRegression[, maxK[, updateBase]]]]) -> retval
-
-    :param isRegression: Type of the problem: ``true`` for regression and ``false`` for classification.
-
-    :param maxK: Number of maximum neighbors that may be passed to the method :ocv:func:`CvKNearest::find_nearest`.
-
-    :param updateBase: Specifies whether the model is trained from scratch (``update_base=false``), or it is updated using the new training data (``update_base=true``). In the latter case, the parameter ``maxK`` must not be larger than the original value.
-
-The method trains the K-Nearest model. It follows the conventions of the generic :ocv:func:`CvStatModel::train` approach with the following limitations:
-
-* Only ``CV_ROW_SAMPLE`` data layout is supported.
-* Input variables are all ordered.
-* Output variables can be either categorical ( ``is_regression=false`` ) or ordered ( ``is_regression=true`` ).
-* Variable subsets (``var_idx``) and missing measurements are not supported.
-
-CvKNearest::find_nearest
+KNearest::findNearest
 ------------------------
 Finds the neighbors and predicts responses for input vectors.
 
-.. ocv:function:: float CvKNearest::find_nearest( const Mat& samples, int k, Mat* results=0, const float** neighbors=0, Mat* neighborResponses=0, Mat* dist=0 ) const
-
-.. ocv:function:: float CvKNearest::find_nearest( const Mat& samples, int k, Mat& results, Mat& neighborResponses, Mat& dists) const
+.. ocv:function:: float KNearest::findNearest( InputArray samples, int k, OutputArray results, OutputArray neighborResponses=noArray(), OutputArray dist=noArray() ) const
 
-.. ocv:function:: float CvKNearest::find_nearest( const CvMat* samples, int k, CvMat* results=0, const float** neighbors=0, CvMat* neighborResponses=0, CvMat* dist=0 ) const
+    :param samples: Input samples stored by rows. It is a single-precision floating-point matrix of ``<number_of_samples> * k`` size.
 
-.. ocv:pyfunction:: cv2.KNearest.find_nearest(samples, k[, results[, neighborResponses[, dists]]]) -> retval, results, neighborResponses, dists
+    :param k: Number of used nearest neighbors. Should be greater than 1.
 
+    :param results: Vector with results of prediction (regression or classification) for each input sample. It is a single-precision floating-point vector with ``<number_of_samples>`` elements.
 
-    :param samples: Input samples stored by rows. It is a single-precision floating-point matrix of :math:`number\_of\_samples \times number\_of\_features` size.
+    :param neighborResponses: Optional output values for corresponding neighbors. It is a single-precision floating-point matrix of ``<number_of_samples> * k`` size.
 
-    :param k: Number of used nearest neighbors. It must satisfy constraint: :math:`k \le` :ocv:func:`CvKNearest::get_max_k`.
-
-    :param results: Vector with results of prediction (regression or classification) for each input sample. It is a single-precision floating-point vector with ``number_of_samples`` elements.
-
-    :param neighbors: Optional output pointers to the neighbor vectors themselves. It is an array of ``k*samples->rows`` pointers.
-
-    :param neighborResponses: Optional output values for corresponding ``neighbors``. It is a single-precision floating-point matrix of :math:`number\_of\_samples \times k` size.
-
-    :param dist: Optional output distances from the input vectors to the corresponding ``neighbors``. It is a single-precision floating-point matrix of :math:`number\_of\_samples \times k` size.
+    :param dist: Optional output distances from the input vectors to the corresponding neighbors. It is a single-precision floating-point matrix of ``<number_of_samples> * k`` size.
 
 For each input vector (a row of the matrix ``samples``), the method finds the ``k`` nearest neighbors.  In case of regression, the predicted result is a mean value of the particular vector's neighbor responses. In case of classification, the class is determined by voting.
 
@@ -87,110 +54,18 @@ If only a single input vector is passed, all output matrices are optional and th
 
 The function is parallelized with the TBB library.
 
-CvKNearest::get_max_k
+KNearest::getDefaultK
+---------------------
+Returns the default number of neighbors
+
+.. ocv:function:: int KNearest::getDefaultK() const
+
+The function returns the default number of neighbors that is used in a simpler ``predict`` method, not ``findNearest``.
+
+KNearest::setDefaultK
 ---------------------
-Returns the number of maximum neighbors that may be passed to the method :ocv:func:`CvKNearest::find_nearest`.
-
-.. ocv:function:: int CvKNearest::get_max_k() const
-
-CvKNearest::get_var_count
--------------------------
-Returns the number of used features (variables count).
-
-.. ocv:function:: int CvKNearest::get_var_count() const
-
-CvKNearest::get_sample_count
-----------------------------
-Returns the total number of train samples.
-
-.. ocv:function:: int CvKNearest::get_sample_count() const
-
-CvKNearest::is_regression
--------------------------
-Returns type of the problem: ``true`` for regression and ``false`` for classification.
-
-.. ocv:function:: bool CvKNearest::is_regression() const
-
-
-
-The sample below (currently using the obsolete ``CvMat`` structures) demonstrates the use of the k-nearest classifier for 2D point classification: ::
-
-    #include "ml.h"
-    #include "highgui.h"
-
-    int main( int argc, char** argv )
-    {
-        const int K = 10;
-        int i, j, k, accuracy;
-        float response;
-        int train_sample_count = 100;
-        CvRNG rng_state = cvRNG(-1);
-        CvMat* trainData = cvCreateMat( train_sample_count, 2, CV_32FC1 );
-        CvMat* trainClasses = cvCreateMat( train_sample_count, 1, CV_32FC1 );
-        IplImage* img = cvCreateImage( cvSize( 500, 500 ), 8, 3 );
-        float _sample[2];
-        CvMat sample = cvMat( 1, 2, CV_32FC1, _sample );
-        cvZero( img );
-
-        CvMat trainData1, trainData2, trainClasses1, trainClasses2;
-
-        // form the training samples
-        cvGetRows( trainData, &trainData1, 0, train_sample_count/2 );
-        cvRandArr( &rng_state, &trainData1, CV_RAND_NORMAL, cvScalar(200,200), cvScalar(50,50) );
-
-        cvGetRows( trainData, &trainData2, train_sample_count/2, train_sample_count );
-        cvRandArr( &rng_state, &trainData2, CV_RAND_NORMAL, cvScalar(300,300), cvScalar(50,50) );
-
-        cvGetRows( trainClasses, &trainClasses1, 0, train_sample_count/2 );
-        cvSet( &trainClasses1, cvScalar(1) );
-
-        cvGetRows( trainClasses, &trainClasses2, train_sample_count/2, train_sample_count );
-        cvSet( &trainClasses2, cvScalar(2) );
-
-        // learn classifier
-        CvKNearest knn( trainData, trainClasses, 0, false, K );
-        CvMat* nearests = cvCreateMat( 1, K, CV_32FC1);
-
-        for( i = 0; i < img->height; i++ )
-        {
-            for( j = 0; j < img->width; j++ )
-            {
-                sample.data.fl[0] = (float)j;
-                sample.data.fl[1] = (float)i;
-
-                // estimate the response and get the neighbors' labels
-                response = knn.find_nearest(&sample,K,0,0,nearests,0);
-
-                // compute the number of neighbors representing the majority
-                for( k = 0, accuracy = 0; k < K; k++ )
-                {
-                    if( nearests->data.fl[k] == response)
-                        accuracy++;
-                }
-                // highlight the pixel depending on the accuracy (or confidence)
-                cvSet2D( img, i, j, response == 1 ?
-                    (accuracy > 5 ? CV_RGB(180,0,0) : CV_RGB(180,120,0)) :
-                    (accuracy > 5 ? CV_RGB(0,180,0) : CV_RGB(120,120,0)) );
-            }
-        }
-
-        // display the original training samples
-        for( i = 0; i < train_sample_count/2; i++ )
-        {
-            CvPoint pt;
-            pt.x = cvRound(trainData1.data.fl[i*2]);
-            pt.y = cvRound(trainData1.data.fl[i*2+1]);
-            cvCircle( img, pt, 2, CV_RGB(255,0,0), CV_FILLED );
-            pt.x = cvRound(trainData2.data.fl[i*2]);
-            pt.y = cvRound(trainData2.data.fl[i*2+1]);
-            cvCircle( img, pt, 2, CV_RGB(0,255,0), CV_FILLED );
-        }
-
-        cvNamedWindow( "classifier result", 1 );
-        cvShowImage( "classifier result", img );
-        cvWaitKey(0);
-
-        cvReleaseMat( &trainClasses );
-        cvReleaseMat( &trainData );
-        return 0;
-    }
+Returns the default number of neighbors
+
+.. ocv:function:: void KNearest::setDefaultK(int k)
+
+The function sets the default number of neighbors that is used in a simpler ``predict`` method, not ``findNearest``.
diff --git a/modules/ml/doc/ml.rst b/modules/ml/doc/ml.rst
index b83e7de..86da3ac 100644
--- a/modules/ml/doc/ml.rst
+++ b/modules/ml/doc/ml.rst
@@ -15,9 +15,7 @@ Most of the classification and regression algorithms are implemented as C++ clas
     support_vector_machines
     decision_trees
     boosting
-    gradient_boosted_trees
     random_trees
-    ertrees
     expectation_maximization
     neural_networks
     mldata
diff --git a/modules/ml/doc/mldata.rst b/modules/ml/doc/mldata.rst
index c3092d1..8a3b796 100644
--- a/modules/ml/doc/mldata.rst
+++ b/modules/ml/doc/mldata.rst
@@ -1,279 +1,126 @@
-MLData
+Training Data
 ===================
 
 .. highlight:: cpp
 
-For the machine learning algorithms, the data set is often stored in a file of the ``.csv``-like format. The file contains a table of predictor and response values where each row of the table corresponds to a sample. Missing values are supported. The UC Irvine Machine Learning Repository (http://archive.ics.uci.edu/ml/) provides many data sets stored in such a format to the machine learning community. The class ``MLData`` is implemented to easily load the data for training one of the OpenCV machine learning algorithms. For float values, only the  ``'.'`` separator is supported. The table can have a header and in such case the user have to set the number of the header lines to skip them duaring the file reading.
+In machine learning algorithms there is notion of training data. Training data includes several components:
 
-CvMLData
---------
-.. ocv:class:: CvMLData
-
-Class for loading the data from a ``.csv`` file.
-::
-
-    class CV_EXPORTS CvMLData
-    {
-    public:
-        CvMLData();
-        virtual ~CvMLData();
-
-        int read_csv(const char* filename);
-
-        const CvMat* get_values() const;
-        const CvMat* get_responses();
-        const CvMat* get_missing() const;
+* A set of training samples. Each training sample is a vector of values (in Computer Vision it's sometimes referred to as feature vector). Usually all the vectors have the same number of components (features); OpenCV ml module assumes that. Each feature can be ordered (i.e. its values are floating-point numbers that can be compared with each other and strictly ordered, i.e. sorted) or categorical (i.e. its value belongs to a fixed set of values that can be integers, strings etc.).
 
-        void set_response_idx( int idx );
-        int get_response_idx() const;
+* Optional set of responses corresponding to the samples. Training data with no responses is used in unsupervised learning algorithms that learn structure of the supplied data based on distances between different samples. Training data with responses is used in supervised learning algorithms, which learn the function mapping samples to responses. Usually the responses are scalar values, ordered (when we deal with regression problem) or categorical (when we deal with classification problem; in this case the responses are often called "labels"). Some algorithms, most noticeably Neural networks, can handle not only scalar, but also multi-dimensional or vector responses.
 
+* Another optional component is the mask of missing measurements. Most algorithms require all the components in all the training samples be valid, but some other algorithms, such as decision tress, can handle the cases of missing measurements.
 
-        void set_train_test_split( const CvTrainTestSplit * spl);
-        const CvMat* get_train_sample_idx() const;
-        const CvMat* get_test_sample_idx() const;
-        void mix_train_and_test_idx();
+* In the case of classification problem user may want to give different weights to different classes. This is useful, for example, when
+  * user wants to shift prediction accuracy towards lower false-alarm rate or higher hit-rate.
+  * user wants to compensate for significantly different amounts of training samples from different classes.
 
-        const CvMat* get_var_idx();
-        void change_var_idx( int vi, bool state );
+* In addition to that, each training sample may be given a weight, if user wants the algorithm to pay special attention to certain training samples and adjust the training model accordingly.
 
-        const CvMat* get_var_types();
-        void set_var_types( const char* str );
+* Also, user may wish not to use the whole training data at once, but rather use parts of it, e.g. to do parameter optimization via cross-validation procedure.
 
-        int get_var_type( int var_idx ) const;
-        void change_var_type( int var_idx, int type);
+As you can see, training data can have rather complex structure; besides, it may be very big and/or not entirely available, so there is need to make abstraction for this concept. In OpenCV ml there is ``cv::ml::TrainData`` class for that.
 
-        void set_delimiter( char ch );
-        char get_delimiter() const;
-
-        void set_miss_ch( char ch );
-        char get_miss_ch() const;
-
-        const std::map<String, int>& get_class_labels_map() const;
+TrainData
+--------
+.. ocv:class:: TrainData
 
-    protected:
-        ...
-    };
+Class encapsulating training data. Please note that the class only specifies the interface of training data, but not implementation. All the statistical model classes in ml take Ptr<TrainData>. In other words, you can create your own class derived from ``TrainData`` and supply smart pointer to the instance of this class into ``StatModel::train``.
 
-CvMLData::read_csv
-------------------
-Reads the data set from a ``.csv``-like ``filename`` file and stores all read values in a matrix.
+TrainData::loadFromCSV
+----------------------
+Reads the dataset from a .csv file and returns the ready-to-use training data.
 
-.. ocv:function:: int CvMLData::read_csv(const char* filename)
+.. ocv:function:: Ptr<TrainData> loadFromCSV(const String& filename, int headerLineCount, int responseStartIdx=-1, int responseEndIdx=-1, const String& varTypeSpec=String(), char delimiter=',', char missch='?');
 
     :param filename: The input file name
 
-While reading the data, the method tries to define the type of variables (predictors and responses): ordered or categorical. If a value of the variable is not numerical (except for the label for a missing value), the type of the variable is set to ``CV_VAR_CATEGORICAL``. If all existing values of the variable are numerical, the type of the variable is set to ``CV_VAR_ORDERED``. So, the default definition of variables types works correctly for all cases except the case of a categorical variable with numerical class labels. In this case, the type ``CV_VAR_ORDERED`` is set. You should change the type to ``CV_VAR_CATEGORICAL`` using the method :ocv:func:`CvMLData::change_var_type`. For categorical variables, a common map is built to convert a string class label to the numerical class label. Use :ocv:func:`CvMLData::get_class_labels_map` to obtain this map.
-
-Also, when reading the data, the method constructs the mask of missing values. For example, values are equal to `'?'`.
-
-CvMLData::get_values
---------------------
-Returns a pointer to the matrix of predictors and response values
-
-.. ocv:function:: const CvMat* CvMLData::get_values() const
-
-The method returns a pointer to the matrix of predictor and response ``values``  or ``0`` if the data has not been loaded from the file yet.
-
-The row count of this matrix equals the sample count. The column count equals predictors ``+ 1`` for the response (if exists) count. This means that each row of the matrix contains values of one sample predictor and response. The matrix type is ``CV_32FC1``.
-
-CvMLData::get_responses
------------------------
-Returns a pointer to the matrix of response values
-
-.. ocv:function:: const CvMat* CvMLData::get_responses()
-
-The method returns a pointer to the matrix of response values or throws an exception if the data has not been loaded from the file yet.
-
-This is a single-column matrix of the type ``CV_32FC1``. Its row count is equal to the sample count, one column and .
-
-CvMLData::get_missing
----------------------
-Returns a pointer to the mask matrix of missing values
-
-.. ocv:function:: const CvMat* CvMLData::get_missing() const
-
-The method returns a pointer to the mask matrix of missing values or throws an exception if the data has not been loaded from the file yet.
-
-This matrix has the same size as the  ``values`` matrix (see :ocv:func:`CvMLData::get_values`) and the type ``CV_8UC1``.
-
-CvMLData::set_response_idx
---------------------------
-Specifies index of response column in the data matrix
-
-.. ocv:function:: void CvMLData::set_response_idx( int idx )
-
-The method sets the index of a response column in the ``values`` matrix (see :ocv:func:`CvMLData::get_values`) or throws an exception if the data has not been loaded from the file yet.
-
-The old response columns become predictors. If ``idx < 0``, there is no response.
-
-CvMLData::get_response_idx
+    :param headerLineCount: The number of lines in the beginning to skip; besides the header, the function also skips empty lines and lines staring with '#'
+    
+    :param responseStartIdx: Index of the first output variable. If -1, the function considers the last variable as the response
+    
+    :param responseEndIdx: Index of the last output variable + 1. If -1, then there is single response variable at ``responseStartIdx``.
+    
+    :param varTypeSpec: The optional text string that specifies the variables' types. It has the format ``ord[n1-n2,n3,n4-n5,...]cat[n6,n7-n8,...]``. That is, variables from n1 to n2 (inclusive range), n3, n4 to n5 ... are considered ordered and n6, n7 to n8 ... are considered as categorical. The range [n1..n2] + [n3] + [n4..n5] + ... + [n6] + [n7..n8] should cover all the variables. If varTypeSpec is not specified, then algorithm uses the following rules:
+        1. all input variables are considered ordered by default. If some column contains has non-numerical values, e.g. 'apple', 'pear', 'apple', 'apple', 'mango', the corresponding variable is considered categorical.
+        2. if there are several output variables, they are all considered as ordered. Error is reported when non-numerical values are used.
+        3. if there is a single output variable, then if its values are non-numerical or are all integers, then it's considered categorical. Otherwise, it's considered ordered.
+    
+    :param delimiter: The character used to separate values in each line.
+    
+    :param missch: The character used to specify missing measurements. It should not be a digit. Although it's a non-numerical value, it surely does not affect the decision of whether the variable ordered or categorical.
+
+TrainData::create
+-----------------
+Creates training data from in-memory arrays.
+
+.. ocv:function:: Ptr<TrainData> create(InputArray samples, int layout, InputArray responses, InputArray varIdx=noArray(), InputArray sampleIdx=noArray(), InputArray sampleWeights=noArray(), InputArray varType=noArray())
+
+    :param samples: matrix of samples. It should have ``CV_32F`` type.
+    
+    :param layout: it's either ``ROW_SAMPLE``, which means that each training sample is a row of ``samples``, or ``COL_SAMPLE``, which means that each training sample occupies a column of ``samples``.
+    
+    :param responses: matrix of responses. If the responses are scalar, they should be stored as a single row or as a single column. The matrix should have type ``CV_32F`` or ``CV_32S`` (in the former case the responses are considered as ordered by default; in the latter case - as categorical)
+    
+    :param varIdx: vector specifying which variables to use for training. It can be an integer vector (``CV_32S``) containing 0-based variable indices or byte vector (``CV_8U``) containing a mask of active variables.
+    
+    :param sampleIdx: vector specifying which samples to use for training. It can be an integer vector (``CV_32S``) containing 0-based sample indices or byte vector (``CV_8U``) containing a mask of training samples.
+    
+    :param sampleWeights: optional vector with weights for each sample. It should have ``CV_32F`` type.
+    
+    :param varType: optional vector of type ``CV_8U`` and size <number_of_variables_in_samples> + <number_of_variables_in_responses>, containing types of each input and output variable. The ordered variables are denoted by value ``VAR_ORDERED``, and categorical - by ``VAR_CATEGORICAL``.
+
+
+TrainData::getTrainSamples
 --------------------------
-Returns index of the response column in the loaded data matrix
-
-.. ocv:function:: int CvMLData::get_response_idx() const
-
-The method returns the index of a response column in the ``values`` matrix (see :ocv:func:`CvMLData::get_values`) or throws an exception if the data has not been loaded from the file yet.
-
-If ``idx < 0``, there is no response.
-
-
-CvMLData::set_train_test_split
-------------------------------
-Divides the read data set into two disjoint training and test subsets.
-
-.. ocv:function:: void CvMLData::set_train_test_split( const CvTrainTestSplit * spl )
-
-This method sets parameters for such a split using ``spl`` (see :ocv:class:`CvTrainTestSplit`) or throws an exception if the data has not been loaded from the file yet.
-
-CvMLData::get_train_sample_idx
-------------------------------
-Returns the matrix of sample indices for a training subset
-
-.. ocv:function:: const CvMat* CvMLData::get_train_sample_idx() const
-
-The method returns the matrix of sample indices for a training subset. This is a single-row  matrix of the type ``CV_32SC1``. If data split is not set, the method returns ``0``. If the data has not been loaded from the file yet, an exception is thrown.
-
-CvMLData::get_test_sample_idx
------------------------------
-Returns the matrix of sample indices for a testing subset
-
-.. ocv:function:: const CvMat* CvMLData::get_test_sample_idx() const
-
-
-CvMLData::mix_train_and_test_idx
---------------------------------
-Mixes the indices of training and test samples
-
-.. ocv:function:: void CvMLData::mix_train_and_test_idx()
-
-The method shuffles the indices of training and test samples preserving sizes of training and test subsets if the data split is set by :ocv:func:`CvMLData::get_values`. If the data has not been loaded from the file yet, an exception is thrown.
-
-CvMLData::get_var_idx
----------------------
-Returns the indices of the active variables in the data matrix
-
-.. ocv:function:: const CvMat* CvMLData::get_var_idx()
-
-The method returns the indices of variables (columns) used in the ``values`` matrix (see :ocv:func:`CvMLData::get_values`).
-
-It returns ``0`` if the used subset is not set. It throws an exception if the data has not been loaded from the file yet. Returned matrix is a single-row matrix of the type ``CV_32SC1``. Its column count is equal to the size of the used variable subset.
-
-CvMLData::change_var_idx
-------------------------
-Enables or disables particular variable in the loaded data
-
-.. ocv:function:: void CvMLData::change_var_idx( int vi, bool state )
-
-By default, after reading the data set all variables in the ``values`` matrix (see :ocv:func:`CvMLData::get_values`) are used. But you may want to use only a subset of variables and include/exclude (depending on ``state`` value) a variable with the ``vi`` index from the used subset. If the data has not been loaded from the file yet, an exception is thrown.
-
-CvMLData::get_var_types
------------------------
-Returns a matrix of the variable types.
-
-.. ocv:function:: const CvMat* CvMLData::get_var_types()
-
-The function returns a single-row matrix of the type ``CV_8UC1``, where each element is set to either ``CV_VAR_ORDERED`` or ``CV_VAR_CATEGORICAL``. The number of columns is equal to the number of variables. If data has not been loaded from file yet an exception is thrown.
-
-CvMLData::set_var_types
------------------------
-Sets the variables types in the loaded data.
-
-.. ocv:function:: void CvMLData::set_var_types( const char* str )
-
-In the string, a variable type is followed by a list of variables indices. For example: ``"ord[0-17],cat[18]"``, ``"ord[0,2,4,10-12], cat[1,3,5-9,13,14]"``, ``"cat"`` (all variables are categorical), ``"ord"`` (all variables are ordered).
-
-CvMLData::get_header_lines_number
----------------------------------
-Returns a number of the table header lines.
-
-.. ocv:function:: int CvMLData::get_header_lines_number() const
-
-CvMLData::set_header_lines_number
----------------------------------
-Sets a number of the table header lines.
-
-.. ocv:function:: void CvMLData::set_header_lines_number( int n )
-
-By default it is supposed that the table does not have a header, i.e. it contains only the data.
-
-CvMLData::get_var_type
-----------------------
-Returns type of the specified variable
-
-.. ocv:function:: int CvMLData::get_var_type( int var_idx ) const
-
-The method returns the type of a variable by the index ``var_idx`` ( ``CV_VAR_ORDERED`` or ``CV_VAR_CATEGORICAL``).
-
-CvMLData::change_var_type
--------------------------
-Changes type of the specified variable
-
-.. ocv:function:: void CvMLData::change_var_type( int var_idx, int type)
-
-The method changes type of variable with index ``var_idx`` from existing type to ``type`` ( ``CV_VAR_ORDERED`` or ``CV_VAR_CATEGORICAL``).
-
-CvMLData::set_delimiter
------------------------
-Sets the delimiter in the file used to separate input numbers
-
-.. ocv:function:: void CvMLData::set_delimiter( char ch )
-
-The method sets the delimiter for variables in a file. For example: ``','`` (default), ``';'``, ``' '`` (space), or other characters. The floating-point separator ``'.'`` is not allowed.
+Returns matrix of train samples
 
-CvMLData::get_delimiter
------------------------
-Returns the currently used delimiter character.
+.. ocv:function:: Mat TrainData::getTrainSamples(int layout=ROW_SAMPLE, bool compressSamples=true, bool compressVars=true) const
 
-.. ocv:function:: char CvMLData::get_delimiter() const
+    :param layout: The requested layout. If it's different from the initial one, the matrix is transposed.
+    
+    :param compressSamples: if true, the function returns only the training samples (specified by sampleIdx)
+    
+    :param compressVars: if true, the function returns the shorter training samples, containing only the active variables.
+    
+In current implementation the function tries to avoid physical data copying and returns the matrix stored inside TrainData (unless the transposition or compression is needed).
 
 
-CvMLData::set_miss_ch
----------------------
-Sets the character used to specify missing values
+TrainData::getTrainResponses
+----------------------------
+Returns the vector of responses
 
-.. ocv:function:: void CvMLData::set_miss_ch( char ch )
+.. ocv:function:: Mat TrainData::getTrainResponses() const
 
-The method sets the character used to specify missing values. For example: ``'?'`` (default), ``'-'``. The floating-point separator ``'.'`` is not allowed.
+The function returns ordered or the original categorical responses. Usually it's used in regression algorithms.
 
-CvMLData::get_miss_ch
----------------------
-Returns the currently used missing value character.
 
-.. ocv:function:: char CvMLData::get_miss_ch() const
+TrainData::getClassLabels
+----------------------------
+Returns the vector of class labels
 
-CvMLData::get_class_labels_map
--------------------------------
-Returns a map that converts strings to labels.
+.. ocv:function:: Mat TrainData::getClassLabels() const
 
-.. ocv:function:: const std::map<String, int>& CvMLData::get_class_labels_map() const
+The function returns vector of unique labels occurred in the responses.
 
-The method returns a map that converts string class labels to the numerical class labels. It can be used to get an original class label as in a file.
 
-CvTrainTestSplit
-----------------
-.. ocv:struct:: CvTrainTestSplit
+TrainData::getTrainNormCatResponses
+-----------------------------------
+Returns the vector of normalized categorical responses
 
-Structure setting the split of a data set read by :ocv:class:`CvMLData`.
-::
+.. ocv:function:: Mat TrainData::getTrainNormCatResponses() const
 
-    struct CvTrainTestSplit
-    {
-        CvTrainTestSplit();
-        CvTrainTestSplit( int train_sample_count, bool mix = true);
-        CvTrainTestSplit( float train_sample_portion, bool mix = true);
+The function returns vector of responses. Each response is integer from 0 to <number of classes>-1. The actual label value can be retrieved then from the class label vector, see ``TrainData::getClassLabels``.
 
-        union
-        {
-            int count;
-            float portion;
-        } train_sample_part;
-        int train_sample_part_mode;
+TrainData::setTrainTestSplitRatio
+-----------------------------------
+Splits the training data into the training and test parts
 
-        bool mix;
-    };
+.. ocv:function:: void TrainData::setTrainTestSplitRatio(double ratio, bool shuffle=true)
 
-There are two ways to construct a split:
+The function selects a subset of specified relative size and then returns it as the training set. If the function is not called, all the data is used for training. Please, note that for each of ``TrainData::getTrain*`` there is corresponding ``TrainData::getTest*``, so that the test subset can be retrieved and processed as well.
 
-* Set the training sample count (subset size) ``train_sample_count``. Other existing samples are located in a test subset.
 
-* Set a training sample portion in ``[0,..1]``. The flag ``mix`` is used to mix training and test samples indices when the split is set. Otherwise, the data set is split in the storing order: the first part of samples of a given size is a training subset, the second part is a test subset.
+Other methods
+-------------
+The class includes many other methods that can be used to access normalized categorical input variables, access training data by parts, so that does not have to fit into the memory etc.
diff --git a/modules/ml/doc/neural_networks.rst b/modules/ml/doc/neural_networks.rst
index 776bf24..166e2e2 100644
--- a/modules/ml/doc/neural_networks.rst
+++ b/modules/ml/doc/neural_networks.rst
@@ -29,17 +29,17 @@ In other words, given the outputs
 Different activation functions may be used. ML implements three standard functions:
 
 *
-    Identity function ( ``CvANN_MLP::IDENTITY``     ):
+    Identity function ( ``ANN_MLP::IDENTITY``     ):
     :math:`f(x)=x`
 *
-    Symmetrical sigmoid ( ``CvANN_MLP::SIGMOID_SYM``     ):
+    Symmetrical sigmoid ( ``ANN_MLP::SIGMOID_SYM``     ):
     :math:`f(x)=\beta*(1-e^{-\alpha x})/(1+e^{-\alpha x}`     ), which is the default choice for MLP. The standard sigmoid with
     :math:`\beta =1, \alpha =1`     is shown below:
 
     .. image:: pics/sigmoid_bipolar.png
 
 *
-    Gaussian function ( ``CvANN_MLP::GAUSSIAN``     ):
+    Gaussian function ( ``ANN_MLP::GAUSSIAN``     ):
     :math:`f(x)=\beta e^{-\alpha x*x}`     , which is not completely supported at the moment.
 
 In ML, all the neurons have the same activation functions, with the same free parameters (
@@ -95,60 +95,90 @@ The second (default) one is a batch RPROP algorithm.
 .. [RPROP93] M. Riedmiller and H. Braun, *A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm*, Proc. ICNN, San Francisco (1993).
 
 
-CvANN_MLP_TrainParams
+ANN_MLP::Params
 ---------------------
-.. ocv:struct:: CvANN_MLP_TrainParams
-
-  Parameters of the MLP training algorithm. You can initialize the structure by a constructor or the individual parameters can be adjusted after the structure is created.
+.. ocv:class:: ANN_MLP::Params
+
+  Parameters of the MLP and of the training algorithm. You can initialize the structure by a constructor or the individual parameters can be adjusted after the structure is created.
+
+  The network structure:
+  
+  .. ocv:member:: Mat layerSizes
+  
+     The number of elements in each layer of network. The very first element specifies the number of elements in the input layer. The last element - number of elements in the output layer.
+     
+  .. ocv:member:: int activateFunc
+  
+     The activation function. Currently the only fully supported activation function is ``ANN_MLP::SIGMOID_SYM``.
+     
+  .. ocv:member:: double fparam1
+  
+     The first parameter of activation function, 0 by default.
+     
+  .. ocv:member:: double fparam2
+  
+     The second parameter of the activation function, 0 by default.
+     
+     .. note::
+     
+         If you are using the default ``ANN_MLP::SIGMOID_SYM`` activation function with the default parameter values fparam1=0 and fparam2=0 then the function used is y = 1.7159*tanh(2/3 * x), so the output will range from [-1.7159, 1.7159], instead of [0,1].    
 
   The back-propagation algorithm parameters:
 
-  .. ocv:member:: double bp_dw_scale
+  .. ocv:member:: double bpDWScale
 
      Strength of the weight gradient term. The recommended value is about 0.1.
 
-  .. ocv:member:: double bp_moment_scale
+  .. ocv:member:: double bpMomentScale
 
      Strength of the momentum term (the difference between weights on the 2 previous iterations). This parameter provides some inertia to smooth the random fluctuations of the weights. It can vary from 0 (the feature is disabled) to 1 and beyond. The value 0.1 or so is good enough
 
   The RPROP algorithm parameters (see [RPROP93]_ for details):
 
-  .. ocv:member:: double rp_dw0
+  .. ocv:member:: double prDW0
 
      Initial value :math:`\Delta_0` of update-values :math:`\Delta_{ij}`.
 
-  .. ocv:member:: double rp_dw_plus
+  .. ocv:member:: double rpDWPlus
 
      Increase factor :math:`\eta^+`. It must be >1.
 
-  .. ocv:member:: double rp_dw_minus
+  .. ocv:member:: double rpDWMinus
 
      Decrease factor :math:`\eta^-`. It must be <1.
 
-  .. ocv:member:: double rp_dw_min
+  .. ocv:member:: double rpDWMin
 
      Update-values lower limit :math:`\Delta_{min}`. It must be positive.
 
-  .. ocv:member:: double rp_dw_max
+  .. ocv:member:: double rpDWMax
 
      Update-values upper limit :math:`\Delta_{max}`. It must be >1.
 
 
-CvANN_MLP_TrainParams::CvANN_MLP_TrainParams
+ANN_MLP::Params::Params
 --------------------------------------------
-The constructors.
+Construct the parameter structure
+
+.. ocv:function:: ANN_MLP::Params()
+
+.. ocv:function:: ANN_MLP::Params::Params( const Mat& layerSizes, int activateFunc, double fparam1, double fparam2, TermCriteria termCrit, int trainMethod, double param1, double param2=0 )
+
+    :param layerSizes: Integer vector specifying the number of neurons in each layer including the input and output layers.
 
-.. ocv:function:: CvANN_MLP_TrainParams::CvANN_MLP_TrainParams()
+    :param activateFunc: Parameter specifying the activation function for each neuron: one of  ``ANN_MLP::IDENTITY``, ``ANN_MLP::SIGMOID_SYM``, and ``ANN_MLP::GAUSSIAN``.
 
-.. ocv:function:: CvANN_MLP_TrainParams::CvANN_MLP_TrainParams( CvTermCriteria term_crit, int train_method, double param1, double param2=0 )
+    :param fparam1: The first parameter of the activation function, :math:`\alpha`. See the formulas in the introduction section.
 
-    :param term_crit: Termination criteria of the training algorithm. You can specify the maximum number of iterations (``max_iter``) and/or how much the error could change between the iterations to make the algorithm continue (``epsilon``).
+    :param fparam2: The second parameter of the activation function, :math:`\beta`. See the formulas in the introduction section.
+
+    :param termCrit: Termination criteria of the training algorithm. You can specify the maximum number of iterations (``maxCount``) and/or how much the error could change between the iterations to make the algorithm continue (``epsilon``).
 
     :param train_method: Training method of the MLP. Possible values are:
 
-        * **CvANN_MLP_TrainParams::BACKPROP** The back-propagation algorithm.
+        * **ANN_MLP_TrainParams::BACKPROP** The back-propagation algorithm.
 
-        * **CvANN_MLP_TrainParams::RPROP** The RPROP algorithm.
+        * **ANN_MLP_TrainParams::RPROP** The RPROP algorithm.
 
     :param param1: Parameter of the training method. It is ``rp_dw0`` for ``RPROP`` and ``bp_dw_scale`` for ``BACKPROP``.
 
@@ -158,126 +188,54 @@ By default the RPROP algorithm is used:
 
 ::
 
-    CvANN_MLP_TrainParams::CvANN_MLP_TrainParams()
+    ANN_MLP_TrainParams::ANN_MLP_TrainParams()
     {
-        term_crit = cvTermCriteria( CV_TERMCRIT_ITER + CV_TERMCRIT_EPS, 1000, 0.01 );
+        layerSizes = Mat();
+        activateFun = SIGMOID_SYM;
+        fparam1 = fparam2 = 0;
+        term_crit = TermCriteria( TermCriteria::MAX_ITER + TermCriteria::EPS, 1000, 0.01 );
         train_method = RPROP;
-        bp_dw_scale = bp_moment_scale = 0.1;
-        rp_dw0 = 0.1; rp_dw_plus = 1.2; rp_dw_minus = 0.5;
-        rp_dw_min = FLT_EPSILON; rp_dw_max = 50.;
+        bpDWScale = bpMomentScale = 0.1;
+        rpDW0 = 0.1; rpDWPlus = 1.2; rpDWMinus = 0.5;
+        rpDWMin = FLT_EPSILON; rpDWMax = 50.;
     }
 
-CvANN_MLP
+ANN_MLP
 ---------
-.. ocv:class:: CvANN_MLP : public CvStatModel
+.. ocv:class:: ANN_MLP : public StatModel
 
 MLP model.
 
-Unlike many other models in ML that are constructed and trained at once, in the MLP model these steps are separated. First, a network with the specified topology is created using the non-default constructor or the method :ocv:func:`CvANN_MLP::create`. All the weights are set to zeros. Then, the network is trained using a set of input and output vectors. The training procedure can be repeated more than once, that is, the weights can be adjusted based on the new training data.
+Unlike many other models in ML that are constructed and trained at once, in the MLP model these steps are separated. First, a network with the specified topology is created using the non-default constructor or the method :ocv:func:`ANN_MLP::create`. All the weights are set to zeros. Then, the network is trained using a set of input and output vectors. The training procedure can be repeated more than once, that is, the weights can be adjusted based on the new training data.
 
 
-CvANN_MLP::CvANN_MLP
+ANN_MLP::create
 --------------------
-The constructors.
-
-.. ocv:function:: CvANN_MLP::CvANN_MLP()
-
-.. ocv:function:: CvANN_MLP::CvANN_MLP( const CvMat* layerSizes, int activateFunc=CvANN_MLP::SIGMOID_SYM, double fparam1=0, double fparam2=0 )
-
-.. ocv:pyfunction::  cv2.ANN_MLP([layerSizes[, activateFunc[, fparam1[, fparam2]]]]) -> <ANN_MLP object>
-
-The advanced constructor allows to create MLP with the specified topology. See :ocv:func:`CvANN_MLP::create` for details.
-
-CvANN_MLP::create
------------------
-Constructs MLP with the specified topology.
-
-.. ocv:function:: void CvANN_MLP::create( const Mat& layerSizes, int activateFunc=CvANN_MLP::SIGMOID_SYM, double fparam1=0, double fparam2=0 )
-
-.. ocv:function:: void CvANN_MLP::create( const CvMat* layerSizes, int activateFunc=CvANN_MLP::SIGMOID_SYM, double fparam1=0, double fparam2=0 )
-
-.. ocv:pyfunction:: cv2.ANN_MLP.create(layerSizes[, activateFunc[, fparam1[, fparam2]]]) -> None
-
-    :param layerSizes: Integer vector specifying the number of neurons in each layer including the input and output layers.
-
-    :param activateFunc: Parameter specifying the activation function for each neuron: one of  ``CvANN_MLP::IDENTITY``, ``CvANN_MLP::SIGMOID_SYM``, and ``CvANN_MLP::GAUSSIAN``.
-
-    :param fparam1: Free parameter of the activation function, :math:`\alpha`. See the formulas in the introduction section.
-
-    :param fparam2: Free parameter of the activation function, :math:`\beta`. See the formulas in the introduction section.
-
-The method creates an MLP network with the specified topology and assigns the same activation function to all the neurons.
-
-CvANN_MLP::train
-----------------
-Trains/updates MLP.
-
-.. ocv:function:: int CvANN_MLP::train( const Mat& inputs, const Mat& outputs, const Mat& sampleWeights, const Mat& sampleIdx=Mat(), CvANN_MLP_TrainParams params = CvANN_MLP_TrainParams(), int flags=0 )
-
-.. ocv:function:: int CvANN_MLP::train( const CvMat* inputs, const CvMat* outputs, const CvMat* sampleWeights, const CvMat* sampleIdx=0, CvANN_MLP_TrainParams params = CvANN_MLP_TrainParams(), int flags=0 )
-
-.. ocv:pyfunction:: cv2.ANN_MLP.train(inputs, outputs, sampleWeights[, sampleIdx[, params[, flags]]]) -> retval
-
-    :param inputs: Floating-point matrix of input vectors, one vector per row.
-
-    :param outputs: Floating-point matrix of the corresponding output vectors, one vector per row.
-
-    :param sampleWeights: (RPROP only) Optional floating-point vector of weights for each sample. Some samples may be more important than others for training. You may want to raise the weight of certain classes to find the right balance between hit-rate and false-alarm rate, and so on.
-
-    :param sampleIdx: Optional integer vector indicating the samples (rows of ``inputs`` and ``outputs``) that are taken into account.
-
-    :param params: Training parameters. See the :ocv:class:`CvANN_MLP_TrainParams` description.
-
-    :param flags: Various parameters to control the training algorithm. A combination of the following parameters is possible:
-
-            * **UPDATE_WEIGHTS** Algorithm updates the network weights, rather than computes them from scratch. In the latter case the weights are initialized using the Nguyen-Widrow algorithm.
-
-            * **NO_INPUT_SCALE** Algorithm does not normalize the input vectors. If this flag is not set, the training algorithm normalizes each input feature independently, shifting its mean value to 0 and making the standard deviation equal to 1. If the network is assumed to be updated frequently, the new training data could be much different from original one. In this case, you should take care of proper normalization.
-
-            * **NO_OUTPUT_SCALE** Algorithm does not normalize the output vectors. If the flag is not set, the training algorithm normalizes each output feature independently, by transforming it to the certain range depending on the used activation function.
-
-This method applies the specified training algorithm to computing/adjusting the network weights. It returns the number of done iterations.
-
-The RPROP training algorithm is parallelized with the TBB library.
-
-If you are using the default ``cvANN_MLP::SIGMOID_SYM`` activation function then the output should be in the range [-1,1], instead of [0,1], for optimal results.
-
-CvANN_MLP::predict
-------------------
-Predicts responses for input samples.
-
-.. ocv:function:: float CvANN_MLP::predict( const Mat& inputs, Mat& outputs ) const
-
-.. ocv:function:: float CvANN_MLP::predict( const CvMat* inputs, CvMat* outputs ) const
-
-.. ocv:pyfunction:: cv2.ANN_MLP.predict(inputs[, outputs]) -> retval, outputs
+Creates empty model
 
-    :param inputs: Input samples.
+.. ocv:function:: Ptr<ANN_MLP> ANN_MLP::create(const Params& params=Params())
 
-    :param outputs: Predicted responses for corresponding samples.
+Use ``StatModel::train`` to train the model, ``StatModel::train<ANN_MLP>(traindata, params)`` to create and train the model, ``StatModel::load<ANN_MLP>(filename)`` to load the pre-trained model. Note that the train method has optional flags, and the following flags are handled by ``ANN_MLP``:
 
-The method returns a dummy value which should be ignored.
+        * **UPDATE_WEIGHTS** Algorithm updates the network weights, rather than computes them from scratch. In the latter case the weights are initialized using the Nguyen-Widrow algorithm.
 
-If you are using the default ``cvANN_MLP::SIGMOID_SYM`` activation function with the default parameter values fparam1=0 and fparam2=0 then the function used is y = 1.7159*tanh(2/3 * x), so the output will range from [-1.7159, 1.7159], instead of [0,1].
+        * **NO_INPUT_SCALE** Algorithm does not normalize the input vectors. If this flag is not set, the training algorithm normalizes each input feature independently, shifting its mean value to 0 and making the standard deviation equal to 1. If the network is assumed to be updated frequently, the new training data could be much different from original one. In this case, you should take care of proper normalization.
 
-CvANN_MLP::get_layer_count
---------------------------
-Returns the number of layers in the MLP.
+        * **NO_OUTPUT_SCALE** Algorithm does not normalize the output vectors. If the flag is not set, the training algorithm normalizes each output feature independently, by transforming it to the certain range depending on the used activation function.
 
-.. ocv:function:: int CvANN_MLP::get_layer_count()
 
-CvANN_MLP::get_layer_sizes
---------------------------
-Returns numbers of neurons in each layer of the MLP.
+ANN_MLP::setParams
+-------------------
+Sets the new network parameters
 
-.. ocv:function:: const CvMat* CvANN_MLP::get_layer_sizes()
+.. ocv:function:: void ANN_MLP::setParams(const Params& params)
 
-The method returns the integer vector specifying the number of neurons in each layer including the input and output layers of the MLP.
+    :param params: The new parameters
 
-CvANN_MLP::get_weights
-----------------------
-Returns neurons weights of the particular layer.
+The existing network, if any, will be destroyed and new empty one will be created. It should be re-trained after that.
 
-.. ocv:function:: double* CvANN_MLP::get_weights(int layer)
+ANN_MLP::getParams
+-------------------
+Retrieves the current network parameters
 
-    :param layer: Index of the particular layer.
+.. ocv:function:: Params ANN_MLP::getParams() const
diff --git a/modules/ml/doc/normal_bayes_classifier.rst b/modules/ml/doc/normal_bayes_classifier.rst
index dbd6ae2..e3aba21 100644
--- a/modules/ml/doc/normal_bayes_classifier.rst
+++ b/modules/ml/doc/normal_bayes_classifier.rst
@@ -9,55 +9,26 @@ This simple classification model assumes that feature vectors from each class ar
 
 .. [Fukunaga90] K. Fukunaga. *Introduction to Statistical Pattern Recognition*. second ed., New York: Academic Press, 1990.
 
-CvNormalBayesClassifier
+NormalBayesClassifier
 -----------------------
-.. ocv:class:: CvNormalBayesClassifier : public CvStatModel
+.. ocv:class:: NormalBayesClassifier : public StatModel
 
 Bayes classifier for normally distributed data.
 
-CvNormalBayesClassifier::CvNormalBayesClassifier
-------------------------------------------------
-Default and training constructors.
+NormalBayesClassifier::create
+-----------------------------
+Creates empty model
 
-.. ocv:function:: CvNormalBayesClassifier::CvNormalBayesClassifier()
+.. ocv:function:: Ptr<NormalBayesClassifier> NormalBayesClassifier::create(const NormalBayesClassifier::Params& params=Params())
 
-.. ocv:function:: CvNormalBayesClassifier::CvNormalBayesClassifier( const Mat& trainData, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat() )
+    :param params: The model parameters. There is none so far, the structure is used as a placeholder for possible extensions.
 
-.. ocv:function:: CvNormalBayesClassifier::CvNormalBayesClassifier( const CvMat* trainData, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0 )
+Use ``StatModel::train`` to train the model, ``StatModel::train<NormalBayesClassifier>(traindata, params)`` to create and train the model, ``StatModel::load<NormalBayesClassifier>(filename)`` to load the pre-trained model.
 
-.. ocv:pyfunction:: cv2.NormalBayesClassifier([trainData, responses[, varIdx[, sampleIdx]]]) -> <NormalBayesClassifier object>
-
-The constructors follow conventions of :ocv:func:`CvStatModel::CvStatModel`. See :ocv:func:`CvStatModel::train` for parameters descriptions.
-
-CvNormalBayesClassifier::train
-------------------------------
-Trains the model.
-
-.. ocv:function:: bool CvNormalBayesClassifier::train( const Mat& trainData, const Mat& responses, const Mat& varIdx = Mat(), const Mat& sampleIdx=Mat(), bool update=false )
-
-.. ocv:function:: bool CvNormalBayesClassifier::train( const CvMat* trainData, const CvMat* responses, const CvMat* varIdx = 0, const CvMat* sampleIdx=0, bool update=false )
-
-.. ocv:pyfunction:: cv2.NormalBayesClassifier.train(trainData, responses[, varIdx[, sampleIdx[, update]]]) -> retval
-
-    :param update: Identifies whether the model should be trained from scratch (``update=false``) or should be updated using the new training data (``update=true``).
-
-The method trains the Normal Bayes classifier. It follows the conventions of the generic :ocv:func:`CvStatModel::train` approach with the following limitations:
-
-* Only ``CV_ROW_SAMPLE`` data layout is supported.
-* Input variables are all ordered.
-* Output variable is categorical , which means that elements of ``responses`` must be integer numbers, though the vector may have the ``CV_32FC1`` type.
-* Missing measurements are not supported.
-
-CvNormalBayesClassifier::predict
---------------------------------
+NormalBayesClassifier::predictProb
+----------------------------------
 Predicts the response for sample(s).
 
-.. ocv:function:: float CvNormalBayesClassifier::predict(  const Mat& samples,  Mat* results=0, Mat* results_prob=0 ) const
-
-.. ocv:function:: float CvNormalBayesClassifier::predict( const CvMat* samples, CvMat* results=0, CvMat* results_prob=0 ) const
-
-.. ocv:pyfunction:: cv2.NormalBayesClassifier.predict(samples) -> retval, results
-
-The method estimates the most probable classes for input vectors. Input vectors (one or more) are stored as rows of the matrix ``samples``. In case of multiple input vectors, there should be one output vector ``results``. The predicted class for a single input vector is returned by the method. The vector ``results_prob`` contains the output probabilities coresponding to each element of ``result``.
+.. ocv:function:: float NormalBayesClassifier::predictProb( InputArray inputs, OutputArray outputs, OutputArray outputProbs, int flags=0 ) const
 
-The function is parallelized with the TBB library.
+The method estimates the most probable classes for input vectors. Input vectors (one or more) are stored as rows of the matrix ``inputs``. In case of multiple input vectors, there should be one output vector ``outputs``. The predicted class for a single input vector is returned by the method. The vector ``outputProbs`` contains the output probabilities corresponding to each element of ``result``.
diff --git a/modules/ml/doc/random_trees.rst b/modules/ml/doc/random_trees.rst
index 8d7911d..3b85126 100644
--- a/modules/ml/doc/random_trees.rst
+++ b/modules/ml/doc/random_trees.rst
@@ -40,179 +40,65 @@ For the random trees usage example, please, see letter_recog.cpp sample in OpenC
 
   * And other articles from the web site http://www.stat.berkeley.edu/users/breiman/RandomForests/cc_home.htm
 
-CvRTParams
-----------
-.. ocv:struct:: CvRTParams : public CvDTreeParams
+RTrees::Params
+--------------
+.. ocv:struct:: RTrees::Params : public DTrees::Params
 
     Training parameters of random trees.
 
 The set of training parameters for the forest is a superset of the training parameters for a single tree. However, random trees do not need all the functionality/features of decision trees. Most noticeably, the trees are not pruned, so the cross-validation parameters are not used.
 
 
-CvRTParams::CvRTParams:
+RTrees::Params::Params
 -----------------------
-The constructors.
+The constructors
 
-.. ocv:function:: CvRTParams::CvRTParams()
+.. ocv:function:: RTrees::Params::Params()
 
-.. ocv:function:: CvRTParams::CvRTParams( int max_depth, int min_sample_count, float regression_accuracy, bool use_surrogates, int max_categories, const float* priors, bool calc_var_importance, int nactive_vars, int max_num_of_trees_in_the_forest, float forest_accuracy, int termcrit_type )
+.. ocv:function:: RTrees::Params::Params( int maxDepth, int minSampleCount, double regressionAccuracy, bool useSurrogates, int maxCategories, const Mat& priors, bool calcVarImportance, int nactiveVars, TermCriteria termCrit )
 
-    :param max_depth: the depth of the tree. A low value will likely underfit and conversely a high value will likely overfit. The optimal value can be obtained using cross validation or other suitable methods.
+    :param maxDepth: the depth of the tree. A low value will likely underfit and conversely a high value will likely overfit. The optimal value can be obtained using cross validation or other suitable methods.
 
-    :param min_sample_count: minimum samples required at a leaf node for it to be split. A reasonable value is a small percentage of the total data e.g. 1%.
+    :param minSampleCount: minimum samples required at a leaf node for it to be split. A reasonable value is a small percentage of the total data e.g. 1%.
 
-    :param max_categories: Cluster possible values of a categorical variable into ``K`` :math:`\leq` ``max_categories`` clusters to find a suboptimal split. If a discrete variable, on which the training procedure tries to make a split, takes more than ``max_categories`` values, the precise best subset estimation may take a very long time because the algorithm is exponential. Instead, many decision trees engines (including ML) try to find sub-optimal split in this case by clustering all the samples into ``max_categories`` clusters that is some categories are merged together. The clustering is applied only in ``n``>2-class classification problems for categorical variables with ``N > max_categories`` possible values. In case of regression and 2-class classification the optimal split can be found efficiently without employing clustering, thus the parameter is not used in these cases.
+    :param maxCategories: Cluster possible values of a categorical variable into ``K <= maxCategories`` clusters to find a suboptimal split. If a discrete variable, on which the training procedure tries to make a split, takes more than ``max_categories`` values, the precise best subset estimation may take a very long time because the algorithm is exponential. Instead, many decision trees engines (including ML) try to find sub-optimal split in this case by clustering all the samples into ``maxCategories`` clusters that is some categories are merged together. The clustering is applied only in ``n``>2-class classification problems for categorical variables with ``N > max_categories`` possible values. In case of regression and 2-class classification the optimal split can be found efficiently without employing clustering, thus the parameter is not used in these cases.
 
-    :param calc_var_importance: If true then variable importance will be calculated and then it can be retrieved by :ocv:func:`CvRTrees::get_var_importance`.
+    :param calcVarImportance: If true then variable importance will be calculated and then it can be retrieved by ``RTrees::getVarImportance``.
 
-    :param nactive_vars: The size of the randomly selected subset of features at each tree node and that are used to find the best split(s). If you set it to 0 then the size will be set to the square root of the total number of features.
+    :param nactiveVars: The size of the randomly selected subset of features at each tree node and that are used to find the best split(s). If you set it to 0 then the size will be set to the square root of the total number of features.
 
-    :param max_num_of_trees_in_the_forest: The maximum number of trees in the forest (surprise, surprise). Typically the more trees you have the better the accuracy. However, the improvement in accuracy generally diminishes and asymptotes pass a certain number of trees. Also to keep in mind, the number of tree increases the prediction time linearly.
-
-    :param forest_accuracy: Sufficient accuracy (OOB error).
-
-    :param termcrit_type: The type of the termination criteria:
-
-        * **CV_TERMCRIT_ITER** Terminate learning by the ``max_num_of_trees_in_the_forest``;
-
-        * **CV_TERMCRIT_EPS** Terminate learning by the ``forest_accuracy``;
-
-        * **CV_TERMCRIT_ITER | CV_TERMCRIT_EPS** Use both termination criteria.
-
-For meaning of other parameters see :ocv:func:`CvDTreeParams::CvDTreeParams`.
+    :param termCrit: The termination criteria that specifies when the training algorithm stops - either when the specified number of trees is trained and added to the ensemble or when sufficient accuracy (measured as OOB error) is achieved. Typically the more trees you have the better the accuracy. However, the improvement in accuracy generally diminishes and asymptotes pass a certain number of trees. Also to keep in mind, the number of tree increases the prediction time linearly.
 
 The default constructor sets all parameters to default values which are different from default values of :ocv:class:`CvDTreeParams`:
 
 ::
 
-    CvRTParams::CvRTParams() : CvDTreeParams( 5, 10, 0, false, 10, 0, false, false, 0 ),
-        calc_var_importance(false), nactive_vars(0)
+    RTrees::Params::Params() : DTrees::Params( 5, 10, 0, false, 10, 0, false, false, Mat() ),
+        calcVarImportance(false), nactiveVars(0)
     {
-        term_crit = cvTermCriteria( CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 50, 0.1 );
+        termCrit = cvTermCriteria( TermCriteria::MAX_ITERS + TermCriteria::EPS, 50, 0.1 );
     }
 
 
-CvRTrees
+RTrees
 --------
-.. ocv:class:: CvRTrees : public CvStatModel
+.. ocv:class:: RTrees : public DTrees
 
     The class implements the random forest predictor as described in the beginning of this section.
 
-CvRTrees::train
+RTrees::create
 ---------------
-Trains the Random Trees model.
-
-.. ocv:function:: bool CvRTrees::train( const Mat& trainData, int tflag, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), const Mat& varType=Mat(), const Mat& missingDataMask=Mat(), CvRTParams params=CvRTParams() )
-
-.. ocv:function:: bool CvRTrees::train( const CvMat* trainData, int tflag, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, const CvMat* varType=0, const CvMat* missingDataMask=0, CvRTParams params=CvRTParams() )
-
-.. ocv:function:: bool CvRTrees::train( CvMLData* data, CvRTParams params=CvRTParams() )
-
-.. ocv:pyfunction:: cv2.RTrees.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]]) -> retval
-
-The method :ocv:func:`CvRTrees::train` is very similar to the method :ocv:func:`CvDTree::train` and follows the generic method :ocv:func:`CvStatModel::train` conventions. All the parameters specific to the algorithm training are passed as a :ocv:class:`CvRTParams` instance. The estimate of the training error (``oob-error``) is stored in the protected class member ``oob_error``.
-
-The function is parallelized with the TBB library.
-
-CvRTrees::predict
------------------
-Predicts the output for an input sample.
-
-.. ocv:function:: float CvRTrees::predict( const Mat& sample, const Mat& missing=Mat() ) const
+Creates the empty model
 
-.. ocv:function:: float CvRTrees::predict( const CvMat* sample, const CvMat* missing = 0 ) const
+.. ocv:function:: bool RTrees::create(const RTrees::Params& params=Params())
 
-.. ocv:pyfunction:: cv2.RTrees.predict(sample[, missing]) -> retval
+Use ``StatModel::train`` to train the model, ``StatModel::train<RTrees>(traindata, params)`` to create and train the model, ``StatModel::load<RTrees>(filename)`` to load the pre-trained model.
 
-    :param sample: Sample for classification.
-
-    :param missing: Optional missing measurement mask of the sample.
-
-The input parameters of the prediction method are the same as in :ocv:func:`CvDTree::predict`  but the return value type is different. This method returns the cumulative result from all the trees in the forest (the class that receives the majority of voices, or the mean of the regression function estimates).
-
-
-CvRTrees::predict_prob
-----------------------
-Returns a fuzzy-predicted class label.
-
-.. ocv:function:: float CvRTrees::predict_prob( const cv::Mat& sample, const cv::Mat& missing = cv::Mat() ) const
-
-.. ocv:function:: float CvRTrees::predict_prob( const CvMat* sample, const CvMat* missing = 0 ) const
-
-.. ocv:pyfunction:: cv2.RTrees.predict_prob(sample[, missing]) -> retval
-
-    :param sample: Sample for classification.
-
-    :param missing: Optional missing measurement mask of the sample.
-
-The function works for binary classification problems only. It returns the number between 0 and 1. This number represents probability or confidence of the sample belonging to the second class. It is calculated as the proportion of decision trees that classified the sample to the second class.
-
-
-CvRTrees::getVarImportance
+RTrees::getVarImportance
 ----------------------------
 Returns the variable importance array.
 
-.. ocv:function:: Mat CvRTrees::getVarImportance()
-
-.. ocv:function:: const CvMat* CvRTrees::get_var_importance()
-
-.. ocv:pyfunction:: cv2.RTrees.getVarImportance() -> retval
-
-The method returns the variable importance vector, computed at the training stage when ``CvRTParams::calc_var_importance`` is set to true. If this flag was set to false, the ``NULL`` pointer is returned. This differs from the decision trees where variable importance can be computed anytime after the training.
-
-
-CvRTrees::get_proximity
------------------------
-Retrieves the proximity measure between two training samples.
-
-.. ocv:function:: float CvRTrees::get_proximity( const CvMat* sample1, const CvMat* sample2, const CvMat* missing1 = 0, const CvMat* missing2 = 0 ) const
-
-    :param sample1: The first sample.
-
-    :param sample2: The second sample.
-
-    :param missing1: Optional missing measurement mask of the first sample.
-
-    :param missing2:  Optional missing measurement mask of the second sample.
-
-The method returns proximity measure between any two samples. This is a ratio of those trees in the ensemble, in which the samples fall into the same leaf node, to the total number of the trees.
-
-CvRTrees::calc_error
---------------------
-Returns error of the random forest.
-
-.. ocv:function:: float CvRTrees::calc_error( CvMLData* data, int type, std::vector<float>* resp=0 )
-
-The method is identical to :ocv:func:`CvDTree::calc_error` but uses the random forest as predictor.
-
-
-CvRTrees::get_train_error
--------------------------
-Returns the train error.
-
-.. ocv:function:: float CvRTrees::get_train_error()
-
-The method works for classification problems only. It returns the proportion of incorrectly classified train samples.
-
-
-CvRTrees::get_rng
------------------
-Returns the state of the used random number generator.
-
-.. ocv:function:: CvRNG* CvRTrees::get_rng()
-
-
-CvRTrees::get_tree_count
-------------------------
-Returns the number of trees in the constructed random forest.
-
-.. ocv:function:: int CvRTrees::get_tree_count() const
-
-
-CvRTrees::get_tree
-------------------
-Returns the specific decision tree in the constructed random forest.
+.. ocv:function:: Mat RTrees::getVarImportance() const
 
-.. ocv:function:: CvForestTree* CvRTrees::get_tree(int i) const
+The method returns the variable importance vector, computed at the training stage when ``RTParams::calcVarImportance`` is set to true. If this flag was set to false, the empty matrix is returned.
 
-    :param i: Index of the decision tree.
diff --git a/modules/ml/doc/support_vector_machines.rst b/modules/ml/doc/support_vector_machines.rst
index 9793bd6..003ec4d 100644
--- a/modules/ml/doc/support_vector_machines.rst
+++ b/modules/ml/doc/support_vector_machines.rst
@@ -14,21 +14,21 @@ SVM implementation in OpenCV is based on [LibSVM]_.
 .. [LibSVM] C.-C. Chang and C.-J. Lin. *LIBSVM: a library for support vector machines*, ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. (http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf)
 
 
-CvParamGrid
+ParamGrid
 -----------
-.. ocv:struct:: CvParamGrid
+.. ocv:class:: ParamGrid
 
   The structure represents the logarithmic grid range of statmodel parameters. It is used for optimizing statmodel accuracy by varying model parameters, the accuracy estimate being computed by cross-validation.
 
-  .. ocv:member:: double CvParamGrid::min_val
+  .. ocv:member:: double ParamGrid::minVal
 
      Minimum value of the statmodel parameter.
 
-  .. ocv:member:: double CvParamGrid::max_val
+  .. ocv:member:: double ParamGrid::maxVal
 
      Maximum value of the statmodel parameter.
 
-  .. ocv:member:: double CvParamGrid::step
+  .. ocv:member:: double ParamGrid::logStep
 
      Logarithmic step for iterating the statmodel parameter.
 
@@ -36,88 +36,78 @@ The grid determines the following iteration sequence of the statmodel parameter
 
 .. math::
 
-    (min\_val, min\_val*step, min\_val*{step}^2, \dots,  min\_val*{step}^n),
+    (minVal, minVal*step, minVal*{step}^2, \dots,  minVal*{logStep}^n),
 
 where :math:`n` is the maximal index satisfying
 
 .. math::
 
-    \texttt{min\_val} * \texttt{step} ^n <  \texttt{max\_val}
+    \texttt{minVal} * \texttt{logStep} ^n <  \texttt{maxVal}
 
-The grid is logarithmic, so ``step`` must always be greater then 1.
+The grid is logarithmic, so ``logStep`` must always be greater then 1.
 
-CvParamGrid::CvParamGrid
+ParamGrid::ParamGrid
 ------------------------
 The constructors.
 
-.. ocv:function:: CvParamGrid::CvParamGrid()
+.. ocv:function:: ParamGrid::ParamGrid()
 
-.. ocv:function:: CvParamGrid::CvParamGrid( double min_val, double max_val, double log_step )
+.. ocv:function:: ParamGrid::ParamGrid( double minVal, double maxVal, double logStep )
 
 The full constructor initializes corresponding members. The default constructor creates a dummy grid:
 
 ::
 
-    CvParamGrid::CvParamGrid()
+    ParamGrid::ParamGrid()
     {
-        min_val = max_val = step = 0;
+        minVal = maxVal = 0;
+        logStep = 1;
     }
 
-CvParamGrid::check
-------------------
-Checks validness of the grid.
 
-.. ocv:function:: bool CvParamGrid::check()
-
-Returns ``true`` if the grid is valid and ``false`` otherwise. The grid is valid if and only if:
-
-* Lower bound of the grid is less then the upper one.
-* Lower bound of the grid is positive.
-* Grid step is greater then 1.
-
-CvSVMParams
+SVM::Params
 -----------
-.. ocv:struct:: CvSVMParams
+.. ocv:class:: SVM::Params
 
 SVM training parameters.
 
-The structure must be initialized and passed to the training method of :ocv:class:`CvSVM`.
+The structure must be initialized and passed to the training method of :ocv:class:`SVM`.
 
-CvSVMParams::CvSVMParams
+SVM::Params::Params
 ------------------------
-The constructors.
+The constructors
 
-.. ocv:function:: CvSVMParams::CvSVMParams()
+.. ocv:function:: SVM::Params::Params()
 
-.. ocv:function:: CvSVMParams::CvSVMParams( int svm_type, int kernel_type, double degree, double gamma, double coef0, double Cvalue, double nu, double p, CvMat* class_weights, CvTermCriteria term_crit )
+.. ocv:function:: SVM::Params::Params( int svmType, int kernelType, double degree, double gamma, double coef0, double Cvalue, double nu, double p, const Mat& classWeights, TermCriteria termCrit )
 
-    :param svm_type: Type of a SVM formulation. Possible values are:
+    :param svmType: Type of a SVM formulation. Possible values are:
 
-        * **CvSVM::C_SVC** C-Support Vector Classification. ``n``-class classification (``n`` :math:`\geq` 2), allows imperfect separation of classes with penalty multiplier ``C`` for outliers.
+        * **SVM::C_SVC** C-Support Vector Classification. ``n``-class classification (``n`` :math:`\geq` 2), allows imperfect separation of classes with penalty multiplier ``C`` for outliers.
 
-        * **CvSVM::NU_SVC** :math:`\nu`-Support Vector Classification. ``n``-class classification with possible imperfect separation. Parameter :math:`\nu`  (in the range 0..1, the larger the value, the smoother the decision boundary) is used instead of ``C``.
+        * **SVM::NU_SVC** :math:`\nu`-Support Vector Classification. ``n``-class classification with possible imperfect separation. Parameter :math:`\nu`  (in the range 0..1, the larger the value, the smoother the decision boundary) is used instead of ``C``.
 
-        * **CvSVM::ONE_CLASS** Distribution Estimation (One-class SVM). All the training data are from the same class, SVM builds a boundary that separates the class from the rest of the feature space.
+        * **SVM::ONE_CLASS** Distribution Estimation (One-class SVM). All the training data are from the same class, SVM builds a boundary that separates the class from the rest of the feature space.
 
-        * **CvSVM::EPS_SVR** :math:`\epsilon`-Support Vector Regression. The distance between feature vectors from the training set and the fitting hyper-plane must be less than ``p``. For outliers the penalty multiplier ``C`` is used.
+        * **SVM::EPS_SVR** :math:`\epsilon`-Support Vector Regression. The distance between feature vectors from the training set and the fitting hyper-plane must be less than ``p``. For outliers the penalty multiplier ``C`` is used.
 
-        * **CvSVM::NU_SVR** :math:`\nu`-Support Vector Regression. :math:`\nu` is used instead of ``p``.
+        * **SVM::NU_SVR** :math:`\nu`-Support Vector Regression. :math:`\nu` is used instead of ``p``.
 
         See [LibSVM]_ for details.
 
-    :param kernel_type: Type of a SVM kernel. Possible values are:
+    :param kernelType: Type of a SVM kernel. Possible values are:
 
-        * **CvSVM::LINEAR** Linear kernel. No mapping is done, linear discrimination (or regression) is done in the original feature space. It is the fastest option. :math:`K(x_i, x_j) = x_i^T x_j`.
+        * **SVM::LINEAR** Linear kernel. No mapping is done, linear discrimination (or regression) is done in the original feature space. It is the fastest option. :math:`K(x_i, x_j) = x_i^T x_j`.
 
-        * **CvSVM::POLY** Polynomial kernel: :math:`K(x_i, x_j) = (\gamma x_i^T x_j + coef0)^{degree}, \gamma > 0`.
+        * **SVM::POLY** Polynomial kernel: :math:`K(x_i, x_j) = (\gamma x_i^T x_j + coef0)^{degree}, \gamma > 0`.
 
-        * **CvSVM::RBF** Radial basis function (RBF), a good choice in most cases. :math:`K(x_i, x_j) = e^{-\gamma ||x_i - x_j||^2}, \gamma > 0`.
+        * **SVM::RBF** Radial basis function (RBF), a good choice in most cases. :math:`K(x_i, x_j) = e^{-\gamma ||x_i - x_j||^2}, \gamma > 0`.
 
-        * **CvSVM::SIGMOID** Sigmoid kernel: :math:`K(x_i, x_j) = \tanh(\gamma x_i^T x_j + coef0)`.
+        * **SVM::SIGMOID** Sigmoid kernel: :math:`K(x_i, x_j) = \tanh(\gamma x_i^T x_j + coef0)`.
 
-        * **CvSVM::CHI2** Exponential Chi2 kernel, similar to the RBF kernel: :math:`K(x_i, x_j) = e^{-\gamma \chi^2(x_i,x_j)}, \chi^2(x_i,x_j) = (x_i-x_j)^2/(x_i+x_j), \gamma > 0`.
+        * **SVM::CHI2** Exponential Chi2 kernel, similar to the RBF kernel: :math:`K(x_i, x_j) = e^{-\gamma \chi^2(x_i,x_j)}, \chi^2(x_i,x_j) = (x_i-x_j)^2/(x_i+x_j), \gamma > 0`.
 
-        * **CvSVM::INTER** Histogram intersection kernel. A fast kernel. :math:`K(x_i, x_j) = min(x_i,x_j)`.
+        * **SVM::INTER** Histogram intersection kernel. A fast kernel. :math:`K(x_i, x_j) = min(x_i,x_j)`.
 
     :param degree: Parameter ``degree`` of a kernel function (POLY).
 
@@ -131,19 +121,19 @@ The constructors.
 
     :param p: Parameter :math:`\epsilon` of a SVM optimization problem (EPS_SVR).
 
-    :param class_weights: Optional weights in the C_SVC problem , assigned to particular classes. They are multiplied by ``C`` so the parameter ``C`` of class ``#i`` becomes :math:`class\_weights_i * C`. Thus these weights affect the misclassification penalty for different classes. The larger weight, the larger penalty on misclassification of data from the corresponding class.
+    :param classWeights: Optional weights in the C_SVC problem , assigned to particular classes. They are multiplied by ``C`` so the parameter ``C`` of class ``#i`` becomes ``classWeights(i) * C``. Thus these weights affect the misclassification penalty for different classes. The larger weight, the larger penalty on misclassification of data from the corresponding class.
 
-    :param term_crit: Termination criteria of the iterative SVM training procedure which solves a partial case of constrained quadratic optimization problem. You can specify tolerance and/or the maximum number of iterations.
+    :param termCrit: Termination criteria of the iterative SVM training procedure which solves a partial case of constrained quadratic optimization problem. You can specify tolerance and/or the maximum number of iterations.
 
 The default constructor initialize the structure with following values:
 
 ::
 
-    CvSVMParams::CvSVMParams() :
-        svm_type(CvSVM::C_SVC), kernel_type(CvSVM::RBF), degree(0),
-        gamma(1), coef0(0), C(1), nu(0), p(0), class_weights(0)
+    SVMParams::SVMParams() :
+        svmType(SVM::C_SVC), kernelType(SVM::RBF), degree(0),
+        gamma(1), coef0(0), C(1), nu(0), p(0), classWeights(0)
     {
-        term_crit = cvTermCriteria( CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, FLT_EPSILON );
+        termCrit = TermCriteria( TermCriteria::MAX_ITER+TermCriteria::EPS, 1000, FLT_EPSILON );
     }
 
 A comparison of different kernels on the following 2D test case with four classes. Four C_SVC SVMs have been trained (one against rest) with auto_train. Evaluation on three different kernels (CHI2, INTER, RBF). The color depicts the class with max score. Bright means max-score > 0, dark means max-score < 0.
@@ -151,10 +141,9 @@ A comparison of different kernels on the following 2D test case with four classe
 .. image:: pics/SVM_Comparison.png
 
 
-
-CvSVM
+SVM
 -----
-.. ocv:class:: CvSVM : public CvStatModel
+.. ocv:class:: SVM : public StatModel
 
 Support Vector Machines.
 
@@ -164,55 +153,27 @@ Support Vector Machines.
    * (Python) An example of grid search digit recognition using SVM can be found at opencv_source/samples/python2/digits_adjust.py
    * (Python) An example of video digit recognition using SVM can be found at opencv_source/samples/python2/digits_video.py
 
-CvSVM::CvSVM
-------------
-Default and training constructors.
-
-.. ocv:function:: CvSVM::CvSVM()
-
-.. ocv:function:: CvSVM::CvSVM( const Mat& trainData, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), CvSVMParams params=CvSVMParams() )
-
-.. ocv:function:: CvSVM::CvSVM( const CvMat* trainData, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, CvSVMParams params=CvSVMParams() )
-
-.. ocv:pyfunction:: cv2.SVM([trainData, responses[, varIdx[, sampleIdx[, params]]]]) -> <SVM object>
-
-The constructors follow conventions of :ocv:func:`CvStatModel::CvStatModel`. See :ocv:func:`CvStatModel::train` for parameters descriptions.
-
-CvSVM::train
+SVM::create
 ------------
-Trains an SVM.
-
-.. ocv:function:: bool CvSVM::train( const Mat& trainData, const Mat& responses, const Mat& varIdx=Mat(), const Mat& sampleIdx=Mat(), CvSVMParams params=CvSVMParams() )
-
-.. ocv:function:: bool CvSVM::train( const CvMat* trainData, const CvMat* responses, const CvMat* varIdx=0, const CvMat* sampleIdx=0, CvSVMParams params=CvSVMParams() )
-
-.. ocv:pyfunction:: cv2.SVM.train(trainData, responses[, varIdx[, sampleIdx[, params]]]) -> retval
-
-The method trains the SVM model. It follows the conventions of the generic :ocv:func:`CvStatModel::train` approach with the following limitations:
-
-* Only the ``CV_ROW_SAMPLE`` data layout is supported.
-
-* Input variables are all ordered.
+Creates empty model
 
-* Output variables can be either categorical (``params.svm_type=CvSVM::C_SVC`` or ``params.svm_type=CvSVM::NU_SVC``), or ordered (``params.svm_type=CvSVM::EPS_SVR`` or ``params.svm_type=CvSVM::NU_SVR``), or not required at all (``params.svm_type=CvSVM::ONE_CLASS``).
+.. ocv:function:: Ptr<SVM> SVM::create(const Params& p=Params(), const Ptr<Kernel>& customKernel=Ptr<Kernel>())
 
-* Missing measurements are not supported.
+    :param p: SVM parameters
+    :param customKernel: the optional custom kernel to use. It must implement ``SVM::Kernel`` interface.
 
-All the other parameters are gathered in the
-:ocv:class:`CvSVMParams` structure.
+Use ``StatModel::train`` to train the model, ``StatModel::train<RTrees>(traindata, params)`` to create and train the model, ``StatModel::load<RTrees>(filename)`` to load the pre-trained model. Since SVM has several parameters, you may want to find the best parameters for your problem. It can be done with ``SVM::trainAuto``.
 
 
-CvSVM::train_auto
+SVM::trainAuto
 -----------------
 Trains an SVM with optimal parameters.
 
-.. ocv:function:: bool CvSVM::train_auto( const Mat& trainData, const Mat& responses, const Mat& varIdx, const Mat& sampleIdx, CvSVMParams params, int k_fold = 10, CvParamGrid Cgrid = CvSVM::get_default_grid(CvSVM::C), CvParamGrid gammaGrid = CvSVM::get_default_grid(CvSVM::GAMMA), CvParamGrid pGrid = CvSVM::get_default_grid(CvSVM::P), CvParamGrid nuGrid  = CvSVM::get_default_grid(CvSVM::NU), CvParamGrid coeffGrid = CvSVM::get_default_grid(CvSVM::COEF), CvParamGrid degreeGrid = CvSVM::get_default_grid(CvSVM::DEGREE), bool balanced=false)
+.. ocv:function:: bool SVM::trainAuto( const Ptr<TrainData>& data, int kFold = 10, ParamGrid Cgrid = SVM::getDefaultGrid(SVM::C), ParamGrid gammaGrid  = SVM::getDefaultGrid(SVM::GAMMA), ParamGrid pGrid = SVM::getDefaultGrid(SVM::P), ParamGrid nuGrid = SVM::getDefaultGrid(SVM::NU), ParamGrid coeffGrid = SVM::getDefaultGrid(SVM::COEF), ParamGrid degreeGrid = SVM::getDefaultGrid(SVM::DEGREE), bool balanced=false)
 
-.. ocv:function:: bool CvSVM::train_auto( const CvMat* trainData, const CvMat* responses, const CvMat* varIdx, const CvMat* sampleIdx, CvSVMParams params, int kfold = 10, CvParamGrid Cgrid = get_default_grid(CvSVM::C), CvParamGrid gammaGrid = get_default_grid(CvSVM::GAMMA), CvParamGrid pGrid = get_default_grid(CvSVM::P), CvParamGrid nuGrid = get_default_grid(CvSVM::NU), CvParamGrid coeffGrid = get_default_grid(CvSVM::COEF), CvParamGrid degreeGrid = get_default_grid(CvSVM::DEGREE), bool balanced=false )
+    :param data: the training data that can be constructed using ``TrainData::create`` or ``TrainData::loadFromCSV``.
 
-.. ocv:pyfunction:: cv2.SVM.train_auto(trainData, responses, varIdx, sampleIdx, params[, k_fold[, Cgrid[, gammaGrid[, pGrid[, nuGrid[, coeffGrid[, degreeGrid[, balanced]]]]]]]]) -> retval
-
-    :param k_fold: Cross-validation parameter. The training set is divided into ``k_fold`` subsets. One subset is used to test the model, the others form the train set. So, the SVM algorithm is executed ``k_fold`` times.
+    :param kFold: Cross-validation parameter. The training set is divided into ``kFold`` subsets. One subset is used to test the model, the others form the train set. So, the SVM algorithm is executed ``kFold`` times.
 
     :param \*Grid: Iteration grid for the corresponding SVM parameter.
 
@@ -220,97 +181,76 @@ Trains an SVM with optimal parameters.
 
 The method trains the SVM model automatically by choosing the optimal
 parameters ``C``, ``gamma``, ``p``, ``nu``, ``coef0``, ``degree`` from
-:ocv:class:`CvSVMParams`. Parameters are considered optimal
+:ocv:class:`SVMParams`. Parameters are considered optimal
 when the cross-validation estimate of the test set error
 is minimal.
 
-If there is no need to optimize a parameter, the corresponding grid step should be set to any value less than or equal to 1. For example, to avoid optimization in ``gamma``, set ``gamma_grid.step = 0``, ``gamma_grid.min_val``, ``gamma_grid.max_val`` as arbitrary numbers. In this case, the value ``params.gamma`` is taken for ``gamma``.
+If there is no need to optimize a parameter, the corresponding grid step should be set to any value less than or equal to 1. For example, to avoid optimization in ``gamma``, set ``gammaGrid.step = 0``, ``gammaGrid.minVal``, ``gamma_grid.maxVal`` as arbitrary numbers. In this case, the value ``params.gamma`` is taken for ``gamma``.
 
 And, finally, if the optimization in a parameter is required but
-the corresponding grid is unknown, you may call the function :ocv:func:`CvSVM::get_default_grid`. To generate a grid, for example, for ``gamma``, call ``CvSVM::get_default_grid(CvSVM::GAMMA)``.
+the corresponding grid is unknown, you may call the function :ocv:func:`SVM::getDefaulltGrid`. To generate a grid, for example, for ``gamma``, call ``SVM::getDefaulltGrid(SVM::GAMMA)``.
 
 This function works for the classification
-(``params.svm_type=CvSVM::C_SVC`` or ``params.svm_type=CvSVM::NU_SVC``)
+(``params.svmType=SVM::C_SVC`` or ``params.svmType=SVM::NU_SVC``)
 as well as for the regression
-(``params.svm_type=CvSVM::EPS_SVR`` or ``params.svm_type=CvSVM::NU_SVR``). If ``params.svm_type=CvSVM::ONE_CLASS``, no optimization is made and the usual SVM with parameters specified in ``params`` is executed.
-
-CvSVM::predict
---------------
-Predicts the response for input sample(s).
-
-.. ocv:function:: float CvSVM::predict( const Mat& sample, bool returnDFVal=false ) const
-
-.. ocv:function:: float CvSVM::predict( const CvMat* sample, bool returnDFVal=false ) const
-
-.. ocv:function:: float CvSVM::predict( const CvMat* samples, CvMat* results, bool returnDFVal=false ) const
-
-.. ocv:pyfunction:: cv2.SVM.predict(sample[, returnDFVal]) -> retval
-
-.. ocv:pyfunction:: cv2.SVM.predict_all(samples[, results]) -> results
-
-    :param sample: Input sample for prediction.
-
-    :param samples: Input samples for prediction.
+(``params.svmType=SVM::EPS_SVR`` or ``params.svmType=SVM::NU_SVR``). If ``params.svmType=SVM::ONE_CLASS``, no optimization is made and the usual SVM with parameters specified in ``params`` is executed.
 
-    :param returnDFVal: Specifies a type of the return value. If ``true`` and the problem is 2-class classification then the method returns the decision function value that is signed distance to the margin, else the function returns a class label (classification) or estimated function value (regression).
 
-    :param results: Output prediction responses for corresponding samples.
-
-If you pass one sample then prediction result is returned. If you want to get responses for several samples then you should pass the ``results`` matrix where prediction results will be stored.
-
-The function is parallelized with the TBB library.
-
-
-CvSVM::get_default_grid
+SVM::getDefaulltGrid
 -----------------------
 Generates a grid for SVM parameters.
 
-.. ocv:function:: CvParamGrid CvSVM::get_default_grid( int param_id )
+.. ocv:function:: ParamGrid SVM::getDefaulltGrid( int param_id )
 
     :param param_id: SVM parameters IDs that must be one of the following:
 
-            * **CvSVM::C**
+            * **SVM::C**
 
-            * **CvSVM::GAMMA**
+            * **SVM::GAMMA**
 
-            * **CvSVM::P**
+            * **SVM::P**
 
-            * **CvSVM::NU**
+            * **SVM::NU**
 
-            * **CvSVM::COEF**
+            * **SVM::COEF**
 
-            * **CvSVM::DEGREE**
+            * **SVM::DEGREE**
 
         The grid is generated for the parameter with this ID.
 
-The function generates a grid for the specified parameter of the SVM algorithm. The grid may be passed to the function :ocv:func:`CvSVM::train_auto`.
+The function generates a grid for the specified parameter of the SVM algorithm. The grid may be passed to the function :ocv:func:`SVM::trainAuto`.
 
-CvSVM::get_params
+SVM::getParams
 -----------------
 Returns the current SVM parameters.
 
-.. ocv:function:: CvSVMParams CvSVM::get_params() const
+.. ocv:function:: SVM::Params SVM::getParams() const
 
-This function may be used to get the optimal parameters obtained while automatically training :ocv:func:`CvSVM::train_auto`.
+This function may be used to get the optimal parameters obtained while automatically training :ocv:func:`SVM::train_auto`.
 
-CvSVM::get_support_vector
+SVM::getSupportVectors
 --------------------------
-Retrieves a number of support vectors and the particular vector.
-
-.. ocv:function:: int CvSVM::get_support_vector_count() const
+Retrieves all the support vectors
 
-.. ocv:function:: const float* CvSVM::get_support_vector(int i) const
+.. ocv:function:: Mat SVM::getSupportVectors() const
 
-.. ocv:pyfunction:: cv2.SVM.get_support_vector_count() -> retval
+The method returns all the support vector as floating-point matrix, where support vectors are stored as matrix rows.
 
-    :param i: Index of the particular support vector.
-
-The methods can be used to retrieve a set of support vectors.
-
-CvSVM::get_var_count
+SVM::getDecisionFunction
+--------------------------
+Retrieves the decision function
+
+.. ocv:function:: double SVM::getDecisionFunction(int i, OutputArray alpha, OutputArray svidx) const
+
+    :param i: the index of the decision function. If the problem solved is regression, 1-class or 2-class classification, then there will be just one decision function and the index should always be 0. Otherwise, in the case of N-class classification, there will be N*(N-1)/2 decision functions.
+    
+    :param alpha: the optional output vector for weights, corresponding to different support vectors. In the case of linear SVM all the alpha's will be 1's.
+    
+    :param svidx: the optional output vector of indices of support vectors within the matrix of support vectors (which can be retrieved by ``SVM::getSupportVectors``). In the case of linear SVM each decision function consists of a single "compressed" support vector.
+    
+The method returns ``rho`` parameter of the decision function, a scalar subtracted from the weighted sum of kernel responses.
+    
+Prediction with SVM
 --------------------
-Returns the number of used features (variables count).
-
-.. ocv:function:: int CvSVM::get_var_count() const
 
-.. ocv:pyfunction:: cv2.SVM.get_var_count() -> retval
+StatModel::predict(samples, results, flags) should be used. Pass ``flags=StatModel::RAW_OUTPUT`` to get the raw response from SVM (in the case of regression, 1-class or 2-class classification problem).
-- 
2.7.4