--- /dev/null
+.. _Gradient Boosted Trees:\r
+\r
+Gradient Boosted Trees\r
+======================\r
+\r
+Gradient Boosted Trees (GBT) is a generalized boosting algorithm, introduced by\r
+Jerome Friedman: http://www.salfordsystems.com/doc/GreedyFuncApproxSS.pdf .\r
+In contrast to AdaBoost.M1 algorithm GBT can deal with both multiclass\r
+classification and regression problems. More than that it can use any\r
+differential loss function, some popular ones are implemented.\r
+Decision trees (:ref:`CvDTree`) usage as base learners allows to process ordered\r
+and categorical variables.\r
+\r
+\r
+.. _Training the GBT model:\r
+\r
+Training the GBT model\r
+----------------------\r
+\r
+Gradient Boosted Trees model represents an ensemble of single regression trees,\r
+that are built in a greedy fashion. Training procedure is an iterative proccess\r
+similar to the numerical optimazation via gradient descent method. Summary loss\r
+on the training set depends only from the current model predictions on the\r
+thaining samples, in other words\r
+:math:`\sum^N_{i=1}L(y_i, F(x_i)) \equiv \mathcal{L}(F(x_1), F(x_2), ... , F(x_N))\r
+\equiv \mathcal{L}(F)`. And the :math:`\mathcal{L}(F)`\r
+gradient can be computed as follows:\r
+\r
+.. math::\r
+ grad(\mathcal{L}(F)) = \left( \dfrac{\partial{L(y_1, F(x_1))}}{\partial{F(x_1)}},\r
+ \dfrac{\partial{L(y_2, F(x_2))}}{\partial{F(x_2)}}, ... ,\r
+ \dfrac{\partial{L(y_N, F(x_N))}}{\partial{F(x_N)}} \right) .\r
+On every training step a single regression tree is built to predict an\r
+antigradient vector components. Step length is computed corresponding to the\r
+loss function and separately for every region determined by the tree leaf, and\r
+can be eliminated by changing leaves' values directly.\r
+\r
+The main scheme of the training proccess is shown below.\r
+\r
+#.\r
+ Find the best constant model.\r
+#.\r
+ For :math:`i` in :math:`[1,M]`:\r
+\r
+ #.\r
+ Compute the antigradient.\r
+ #.\r
+ Grow a regression tree to predict antigradient components.\r
+ #.\r
+ Change values in the tree leaves.\r
+ #.\r
+ Add the tree to the model.\r
+\r
+\r
+The following loss functions are implemented:\r
+\r
+*for regression problems:*\r
+\r
+#.\r
+ Squared loss (``CvGBTrees::SQUARED_LOSS``):\r
+ :math:`L(y,f(x))=\dfrac{1}{2}(y-f(x))^2`\r
+#.\r
+ Absolute loss (``CvGBTrees::ABSOLUTE_LOSS``):\r
+ :math:`L(y,f(x))=|y-f(x)|`\r
+#.\r
+ Huber loss (``CvGBTrees::HUBER_LOSS``):\r
+ :math:`L(y,f(x)) = \left\{ \begin{array}{lr}\r
+ \delta\cdot\left(|y-f(x)|-\dfrac{\delta}{2}\right) & : |y-f(x)|>\delta\\\r
+ \dfrac{1}{2}\cdot(y-f(x))^2 & : |y-f(x)|\leq\delta \end{array} \right.`,\r
+ where :math:`\delta` is the :math:`\alpha`-quantile estimation of the\r
+ :math:`|y-f(x)|`. In the current implementation :math:`\alpha=0.2`.\r
+\r
+*for classification problems:*\r
+\r
+4.\r
+ Deviance or cross-entropy loss (``CvGBTrees::DEVIANCE_LOSS``):\r
+ :math:`K` functions are built, one function for each output class, and\r
+ :math:`L(y,f_1(x),...,f_K(x)) = -\sum^K_{k=0}1(y=k)\ln{p_k(x)}`,\r
+ where :math:`p_k(x)=\dfrac{\exp{f_k(x)}}{\sum^K_{i=1}\exp{f_i(x)}}`\r
+ is the estimation of the probability that :math:`y=k`.\r
+\r
+In the end we get the model in the following form:\r
+\r
+.. math:: f(x) = f_0 + \nu\cdot\sum^M_{i=1}T_i(x) ,\r
+where :math:`f_0` is the initial guess (the best constant model) and :math:`\nu`\r
+is a regularization parameter from the interval :math:`(0,1]`, futher called\r
+*shrinkage*.\r
+\r
+\r
+.. _Predicting with GBT model:\r
+\r
+Predicting with GBT model\r
+-------------------------\r
+\r
+To get the GBT model prediciton it is needed to compute the sum of responses of\r
+all the trees in the ensemble. For regression problems it is the answer, and\r
+for classification problems the result is :math:`\arg\max_{i=1..K}(f_i(x))`.\r
+\r
+\r
+.. highlight:: cpp\r
+\r
+\r
+.. index:: CvGBTreesParams\r
+.. _CvGBTreesParams:\r
+\r
+CvGBTreesParams\r
+---------------\r
+.. c:type:: CvGBTreesParams\r
+\r
+GBT training parameters ::\r
+\r
+ struct CvGBTreesParams : public CvDTreeParams\r
+ {\r
+ int weak_count;\r
+ int loss_function_type;\r
+ float subsample_portion;\r
+ float shrinkage;\r
+\r
+ CvGBTreesParams();\r
+ CvGBTreesParams( int loss_function_type, int weak_count, float shrinkage,\r
+ float subsample_portion, int max_depth, bool use_surrogates );\r
+ };\r
+\r
+The structure contains parameters for each sigle decision tree in the ensemble,\r
+as well as the whole model characteristics. The structure is derived from\r
+:ref:`CvDTreeParams` but not all of the decision tree parameters are supported:\r
+cross-validation, pruning and class priorities are not used. The whole\r
+parameters list is shown below:\r
+\r
+``weak_count``\r
+\r
+ The count of boosting algorithm iterations. ``weak_count*K`` -- is the total\r
+ count of trees in the GBT model, where ``K`` is the output classes count\r
+ (equal to one in the case of regression).\r
+ \r
+``loss_function_type``\r
+\r
+ The type of the loss function used for training\r
+ (see :ref:`Training the GBT model`). It must be one of the\r
+ following: ``CvGBTrees::SQUARED_LOSS``, ``CvGBTrees::ABSOLUTE_LOSS``,\r
+ ``CvGBTrees::HUBER_LOSS``, ``CvGBTrees::DEVIANCE_LOSS``. The first three\r
+ ones are used for the case of regression problems, and the last one for\r
+ classification.\r
+ \r
+``shrinkage``\r
+\r
+ Regularization parameter (see :ref:`Training the GBT model`).\r
+ \r
+``subsample_portion``\r
+\r
+ The portion of the whole training set used on each algorithm iteration.\r
+ Subset is generated randomly\r
+ (For more information see\r
+ http://www.salfordsystems.com/doc/StochasticBoostingSS.pdf).\r
+\r
+``max_depth``\r
+\r
+ The maximal depth of each decision tree in the ensemble (see :ref:`CvDTree`).\r
+\r
+``use_surrogates``\r
+\r
+ If ``true`` surrogate splits are built (see :ref:`CvDTree`).\r
+ \r
+By default the following constructor is used:\r
+\r
+.. code-block:: cpp\r
+\r
+ CvGBTreesParams(CvGBTrees::SQUARED_LOSS, 200, 0.8f, 0.01f, 3, false)\r
+ : CvDTreeParams( 3, 10, 0, false, 10, 0, false, false, 0 )\r
+\r
+\r
+\r
+.. index:: CvGBTrees\r
+.. _CvGBTrees:\r
+\r
+CvGBTrees\r
+---------\r
+.. c:type:: CvGBTrees\r
+\r
+GBT model ::\r
+\r
+ class CvGBTrees : public CvStatModel\r
+ {\r
+ public:\r
+\r
+ enum {SQUARED_LOSS=0, ABSOLUTE_LOSS, HUBER_LOSS=3, DEVIANCE_LOSS};\r
+\r
+ CvGBTrees();\r
+ CvGBTrees( const cv::Mat& trainData, int tflag,\r
+ const Mat& responses, const Mat& varIdx=Mat(),\r
+ const Mat& sampleIdx=Mat(), const cv::Mat& varType=Mat(),\r
+ const Mat& missingDataMask=Mat(),\r
+ CvGBTreesParams params=CvGBTreesParams() );\r
+\r
+ virtual ~CvGBTrees();\r
+ virtual bool train( const Mat& trainData, int tflag,\r
+ const Mat& responses, const Mat& varIdx=Mat(),\r
+ const Mat& sampleIdx=Mat(), const Mat& varType=Mat(),\r
+ const Mat& missingDataMask=Mat(),\r
+ CvGBTreesParams params=CvGBTreesParams(),\r
+ bool update=false );\r
+ \r
+ virtual bool train( CvMLData* data,\r
+ CvGBTreesParams params=CvGBTreesParams(),\r
+ bool update=false );\r
+\r
+ virtual float predict( const Mat& sample, const Mat& missing=Mat(),\r
+ const Range& slice = Range::all(),\r
+ int k=-1 ) const;\r
+\r
+ virtual void clear();\r
+\r
+ virtual float calc_error( CvMLData* _data, int type,\r
+ std::vector<float> *resp = 0 );\r
+\r
+ virtual void write( CvFileStorage* fs, const char* name ) const;\r
+\r
+ virtual void read( CvFileStorage* fs, CvFileNode* node );\r
+\r
+ protected:\r
+ \r
+ CvDTreeTrainData* data;\r
+ CvGBTreesParams params;\r
+ CvSeq** weak;\r
+ Mat& orig_response;\r
+ Mat& sum_response;\r
+ Mat& sum_response_tmp;\r
+ Mat& weak_eval;\r
+ Mat& sample_idx;\r
+ Mat& subsample_train;\r
+ Mat& subsample_test;\r
+ Mat& missing;\r
+ Mat& class_labels;\r
+ RNG* rng;\r
+ int class_count;\r
+ float delta;\r
+ float base_value;\r
+ \r
+ ...\r
+\r
+ };\r
+\r
+\r
+ \r
+.. index:: CvGBTrees::train\r
+\r
+.. _CvGBTrees::train:\r
+\r
+CvGBTrees::train\r
+----------------\r
+.. c:function:: bool train(const Mat & trainData, int tflag, const Mat & responses, const Mat & varIdx=Mat(), const Mat & sampleIdx=Mat(), const Mat & varType=Mat(), const Mat & missingDataMask=Mat(), CvGBTreesParams params=CvGBTreesParams(), bool update=false)\r
+\r
+.. c:function:: bool train(CvMLData* data, CvGBTreesParams params=CvGBTreesParams(), bool update=false)\r
+ \r
+ Trains a Gradient boosted tree model.\r
+ \r
+The first train method follows the common template (see :ref:`CvStatModel::train`).\r
+Both ``tflag`` values (``CV_ROW_SAMPLE``, ``CV_COL_SAMPLE``) are supported.\r
+``trainData`` must be of ``CV_32F`` type. ``responses`` must be a matrix of type\r
+``CV_32S`` or ``CV_32F``, in both cases it is converted into the ``CV_32F``\r
+matrix inside the training procedure. ``varIdx`` and ``sampleIdx`` must be a\r
+list of indices (``CV_32S``), or a mask (``CV_8U`` or ``CV_8S``). ``update`` is\r
+a dummy parameter.\r
+\r
+The second form of :ref:`CvGBTrees::train` function uses :ref:`CvMLData` as a\r
+data set container. ``update`` is still a dummy parameter. \r
+\r
+All parameters specific to the GBT model are passed into the training function\r
+as a :ref:`CvGBTreesParams` structure.\r
+\r
+\r
+.. index:: CvGBTrees::predict\r
+\r
+.. _CvGBTrees::predict:\r
+\r
+CvGBTrees::predict\r
+------------------\r
+.. c:function:: float predict(const Mat & sample, const Mat & missing=Mat(), const Range & slice = Range::all(), int k=-1) const\r
+\r
+ Predicts a response for an input sample.\r
+ \r
+The method predicts the response, corresponding to the given sample\r
+(see :ref:`Predicting with GBT model`).\r
+The result is either the class label or the estimated function value.\r
+:c:func:`predict` method allows to use the parallel version of the GBT model\r
+prediction if the OpenCV is built with the TBB library. In this case predicitons\r
+of single trees are computed in a parallel fashion.\r
+\r
+``sample``\r
+\r
+ An input feature vector, that has the same format as every training set\r
+ element. Hence, if not all the variables were actualy used while training,\r
+ ``sample`` have to contain fictive values on the appropriate places.\r
+ \r
+``missing``\r
+\r
+ The missing values mask. The one dimentional matrix of the same size as\r
+ ``sample`` having a ``CV_8U`` type. ``1`` corresponds to the missing value\r
+ in the same position in the ``sample`` vector. If there are no missing values\r
+ in the feature vector empty matrix can be passed instead of the missing mask.\r
+ \r
+``weak_responses``\r
+\r
+ In addition to the prediciton of the whole model all the trees' predcitions\r
+ can be obtained by passing a ``weak_responses`` matrix with :math:`K` rows,\r
+ where :math:`K` is the output classes count (1 for the case of regression)\r
+ and having as many columns as the ``slice`` length.\r
+ \r
+``slice``\r
+ \r
+ Defines the part of the ensemble used for prediction.\r
+ All trees are used when ``slice = Range::all()``. This parameter is useful to\r
+ get predictions of the GBT models with different ensemble sizes learning\r
+ only the one model actually.\r
+ \r
+``k``\r
+ \r
+ In the case of the classification problem not the one, but :math:`K` tree\r
+ ensembles are built (see :ref:`Training the GBT model`). By passing this\r
+ parameter the ouput can be changed to sum of the trees' predictions in the\r
+ ``k``'th ensemble only. To get the total GBT model prediction ``k`` value\r
+ must be -1. For regression problems ``k`` have to be equal to -1 also.\r
+ \r
+\r
+ \r
+.. index:: CvGBTrees::clear\r
+\r
+.. _CvGBTrees::clear:\r
+\r
+CvGBTrees::clear\r
+----------------\r
+.. c:function:: void clear()\r
+\r
+ Clears the model.\r
+ \r
+Deletes the data set information, all the weak models and sets all internal\r
+variables to the initial state. Is called in :ref:`CvGBTrees::train` and in the\r
+destructor.\r
+\r
+\r
+.. index:: CvGBTrees::calc_error\r
+\r
+.. _CvGBTrees::calc_error:\r
+\r
+CvGBTrees::calc_error\r
+---------------------\r
+.. c:function:: float calc_error( CvMLData* _data, int type, std::vector<float> *resp = 0 )\r
+\r
+ Calculates training or testing error.\r
+ \r
+If the :ref:`CvMLData` data is used to store the data set :c:func:`calc_error` can be\r
+used to get the training or testing error easily and (optionally) all predictions\r
+on the training/testing set. If TBB library is used, the error is computed in a\r
+parallel way: predictions for different samples are computed at the same time.\r
+In the case of regression problem mean squared error is returned. For\r
+classifications the result is the misclassification error in percent.\r
+\r
+``_data``\r
+\r
+ Data set.\r
+ \r
+``type``\r
+ \r
+ Defines what error should be computed: train (``CV_TRAIN_ERROR``) or test\r
+ (``CV_TEST_ERROR``).\r
+\r
+``resp``\r
+ \r
+ If not ``0`` a vector of predictions on the corresponding data set is\r
+ returned.\r
+\r
support_vector_machines
decision_trees
boosting
+ gradient_boosted_trees
random_trees
expectation_maximization
neural_networks
-
// Response value prediction
//
// API
- // virtual float predict( const CvMat* sample, const CvMat* missing=0,
+ // virtual float predict_serial( const CvMat* sample, const CvMat* missing=0,
CvMat* weak_responses=0, CvSlice slice = CV_WHOLE_SEQ,
int k=-1 ) const;
// RESULT
// Predicted value.
*/
+ virtual float predict_serial( const CvMat* sample, const CvMat* missing=0,
+ CvMat* weakResponses=0, CvSlice slice = CV_WHOLE_SEQ,
+ int k=-1 ) const;
+
+ /*
+ // Response value prediction.
+ // Parallel version (in the case of TBB existence)
+ //
+ // API
+ // virtual float predict( const CvMat* sample, const CvMat* missing=0,
+ CvMat* weak_responses=0, CvSlice slice = CV_WHOLE_SEQ,
+ int k=-1 ) const;
+
+ // INPUT
+ // sample - input sample of the same type as in the training set.
+ // missing - missing values mask. missing=0 if there are no
+ // missing values in sample vector.
+ // weak_responses - predictions of all of the trees.
+ // not implemented (!)
+ // slice - part of the ensemble used for prediction.
+ // slice = CV_WHOLE_SEQ when all trees are used.
+ // k - number of ensemble used.
+ // k is in {-1,0,1,..,<count of output classes-1>}.
+ // in the case of classification problem
+ // <count of output classes-1> ensembles are built.
+ // If k = -1 ordinary prediction is the result,
+ // otherwise function gives the prediction of the
+ // k-th ensemble only.
+ // OUTPUT
+ // RESULT
+ // Predicted value.
+ */
virtual float predict( const CvMat* sample, const CvMat* missing=0,
CvMat* weakResponses=0, CvSlice slice = CV_WHOLE_SEQ,
int k=-1 ) const;
/*
- // Delete all temporary data.
+ // Deletes all the data.
//
// API
// virtual void clear();
// INPUT
// OUTPUT
// delete data, weak, orig_response, sum_response,
- // weak_eval, ubsample_train, subsample_test,
+ // weak_eval, subsample_train, subsample_test,
// sample_idx, missing, lass_labels
// delta = 0.0
// RESULT
//
// INPUT
// data - dataset
- // type - defines which error is to compute^ train (CV_TRAIN_ERROR) or
+ // type - defines which error is to compute: train (CV_TRAIN_ERROR) or
// test (CV_TEST_ERROR).
// OUTPUT
// resp - vector of predicitons
virtual float calc_error( CvMLData* _data, int type,
std::vector<float> *resp = 0 );
-
/*
//
// Write parameters of the gtb model and data. Write learned model.
CvMat* orig_response;
CvMat* sum_response;
CvMat* sum_response_tmp;
- CvMat* weak_eval;
CvMat* sample_idx;
CvMat* subsample_train;
CvMat* subsample_test;
weak = 0;\r
default_model_name = "my_boost_tree";\r
orig_response = sum_response = sum_response_tmp = 0;\r
- weak_eval = subsample_train = subsample_test = 0;\r
+ subsample_train = subsample_test = 0;\r
missing = sample_idx = 0;\r
class_labels = 0;\r
class_count = 1;\r
cvReleaseMat( &orig_response );\r
cvReleaseMat( &sum_response );\r
cvReleaseMat( &sum_response_tmp );\r
- cvReleaseMat( &weak_eval );\r
cvReleaseMat( &subsample_train );\r
cvReleaseMat( &subsample_test );\r
cvReleaseMat( &sample_idx );\r
data = 0;\r
default_model_name = "my_boost_tree";\r
orig_response = sum_response = sum_response_tmp = 0;\r
- weak_eval = subsample_train = subsample_test = 0;\r
+ subsample_train = subsample_test = 0;\r
missing = sample_idx = 0;\r
class_labels = 0;\r
class_count = 1;\r
{\r
int sample_idx_len = get_len(_sample_idx);\r
\r
- switch (CV_ELEM_SIZE(_sample_idx->type))\r
+ switch (CV_MAT_TYPE(_sample_idx->type))\r
{\r
case CV_32SC1:\r
{\r
\r
//===========================================================================\r
\r
-float CvGBTrees::predict( const CvMat* _sample, const CvMat* _missing,\r
- CvMat* /*weak_responses*/, CvSlice slice, int k) const \r
+float CvGBTrees::predict_serial( const CvMat* _sample, const CvMat* _missing,\r
+ CvMat* weak_responses, CvSlice slice, int k) const \r
{\r
float result = 0.0f;\r
\r
if (!weak) return 0.0f;\r
\r
- float* sum = new float[class_count];\r
- for (int i=0; i<class_count; ++i)\r
- sum[i] = base_value;\r
-\r
CvSeqReader reader;\r
int weak_count = cvSliceLength( slice, weak[class_count-1] );\r
CvDTree* tree;\r
+ \r
+ if (weak_responses)\r
+ {\r
+ if (CV_MAT_TYPE(weak_responses->type) != CV_32F)\r
+ return 0.0f;\r
+ if ((k >= 0) && (k<class_count) && (weak_responses->rows != 1))\r
+ return 0.0f;\r
+ if ((k == -1) && (weak_responses->rows != class_count))\r
+ return 0.0f;\r
+ if (weak_responses->cols != weak_count)\r
+ return 0.0f;\r
+ }\r
+ \r
+ float* sum = new float[class_count];\r
+ memset(sum, 0, class_count*sizeof(float));\r
\r
for (int i=0; i<class_count; ++i)\r
{\r
for (int j=0; j<weak_count; ++j)\r
{\r
CV_READ_SEQ_ELEM( tree, reader );\r
- sum[i] += params.shrinkage *\r
- (float)(tree->predict(_sample, _missing)->value);\r
+ float p = (float)(tree->predict(_sample, _missing)->value);\r
+ sum[i] += params.shrinkage * p;\r
+ if (weak_responses)\r
+ weak_responses->data.fl[i*weak_count+j] = p;\r
}\r
}\r
}\r
+ \r
+ for (int i=0; i<class_count; ++i)\r
+ sum[i] += base_value;\r
\r
if (class_count == 1)\r
{\r
return float(orig_class_label);\r
}\r
\r
+\r
+class Tree_predictor\r
+{\r
+private:\r
+ pCvSeq* weak;\r
+ float* sum;\r
+ const int k;\r
+ const CvMat* sample;\r
+ const CvMat* missing;\r
+ const float shrinkage;\r
+ \r
+#ifdef HAVE_TBB\r
+ static tbb::spin_mutex SumMutex;\r
+#endif\r
+\r
+\r
+public:\r
+ Tree_predictor() : weak(0), sum(0), k(0), sample(0), missing(0), shrinkage(1.0f) {}\r
+ Tree_predictor(pCvSeq* _weak, const int _k, const float _shrinkage,\r
+ const CvMat* _sample, const CvMat* _missing, float* _sum ) :\r
+ weak(_weak), k(_k), sample(_sample),\r
+ missing(_missing), sum(_sum), shrinkage(_shrinkage)\r
+ {}\r
+ \r
+ Tree_predictor( const Tree_predictor& p, cv::Split ) :\r
+ weak(p.weak), k(p.k), sample(p.sample),\r
+ missing(p.missing), sum(p.sum), shrinkage(p.shrinkage)\r
+ {}\r
+\r
+ Tree_predictor& operator=( const Tree_predictor& )\r
+ {}\r
+ \r
+ virtual void operator()(const cv::BlockedRange& range) const\r
+ {\r
+#ifdef HAVE_TBB\r
+ tbb::spin_mutex::scoped_lock lock;\r
+#endif\r
+ CvSeqReader reader;\r
+ int begin = range.begin();\r
+ int end = range.end();\r
+ \r
+ int weak_count = end - begin;\r
+ CvDTree* tree;\r
+\r
+ for (int i=0; i<k; ++i)\r
+ {\r
+ float tmp_sum = 0.0f;\r
+ if ((weak[i]) && (weak_count))\r
+ {\r
+ cvStartReadSeq( weak[i], &reader ); \r
+ cvSetSeqReaderPos( &reader, begin );\r
+ for (int j=0; j<weak_count; ++j)\r
+ {\r
+ CV_READ_SEQ_ELEM( tree, reader );\r
+ tmp_sum += shrinkage*(float)(tree->predict(sample, missing)->value);\r
+ }\r
+ }\r
+#ifdef HAVE_TBB\r
+ lock.acquire(SumMutex);\r
+ sum[i] += tmp_sum;\r
+ lock.release();\r
+#else\r
+ sum[i] += tmp_sum;\r
+#endif\r
+ }\r
+ } // Tree_predictor::operator()\r
+ \r
+}; // class Tree_predictor\r
+\r
+\r
+#ifdef HAVE_TBB\r
+tbb::spin_mutex Tree_predictor::SumMutex;\r
+#endif\r
+\r
+\r
+\r
+float CvGBTrees::predict( const CvMat* _sample, const CvMat* _missing,\r
+ CvMat* /*weak_responses*/, CvSlice slice, int k) const \r
+ {\r
+ float result = 0.0f;\r
+ if (!weak) return 0.0f;\r
+ float* sum = new float[class_count];\r
+ for (int i=0; i<class_count; ++i)\r
+ sum[i] = 0.0f;\r
+ int begin = slice.start_index;\r
+ int end = begin + cvSliceLength( slice, weak[0] );\r
+ \r
+ pCvSeq* weak_seq = weak;\r
+ Tree_predictor predictor = Tree_predictor(weak_seq, class_count,\r
+ params.shrinkage, _sample, _missing, sum);\r
+ \r
+//#ifdef HAVE_TBB\r
+// tbb::parallel_for(cv::BlockedRange(begin, end), predictor,\r
+// tbb::auto_partitioner());\r
+//#else\r
+ cv::parallel_for(cv::BlockedRange(begin, end), predictor);\r
+//#endif\r
+\r
+ for (int i=0; i<class_count; ++i)\r
+ sum[i] = sum[i] /** params.shrinkage*/ + base_value;\r
+\r
+ if (class_count == 1)\r
+ {\r
+ result = sum[0];\r
+ delete[] sum;\r
+ return result;\r
+ }\r
+\r
+ if ((k>=0) && (k<class_count))\r
+ {\r
+ result = sum[k];\r
+ delete[] sum;\r
+ return result;\r
+ }\r
+\r
+ float max = sum[0];\r
+ int class_label = 0;\r
+ for (int i=1; i<class_count; ++i)\r
+ if (sum[i] > max)\r
+ {\r
+ max = sum[i];\r
+ class_label = i;\r
+ }\r
+\r
+ delete[] sum;\r
+ int orig_class_label = class_labels->data.i[class_label];\r
+\r
+ return float(orig_class_label);\r
+ }\r
+\r
+\r
//===========================================================================\r
\r
void CvGBTrees::write_params( CvFileStorage* fs ) const\r
\r
//===========================================================================\r
\r
+class Sample_predictor\r
+{\r
+private:\r
+ const CvGBTrees* gbt;\r
+ float* predictions;\r
+ const CvMat* samples;\r
+ const CvMat* missing;\r
+ const CvMat* idx;\r
+ CvSlice slice;\r
+\r
+public:\r
+ Sample_predictor() : gbt(0), predictions(0), samples(0), missing(0),\r
+ idx(0), slice(CV_WHOLE_SEQ)\r
+ {}\r
+\r
+ Sample_predictor(const CvGBTrees* _gbt, float* _predictions,\r
+ const CvMat* _samples, const CvMat* _missing,\r
+ const CvMat* _idx, CvSlice _slice=CV_WHOLE_SEQ) :\r
+ gbt(_gbt), predictions(_predictions), samples(_samples),\r
+ missing(_missing), idx(_idx), slice(_slice)\r
+ {}\r
+ \r
+\r
+ Sample_predictor( const Sample_predictor& p, cv::Split ) :\r
+ gbt(p.gbt), predictions(p.predictions),\r
+ samples(p.samples), missing(p.missing), idx(p.idx),\r
+ slice(p.slice)\r
+ {}\r
+\r
+\r
+ virtual void operator()(const cv::BlockedRange& range) const\r
+ {\r
+ int begin = range.begin();\r
+ int end = range.end();\r
+\r
+ CvMat x;\r
+ CvMat miss;\r
+\r
+ for (int i=begin; i<end; ++i)\r
+ {\r
+ int j = idx ? idx->data.i[i] : i;\r
+ cvGetRow(samples, &x, j);\r
+ if (!missing)\r
+ {\r
+ predictions[i] = gbt->predict_serial(&x,0,0,slice);\r
+ }\r
+ else\r
+ {\r
+ cvGetRow(missing, &miss, j);\r
+ predictions[i] = gbt->predict_serial(&x,&miss,0,slice);\r
+ }\r
+ }\r
+ } // Sample_predictor::operator()\r
+\r
+}; // class Sample_predictor\r
+\r
+\r
+\r
// type in {CV_TRAIN_ERROR, CV_TEST_ERROR}\r
float \r
CvGBTrees::calc_error( CvMLData* _data, int type, std::vector<float> *resp )\r
{\r
- float err = 0;\r
- const CvMat* values = _data->get_values();\r
+\r
+ float err = 0.0f;\r
+ const CvMat* sample_idx = (type == CV_TRAIN_ERROR) ?\r
+ _data->get_train_sample_idx() :\r
+ _data->get_test_sample_idx();\r
const CvMat* response = _data->get_responses();\r
- const CvMat* missing = _data->get_missing();\r
- const CvMat* sample_idx = (type == CV_TEST_ERROR) ?\r
- _data->get_test_sample_idx() :\r
- _data->get_train_sample_idx();\r
- //const CvMat* var_types = _data->get_var_types();\r
- int* sidx = sample_idx ? sample_idx->data.i : 0;\r
- int r_step = CV_IS_MAT_CONT(response->type) ?\r
- 1 : response->step / CV_ELEM_SIZE(response->type);\r
- //bool is_classifier = \r
- // var_types->data.ptr[var_types->cols-1] == CV_VAR_CATEGORICAL;\r
- int sample_count = sample_idx ? sample_idx->cols : 0;\r
- sample_count = (type == CV_TRAIN_ERROR && sample_count == 0) ?\r
- values->rows :\r
- sample_count;\r
- float* pred_resp = 0;\r
- if( resp && (sample_count > 0) )\r
+ \r
+ int n = sample_idx ? get_len(sample_idx) : 0;\r
+ n = (type == CV_TRAIN_ERROR && n == 0) ? _data->get_values()->rows : n;\r
+ \r
+ if (!n)\r
+ return -FLT_MAX;\r
+ \r
+ float* pred_resp = 0; \r
+ if (resp)\r
{\r
- resp->resize( sample_count );\r
+ resp->resize(n);\r
pred_resp = &((*resp)[0]);\r
}\r
+ else\r
+ pred_resp = new float[n];\r
+\r
+ Sample_predictor predictor = Sample_predictor(this, pred_resp, _data->get_values(),\r
+ _data->get_missing(), sample_idx);\r
+ \r
+//#ifdef HAVE_TBB\r
+// tbb::parallel_for(cv::BlockedRange(0,n), predictor, tbb::auto_partitioner());\r
+//#else\r
+ cv::parallel_for(cv::BlockedRange(0,n), predictor);\r
+//#endif\r
+ \r
+ int* sidx = sample_idx ? sample_idx->data.i : 0;\r
+ int r_step = CV_IS_MAT_CONT(response->type) ?\r
+ 1 : response->step / CV_ELEM_SIZE(response->type);\r
+ \r
+\r
if ( !problem_type() )\r
{\r
- for( int i = 0; i < sample_count; i++ )\r
+ for( int i = 0; i < n; i++ )\r
{\r
- CvMat sample, miss;\r
int si = sidx ? sidx[i] : i;\r
- cvGetRow( values, &sample, si ); \r
- if( missing ) \r
- cvGetRow( missing, &miss, si ); \r
- float r = (float)predict( &sample, missing ? &miss : 0 );\r
- if( pred_resp )\r
- pred_resp[i] = r;\r
- int d = fabs((double)r - response->data.fl[si*r_step]) <= FLT_EPSILON ? 0 : 1;\r
+ int d = fabs((double)pred_resp[i] - response->data.fl[si*r_step]) <= FLT_EPSILON ? 0 : 1;\r
err += d;\r
}\r
- err = sample_count ? err / (float)sample_count * 100 : -FLT_MAX;\r
+ err = err / (float)n * 100.0f;\r
}\r
else\r
{\r
- for( int i = 0; i < sample_count; i++ )\r
+ for( int i = 0; i < n; i++ )\r
{\r
- CvMat sample, miss;\r
int si = sidx ? sidx[i] : i;\r
- cvGetRow( values, &sample, si );\r
- if( missing ) \r
- cvGetRow( missing, &miss, si ); \r
- float r = (float)predict( &sample, missing ? &miss : 0 );\r
- if( pred_resp )\r
- pred_resp[i] = r;\r
- float d = r - response->data.fl[si*r_step];\r
+ float d = pred_resp[i] - response->data.fl[si*r_step];\r
err += d*d;\r
}\r
- err = sample_count ? err / (float)sample_count : -FLT_MAX; \r
+ err = err / (float)n; \r
}\r
+ \r
return err;\r
-\r
}\r
\r
\r
weak = 0;\r
default_model_name = "my_boost_tree";\r
orig_response = sum_response = sum_response_tmp = 0;\r
- weak_eval = subsample_train = subsample_test = 0;\r
+ subsample_train = subsample_test = 0;\r
missing = sample_idx = 0;\r
class_labels = 0;\r
class_count = 1;\r
print_result( ertrees.calc_error( &data, CV_TRAIN_ERROR), ertrees.calc_error( &data, CV_TEST_ERROR ), ertrees.get_var_importance() );
printf("======GBTREES=====\n");
- gbtrees.train( &data, CvGBTreesParams(CvGBTrees::DEVIANCE_LOSS, 100, 0.05f, 0.6f, 10, true));
+ if (categorical_response)
+ gbtrees.train( &data, CvGBTreesParams(CvGBTrees::DEVIANCE_LOSS, 100, 0.1f, 0.8f, 5, false));
+ else
+ gbtrees.train( &data, CvGBTreesParams(CvGBTrees::SQUARED_LOSS, 100, 0.1f, 0.8f, 5, false));
print_result( gbtrees.calc_error( &data, CV_TRAIN_ERROR), gbtrees.calc_error( &data, CV_TEST_ERROR ), 0 ); //doesn't compute importance
}
else