doc/tutorials/calib3d/real_time_pose/real_time_pose.markdown

   1 Real Time pose estimation of a textured object {#tutorial_real_time_pose}
   2 ==============================================
   3
   4 Nowadays, augmented reality is one of the top research topic in computer vision and robotics fields.
   5 The most elemental problem in augmented reality is the estimation of the camera pose respect of an
   6 object in the case of computer vision area to do later some 3D rendering or in the case of robotics
   7 obtain an object pose in order to grasp it and do some manipulation. However, this is not a trivial
   8 problem to solve due to the fact that the most common issue in image processing is the computational
   9 cost of applying a lot of algorithms or mathematical operations for solving a problem which is basic
  10 and immediately for humans.
  11
  12 Goal
  13 ----
  14
  15 In this tutorial is explained how to build a real time application to estimate the camera pose in
  16 order to track a textured object with six degrees of freedom given a 2D image and its 3D textured
  17 model.
  18
  19 The application will have the following parts:
  20
  21 -   Read 3D textured object model and object mesh.
  22 -   Take input from Camera or Video.
  23 -   Extract ORB features and descriptors from the scene.
  24 -   Match scene descriptors with model descriptors using Flann matcher.
  25 -   Pose estimation using PnP + Ransac.
  26 -   Linear Kalman Filter for bad poses rejection.
  27
  28 Theory
  29 ------
  30
  31 In computer vision estimate the camera pose from *n* 3D-to-2D point correspondences is a fundamental
  32 and well understood problem. The most general version of the problem requires estimating the six
  33 degrees of freedom of the pose and five calibration parameters: focal length, principal point,
  34 aspect ratio and skew. It could be established with a minimum of 6 correspondences, using the well
  35 known Direct Linear Transform (DLT) algorithm. There are, though, several simplifications to the
  36 problem which turn into an extensive list of different algorithms that improve the accuracy of the
  37 DLT.
  38
  39 The most common simplification is to assume known calibration parameters which is the so-called
  40 Perspective-*n*-Point problem:
  41
  42 ![](images/pnp.jpg)
  43
  44 **Problem Formulation:** Given a set of correspondences between 3D points \f$p_i\f$ expressed in a world
  45 reference frame, and their 2D projections \f$u_i\f$ onto the image, we seek to retrieve the pose (\f$R\f$
  46 and \f$t\f$) of the camera w.r.t. the world and the focal length \f$f\f$.
  47
  48 OpenCV provides four different approaches to solve the Perspective-*n*-Point problem which return
  49 \f$R\f$ and \f$t\f$. Then, using the following formula it's possible to project 3D points into the image
  50 plane:
  51
  52 \f[s\ \left [ \begin{matrix}   u \\   v \\  1 \end{matrix} \right ] = \left [ \begin{matrix}   f_x & 0 & c_x \\  0 & f_y & c_y \\   0 & 0 & 1 \end{matrix} \right ] \left [ \begin{matrix}  r_{11} & r_{12} & r_{13} & t_1 \\ r_{21} & r_{22} & r_{23} & t_2 \\  r_{31} & r_{32} & r_{33} & t_3 \end{matrix} \right ] \left [ \begin{matrix}  X \\  Y \\   Z\\ 1 \end{matrix} \right ]\f]
  53
  54 The complete documentation of how to manage with this equations is in @ref calib3d .
  55
  56 Source code
  57 -----------
  58
  59 You can find the source code of this tutorial in the
  60 `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/` folder of the OpenCV source library.
  61
  62 The tutorial consists of two main programs:
  63
  64 -#  **Model registration**
  65
  66     This application is exclusive to whom don't have a 3D textured model of the object to be detected.
  67     You can use this program to create your own textured 3D model. This program only works for planar
  68     objects, then if you want to model an object with complex shape you should use a sophisticated
  69     software to create it.
  70
  71     The application needs an input image of the object to be registered and its 3D mesh. We have also
  72     to provide the intrinsic parameters of the camera with which the input image was taken. All the
  73     files need to be specified using the absolute path or the relative one from your application’s
  74     working directory. If none files are specified the program will try to open the provided default
  75     parameters.
  76
  77     The application starts up extracting the ORB features and descriptors from the input image and
  78     then uses the mesh along with the [Möller–Trumbore intersection
  79     algorithm](http://http://en.wikipedia.org/wiki/M%C3%B6ller%E2%80%93Trumbore_intersection_algorithm/)
  80     to compute the 3D coordinates of the found features. Finally, the 3D points and the descriptors
  81     are stored in different lists in a file with YAML format which each row is a different point. The
  82     technical background on how to store the files can be found in the @ref tutorial_file_input_output_with_xml_yml
  83     tutorial.
  84
  85     ![](images/registration.png)
  86
  87 -#  **Model detection**
  88
  89     The aim of this application is estimate in real time the object pose given its 3D textured model.
  90
  91     The application starts up loading the 3D textured model in YAML file format with the same
  92     structure explained in the model registration program. From the scene, the ORB features and
  93     descriptors are detected and extracted. Then, is used @ref cv::FlannBasedMatcher with
  94     @ref cv::flann::GenericIndex to do the matching between the scene descriptors and the model descriptors.
  95     Using the found matches along with @ref cv::solvePnPRansac function the `R` and `t` of
  96     the camera are computed. Finally, a KalmanFilter is applied in order to reject bad poses.
  97
  98     In the case that you compiled OpenCV with the samples, you can find it in opencv/build/bin/cpp-tutorial-pnp_detection\`.
  99     Then you can run the application and change some parameters:
 100     @code{.cpp}
 101     This program shows how to detect an object given its 3D textured model. You can choose to use a recorded video or the webcam.
 102     Usage:
 103       ./cpp-tutorial-pnp_detection -help
 104     Keys:
 105       'esc' - to quit.
 106     --------------------------------------------------------------------------
 107
 108     Usage: cpp-tutorial-pnp_detection [params]
 109
 110       -c, --confidence (value:0.95)
 111           RANSAC confidence
 112       -e, --error (value:2.0)
 113           RANSAC reprojection error
 114       -f, --fast (value:true)
 115           use of robust fast match
 116       -h, --help (value:true)
 117           print this message
 118       --in, --inliers (value:30)
 119           minimum inliers for Kalman update
 120       --it, --iterations (value:500)
 121           RANSAC maximum iterations count
 122       -k, --keypoints (value:2000)
 123           number of keypoints to detect
 124       --mesh
 125           path to ply mesh
 126       --method, --pnp (value:0)
 127           PnP method: (0) ITERATIVE - (1) EPNP - (2) P3P - (3) DLS
 128       --model
 129           path to yml model
 130       -r, --ratio (value:0.7)
 131           threshold for ratio test
 132       -v, --video
 133           path to recorded video
 134     @endcode
 135     For example, you can run the application changing the pnp method:
 136     @code{.cpp}
 137     ./cpp-tutorial-pnp_detection --method=2
 138     @endcode
 139
 140 Explanation
 141 -----------
 142
 143 Here is explained in detail the code for the real time application:
 144
 145 -#  **Read 3D textured object model and object mesh.**
 146
 147     In order to load the textured model I implemented the *class* **Model** which has the function
 148     *load()* that opens a YAML file and take the stored 3D points with its corresponding descriptors.
 149     You can find an example of a 3D textured model in
 150     `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/cookies_ORB.yml`.
 151     @code{.cpp}
 152     /* Load a YAML file using OpenCV */
 153     void Model::load(const std::string path)
 154     {
 155         cv::Mat points3d_mat;
 156
 157         cv::FileStorage storage(path, cv::FileStorage::READ);
 158         storage["points_3d"] >> points3d_mat;
 159         storage["descriptors"] >> descriptors_;
 160
 161         points3d_mat.copyTo(list_points3d_in_);
 162
 163         storage.release();
 164
 165     }
 166     @endcode
 167     In the main program the model is loaded as follows:
 168     @code{.cpp}
 169     Model model;               // instantiate Model object
 170     model.load(yml_read_path); // load a 3D textured object model
 171     @endcode
 172     In order to read the model mesh I implemented a *class* **Mesh** which has a function *load()*
 173     that opens a \f$*\f$.ply file and store the 3D points of the object and also the composed triangles.
 174     You can find an example of a model mesh in
 175     `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.ply`.
 176     @code{.cpp}
 177     /* Load a CSV with *.ply format */
 178     void Mesh::load(const std::string path)
 179     {
 180
 181         // Create the reader
 182         CsvReader csvReader(path);
 183
 184         // Clear previous data
 185         list_vertex_.clear();
 186         list_triangles_.clear();
 187
 188         // Read from .ply file
 189         csvReader.readPLY(list_vertex_, list_triangles_);
 190
 191         // Update mesh attributes
 192         num_vertexs_ = list_vertex_.size();
 193         num_triangles_ = list_triangles_.size();
 194
 195     }
 196     @endcode
 197     In the main program the mesh is loaded as follows:
 198     @code{.cpp}
 199     Mesh mesh;                // instantiate Mesh object
 200     mesh.load(ply_read_path); // load an object mesh
 201     @endcode
 202     You can also load different model and mesh:
 203     @code{.cpp}
 204     ./cpp-tutorial-pnp_detection --mesh=/absolute_path_to_your_mesh.ply --model=/absolute_path_to_your_model.yml
 205     @endcode
 206
 207 -#  **Take input from Camera or Video**
 208
 209     To detect is necessary capture video. It's done loading a recorded video by passing the absolute
 210     path where it is located in your machine. In order to test the application you can find a recorded
 211     video in `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.mp4`.
 212     @code{.cpp}
 213     cv::VideoCapture cap;                // instantiate VideoCapture
 214     cap.open(video_read_path);           // open a recorded video
 215
 216     if(!cap.isOpened())                  // check if we succeeded
 217     {
 218        std::cout << "Could not open the camera device" << std::endl;
 219        return -1;
 220     }
 221     @endcode
 222     Then the algorithm is computed frame per frame:
 223     @code{.cpp}
 224     cv::Mat frame, frame_vis;
 225
 226     while(cap.read(frame) && cv::waitKey(30) != 27)    // capture frame until ESC is pressed
 227     {
 228
 229         frame_vis = frame.clone();                     // refresh visualisation frame
 230
 231         // MAIN ALGORITHM
 232
 233     }
 234     @endcode
 235     You can also load different recorded video:
 236     @code{.cpp}
 237     ./cpp-tutorial-pnp_detection --video=/absolute_path_to_your_video.mp4
 238     @endcode
 239
 240 -#  **Extract ORB features and descriptors from the scene**
 241
 242     The next step is to detect the scene features and extract it descriptors. For this task I
 243     implemented a *class* **RobustMatcher** which has a function for keypoints detection and features
 244     extraction. You can find it in
 245     `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/RobusMatcher.cpp`. In your
 246     *RobusMatch* object you can use any of the 2D features detectors of OpenCV. In this case I used
 247     @ref cv::ORB features because is based on @ref cv::FAST to detect the keypoints and cv::xfeatures2d::BriefDescriptorExtractor
 248     to extract the descriptors which means that is fast and robust to rotations. You can find more
 249     detailed information about *ORB* in the documentation.
 250
 251     The following code is how to instantiate and set the features detector and the descriptors
 252     extractor:
 253     @code{.cpp}
 254     RobustMatcher rmatcher;                                                          // instantiate RobustMatcher
 255
 256     cv::FeatureDetector * detector = new cv::OrbFeatureDetector(numKeyPoints);       // instantiate ORB feature detector
 257     cv::DescriptorExtractor * extractor = new cv::OrbDescriptorExtractor();          // instantiate ORB descriptor extractor
 258
 259     rmatcher.setFeatureDetector(detector);                                           // set feature detector
 260     rmatcher.setDescriptorExtractor(extractor);                                      // set descriptor extractor
 261     @endcode
 262     The features and descriptors will be computed by the *RobustMatcher* inside the matching function.
 263
 264 -#  **Match scene descriptors with model descriptors using Flann matcher**
 265
 266     It is the first step in our detection algorithm. The main idea is to match the scene descriptors
 267     with our model descriptors in order to know the 3D coordinates of the found features into the
 268     current scene.
 269
 270     Firstly, we have to set which matcher we want to use. In this case is used
 271     @ref cv::FlannBasedMatcher matcher which in terms of computational cost is faster than the
 272     @ref cv::BFMatcher matcher as we increase the trained collection of features. Then, for
 273     FlannBased matcher the index created is *Multi-Probe LSH: Efficient Indexing for High-Dimensional
 274     Similarity Search* due to *ORB* descriptors are binary.
 275
 276     You can tune the *LSH* and search parameters to improve the matching efficiency:
 277     @code{.cpp}
 278     cv::Ptr<cv::flann::IndexParams> indexParams = cv::makePtr<cv::flann::LshIndexParams>(6, 12, 1); // instantiate LSH index parameters
 279     cv::Ptr<cv::flann::SearchParams> searchParams = cv::makePtr<cv::flann::SearchParams>(50);       // instantiate flann search parameters
 280
 281     cv::DescriptorMatcher * matcher = new cv::FlannBasedMatcher(indexParams, searchParams);         // instantiate FlannBased matcher
 282     rmatcher.setDescriptorMatcher(matcher);                                                         // set matcher
 283     @endcode
 284     Secondly, we have to call the matcher by using *robustMatch()* or *fastRobustMatch()* function.
 285     The difference of using this two functions is its computational cost. The first method is slower
 286     but more robust at filtering good matches because uses two ratio test and a symmetry test. In
 287     contrast, the second method is faster but less robust because only applies a single ratio test to
 288     the matches.
 289
 290     The following code is to get the model 3D points and its descriptors and then call the matcher in
 291     the main program:
 292     @code{.cpp}
 293     // Get the MODEL INFO
 294
 295     std::vector<cv::Point3f> list_points3d_model = model.get_points3d();  // list with model 3D coordinates
 296     cv::Mat descriptors_model = model.get_descriptors();                  // list with descriptors of each 3D coordinate
 297     @endcode
 298     @code{.cpp}
 299     // -- Step 1: Robust matching between model descriptors and scene descriptors
 300
 301     std::vector<cv::DMatch> good_matches;       // to obtain the model 3D points  in the scene
 302     std::vector<cv::KeyPoint> keypoints_scene;  // to obtain the 2D points of the scene
 303
 304     if(fast_match)
 305     {
 306         rmatcher.fastRobustMatch(frame, good_matches, keypoints_scene, descriptors_model);
 307     }
 308     else
 309     {
 310         rmatcher.robustMatch(frame, good_matches, keypoints_scene, descriptors_model);
 311     }
 312     @endcode
 313     The following code corresponds to the *robustMatch()* function which belongs to the
 314     *RobustMatcher* class. This function uses the given image to detect the keypoints and extract the
 315     descriptors, match using *two Nearest Neighbour* the extracted descriptors with the given model
 316     descriptors and vice versa. Then, a ratio test is applied to the two direction matches in order to
 317     remove these matches which its distance ratio between the first and second best match is larger
 318     than a given threshold. Finally, a symmetry test is applied in order the remove non symmetrical
 319     matches.
 320     @code{.cpp}
 321     void RobustMatcher::robustMatch( const cv::Mat& frame, std::vector<cv::DMatch>& good_matches,
 322                                      std::vector<cv::KeyPoint>& keypoints_frame,
 323                                      const std::vector<cv::KeyPoint>& keypoints_model, const cv::Mat& descriptors_model )
 324     {
 325
 326         // 1a. Detection of the ORB features
 327         this->computeKeyPoints(frame, keypoints_frame);
 328
 329         // 1b. Extraction of the ORB descriptors
 330         cv::Mat descriptors_frame;
 331         this->computeDescriptors(frame, keypoints_frame, descriptors_frame);
 332
 333         // 2. Match the two image descriptors
 334         std::vector<std::vector<cv::DMatch> > matches12, matches21;
 335
 336         // 2a. From image 1 to image 2
 337         matcher_->knnMatch(descriptors_frame, descriptors_model, matches12, 2); // return 2 nearest neighbours
 338
 339         // 2b. From image 2 to image 1
 340         matcher_->knnMatch(descriptors_model, descriptors_frame, matches21, 2); // return 2 nearest neighbours
 341
 342         // 3. Remove matches for which NN ratio is > than threshold
 343         // clean image 1 -> image 2 matches
 344         int removed1 = ratioTest(matches12);
 345         // clean image 2 -> image 1 matches
 346         int removed2 = ratioTest(matches21);
 347
 348         // 4. Remove non-symmetrical matches
 349         symmetryTest(matches12, matches21, good_matches);
 350
 351     }
 352     @endcode
 353     After the matches filtering we have to subtract the 2D and 3D correspondences from the found scene
 354     keypoints and our 3D model using the obtained *DMatches* vector. For more information about
 355     @ref cv::DMatch check the documentation.
 356     @code{.cpp}
 357     // -- Step 2: Find out the 2D/3D correspondences
 358
 359     std::vector<cv::Point3f> list_points3d_model_match;    // container for the model 3D coordinates found in the scene
 360     std::vector<cv::Point2f> list_points2d_scene_match;    // container for the model 2D coordinates found in the scene
 361
 362     for(unsigned int match_index = 0; match_index < good_matches.size(); ++match_index)
 363     {
 364         cv::Point3f point3d_model = list_points3d_model[ good_matches[match_index].trainIdx ];   // 3D point from model
 365         cv::Point2f point2d_scene = keypoints_scene[ good_matches[match_index].queryIdx ].pt;    // 2D point from the scene
 366         list_points3d_model_match.push_back(point3d_model);                                      // add 3D point
 367         list_points2d_scene_match.push_back(point2d_scene);                                      // add 2D point
 368     }
 369     @endcode
 370     You can also change the ratio test threshold, the number of keypoints to detect as well as use or
 371     not the robust matcher:
 372     @code{.cpp}
 373     ./cpp-tutorial-pnp_detection --ratio=0.8 --keypoints=1000 --fast=false
 374     @endcode
 375
 376 -#  **Pose estimation using PnP + Ransac**
 377
 378     Once with the 2D and 3D correspondences we have to apply a PnP algorithm in order to estimate the
 379     camera pose. The reason why we have to use @ref cv::solvePnPRansac instead of @ref cv::solvePnP is
 380     due to the fact that after the matching not all the found correspondences are correct and, as like
 381     as not, there are false correspondences or also called *outliers*. The [Random Sample
 382     Consensus](http://en.wikipedia.org/wiki/RANSAC) or *Ransac* is a non-deterministic iterative
 383     method which estimate parameters of a mathematical model from observed data producing an
 384     approximate result as the number of iterations increase. After applying *Ransac* all the *outliers*
 385     will be eliminated to then estimate the camera pose with a certain probability to obtain a good
 386     solution.
 387
 388     For the camera pose estimation I have implemented a *class* **PnPProblem**. This *class* has 4
 389     attributes: a given calibration matrix, the rotation matrix, the translation matrix and the
 390     rotation-translation matrix. The intrinsic calibration parameters of the camera which you are
 391     using to estimate the pose are necessary. In order to obtain the parameters you can check
 392     @ref tutorial_camera_calibration_square_chess and @ref tutorial_camera_calibration tutorials.
 393
 394     The following code is how to declare the *PnPProblem class* in the main program:
 395     @code{.cpp}
 396     // Intrinsic camera parameters: UVC WEBCAM
 397
 398     double f = 55;                           // focal length in mm
 399     double sx = 22.3, sy = 14.9;             // sensor size
 400     double width = 640, height = 480;        // image size
 401
 402     double params_WEBCAM[] = { width*f/sx,   // fx
 403                                height*f/sy,  // fy
 404                                width/2,      // cx
 405                                height/2};    // cy
 406
 407     PnPProblem pnp_detection(params_WEBCAM); // instantiate PnPProblem class
 408     @endcode
 409     The following code is how the *PnPProblem class* initialises its attributes:
 410     @code{.cpp}
 411     // Custom constructor given the intrinsic camera parameters
 412
 413     PnPProblem::PnPProblem(const double params[])
 414     {
 415       _A_matrix = cv::Mat::zeros(3, 3, CV_64FC1);   // intrinsic camera parameters
 416       _A_matrix.at<double>(0, 0) = params[0];       //      [ fx   0  cx ]
 417       _A_matrix.at<double>(1, 1) = params[1];       //      [  0  fy  cy ]
 418       _A_matrix.at<double>(0, 2) = params[2];       //      [  0   0   1 ]
 419       _A_matrix.at<double>(1, 2) = params[3];
 420       _A_matrix.at<double>(2, 2) = 1;
 421       _R_matrix = cv::Mat::zeros(3, 3, CV_64FC1);   // rotation matrix
 422       _t_matrix = cv::Mat::zeros(3, 1, CV_64FC1);   // translation matrix
 423       _P_matrix = cv::Mat::zeros(3, 4, CV_64FC1);   // rotation-translation matrix
 424
 425     }
 426     @endcode
 427     OpenCV provides four PnP methods: ITERATIVE, EPNP, P3P and DLS. Depending on the application type,
 428     the estimation method will be different. In the case that we want to make a real time application,
 429     the more suitable methods are EPNP and P3P since they are faster than ITERATIVE and DLS at
 430     finding an optimal solution. However, EPNP and P3P are not especially robust in front of planar
 431     surfaces and sometimes the pose estimation seems to have a mirror effect. Therefore, in this
 432     tutorial an ITERATIVE method is used due to the object to be detected has planar surfaces.
 433
 434     The OpenCV RANSAC implementation wants you to provide three parameters: 1) the maximum number of
 435     iterations until the algorithm stops, 2) the maximum allowed distance between the observed and
 436     computed point projections to consider it an inlier and 3) the confidence to obtain a good result.
 437     You can tune these parameters in order to improve your algorithm performance. Increasing the
 438     number of iterations will have a more accurate solution, but will take more time to find a
 439     solution. Increasing the reprojection error will reduce the computation time, but your solution
 440     will be unaccurate. Decreasing the confidence your algorithm will be faster, but the obtained
 441     solution will be unaccurate.
 442
 443     The following parameters work for this application:
 444     @code{.cpp}
 445     // RANSAC parameters
 446
 447     int iterationsCount = 500;        // number of Ransac iterations.
 448     float reprojectionError = 2.0;    // maximum allowed distance to consider it an inlier.
 449     float confidence = 0.95;          // RANSAC successful confidence.
 450     @endcode
 451     The following code corresponds to the *estimatePoseRANSAC()* function which belongs to the
 452     *PnPProblem class*. This function estimates the rotation and translation matrix given a set of
 453     2D/3D correspondences, the desired PnP method to use, the output inliers container and the Ransac
 454     parameters:
 455     @code{.cpp}
 456     // Estimate the pose given a list of 2D/3D correspondences with RANSAC and the method to use
 457
 458     void PnPProblem::estimatePoseRANSAC( const std::vector<cv::Point3f> &list_points3d,        // list with model 3D coordinates
 459                                          const std::vector<cv::Point2f> &list_points2d,        // list with scene 2D coordinates
 460                                          int flags, cv::Mat &inliers, int iterationsCount,     // PnP method; inliers container
 461                                          float reprojectionError, float confidence )           // RANSAC parameters
 462     {
 463         cv::Mat distCoeffs = cv::Mat::zeros(4, 1, CV_64FC1);    // vector of distortion coefficients
 464         cv::Mat rvec = cv::Mat::zeros(3, 1, CV_64FC1);          // output rotation vector
 465         cv::Mat tvec = cv::Mat::zeros(3, 1, CV_64FC1);          // output translation vector
 466
 467         bool useExtrinsicGuess = false;   // if true the function uses the provided rvec and tvec values as
 468                                           // initial approximations of the rotation and translation vectors
 469
 470         cv::solvePnPRansac( list_points3d, list_points2d, _A_matrix, distCoeffs, rvec, tvec,
 471                             useExtrinsicGuess, iterationsCount, reprojectionError, confidence,
 472                             inliers, flags );
 473
 474         Rodrigues(rvec,_R_matrix);                   // converts Rotation Vector to Matrix
 475         _t_matrix = tvec;                            // set translation matrix
 476
 477         this->set_P_matrix(_R_matrix, _t_matrix);    // set rotation-translation matrix
 478
 479     }
 480     @endcode
 481     In the following code are the 3th and 4th steps of the main algorithm. The first, calling the
 482     above function and the second taking the output inliers vector from RANSAC to get the 2D scene
 483     points for drawing purpose. As seen in the code we must be sure to apply RANSAC if we have
 484     matches, in the other case, the function @ref cv::solvePnPRansac crashes due to any OpenCV *bug*.
 485     @code{.cpp}
 486     if(good_matches.size() > 0) // None matches, then RANSAC crashes
 487     {
 488
 489         // -- Step 3: Estimate the pose using RANSAC approach
 490         pnp_detection.estimatePoseRANSAC( list_points3d_model_match, list_points2d_scene_match,
 491                                           pnpMethod, inliers_idx, iterationsCount, reprojectionError, confidence );
 492
 493
 494         // -- Step 4: Catch the inliers keypoints to draw
 495         for(int inliers_index = 0; inliers_index < inliers_idx.rows; ++inliers_index)
 496         {
 497         int n = inliers_idx.at<int>(inliers_index);         // i-inlier
 498         cv::Point2f point2d = list_points2d_scene_match[n]; // i-inlier point 2D
 499         list_points2d_inliers.push_back(point2d);           // add i-inlier to list
 500     }
 501     @endcode
 502     Finally, once the camera pose has been estimated we can use the \f$R\f$ and \f$t\f$ in order to compute
 503     the 2D projection onto the image of a given 3D point expressed in a world reference frame using
 504     the showed formula on *Theory*.
 505
 506     The following code corresponds to the *backproject3DPoint()* function which belongs to the
 507     *PnPProblem class*. The function backproject a given 3D point expressed in a world reference frame
 508     onto a 2D image:
 509     @code{.cpp}
 510     // Backproject a 3D point to 2D using the estimated pose parameters
 511
 512     cv::Point2f PnPProblem::backproject3DPoint(const cv::Point3f &point3d)
 513     {
 514         // 3D point vector [x y z 1]'
 515         cv::Mat point3d_vec = cv::Mat(4, 1, CV_64FC1);
 516         point3d_vec.at<double>(0) = point3d.x;
 517         point3d_vec.at<double>(1) = point3d.y;
 518         point3d_vec.at<double>(2) = point3d.z;
 519         point3d_vec.at<double>(3) = 1;
 520
 521         // 2D point vector [u v 1]'
 522         cv::Mat point2d_vec = cv::Mat(4, 1, CV_64FC1);
 523         point2d_vec = _A_matrix * _P_matrix * point3d_vec;
 524
 525         // Normalization of [u v]'
 526         cv::Point2f point2d;
 527         point2d.x = point2d_vec.at<double>(0) / point2d_vec.at<double>(2);
 528         point2d.y = point2d_vec.at<double>(1) / point2d_vec.at<double>(2);
 529
 530         return point2d;
 531     }
 532     @endcode
 533     The above function is used to compute all the 3D points of the object *Mesh* to show the pose of
 534     the object.
 535
 536     You can also change RANSAC parameters and PnP method:
 537     @code{.cpp}
 538     ./cpp-tutorial-pnp_detection --error=0.25 --confidence=0.90 --iterations=250 --method=3
 539     @endcode
 540
 541 -#  **Linear Kalman Filter for bad poses rejection**
 542
 543     Is it common in computer vision or robotics fields that after applying detection or tracking
 544     techniques, bad results are obtained due to some sensor errors. In order to avoid these bad
 545     detections in this tutorial is explained how to implement a Linear Kalman Filter. The Kalman
 546     Filter will be applied after detected a given number of inliers.
 547
 548     You can find more information about what [Kalman
 549     Filter](http://en.wikipedia.org/wiki/Kalman_filter) is. In this tutorial it's used the OpenCV
 550     implementation of the @ref cv::KalmanFilter based on
 551     [Linear Kalman Filter for position and orientation tracking](http://campar.in.tum.de/Chair/KalmanFilter)
 552     to set the dynamics and measurement models.
 553
 554     Firstly, we have to define our state vector which will have 18 states: the positional data (x,y,z)
 555     with its first and second derivatives (velocity and acceleration), then rotation is added in form
 556     of three euler angles (roll, pitch, jaw) together with their first and second derivatives (angular
 557     velocity and acceleration)
 558
 559     \f[X = (x,y,z,\dot x,\dot y,\dot z,\ddot x,\ddot y,\ddot z,\psi,\theta,\phi,\dot \psi,\dot \theta,\dot \phi,\ddot \psi,\ddot \theta,\ddot \phi)^T\f]
 560
 561     Secondly, we have to define the number of measurements which will be 6: from \f$R\f$ and \f$t\f$ we can
 562     extract \f$(x,y,z)\f$ and \f$(\psi,\theta,\phi)\f$. In addition, we have to define the number of control
 563     actions to apply to the system which in this case will be *zero*. Finally, we have to define the
 564     differential time between measurements which in this case is \f$1/T\f$, where *T* is the frame rate of
 565     the video.
 566     @code{.cpp}
 567     cv::KalmanFilter KF;         // instantiate Kalman Filter
 568
 569     int nStates = 18;            // the number of states
 570     int nMeasurements = 6;       // the number of measured states
 571     int nInputs = 0;             // the number of action control
 572
 573     double dt = 0.125;           // time between measurements (1/FPS)
 574
 575     initKalmanFilter(KF, nStates, nMeasurements, nInputs, dt);    // init function
 576     @endcode
 577     The following code corresponds to the *Kalman Filter* initialisation. Firstly, is set the process
 578     noise, the measurement noise and the error covariance matrix. Secondly, are set the transition
 579     matrix which is the dynamic model and finally the measurement matrix, which is the measurement
 580     model.
 581
 582     You can tune the process and measurement noise to improve the *Kalman Filter* performance. As the
 583     measurement noise is reduced the faster will converge doing the algorithm sensitive in front of
 584     bad measurements.
 585     @code{.cpp}
 586     void initKalmanFilter(cv::KalmanFilter &KF, int nStates, int nMeasurements, int nInputs, double dt)
 587     {
 588
 589       KF.init(nStates, nMeasurements, nInputs, CV_64F);                 // init Kalman Filter
 590
 591       cv::setIdentity(KF.processNoiseCov, cv::Scalar::all(1e-5));       // set process noise
 592       cv::setIdentity(KF.measurementNoiseCov, cv::Scalar::all(1e-4));   // set measurement noise
 593       cv::setIdentity(KF.errorCovPost, cv::Scalar::all(1));             // error covariance
 594
 595
 596                      /* DYNAMIC MODEL */
 597
 598       //  [1 0 0 dt  0  0 dt2   0   0 0 0 0  0  0  0   0   0   0]
 599       //  [0 1 0  0 dt  0   0 dt2   0 0 0 0  0  0  0   0   0   0]
 600       //  [0 0 1  0  0 dt   0   0 dt2 0 0 0  0  0  0   0   0   0]
 601       //  [0 0 0  1  0  0  dt   0   0 0 0 0  0  0  0   0   0   0]
 602       //  [0 0 0  0  1  0   0  dt   0 0 0 0  0  0  0   0   0   0]
 603       //  [0 0 0  0  0  1   0   0  dt 0 0 0  0  0  0   0   0   0]
 604       //  [0 0 0  0  0  0   1   0   0 0 0 0  0  0  0   0   0   0]
 605       //  [0 0 0  0  0  0   0   1   0 0 0 0  0  0  0   0   0   0]
 606       //  [0 0 0  0  0  0   0   0   1 0 0 0  0  0  0   0   0   0]
 607       //  [0 0 0  0  0  0   0   0   0 1 0 0 dt  0  0 dt2   0   0]
 608       //  [0 0 0  0  0  0   0   0   0 0 1 0  0 dt  0   0 dt2   0]
 609       //  [0 0 0  0  0  0   0   0   0 0 0 1  0  0 dt   0   0 dt2]
 610       //  [0 0 0  0  0  0   0   0   0 0 0 0  1  0  0  dt   0   0]
 611       //  [0 0 0  0  0  0   0   0   0 0 0 0  0  1  0   0  dt   0]
 612       //  [0 0 0  0  0  0   0   0   0 0 0 0  0  0  1   0   0  dt]
 613       //  [0 0 0  0  0  0   0   0   0 0 0 0  0  0  0   1   0   0]
 614       //  [0 0 0  0  0  0   0   0   0 0 0 0  0  0  0   0   1   0]
 615       //  [0 0 0  0  0  0   0   0   0 0 0 0  0  0  0   0   0   1]
 616
 617       // position
 618       KF.transitionMatrix.at<double>(0,3) = dt;
 619       KF.transitionMatrix.at<double>(1,4) = dt;
 620       KF.transitionMatrix.at<double>(2,5) = dt;
 621       KF.transitionMatrix.at<double>(3,6) = dt;
 622       KF.transitionMatrix.at<double>(4,7) = dt;
 623       KF.transitionMatrix.at<double>(5,8) = dt;
 624       KF.transitionMatrix.at<double>(0,6) = 0.5*pow(dt,2);
 625       KF.transitionMatrix.at<double>(1,7) = 0.5*pow(dt,2);
 626       KF.transitionMatrix.at<double>(2,8) = 0.5*pow(dt,2);
 627
 628       // orientation
 629       KF.transitionMatrix.at<double>(9,12) = dt;
 630       KF.transitionMatrix.at<double>(10,13) = dt;
 631       KF.transitionMatrix.at<double>(11,14) = dt;
 632       KF.transitionMatrix.at<double>(12,15) = dt;
 633       KF.transitionMatrix.at<double>(13,16) = dt;
 634       KF.transitionMatrix.at<double>(14,17) = dt;
 635       KF.transitionMatrix.at<double>(9,15) = 0.5*pow(dt,2);
 636       KF.transitionMatrix.at<double>(10,16) = 0.5*pow(dt,2);
 637       KF.transitionMatrix.at<double>(11,17) = 0.5*pow(dt,2);
 638
 639
 640            /* MEASUREMENT MODEL */
 641
 642       //  [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 643       //  [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 644       //  [0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 645       //  [0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
 646       //  [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
 647       //  [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]
 648
 649       KF.measurementMatrix.at<double>(0,0) = 1;  // x
 650       KF.measurementMatrix.at<double>(1,1) = 1;  // y
 651       KF.measurementMatrix.at<double>(2,2) = 1;  // z
 652       KF.measurementMatrix.at<double>(3,9) = 1;  // roll
 653       KF.measurementMatrix.at<double>(4,10) = 1; // pitch
 654       KF.measurementMatrix.at<double>(5,11) = 1; // yaw
 655
 656     }
 657     @endcode
 658     In the following code is the 5th step of the main algorithm. When the obtained number of inliers
 659     after *Ransac* is over the threshold, the measurements matrix is filled and then the *Kalman
 660     Filter* is updated:
 661     @code{.cpp}
 662     // -- Step 5: Kalman Filter
 663
 664     // GOOD MEASUREMENT
 665     if( inliers_idx.rows >= minInliersKalman )
 666     {
 667
 668         // Get the measured translation
 669         cv::Mat translation_measured(3, 1, CV_64F);
 670         translation_measured = pnp_detection.get_t_matrix();
 671
 672         // Get the measured rotation
 673         cv::Mat rotation_measured(3, 3, CV_64F);
 674         rotation_measured = pnp_detection.get_R_matrix();
 675
 676         // fill the measurements vector
 677         fillMeasurements(measurements, translation_measured, rotation_measured);
 678
 679     }
 680
 681     // Instantiate estimated translation and rotation
 682     cv::Mat translation_estimated(3, 1, CV_64F);
 683     cv::Mat rotation_estimated(3, 3, CV_64F);
 684
 685     // update the Kalman filter with good measurements
 686     updateKalmanFilter( KF, measurements,
 687                   translation_estimated, rotation_estimated);
 688     @endcode
 689     The following code corresponds to the *fillMeasurements()* function which converts the measured
 690     [Rotation Matrix to Eulers
 691     angles](http://euclideanspace.com/maths/geometry/rotations/conversions/matrixToEuler/index.htm)
 692     and fill the measurements matrix along with the measured translation vector:
 693     @code{.cpp}
 694     void fillMeasurements( cv::Mat &measurements,
 695                        const cv::Mat &translation_measured, const cv::Mat &rotation_measured)
 696     {
 697         // Convert rotation matrix to euler angles
 698         cv::Mat measured_eulers(3, 1, CV_64F);
 699         measured_eulers = rot2euler(rotation_measured);
 700
 701         // Set measurement to predict
 702         measurements.at<double>(0) = translation_measured.at<double>(0); // x
 703         measurements.at<double>(1) = translation_measured.at<double>(1); // y
 704         measurements.at<double>(2) = translation_measured.at<double>(2); // z
 705         measurements.at<double>(3) = measured_eulers.at<double>(0);      // roll
 706         measurements.at<double>(4) = measured_eulers.at<double>(1);      // pitch
 707         measurements.at<double>(5) = measured_eulers.at<double>(2);      // yaw
 708     }
 709     @endcode
 710     The following code corresponds to the *updateKalmanFilter()* function which update the Kalman
 711     Filter and set the estimated Rotation Matrix and translation vector. The estimated Rotation Matrix
 712     comes from the estimated [Euler angles to Rotation
 713     Matrix](http://euclideanspace.com/maths/geometry/rotations/conversions/eulerToMatrix/index.htm).
 714     @code{.cpp}
 715     void updateKalmanFilter( cv::KalmanFilter &KF, cv::Mat &measurement,
 716                          cv::Mat &translation_estimated, cv::Mat &rotation_estimated )
 717     {
 718
 719         // First predict, to update the internal statePre variable
 720         cv::Mat prediction = KF.predict();
 721
 722         // The "correct" phase that is going to use the predicted value and our measurement
 723         cv::Mat estimated = KF.correct(measurement);
 724
 725         // Estimated translation
 726         translation_estimated.at<double>(0) = estimated.at<double>(0);
 727         translation_estimated.at<double>(1) = estimated.at<double>(1);
 728         translation_estimated.at<double>(2) = estimated.at<double>(2);
 729
 730         // Estimated euler angles
 731         cv::Mat eulers_estimated(3, 1, CV_64F);
 732         eulers_estimated.at<double>(0) = estimated.at<double>(9);
 733         eulers_estimated.at<double>(1) = estimated.at<double>(10);
 734         eulers_estimated.at<double>(2) = estimated.at<double>(11);
 735
 736         // Convert estimated quaternion to rotation matrix
 737         rotation_estimated = euler2rot(eulers_estimated);
 738
 739     }
 740     @endcode
 741     The 6th step is set the estimated rotation-translation matrix:
 742     @code{.cpp}
 743     // -- Step 6: Set estimated projection matrix
 744     pnp_detection_est.set_P_matrix(rotation_estimated, translation_estimated);
 745     @endcode
 746     The last and optional step is draw the found pose. To do it I implemented a function to draw all
 747     the mesh 3D points and an extra reference axis:
 748     @code{.cpp}
 749     // -- Step X: Draw pose
 750
 751     drawObjectMesh(frame_vis, &mesh, &pnp_detection, green);                // draw current pose
 752     drawObjectMesh(frame_vis, &mesh, &pnp_detection_est, yellow);           // draw estimated pose
 753
 754     double l = 5;
 755     std::vector<cv::Point2f> pose_points2d;
 756     pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,0,0)));    // axis center
 757     pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(l,0,0)));    // axis x
 758     pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,l,0)));    // axis y
 759     pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,0,l)));    // axis z
 760     draw3DCoordinateAxes(frame_vis, pose_points2d);                                       // draw axes
 761     @endcode
 762     You can also modify the minimum inliers to update Kalman Filter:
 763     @code{.cpp}
 764     ./cpp-tutorial-pnp_detection --inliers=20
 765     @endcode
 766
 767 Results
 768 -------
 769
 770 The following videos are the results of pose estimation in real time using the explained detection
 771 algorithm using the following parameters:
 772 @code{.cpp}
 773 // Robust Matcher parameters
 774
 775 int numKeyPoints = 2000;      // number of detected keypoints
 776 float ratio = 0.70f;          // ratio test
 777 bool fast_match = true;       // fastRobustMatch() or robustMatch()
 778
 779
 780 // RANSAC parameters
 781
 782 int iterationsCount = 500;    // number of Ransac iterations.
 783 int reprojectionError = 2.0;  // maximum allowed distance to consider it an inlier.
 784 float confidence = 0.95;      // ransac successful confidence.
 785
 786
 787 // Kalman Filter parameters
 788
 789 int minInliersKalman = 30;    // Kalman threshold updating
 790 @endcode
 791 You can watch the real time pose estimation on the [YouTube
 792 here](http://www.youtube.com/user/opencvdev/videos).
 793
 794 @youtube{XNATklaJlSQ}
 795 @youtube{YLS9bWek78k}