doc/tutorials/calib3d/real_time_pose/real_time_pose.markdown

   1 Real Time pose estimation of a textured object {#tutorial_real_time_pose}
   2 ==============================================
   3
   4 @prev_tutorial{tutorial_camera_calibration}
   5 @next_tutorial{tutorial_interactive_calibration}
   6
   7
   8 Nowadays, augmented reality is one of the top research topic in computer vision and robotics fields.
   9 The most elemental problem in augmented reality is the estimation of the camera pose respect of an
  10 object in the case of computer vision area to do later some 3D rendering or in the case of robotics
  11 obtain an object pose in order to grasp it and do some manipulation. However, this is not a trivial
  12 problem to solve due to the fact that the most common issue in image processing is the computational
  13 cost of applying a lot of algorithms or mathematical operations for solving a problem which is basic
  14 and immediately for humans.
  15
  16 Goal
  17 ----
  18
  19 In this tutorial is explained how to build a real time application to estimate the camera pose in
  20 order to track a textured object with six degrees of freedom given a 2D image and its 3D textured
  21 model.
  22
  23 The application will have the following parts:
  24
  25 -   Read 3D textured object model and object mesh.
  26 -   Take input from Camera or Video.
  27 -   Extract ORB features and descriptors from the scene.
  28 -   Match scene descriptors with model descriptors using Flann matcher.
  29 -   Pose estimation using PnP + Ransac.
  30 -   Linear Kalman Filter for bad poses rejection.
  31
  32 Theory
  33 ------
  34
  35 In computer vision estimate the camera pose from *n* 3D-to-2D point correspondences is a fundamental
  36 and well understood problem. The most general version of the problem requires estimating the six
  37 degrees of freedom of the pose and five calibration parameters: focal length, principal point,
  38 aspect ratio and skew. It could be established with a minimum of 6 correspondences, using the well
  39 known Direct Linear Transform (DLT) algorithm. There are, though, several simplifications to the
  40 problem which turn into an extensive list of different algorithms that improve the accuracy of the
  41 DLT.
  42
  43 The most common simplification is to assume known calibration parameters which is the so-called
  44 Perspective-*n*-Point problem:
  45
  46 ![](images/pnp.jpg)
  47
  48 **Problem Formulation:** Given a set of correspondences between 3D points \f$p_i\f$ expressed in a world
  49 reference frame, and their 2D projections \f$u_i\f$ onto the image, we seek to retrieve the pose (\f$R\f$
  50 and \f$t\f$) of the camera w.r.t. the world and the focal length \f$f\f$.
  51
  52 OpenCV provides four different approaches to solve the Perspective-*n*-Point problem which return
  53 \f$R\f$ and \f$t\f$. Then, using the following formula it's possible to project 3D points into the image
  54 plane:
  55
  56 \f[s\ \left [ \begin{matrix}   u \\   v \\  1 \end{matrix} \right ] = \left [ \begin{matrix}   f_x & 0 & c_x \\  0 & f_y & c_y \\   0 & 0 & 1 \end{matrix} \right ] \left [ \begin{matrix}  r_{11} & r_{12} & r_{13} & t_1 \\ r_{21} & r_{22} & r_{23} & t_2 \\  r_{31} & r_{32} & r_{33} & t_3 \end{matrix} \right ] \left [ \begin{matrix}  X \\  Y \\   Z\\ 1 \end{matrix} \right ]\f]
  57
  58 The complete documentation of how to manage with this equations is in @ref calib3d .
  59
  60 Source code
  61 -----------
  62
  63 You can find the source code of this tutorial in the
  64 `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/` folder of the OpenCV source library.
  65
  66 The tutorial consists of two main programs:
  67
  68 -#  **Model registration**
  69
  70     This application is exclusive to whom don't have a 3D textured model of the object to be detected.
  71     You can use this program to create your own textured 3D model. This program only works for planar
  72     objects, then if you want to model an object with complex shape you should use a sophisticated
  73     software to create it.
  74
  75     The application needs an input image of the object to be registered and its 3D mesh. We have also
  76     to provide the intrinsic parameters of the camera with which the input image was taken. All the
  77     files need to be specified using the absolute path or the relative one from your application’s
  78     working directory. If none files are specified the program will try to open the provided default
  79     parameters.
  80
  81     The application starts up extracting the ORB features and descriptors from the input image and
  82     then uses the mesh along with the [Möller–Trumbore intersection
  83     algorithm](http://http://en.wikipedia.org/wiki/M%C3%B6ller%E2%80%93Trumbore_intersection_algorithm/)
  84     to compute the 3D coordinates of the found features. Finally, the 3D points and the descriptors
  85     are stored in different lists in a file with YAML format which each row is a different point. The
  86     technical background on how to store the files can be found in the @ref tutorial_file_input_output_with_xml_yml
  87     tutorial.
  88
  89     ![](images/registration.png)
  90
  91 -#  **Model detection**
  92
  93     The aim of this application is estimate in real time the object pose given its 3D textured model.
  94
  95     The application starts up loading the 3D textured model in YAML file format with the same
  96     structure explained in the model registration program. From the scene, the ORB features and
  97     descriptors are detected and extracted. Then, is used @ref cv::FlannBasedMatcher with
  98     @ref cv::flann::GenericIndex to do the matching between the scene descriptors and the model descriptors.
  99     Using the found matches along with @ref cv::solvePnPRansac function the `R` and `t` of
 100     the camera are computed. Finally, a KalmanFilter is applied in order to reject bad poses.
 101
 102     In the case that you compiled OpenCV with the samples, you can find it in opencv/build/bin/cpp-tutorial-pnp_detection\`.
 103     Then you can run the application and change some parameters:
 104     @code{.cpp}
 105     This program shows how to detect an object given its 3D textured model. You can choose to use a recorded video or the webcam.
 106     Usage:
 107       ./cpp-tutorial-pnp_detection -help
 108     Keys:
 109       'esc' - to quit.
 110     --------------------------------------------------------------------------
 111
 112     Usage: cpp-tutorial-pnp_detection [params]
 113
 114       -c, --confidence (value:0.95)
 115           RANSAC confidence
 116       -e, --error (value:2.0)
 117           RANSAC reprojection error
 118       -f, --fast (value:true)
 119           use of robust fast match
 120       -h, --help (value:true)
 121           print this message
 122       --in, --inliers (value:30)
 123           minimum inliers for Kalman update
 124       --it, --iterations (value:500)
 125           RANSAC maximum iterations count
 126       -k, --keypoints (value:2000)
 127           number of keypoints to detect
 128       --mesh
 129           path to ply mesh
 130       --method, --pnp (value:0)
 131           PnP method: (0) ITERATIVE - (1) EPNP - (2) P3P - (3) DLS
 132       --model
 133           path to yml model
 134       -r, --ratio (value:0.7)
 135           threshold for ratio test
 136       -v, --video
 137           path to recorded video
 138     @endcode
 139     For example, you can run the application changing the pnp method:
 140     @code{.cpp}
 141     ./cpp-tutorial-pnp_detection --method=2
 142     @endcode
 143
 144 Explanation
 145 -----------
 146
 147 Here is explained in detail the code for the real time application:
 148
 149 -#  **Read 3D textured object model and object mesh.**
 150
 151     In order to load the textured model I implemented the *class* **Model** which has the function
 152     *load()* that opens a YAML file and take the stored 3D points with its corresponding descriptors.
 153     You can find an example of a 3D textured model in
 154     `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/cookies_ORB.yml`.
 155     @code{.cpp}
 156     /* Load a YAML file using OpenCV */
 157     void Model::load(const std::string path)
 158     {
 159         cv::Mat points3d_mat;
 160
 161         cv::FileStorage storage(path, cv::FileStorage::READ);
 162         storage["points_3d"] >> points3d_mat;
 163         storage["descriptors"] >> descriptors_;
 164
 165         points3d_mat.copyTo(list_points3d_in_);
 166
 167         storage.release();
 168
 169     }
 170     @endcode
 171     In the main program the model is loaded as follows:
 172     @code{.cpp}
 173     Model model;               // instantiate Model object
 174     model.load(yml_read_path); // load a 3D textured object model
 175     @endcode
 176     In order to read the model mesh I implemented a *class* **Mesh** which has a function *load()*
 177     that opens a \f$*\f$.ply file and store the 3D points of the object and also the composed triangles.
 178     You can find an example of a model mesh in
 179     `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.ply`.
 180     @code{.cpp}
 181     /* Load a CSV with *.ply format */
 182     void Mesh::load(const std::string path)
 183     {
 184
 185         // Create the reader
 186         CsvReader csvReader(path);
 187
 188         // Clear previous data
 189         list_vertex_.clear();
 190         list_triangles_.clear();
 191
 192         // Read from .ply file
 193         csvReader.readPLY(list_vertex_, list_triangles_);
 194
 195         // Update mesh attributes
 196         num_vertexs_ = list_vertex_.size();
 197         num_triangles_ = list_triangles_.size();
 198
 199     }
 200     @endcode
 201     In the main program the mesh is loaded as follows:
 202     @code{.cpp}
 203     Mesh mesh;                // instantiate Mesh object
 204     mesh.load(ply_read_path); // load an object mesh
 205     @endcode
 206     You can also load different model and mesh:
 207     @code{.cpp}
 208     ./cpp-tutorial-pnp_detection --mesh=/absolute_path_to_your_mesh.ply --model=/absolute_path_to_your_model.yml
 209     @endcode
 210
 211 -#  **Take input from Camera or Video**
 212
 213     To detect is necessary capture video. It's done loading a recorded video by passing the absolute
 214     path where it is located in your machine. In order to test the application you can find a recorded
 215     video in `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.mp4`.
 216     @code{.cpp}
 217     cv::VideoCapture cap;                // instantiate VideoCapture
 218     cap.open(video_read_path);           // open a recorded video
 219
 220     if(!cap.isOpened())                  // check if we succeeded
 221     {
 222        std::cout << "Could not open the camera device" << std::endl;
 223        return -1;
 224     }
 225     @endcode
 226     Then the algorithm is computed frame per frame:
 227     @code{.cpp}
 228     cv::Mat frame, frame_vis;
 229
 230     while(cap.read(frame) && cv::waitKey(30) != 27)    // capture frame until ESC is pressed
 231     {
 232
 233         frame_vis = frame.clone();                     // refresh visualisation frame
 234
 235         // MAIN ALGORITHM
 236
 237     }
 238     @endcode
 239     You can also load different recorded video:
 240     @code{.cpp}
 241     ./cpp-tutorial-pnp_detection --video=/absolute_path_to_your_video.mp4
 242     @endcode
 243
 244 -#  **Extract ORB features and descriptors from the scene**
 245
 246     The next step is to detect the scene features and extract it descriptors. For this task I
 247     implemented a *class* **RobustMatcher** which has a function for keypoints detection and features
 248     extraction. You can find it in
 249     `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/RobusMatcher.cpp`. In your
 250     *RobusMatch* object you can use any of the 2D features detectors of OpenCV. In this case I used
 251     @ref cv::ORB features because is based on @ref cv::FAST to detect the keypoints and cv::xfeatures2d::BriefDescriptorExtractor
 252     to extract the descriptors which means that is fast and robust to rotations. You can find more
 253     detailed information about *ORB* in the documentation.
 254
 255     The following code is how to instantiate and set the features detector and the descriptors
 256     extractor:
 257     @code{.cpp}
 258     RobustMatcher rmatcher;                                                          // instantiate RobustMatcher
 259
 260     cv::FeatureDetector * detector = new cv::OrbFeatureDetector(numKeyPoints);       // instantiate ORB feature detector
 261     cv::DescriptorExtractor * extractor = new cv::OrbDescriptorExtractor();          // instantiate ORB descriptor extractor
 262
 263     rmatcher.setFeatureDetector(detector);                                           // set feature detector
 264     rmatcher.setDescriptorExtractor(extractor);                                      // set descriptor extractor
 265     @endcode
 266     The features and descriptors will be computed by the *RobustMatcher* inside the matching function.
 267
 268 -#  **Match scene descriptors with model descriptors using Flann matcher**
 269
 270     It is the first step in our detection algorithm. The main idea is to match the scene descriptors
 271     with our model descriptors in order to know the 3D coordinates of the found features into the
 272     current scene.
 273
 274     Firstly, we have to set which matcher we want to use. In this case is used
 275     @ref cv::FlannBasedMatcher matcher which in terms of computational cost is faster than the
 276     @ref cv::BFMatcher matcher as we increase the trained collection of features. Then, for
 277     FlannBased matcher the index created is *Multi-Probe LSH: Efficient Indexing for High-Dimensional
 278     Similarity Search* due to *ORB* descriptors are binary.
 279
 280     You can tune the *LSH* and search parameters to improve the matching efficiency:
 281     @code{.cpp}
 282     cv::Ptr<cv::flann::IndexParams> indexParams = cv::makePtr<cv::flann::LshIndexParams>(6, 12, 1); // instantiate LSH index parameters
 283     cv::Ptr<cv::flann::SearchParams> searchParams = cv::makePtr<cv::flann::SearchParams>(50);       // instantiate flann search parameters
 284
 285     cv::DescriptorMatcher * matcher = new cv::FlannBasedMatcher(indexParams, searchParams);         // instantiate FlannBased matcher
 286     rmatcher.setDescriptorMatcher(matcher);                                                         // set matcher
 287     @endcode
 288     Secondly, we have to call the matcher by using *robustMatch()* or *fastRobustMatch()* function.
 289     The difference of using this two functions is its computational cost. The first method is slower
 290     but more robust at filtering good matches because uses two ratio test and a symmetry test. In
 291     contrast, the second method is faster but less robust because only applies a single ratio test to
 292     the matches.
 293
 294     The following code is to get the model 3D points and its descriptors and then call the matcher in
 295     the main program:
 296     @code{.cpp}
 297     // Get the MODEL INFO
 298
 299     std::vector<cv::Point3f> list_points3d_model = model.get_points3d();  // list with model 3D coordinates
 300     cv::Mat descriptors_model = model.get_descriptors();                  // list with descriptors of each 3D coordinate
 301     @endcode
 302     @code{.cpp}
 303     // -- Step 1: Robust matching between model descriptors and scene descriptors
 304
 305     std::vector<cv::DMatch> good_matches;       // to obtain the model 3D points  in the scene
 306     std::vector<cv::KeyPoint> keypoints_scene;  // to obtain the 2D points of the scene
 307
 308     if(fast_match)
 309     {
 310         rmatcher.fastRobustMatch(frame, good_matches, keypoints_scene, descriptors_model);
 311     }
 312     else
 313     {
 314         rmatcher.robustMatch(frame, good_matches, keypoints_scene, descriptors_model);
 315     }
 316     @endcode
 317     The following code corresponds to the *robustMatch()* function which belongs to the
 318     *RobustMatcher* class. This function uses the given image to detect the keypoints and extract the
 319     descriptors, match using *two Nearest Neighbour* the extracted descriptors with the given model
 320     descriptors and vice versa. Then, a ratio test is applied to the two direction matches in order to
 321     remove these matches which its distance ratio between the first and second best match is larger
 322     than a given threshold. Finally, a symmetry test is applied in order the remove non symmetrical
 323     matches.
 324     @code{.cpp}
 325     void RobustMatcher::robustMatch( const cv::Mat& frame, std::vector<cv::DMatch>& good_matches,
 326                                      std::vector<cv::KeyPoint>& keypoints_frame,
 327                                      const std::vector<cv::KeyPoint>& keypoints_model, const cv::Mat& descriptors_model )
 328     {
 329
 330         // 1a. Detection of the ORB features
 331         this->computeKeyPoints(frame, keypoints_frame);
 332
 333         // 1b. Extraction of the ORB descriptors
 334         cv::Mat descriptors_frame;
 335         this->computeDescriptors(frame, keypoints_frame, descriptors_frame);
 336
 337         // 2. Match the two image descriptors
 338         std::vector<std::vector<cv::DMatch> > matches12, matches21;
 339
 340         // 2a. From image 1 to image 2
 341         matcher_->knnMatch(descriptors_frame, descriptors_model, matches12, 2); // return 2 nearest neighbours
 342
 343         // 2b. From image 2 to image 1
 344         matcher_->knnMatch(descriptors_model, descriptors_frame, matches21, 2); // return 2 nearest neighbours
 345
 346         // 3. Remove matches for which NN ratio is > than threshold
 347         // clean image 1 -> image 2 matches
 348         int removed1 = ratioTest(matches12);
 349         // clean image 2 -> image 1 matches
 350         int removed2 = ratioTest(matches21);
 351
 352         // 4. Remove non-symmetrical matches
 353         symmetryTest(matches12, matches21, good_matches);
 354
 355     }
 356     @endcode
 357     After the matches filtering we have to subtract the 2D and 3D correspondences from the found scene
 358     keypoints and our 3D model using the obtained *DMatches* vector. For more information about
 359     @ref cv::DMatch check the documentation.
 360     @code{.cpp}
 361     // -- Step 2: Find out the 2D/3D correspondences
 362
 363     std::vector<cv::Point3f> list_points3d_model_match;    // container for the model 3D coordinates found in the scene
 364     std::vector<cv::Point2f> list_points2d_scene_match;    // container for the model 2D coordinates found in the scene
 365
 366     for(unsigned int match_index = 0; match_index < good_matches.size(); ++match_index)
 367     {
 368         cv::Point3f point3d_model = list_points3d_model[ good_matches[match_index].trainIdx ];   // 3D point from model
 369         cv::Point2f point2d_scene = keypoints_scene[ good_matches[match_index].queryIdx ].pt;    // 2D point from the scene
 370         list_points3d_model_match.push_back(point3d_model);                                      // add 3D point
 371         list_points2d_scene_match.push_back(point2d_scene);                                      // add 2D point
 372     }
 373     @endcode
 374     You can also change the ratio test threshold, the number of keypoints to detect as well as use or
 375     not the robust matcher:
 376     @code{.cpp}
 377     ./cpp-tutorial-pnp_detection --ratio=0.8 --keypoints=1000 --fast=false
 378     @endcode
 379
 380 -#  **Pose estimation using PnP + Ransac**
 381
 382     Once with the 2D and 3D correspondences we have to apply a PnP algorithm in order to estimate the
 383     camera pose. The reason why we have to use @ref cv::solvePnPRansac instead of @ref cv::solvePnP is
 384     due to the fact that after the matching not all the found correspondences are correct and, as like
 385     as not, there are false correspondences or also called *outliers*. The [Random Sample
 386     Consensus](http://en.wikipedia.org/wiki/RANSAC) or *Ransac* is a non-deterministic iterative
 387     method which estimate parameters of a mathematical model from observed data producing an
 388     approximate result as the number of iterations increase. After applying *Ransac* all the *outliers*
 389     will be eliminated to then estimate the camera pose with a certain probability to obtain a good
 390     solution.
 391
 392     For the camera pose estimation I have implemented a *class* **PnPProblem**. This *class* has 4
 393     attributes: a given calibration matrix, the rotation matrix, the translation matrix and the
 394     rotation-translation matrix. The intrinsic calibration parameters of the camera which you are
 395     using to estimate the pose are necessary. In order to obtain the parameters you can check
 396     @ref tutorial_camera_calibration_square_chess and @ref tutorial_camera_calibration tutorials.
 397
 398     The following code is how to declare the *PnPProblem class* in the main program:
 399     @code{.cpp}
 400     // Intrinsic camera parameters: UVC WEBCAM
 401
 402     double f = 55;                           // focal length in mm
 403     double sx = 22.3, sy = 14.9;             // sensor size
 404     double width = 640, height = 480;        // image size
 405
 406     double params_WEBCAM[] = { width*f/sx,   // fx
 407                                height*f/sy,  // fy
 408                                width/2,      // cx
 409                                height/2};    // cy
 410
 411     PnPProblem pnp_detection(params_WEBCAM); // instantiate PnPProblem class
 412     @endcode
 413     The following code is how the *PnPProblem class* initialises its attributes:
 414     @code{.cpp}
 415     // Custom constructor given the intrinsic camera parameters
 416
 417     PnPProblem::PnPProblem(const double params[])
 418     {
 419       _A_matrix = cv::Mat::zeros(3, 3, CV_64FC1);   // intrinsic camera parameters
 420       _A_matrix.at<double>(0, 0) = params[0];       //      [ fx   0  cx ]
 421       _A_matrix.at<double>(1, 1) = params[1];       //      [  0  fy  cy ]
 422       _A_matrix.at<double>(0, 2) = params[2];       //      [  0   0   1 ]
 423       _A_matrix.at<double>(1, 2) = params[3];
 424       _A_matrix.at<double>(2, 2) = 1;
 425       _R_matrix = cv::Mat::zeros(3, 3, CV_64FC1);   // rotation matrix
 426       _t_matrix = cv::Mat::zeros(3, 1, CV_64FC1);   // translation matrix
 427       _P_matrix = cv::Mat::zeros(3, 4, CV_64FC1);   // rotation-translation matrix
 428
 429     }
 430     @endcode
 431     OpenCV provides four PnP methods: ITERATIVE, EPNP, P3P and DLS. Depending on the application type,
 432     the estimation method will be different. In the case that we want to make a real time application,
 433     the more suitable methods are EPNP and P3P since they are faster than ITERATIVE and DLS at
 434     finding an optimal solution. However, EPNP and P3P are not especially robust in front of planar
 435     surfaces and sometimes the pose estimation seems to have a mirror effect. Therefore, in this
 436     tutorial an ITERATIVE method is used due to the object to be detected has planar surfaces.
 437
 438     The OpenCV RANSAC implementation wants you to provide three parameters: 1) the maximum number of
 439     iterations until the algorithm stops, 2) the maximum allowed distance between the observed and
 440     computed point projections to consider it an inlier and 3) the confidence to obtain a good result.
 441     You can tune these parameters in order to improve your algorithm performance. Increasing the
 442     number of iterations will have a more accurate solution, but will take more time to find a
 443     solution. Increasing the reprojection error will reduce the computation time, but your solution
 444     will be unaccurate. Decreasing the confidence your algorithm will be faster, but the obtained
 445     solution will be unaccurate.
 446
 447     The following parameters work for this application:
 448     @code{.cpp}
 449     // RANSAC parameters
 450
 451     int iterationsCount = 500;        // number of Ransac iterations.
 452     float reprojectionError = 2.0;    // maximum allowed distance to consider it an inlier.
 453     float confidence = 0.95;          // RANSAC successful confidence.
 454     @endcode
 455     The following code corresponds to the *estimatePoseRANSAC()* function which belongs to the
 456     *PnPProblem class*. This function estimates the rotation and translation matrix given a set of
 457     2D/3D correspondences, the desired PnP method to use, the output inliers container and the Ransac
 458     parameters:
 459     @code{.cpp}
 460     // Estimate the pose given a list of 2D/3D correspondences with RANSAC and the method to use
 461
 462     void PnPProblem::estimatePoseRANSAC( const std::vector<cv::Point3f> &list_points3d,        // list with model 3D coordinates
 463                                          const std::vector<cv::Point2f> &list_points2d,        // list with scene 2D coordinates
 464                                          int flags, cv::Mat &inliers, int iterationsCount,     // PnP method; inliers container
 465                                          float reprojectionError, float confidence )           // RANSAC parameters
 466     {
 467         cv::Mat distCoeffs = cv::Mat::zeros(4, 1, CV_64FC1);    // vector of distortion coefficients
 468         cv::Mat rvec = cv::Mat::zeros(3, 1, CV_64FC1);          // output rotation vector
 469         cv::Mat tvec = cv::Mat::zeros(3, 1, CV_64FC1);          // output translation vector
 470
 471         bool useExtrinsicGuess = false;   // if true the function uses the provided rvec and tvec values as
 472                                           // initial approximations of the rotation and translation vectors
 473
 474         cv::solvePnPRansac( list_points3d, list_points2d, _A_matrix, distCoeffs, rvec, tvec,
 475                             useExtrinsicGuess, iterationsCount, reprojectionError, confidence,
 476                             inliers, flags );
 477
 478         Rodrigues(rvec,_R_matrix);                   // converts Rotation Vector to Matrix
 479         _t_matrix = tvec;                            // set translation matrix
 480
 481         this->set_P_matrix(_R_matrix, _t_matrix);    // set rotation-translation matrix
 482
 483     }
 484     @endcode
 485     In the following code are the 3th and 4th steps of the main algorithm. The first, calling the
 486     above function and the second taking the output inliers vector from RANSAC to get the 2D scene
 487     points for drawing purpose. As seen in the code we must be sure to apply RANSAC if we have
 488     matches, in the other case, the function @ref cv::solvePnPRansac crashes due to any OpenCV *bug*.
 489     @code{.cpp}
 490     if(good_matches.size() > 0) // None matches, then RANSAC crashes
 491     {
 492
 493         // -- Step 3: Estimate the pose using RANSAC approach
 494         pnp_detection.estimatePoseRANSAC( list_points3d_model_match, list_points2d_scene_match,
 495                                           pnpMethod, inliers_idx, iterationsCount, reprojectionError, confidence );
 496
 497
 498         // -- Step 4: Catch the inliers keypoints to draw
 499         for(int inliers_index = 0; inliers_index < inliers_idx.rows; ++inliers_index)
 500         {
 501         int n = inliers_idx.at<int>(inliers_index);         // i-inlier
 502         cv::Point2f point2d = list_points2d_scene_match[n]; // i-inlier point 2D
 503         list_points2d_inliers.push_back(point2d);           // add i-inlier to list
 504     }
 505     @endcode
 506     Finally, once the camera pose has been estimated we can use the \f$R\f$ and \f$t\f$ in order to compute
 507     the 2D projection onto the image of a given 3D point expressed in a world reference frame using
 508     the showed formula on *Theory*.
 509
 510     The following code corresponds to the *backproject3DPoint()* function which belongs to the
 511     *PnPProblem class*. The function backproject a given 3D point expressed in a world reference frame
 512     onto a 2D image:
 513     @code{.cpp}
 514     // Backproject a 3D point to 2D using the estimated pose parameters
 515
 516     cv::Point2f PnPProblem::backproject3DPoint(const cv::Point3f &point3d)
 517     {
 518         // 3D point vector [x y z 1]'
 519         cv::Mat point3d_vec = cv::Mat(4, 1, CV_64FC1);
 520         point3d_vec.at<double>(0) = point3d.x;
 521         point3d_vec.at<double>(1) = point3d.y;
 522         point3d_vec.at<double>(2) = point3d.z;
 523         point3d_vec.at<double>(3) = 1;
 524
 525         // 2D point vector [u v 1]'
 526         cv::Mat point2d_vec = cv::Mat(4, 1, CV_64FC1);
 527         point2d_vec = _A_matrix * _P_matrix * point3d_vec;
 528
 529         // Normalization of [u v]'
 530         cv::Point2f point2d;
 531         point2d.x = point2d_vec.at<double>(0) / point2d_vec.at<double>(2);
 532         point2d.y = point2d_vec.at<double>(1) / point2d_vec.at<double>(2);
 533
 534         return point2d;
 535     }
 536     @endcode
 537     The above function is used to compute all the 3D points of the object *Mesh* to show the pose of
 538     the object.
 539
 540     You can also change RANSAC parameters and PnP method:
 541     @code{.cpp}
 542     ./cpp-tutorial-pnp_detection --error=0.25 --confidence=0.90 --iterations=250 --method=3
 543     @endcode
 544
 545 -#  **Linear Kalman Filter for bad poses rejection**
 546
 547     Is it common in computer vision or robotics fields that after applying detection or tracking
 548     techniques, bad results are obtained due to some sensor errors. In order to avoid these bad
 549     detections in this tutorial is explained how to implement a Linear Kalman Filter. The Kalman
 550     Filter will be applied after detected a given number of inliers.
 551
 552     You can find more information about what [Kalman
 553     Filter](http://en.wikipedia.org/wiki/Kalman_filter) is. In this tutorial it's used the OpenCV
 554     implementation of the @ref cv::KalmanFilter based on
 555     [Linear Kalman Filter for position and orientation tracking](http://campar.in.tum.de/Chair/KalmanFilter)
 556     to set the dynamics and measurement models.
 557
 558     Firstly, we have to define our state vector which will have 18 states: the positional data (x,y,z)
 559     with its first and second derivatives (velocity and acceleration), then rotation is added in form
 560     of three euler angles (roll, pitch, jaw) together with their first and second derivatives (angular
 561     velocity and acceleration)
 562
 563     \f[X = (x,y,z,\dot x,\dot y,\dot z,\ddot x,\ddot y,\ddot z,\psi,\theta,\phi,\dot \psi,\dot \theta,\dot \phi,\ddot \psi,\ddot \theta,\ddot \phi)^T\f]
 564
 565     Secondly, we have to define the number of measurements which will be 6: from \f$R\f$ and \f$t\f$ we can
 566     extract \f$(x,y,z)\f$ and \f$(\psi,\theta,\phi)\f$. In addition, we have to define the number of control
 567     actions to apply to the system which in this case will be *zero*. Finally, we have to define the
 568     differential time between measurements which in this case is \f$1/T\f$, where *T* is the frame rate of
 569     the video.
 570     @code{.cpp}
 571     cv::KalmanFilter KF;         // instantiate Kalman Filter
 572
 573     int nStates = 18;            // the number of states
 574     int nMeasurements = 6;       // the number of measured states
 575     int nInputs = 0;             // the number of action control
 576
 577     double dt = 0.125;           // time between measurements (1/FPS)
 578
 579     initKalmanFilter(KF, nStates, nMeasurements, nInputs, dt);    // init function
 580     @endcode
 581     The following code corresponds to the *Kalman Filter* initialisation. Firstly, is set the process
 582     noise, the measurement noise and the error covariance matrix. Secondly, are set the transition
 583     matrix which is the dynamic model and finally the measurement matrix, which is the measurement
 584     model.
 585
 586     You can tune the process and measurement noise to improve the *Kalman Filter* performance. As the
 587     measurement noise is reduced the faster will converge doing the algorithm sensitive in front of
 588     bad measurements.
 589     @code{.cpp}
 590     void initKalmanFilter(cv::KalmanFilter &KF, int nStates, int nMeasurements, int nInputs, double dt)
 591     {
 592
 593       KF.init(nStates, nMeasurements, nInputs, CV_64F);                 // init Kalman Filter
 594
 595       cv::setIdentity(KF.processNoiseCov, cv::Scalar::all(1e-5));       // set process noise
 596       cv::setIdentity(KF.measurementNoiseCov, cv::Scalar::all(1e-4));   // set measurement noise
 597       cv::setIdentity(KF.errorCovPost, cv::Scalar::all(1));             // error covariance
 598
 599
 600                      /* DYNAMIC MODEL */
 601
 602       //  [1 0 0 dt  0  0 dt2   0   0 0 0 0  0  0  0   0   0   0]
 603       //  [0 1 0  0 dt  0   0 dt2   0 0 0 0  0  0  0   0   0   0]
 604       //  [0 0 1  0  0 dt   0   0 dt2 0 0 0  0  0  0   0   0   0]
 605       //  [0 0 0  1  0  0  dt   0   0 0 0 0  0  0  0   0   0   0]
 606       //  [0 0 0  0  1  0   0  dt   0 0 0 0  0  0  0   0   0   0]
 607       //  [0 0 0  0  0  1   0   0  dt 0 0 0  0  0  0   0   0   0]
 608       //  [0 0 0  0  0  0   1   0   0 0 0 0  0  0  0   0   0   0]
 609       //  [0 0 0  0  0  0   0   1   0 0 0 0  0  0  0   0   0   0]
 610       //  [0 0 0  0  0  0   0   0   1 0 0 0  0  0  0   0   0   0]
 611       //  [0 0 0  0  0  0   0   0   0 1 0 0 dt  0  0 dt2   0   0]
 612       //  [0 0 0  0  0  0   0   0   0 0 1 0  0 dt  0   0 dt2   0]
 613       //  [0 0 0  0  0  0   0   0   0 0 0 1  0  0 dt   0   0 dt2]
 614       //  [0 0 0  0  0  0   0   0   0 0 0 0  1  0  0  dt   0   0]
 615       //  [0 0 0  0  0  0   0   0   0 0 0 0  0  1  0   0  dt   0]
 616       //  [0 0 0  0  0  0   0   0   0 0 0 0  0  0  1   0   0  dt]
 617       //  [0 0 0  0  0  0   0   0   0 0 0 0  0  0  0   1   0   0]
 618       //  [0 0 0  0  0  0   0   0   0 0 0 0  0  0  0   0   1   0]
 619       //  [0 0 0  0  0  0   0   0   0 0 0 0  0  0  0   0   0   1]
 620
 621       // position
 622       KF.transitionMatrix.at<double>(0,3) = dt;
 623       KF.transitionMatrix.at<double>(1,4) = dt;
 624       KF.transitionMatrix.at<double>(2,5) = dt;
 625       KF.transitionMatrix.at<double>(3,6) = dt;
 626       KF.transitionMatrix.at<double>(4,7) = dt;
 627       KF.transitionMatrix.at<double>(5,8) = dt;
 628       KF.transitionMatrix.at<double>(0,6) = 0.5*pow(dt,2);
 629       KF.transitionMatrix.at<double>(1,7) = 0.5*pow(dt,2);
 630       KF.transitionMatrix.at<double>(2,8) = 0.5*pow(dt,2);
 631
 632       // orientation
 633       KF.transitionMatrix.at<double>(9,12) = dt;
 634       KF.transitionMatrix.at<double>(10,13) = dt;
 635       KF.transitionMatrix.at<double>(11,14) = dt;
 636       KF.transitionMatrix.at<double>(12,15) = dt;
 637       KF.transitionMatrix.at<double>(13,16) = dt;
 638       KF.transitionMatrix.at<double>(14,17) = dt;
 639       KF.transitionMatrix.at<double>(9,15) = 0.5*pow(dt,2);
 640       KF.transitionMatrix.at<double>(10,16) = 0.5*pow(dt,2);
 641       KF.transitionMatrix.at<double>(11,17) = 0.5*pow(dt,2);
 642
 643
 644            /* MEASUREMENT MODEL */
 645
 646       //  [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 647       //  [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 648       //  [0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
 649       //  [0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
 650       //  [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
 651       //  [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]
 652
 653       KF.measurementMatrix.at<double>(0,0) = 1;  // x
 654       KF.measurementMatrix.at<double>(1,1) = 1;  // y
 655       KF.measurementMatrix.at<double>(2,2) = 1;  // z
 656       KF.measurementMatrix.at<double>(3,9) = 1;  // roll
 657       KF.measurementMatrix.at<double>(4,10) = 1; // pitch
 658       KF.measurementMatrix.at<double>(5,11) = 1; // yaw
 659
 660     }
 661     @endcode
 662     In the following code is the 5th step of the main algorithm. When the obtained number of inliers
 663     after *Ransac* is over the threshold, the measurements matrix is filled and then the *Kalman
 664     Filter* is updated:
 665     @code{.cpp}
 666     // -- Step 5: Kalman Filter
 667
 668     // GOOD MEASUREMENT
 669     if( inliers_idx.rows >= minInliersKalman )
 670     {
 671
 672         // Get the measured translation
 673         cv::Mat translation_measured(3, 1, CV_64F);
 674         translation_measured = pnp_detection.get_t_matrix();
 675
 676         // Get the measured rotation
 677         cv::Mat rotation_measured(3, 3, CV_64F);
 678         rotation_measured = pnp_detection.get_R_matrix();
 679
 680         // fill the measurements vector
 681         fillMeasurements(measurements, translation_measured, rotation_measured);
 682
 683     }
 684
 685     // Instantiate estimated translation and rotation
 686     cv::Mat translation_estimated(3, 1, CV_64F);
 687     cv::Mat rotation_estimated(3, 3, CV_64F);
 688
 689     // update the Kalman filter with good measurements
 690     updateKalmanFilter( KF, measurements,
 691                   translation_estimated, rotation_estimated);
 692     @endcode
 693     The following code corresponds to the *fillMeasurements()* function which converts the measured
 694     [Rotation Matrix to Eulers
 695     angles](http://euclideanspace.com/maths/geometry/rotations/conversions/matrixToEuler/index.htm)
 696     and fill the measurements matrix along with the measured translation vector:
 697     @code{.cpp}
 698     void fillMeasurements( cv::Mat &measurements,
 699                        const cv::Mat &translation_measured, const cv::Mat &rotation_measured)
 700     {
 701         // Convert rotation matrix to euler angles
 702         cv::Mat measured_eulers(3, 1, CV_64F);
 703         measured_eulers = rot2euler(rotation_measured);
 704
 705         // Set measurement to predict
 706         measurements.at<double>(0) = translation_measured.at<double>(0); // x
 707         measurements.at<double>(1) = translation_measured.at<double>(1); // y
 708         measurements.at<double>(2) = translation_measured.at<double>(2); // z
 709         measurements.at<double>(3) = measured_eulers.at<double>(0);      // roll
 710         measurements.at<double>(4) = measured_eulers.at<double>(1);      // pitch
 711         measurements.at<double>(5) = measured_eulers.at<double>(2);      // yaw
 712     }
 713     @endcode
 714     The following code corresponds to the *updateKalmanFilter()* function which update the Kalman
 715     Filter and set the estimated Rotation Matrix and translation vector. The estimated Rotation Matrix
 716     comes from the estimated [Euler angles to Rotation
 717     Matrix](http://euclideanspace.com/maths/geometry/rotations/conversions/eulerToMatrix/index.htm).
 718     @code{.cpp}
 719     void updateKalmanFilter( cv::KalmanFilter &KF, cv::Mat &measurement,
 720                          cv::Mat &translation_estimated, cv::Mat &rotation_estimated )
 721     {
 722
 723         // First predict, to update the internal statePre variable
 724         cv::Mat prediction = KF.predict();
 725
 726         // The "correct" phase that is going to use the predicted value and our measurement
 727         cv::Mat estimated = KF.correct(measurement);
 728
 729         // Estimated translation
 730         translation_estimated.at<double>(0) = estimated.at<double>(0);
 731         translation_estimated.at<double>(1) = estimated.at<double>(1);
 732         translation_estimated.at<double>(2) = estimated.at<double>(2);
 733
 734         // Estimated euler angles
 735         cv::Mat eulers_estimated(3, 1, CV_64F);
 736         eulers_estimated.at<double>(0) = estimated.at<double>(9);
 737         eulers_estimated.at<double>(1) = estimated.at<double>(10);
 738         eulers_estimated.at<double>(2) = estimated.at<double>(11);
 739
 740         // Convert estimated quaternion to rotation matrix
 741         rotation_estimated = euler2rot(eulers_estimated);
 742
 743     }
 744     @endcode
 745     The 6th step is set the estimated rotation-translation matrix:
 746     @code{.cpp}
 747     // -- Step 6: Set estimated projection matrix
 748     pnp_detection_est.set_P_matrix(rotation_estimated, translation_estimated);
 749     @endcode
 750     The last and optional step is draw the found pose. To do it I implemented a function to draw all
 751     the mesh 3D points and an extra reference axis:
 752     @code{.cpp}
 753     // -- Step X: Draw pose
 754
 755     drawObjectMesh(frame_vis, &mesh, &pnp_detection, green);                // draw current pose
 756     drawObjectMesh(frame_vis, &mesh, &pnp_detection_est, yellow);           // draw estimated pose
 757
 758     double l = 5;
 759     std::vector<cv::Point2f> pose_points2d;
 760     pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,0,0)));    // axis center
 761     pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(l,0,0)));    // axis x
 762     pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,l,0)));    // axis y
 763     pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,0,l)));    // axis z
 764     draw3DCoordinateAxes(frame_vis, pose_points2d);                                       // draw axes
 765     @endcode
 766     You can also modify the minimum inliers to update Kalman Filter:
 767     @code{.cpp}
 768     ./cpp-tutorial-pnp_detection --inliers=20
 769     @endcode
 770
 771 Results
 772 -------
 773
 774 The following videos are the results of pose estimation in real time using the explained detection
 775 algorithm using the following parameters:
 776 @code{.cpp}
 777 // Robust Matcher parameters
 778
 779 int numKeyPoints = 2000;      // number of detected keypoints
 780 float ratio = 0.70f;          // ratio test
 781 bool fast_match = true;       // fastRobustMatch() or robustMatch()
 782
 783
 784 // RANSAC parameters
 785
 786 int iterationsCount = 500;    // number of Ransac iterations.
 787 int reprojectionError = 2.0;  // maximum allowed distance to consider it an inlier.
 788 float confidence = 0.95;      // ransac successful confidence.
 789
 790
 791 // Kalman Filter parameters
 792
 793 int minInliersKalman = 30;    // Kalman threshold updating
 794 @endcode
 795 You can watch the real time pose estimation on the [YouTube
 796 here](http://www.youtube.com/user/opencvdev/videos).
 797
 798 @youtube{XNATklaJlSQ}
 799 @youtube{YLS9bWek78k}