1 Real Time pose estimation of a textured object {#tutorial_real_time_pose}
2 ==============================================
4 @prev_tutorial{tutorial_camera_calibration}
5 @next_tutorial{tutorial_interactive_calibration}
8 Nowadays, augmented reality is one of the top research topic in computer vision and robotics fields.
9 The most elemental problem in augmented reality is the estimation of the camera pose respect of an
10 object in the case of computer vision area to do later some 3D rendering or in the case of robotics
11 obtain an object pose in order to grasp it and do some manipulation. However, this is not a trivial
12 problem to solve due to the fact that the most common issue in image processing is the computational
13 cost of applying a lot of algorithms or mathematical operations for solving a problem which is basic
14 and immediately for humans.
19 In this tutorial is explained how to build a real time application to estimate the camera pose in
20 order to track a textured object with six degrees of freedom given a 2D image and its 3D textured
23 The application will have the following parts:
25 - Read 3D textured object model and object mesh.
26 - Take input from Camera or Video.
27 - Extract ORB features and descriptors from the scene.
28 - Match scene descriptors with model descriptors using Flann matcher.
29 - Pose estimation using PnP + Ransac.
30 - Linear Kalman Filter for bad poses rejection.
35 In computer vision estimate the camera pose from *n* 3D-to-2D point correspondences is a fundamental
36 and well understood problem. The most general version of the problem requires estimating the six
37 degrees of freedom of the pose and five calibration parameters: focal length, principal point,
38 aspect ratio and skew. It could be established with a minimum of 6 correspondences, using the well
39 known Direct Linear Transform (DLT) algorithm. There are, though, several simplifications to the
40 problem which turn into an extensive list of different algorithms that improve the accuracy of the
43 The most common simplification is to assume known calibration parameters which is the so-called
44 Perspective-*n*-Point problem:
48 **Problem Formulation:** Given a set of correspondences between 3D points \f$p_i\f$ expressed in a world
49 reference frame, and their 2D projections \f$u_i\f$ onto the image, we seek to retrieve the pose (\f$R\f$
50 and \f$t\f$) of the camera w.r.t. the world and the focal length \f$f\f$.
52 OpenCV provides four different approaches to solve the Perspective-*n*-Point problem which return
53 \f$R\f$ and \f$t\f$. Then, using the following formula it's possible to project 3D points into the image
56 \f[s\ \left [ \begin{matrix} u \\ v \\ 1 \end{matrix} \right ] = \left [ \begin{matrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{matrix} \right ] \left [ \begin{matrix} r_{11} & r_{12} & r_{13} & t_1 \\ r_{21} & r_{22} & r_{23} & t_2 \\ r_{31} & r_{32} & r_{33} & t_3 \end{matrix} \right ] \left [ \begin{matrix} X \\ Y \\ Z\\ 1 \end{matrix} \right ]\f]
58 The complete documentation of how to manage with this equations is in @ref calib3d .
63 You can find the source code of this tutorial in the
64 `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/` folder of the OpenCV source library.
66 The tutorial consists of two main programs:
68 -# **Model registration**
70 This application is exclusive to whom don't have a 3D textured model of the object to be detected.
71 You can use this program to create your own textured 3D model. This program only works for planar
72 objects, then if you want to model an object with complex shape you should use a sophisticated
73 software to create it.
75 The application needs an input image of the object to be registered and its 3D mesh. We have also
76 to provide the intrinsic parameters of the camera with which the input image was taken. All the
77 files need to be specified using the absolute path or the relative one from your application’s
78 working directory. If none files are specified the program will try to open the provided default
81 The application starts up extracting the ORB features and descriptors from the input image and
82 then uses the mesh along with the [Möller–Trumbore intersection
83 algorithm](http://http://en.wikipedia.org/wiki/M%C3%B6ller%E2%80%93Trumbore_intersection_algorithm/)
84 to compute the 3D coordinates of the found features. Finally, the 3D points and the descriptors
85 are stored in different lists in a file with YAML format which each row is a different point. The
86 technical background on how to store the files can be found in the @ref tutorial_file_input_output_with_xml_yml
89 ![](images/registration.png)
91 -# **Model detection**
93 The aim of this application is estimate in real time the object pose given its 3D textured model.
95 The application starts up loading the 3D textured model in YAML file format with the same
96 structure explained in the model registration program. From the scene, the ORB features and
97 descriptors are detected and extracted. Then, is used @ref cv::FlannBasedMatcher with
98 @ref cv::flann::GenericIndex to do the matching between the scene descriptors and the model descriptors.
99 Using the found matches along with @ref cv::solvePnPRansac function the `R` and `t` of
100 the camera are computed. Finally, a KalmanFilter is applied in order to reject bad poses.
102 In the case that you compiled OpenCV with the samples, you can find it in opencv/build/bin/cpp-tutorial-pnp_detection\`.
103 Then you can run the application and change some parameters:
105 This program shows how to detect an object given its 3D textured model. You can choose to use a recorded video or the webcam.
107 ./cpp-tutorial-pnp_detection -help
110 --------------------------------------------------------------------------
112 Usage: cpp-tutorial-pnp_detection [params]
114 -c, --confidence (value:0.95)
116 -e, --error (value:2.0)
117 RANSAC reprojection error
118 -f, --fast (value:true)
119 use of robust fast match
120 -h, --help (value:true)
122 --in, --inliers (value:30)
123 minimum inliers for Kalman update
124 --it, --iterations (value:500)
125 RANSAC maximum iterations count
126 -k, --keypoints (value:2000)
127 number of keypoints to detect
130 --method, --pnp (value:0)
131 PnP method: (0) ITERATIVE - (1) EPNP - (2) P3P - (3) DLS
134 -r, --ratio (value:0.7)
135 threshold for ratio test
137 path to recorded video
139 For example, you can run the application changing the pnp method:
141 ./cpp-tutorial-pnp_detection --method=2
147 Here is explained in detail the code for the real time application:
149 -# **Read 3D textured object model and object mesh.**
151 In order to load the textured model I implemented the *class* **Model** which has the function
152 *load()* that opens a YAML file and take the stored 3D points with its corresponding descriptors.
153 You can find an example of a 3D textured model in
154 `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/cookies_ORB.yml`.
156 /* Load a YAML file using OpenCV */
157 void Model::load(const std::string path)
159 cv::Mat points3d_mat;
161 cv::FileStorage storage(path, cv::FileStorage::READ);
162 storage["points_3d"] >> points3d_mat;
163 storage["descriptors"] >> descriptors_;
165 points3d_mat.copyTo(list_points3d_in_);
171 In the main program the model is loaded as follows:
173 Model model; // instantiate Model object
174 model.load(yml_read_path); // load a 3D textured object model
176 In order to read the model mesh I implemented a *class* **Mesh** which has a function *load()*
177 that opens a \f$*\f$.ply file and store the 3D points of the object and also the composed triangles.
178 You can find an example of a model mesh in
179 `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.ply`.
181 /* Load a CSV with *.ply format */
182 void Mesh::load(const std::string path)
186 CsvReader csvReader(path);
188 // Clear previous data
189 list_vertex_.clear();
190 list_triangles_.clear();
192 // Read from .ply file
193 csvReader.readPLY(list_vertex_, list_triangles_);
195 // Update mesh attributes
196 num_vertexs_ = list_vertex_.size();
197 num_triangles_ = list_triangles_.size();
201 In the main program the mesh is loaded as follows:
203 Mesh mesh; // instantiate Mesh object
204 mesh.load(ply_read_path); // load an object mesh
206 You can also load different model and mesh:
208 ./cpp-tutorial-pnp_detection --mesh=/absolute_path_to_your_mesh.ply --model=/absolute_path_to_your_model.yml
211 -# **Take input from Camera or Video**
213 To detect is necessary capture video. It's done loading a recorded video by passing the absolute
214 path where it is located in your machine. In order to test the application you can find a recorded
215 video in `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.mp4`.
217 cv::VideoCapture cap; // instantiate VideoCapture
218 cap.open(video_read_path); // open a recorded video
220 if(!cap.isOpened()) // check if we succeeded
222 std::cout << "Could not open the camera device" << std::endl;
226 Then the algorithm is computed frame per frame:
228 cv::Mat frame, frame_vis;
230 while(cap.read(frame) && cv::waitKey(30) != 27) // capture frame until ESC is pressed
233 frame_vis = frame.clone(); // refresh visualisation frame
239 You can also load different recorded video:
241 ./cpp-tutorial-pnp_detection --video=/absolute_path_to_your_video.mp4
244 -# **Extract ORB features and descriptors from the scene**
246 The next step is to detect the scene features and extract it descriptors. For this task I
247 implemented a *class* **RobustMatcher** which has a function for keypoints detection and features
248 extraction. You can find it in
249 `samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/RobusMatcher.cpp`. In your
250 *RobusMatch* object you can use any of the 2D features detectors of OpenCV. In this case I used
251 @ref cv::ORB features because is based on @ref cv::FAST to detect the keypoints and cv::xfeatures2d::BriefDescriptorExtractor
252 to extract the descriptors which means that is fast and robust to rotations. You can find more
253 detailed information about *ORB* in the documentation.
255 The following code is how to instantiate and set the features detector and the descriptors
258 RobustMatcher rmatcher; // instantiate RobustMatcher
260 cv::FeatureDetector * detector = new cv::OrbFeatureDetector(numKeyPoints); // instantiate ORB feature detector
261 cv::DescriptorExtractor * extractor = new cv::OrbDescriptorExtractor(); // instantiate ORB descriptor extractor
263 rmatcher.setFeatureDetector(detector); // set feature detector
264 rmatcher.setDescriptorExtractor(extractor); // set descriptor extractor
266 The features and descriptors will be computed by the *RobustMatcher* inside the matching function.
268 -# **Match scene descriptors with model descriptors using Flann matcher**
270 It is the first step in our detection algorithm. The main idea is to match the scene descriptors
271 with our model descriptors in order to know the 3D coordinates of the found features into the
274 Firstly, we have to set which matcher we want to use. In this case is used
275 @ref cv::FlannBasedMatcher matcher which in terms of computational cost is faster than the
276 @ref cv::BFMatcher matcher as we increase the trained collection of features. Then, for
277 FlannBased matcher the index created is *Multi-Probe LSH: Efficient Indexing for High-Dimensional
278 Similarity Search* due to *ORB* descriptors are binary.
280 You can tune the *LSH* and search parameters to improve the matching efficiency:
282 cv::Ptr<cv::flann::IndexParams> indexParams = cv::makePtr<cv::flann::LshIndexParams>(6, 12, 1); // instantiate LSH index parameters
283 cv::Ptr<cv::flann::SearchParams> searchParams = cv::makePtr<cv::flann::SearchParams>(50); // instantiate flann search parameters
285 cv::DescriptorMatcher * matcher = new cv::FlannBasedMatcher(indexParams, searchParams); // instantiate FlannBased matcher
286 rmatcher.setDescriptorMatcher(matcher); // set matcher
288 Secondly, we have to call the matcher by using *robustMatch()* or *fastRobustMatch()* function.
289 The difference of using this two functions is its computational cost. The first method is slower
290 but more robust at filtering good matches because uses two ratio test and a symmetry test. In
291 contrast, the second method is faster but less robust because only applies a single ratio test to
294 The following code is to get the model 3D points and its descriptors and then call the matcher in
297 // Get the MODEL INFO
299 std::vector<cv::Point3f> list_points3d_model = model.get_points3d(); // list with model 3D coordinates
300 cv::Mat descriptors_model = model.get_descriptors(); // list with descriptors of each 3D coordinate
303 // -- Step 1: Robust matching between model descriptors and scene descriptors
305 std::vector<cv::DMatch> good_matches; // to obtain the model 3D points in the scene
306 std::vector<cv::KeyPoint> keypoints_scene; // to obtain the 2D points of the scene
310 rmatcher.fastRobustMatch(frame, good_matches, keypoints_scene, descriptors_model);
314 rmatcher.robustMatch(frame, good_matches, keypoints_scene, descriptors_model);
317 The following code corresponds to the *robustMatch()* function which belongs to the
318 *RobustMatcher* class. This function uses the given image to detect the keypoints and extract the
319 descriptors, match using *two Nearest Neighbour* the extracted descriptors with the given model
320 descriptors and vice versa. Then, a ratio test is applied to the two direction matches in order to
321 remove these matches which its distance ratio between the first and second best match is larger
322 than a given threshold. Finally, a symmetry test is applied in order the remove non symmetrical
325 void RobustMatcher::robustMatch( const cv::Mat& frame, std::vector<cv::DMatch>& good_matches,
326 std::vector<cv::KeyPoint>& keypoints_frame,
327 const std::vector<cv::KeyPoint>& keypoints_model, const cv::Mat& descriptors_model )
330 // 1a. Detection of the ORB features
331 this->computeKeyPoints(frame, keypoints_frame);
333 // 1b. Extraction of the ORB descriptors
334 cv::Mat descriptors_frame;
335 this->computeDescriptors(frame, keypoints_frame, descriptors_frame);
337 // 2. Match the two image descriptors
338 std::vector<std::vector<cv::DMatch> > matches12, matches21;
340 // 2a. From image 1 to image 2
341 matcher_->knnMatch(descriptors_frame, descriptors_model, matches12, 2); // return 2 nearest neighbours
343 // 2b. From image 2 to image 1
344 matcher_->knnMatch(descriptors_model, descriptors_frame, matches21, 2); // return 2 nearest neighbours
346 // 3. Remove matches for which NN ratio is > than threshold
347 // clean image 1 -> image 2 matches
348 int removed1 = ratioTest(matches12);
349 // clean image 2 -> image 1 matches
350 int removed2 = ratioTest(matches21);
352 // 4. Remove non-symmetrical matches
353 symmetryTest(matches12, matches21, good_matches);
357 After the matches filtering we have to subtract the 2D and 3D correspondences from the found scene
358 keypoints and our 3D model using the obtained *DMatches* vector. For more information about
359 @ref cv::DMatch check the documentation.
361 // -- Step 2: Find out the 2D/3D correspondences
363 std::vector<cv::Point3f> list_points3d_model_match; // container for the model 3D coordinates found in the scene
364 std::vector<cv::Point2f> list_points2d_scene_match; // container for the model 2D coordinates found in the scene
366 for(unsigned int match_index = 0; match_index < good_matches.size(); ++match_index)
368 cv::Point3f point3d_model = list_points3d_model[ good_matches[match_index].trainIdx ]; // 3D point from model
369 cv::Point2f point2d_scene = keypoints_scene[ good_matches[match_index].queryIdx ].pt; // 2D point from the scene
370 list_points3d_model_match.push_back(point3d_model); // add 3D point
371 list_points2d_scene_match.push_back(point2d_scene); // add 2D point
374 You can also change the ratio test threshold, the number of keypoints to detect as well as use or
375 not the robust matcher:
377 ./cpp-tutorial-pnp_detection --ratio=0.8 --keypoints=1000 --fast=false
380 -# **Pose estimation using PnP + Ransac**
382 Once with the 2D and 3D correspondences we have to apply a PnP algorithm in order to estimate the
383 camera pose. The reason why we have to use @ref cv::solvePnPRansac instead of @ref cv::solvePnP is
384 due to the fact that after the matching not all the found correspondences are correct and, as like
385 as not, there are false correspondences or also called *outliers*. The [Random Sample
386 Consensus](http://en.wikipedia.org/wiki/RANSAC) or *Ransac* is a non-deterministic iterative
387 method which estimate parameters of a mathematical model from observed data producing an
388 approximate result as the number of iterations increase. After applying *Ransac* all the *outliers*
389 will be eliminated to then estimate the camera pose with a certain probability to obtain a good
392 For the camera pose estimation I have implemented a *class* **PnPProblem**. This *class* has 4
393 attributes: a given calibration matrix, the rotation matrix, the translation matrix and the
394 rotation-translation matrix. The intrinsic calibration parameters of the camera which you are
395 using to estimate the pose are necessary. In order to obtain the parameters you can check
396 @ref tutorial_camera_calibration_square_chess and @ref tutorial_camera_calibration tutorials.
398 The following code is how to declare the *PnPProblem class* in the main program:
400 // Intrinsic camera parameters: UVC WEBCAM
402 double f = 55; // focal length in mm
403 double sx = 22.3, sy = 14.9; // sensor size
404 double width = 640, height = 480; // image size
406 double params_WEBCAM[] = { width*f/sx, // fx
411 PnPProblem pnp_detection(params_WEBCAM); // instantiate PnPProblem class
413 The following code is how the *PnPProblem class* initialises its attributes:
415 // Custom constructor given the intrinsic camera parameters
417 PnPProblem::PnPProblem(const double params[])
419 _A_matrix = cv::Mat::zeros(3, 3, CV_64FC1); // intrinsic camera parameters
420 _A_matrix.at<double>(0, 0) = params[0]; // [ fx 0 cx ]
421 _A_matrix.at<double>(1, 1) = params[1]; // [ 0 fy cy ]
422 _A_matrix.at<double>(0, 2) = params[2]; // [ 0 0 1 ]
423 _A_matrix.at<double>(1, 2) = params[3];
424 _A_matrix.at<double>(2, 2) = 1;
425 _R_matrix = cv::Mat::zeros(3, 3, CV_64FC1); // rotation matrix
426 _t_matrix = cv::Mat::zeros(3, 1, CV_64FC1); // translation matrix
427 _P_matrix = cv::Mat::zeros(3, 4, CV_64FC1); // rotation-translation matrix
431 OpenCV provides four PnP methods: ITERATIVE, EPNP, P3P and DLS. Depending on the application type,
432 the estimation method will be different. In the case that we want to make a real time application,
433 the more suitable methods are EPNP and P3P since they are faster than ITERATIVE and DLS at
434 finding an optimal solution. However, EPNP and P3P are not especially robust in front of planar
435 surfaces and sometimes the pose estimation seems to have a mirror effect. Therefore, in this
436 tutorial an ITERATIVE method is used due to the object to be detected has planar surfaces.
438 The OpenCV RANSAC implementation wants you to provide three parameters: 1) the maximum number of
439 iterations until the algorithm stops, 2) the maximum allowed distance between the observed and
440 computed point projections to consider it an inlier and 3) the confidence to obtain a good result.
441 You can tune these parameters in order to improve your algorithm performance. Increasing the
442 number of iterations will have a more accurate solution, but will take more time to find a
443 solution. Increasing the reprojection error will reduce the computation time, but your solution
444 will be unaccurate. Decreasing the confidence your algorithm will be faster, but the obtained
445 solution will be unaccurate.
447 The following parameters work for this application:
451 int iterationsCount = 500; // number of Ransac iterations.
452 float reprojectionError = 2.0; // maximum allowed distance to consider it an inlier.
453 float confidence = 0.95; // RANSAC successful confidence.
455 The following code corresponds to the *estimatePoseRANSAC()* function which belongs to the
456 *PnPProblem class*. This function estimates the rotation and translation matrix given a set of
457 2D/3D correspondences, the desired PnP method to use, the output inliers container and the Ransac
460 // Estimate the pose given a list of 2D/3D correspondences with RANSAC and the method to use
462 void PnPProblem::estimatePoseRANSAC( const std::vector<cv::Point3f> &list_points3d, // list with model 3D coordinates
463 const std::vector<cv::Point2f> &list_points2d, // list with scene 2D coordinates
464 int flags, cv::Mat &inliers, int iterationsCount, // PnP method; inliers container
465 float reprojectionError, float confidence ) // RANSAC parameters
467 cv::Mat distCoeffs = cv::Mat::zeros(4, 1, CV_64FC1); // vector of distortion coefficients
468 cv::Mat rvec = cv::Mat::zeros(3, 1, CV_64FC1); // output rotation vector
469 cv::Mat tvec = cv::Mat::zeros(3, 1, CV_64FC1); // output translation vector
471 bool useExtrinsicGuess = false; // if true the function uses the provided rvec and tvec values as
472 // initial approximations of the rotation and translation vectors
474 cv::solvePnPRansac( list_points3d, list_points2d, _A_matrix, distCoeffs, rvec, tvec,
475 useExtrinsicGuess, iterationsCount, reprojectionError, confidence,
478 Rodrigues(rvec,_R_matrix); // converts Rotation Vector to Matrix
479 _t_matrix = tvec; // set translation matrix
481 this->set_P_matrix(_R_matrix, _t_matrix); // set rotation-translation matrix
485 In the following code are the 3th and 4th steps of the main algorithm. The first, calling the
486 above function and the second taking the output inliers vector from RANSAC to get the 2D scene
487 points for drawing purpose. As seen in the code we must be sure to apply RANSAC if we have
488 matches, in the other case, the function @ref cv::solvePnPRansac crashes due to any OpenCV *bug*.
490 if(good_matches.size() > 0) // None matches, then RANSAC crashes
493 // -- Step 3: Estimate the pose using RANSAC approach
494 pnp_detection.estimatePoseRANSAC( list_points3d_model_match, list_points2d_scene_match,
495 pnpMethod, inliers_idx, iterationsCount, reprojectionError, confidence );
498 // -- Step 4: Catch the inliers keypoints to draw
499 for(int inliers_index = 0; inliers_index < inliers_idx.rows; ++inliers_index)
501 int n = inliers_idx.at<int>(inliers_index); // i-inlier
502 cv::Point2f point2d = list_points2d_scene_match[n]; // i-inlier point 2D
503 list_points2d_inliers.push_back(point2d); // add i-inlier to list
506 Finally, once the camera pose has been estimated we can use the \f$R\f$ and \f$t\f$ in order to compute
507 the 2D projection onto the image of a given 3D point expressed in a world reference frame using
508 the showed formula on *Theory*.
510 The following code corresponds to the *backproject3DPoint()* function which belongs to the
511 *PnPProblem class*. The function backproject a given 3D point expressed in a world reference frame
514 // Backproject a 3D point to 2D using the estimated pose parameters
516 cv::Point2f PnPProblem::backproject3DPoint(const cv::Point3f &point3d)
518 // 3D point vector [x y z 1]'
519 cv::Mat point3d_vec = cv::Mat(4, 1, CV_64FC1);
520 point3d_vec.at<double>(0) = point3d.x;
521 point3d_vec.at<double>(1) = point3d.y;
522 point3d_vec.at<double>(2) = point3d.z;
523 point3d_vec.at<double>(3) = 1;
525 // 2D point vector [u v 1]'
526 cv::Mat point2d_vec = cv::Mat(4, 1, CV_64FC1);
527 point2d_vec = _A_matrix * _P_matrix * point3d_vec;
529 // Normalization of [u v]'
531 point2d.x = point2d_vec.at<double>(0) / point2d_vec.at<double>(2);
532 point2d.y = point2d_vec.at<double>(1) / point2d_vec.at<double>(2);
537 The above function is used to compute all the 3D points of the object *Mesh* to show the pose of
540 You can also change RANSAC parameters and PnP method:
542 ./cpp-tutorial-pnp_detection --error=0.25 --confidence=0.90 --iterations=250 --method=3
545 -# **Linear Kalman Filter for bad poses rejection**
547 Is it common in computer vision or robotics fields that after applying detection or tracking
548 techniques, bad results are obtained due to some sensor errors. In order to avoid these bad
549 detections in this tutorial is explained how to implement a Linear Kalman Filter. The Kalman
550 Filter will be applied after detected a given number of inliers.
552 You can find more information about what [Kalman
553 Filter](http://en.wikipedia.org/wiki/Kalman_filter) is. In this tutorial it's used the OpenCV
554 implementation of the @ref cv::KalmanFilter based on
555 [Linear Kalman Filter for position and orientation tracking](http://campar.in.tum.de/Chair/KalmanFilter)
556 to set the dynamics and measurement models.
558 Firstly, we have to define our state vector which will have 18 states: the positional data (x,y,z)
559 with its first and second derivatives (velocity and acceleration), then rotation is added in form
560 of three euler angles (roll, pitch, jaw) together with their first and second derivatives (angular
561 velocity and acceleration)
563 \f[X = (x,y,z,\dot x,\dot y,\dot z,\ddot x,\ddot y,\ddot z,\psi,\theta,\phi,\dot \psi,\dot \theta,\dot \phi,\ddot \psi,\ddot \theta,\ddot \phi)^T\f]
565 Secondly, we have to define the number of measurements which will be 6: from \f$R\f$ and \f$t\f$ we can
566 extract \f$(x,y,z)\f$ and \f$(\psi,\theta,\phi)\f$. In addition, we have to define the number of control
567 actions to apply to the system which in this case will be *zero*. Finally, we have to define the
568 differential time between measurements which in this case is \f$1/T\f$, where *T* is the frame rate of
571 cv::KalmanFilter KF; // instantiate Kalman Filter
573 int nStates = 18; // the number of states
574 int nMeasurements = 6; // the number of measured states
575 int nInputs = 0; // the number of action control
577 double dt = 0.125; // time between measurements (1/FPS)
579 initKalmanFilter(KF, nStates, nMeasurements, nInputs, dt); // init function
581 The following code corresponds to the *Kalman Filter* initialisation. Firstly, is set the process
582 noise, the measurement noise and the error covariance matrix. Secondly, are set the transition
583 matrix which is the dynamic model and finally the measurement matrix, which is the measurement
586 You can tune the process and measurement noise to improve the *Kalman Filter* performance. As the
587 measurement noise is reduced the faster will converge doing the algorithm sensitive in front of
590 void initKalmanFilter(cv::KalmanFilter &KF, int nStates, int nMeasurements, int nInputs, double dt)
593 KF.init(nStates, nMeasurements, nInputs, CV_64F); // init Kalman Filter
595 cv::setIdentity(KF.processNoiseCov, cv::Scalar::all(1e-5)); // set process noise
596 cv::setIdentity(KF.measurementNoiseCov, cv::Scalar::all(1e-4)); // set measurement noise
597 cv::setIdentity(KF.errorCovPost, cv::Scalar::all(1)); // error covariance
602 // [1 0 0 dt 0 0 dt2 0 0 0 0 0 0 0 0 0 0 0]
603 // [0 1 0 0 dt 0 0 dt2 0 0 0 0 0 0 0 0 0 0]
604 // [0 0 1 0 0 dt 0 0 dt2 0 0 0 0 0 0 0 0 0]
605 // [0 0 0 1 0 0 dt 0 0 0 0 0 0 0 0 0 0 0]
606 // [0 0 0 0 1 0 0 dt 0 0 0 0 0 0 0 0 0 0]
607 // [0 0 0 0 0 1 0 0 dt 0 0 0 0 0 0 0 0 0]
608 // [0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]
609 // [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]
610 // [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]
611 // [0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0 dt2 0 0]
612 // [0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0 dt2 0]
613 // [0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0 dt2]
614 // [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0]
615 // [0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0]
616 // [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt]
617 // [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0]
618 // [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]
619 // [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]
622 KF.transitionMatrix.at<double>(0,3) = dt;
623 KF.transitionMatrix.at<double>(1,4) = dt;
624 KF.transitionMatrix.at<double>(2,5) = dt;
625 KF.transitionMatrix.at<double>(3,6) = dt;
626 KF.transitionMatrix.at<double>(4,7) = dt;
627 KF.transitionMatrix.at<double>(5,8) = dt;
628 KF.transitionMatrix.at<double>(0,6) = 0.5*pow(dt,2);
629 KF.transitionMatrix.at<double>(1,7) = 0.5*pow(dt,2);
630 KF.transitionMatrix.at<double>(2,8) = 0.5*pow(dt,2);
633 KF.transitionMatrix.at<double>(9,12) = dt;
634 KF.transitionMatrix.at<double>(10,13) = dt;
635 KF.transitionMatrix.at<double>(11,14) = dt;
636 KF.transitionMatrix.at<double>(12,15) = dt;
637 KF.transitionMatrix.at<double>(13,16) = dt;
638 KF.transitionMatrix.at<double>(14,17) = dt;
639 KF.transitionMatrix.at<double>(9,15) = 0.5*pow(dt,2);
640 KF.transitionMatrix.at<double>(10,16) = 0.5*pow(dt,2);
641 KF.transitionMatrix.at<double>(11,17) = 0.5*pow(dt,2);
644 /* MEASUREMENT MODEL */
646 // [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
647 // [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
648 // [0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
649 // [0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]
650 // [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]
651 // [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]
653 KF.measurementMatrix.at<double>(0,0) = 1; // x
654 KF.measurementMatrix.at<double>(1,1) = 1; // y
655 KF.measurementMatrix.at<double>(2,2) = 1; // z
656 KF.measurementMatrix.at<double>(3,9) = 1; // roll
657 KF.measurementMatrix.at<double>(4,10) = 1; // pitch
658 KF.measurementMatrix.at<double>(5,11) = 1; // yaw
662 In the following code is the 5th step of the main algorithm. When the obtained number of inliers
663 after *Ransac* is over the threshold, the measurements matrix is filled and then the *Kalman
666 // -- Step 5: Kalman Filter
669 if( inliers_idx.rows >= minInliersKalman )
672 // Get the measured translation
673 cv::Mat translation_measured(3, 1, CV_64F);
674 translation_measured = pnp_detection.get_t_matrix();
676 // Get the measured rotation
677 cv::Mat rotation_measured(3, 3, CV_64F);
678 rotation_measured = pnp_detection.get_R_matrix();
680 // fill the measurements vector
681 fillMeasurements(measurements, translation_measured, rotation_measured);
685 // Instantiate estimated translation and rotation
686 cv::Mat translation_estimated(3, 1, CV_64F);
687 cv::Mat rotation_estimated(3, 3, CV_64F);
689 // update the Kalman filter with good measurements
690 updateKalmanFilter( KF, measurements,
691 translation_estimated, rotation_estimated);
693 The following code corresponds to the *fillMeasurements()* function which converts the measured
694 [Rotation Matrix to Eulers
695 angles](http://euclideanspace.com/maths/geometry/rotations/conversions/matrixToEuler/index.htm)
696 and fill the measurements matrix along with the measured translation vector:
698 void fillMeasurements( cv::Mat &measurements,
699 const cv::Mat &translation_measured, const cv::Mat &rotation_measured)
701 // Convert rotation matrix to euler angles
702 cv::Mat measured_eulers(3, 1, CV_64F);
703 measured_eulers = rot2euler(rotation_measured);
705 // Set measurement to predict
706 measurements.at<double>(0) = translation_measured.at<double>(0); // x
707 measurements.at<double>(1) = translation_measured.at<double>(1); // y
708 measurements.at<double>(2) = translation_measured.at<double>(2); // z
709 measurements.at<double>(3) = measured_eulers.at<double>(0); // roll
710 measurements.at<double>(4) = measured_eulers.at<double>(1); // pitch
711 measurements.at<double>(5) = measured_eulers.at<double>(2); // yaw
714 The following code corresponds to the *updateKalmanFilter()* function which update the Kalman
715 Filter and set the estimated Rotation Matrix and translation vector. The estimated Rotation Matrix
716 comes from the estimated [Euler angles to Rotation
717 Matrix](http://euclideanspace.com/maths/geometry/rotations/conversions/eulerToMatrix/index.htm).
719 void updateKalmanFilter( cv::KalmanFilter &KF, cv::Mat &measurement,
720 cv::Mat &translation_estimated, cv::Mat &rotation_estimated )
723 // First predict, to update the internal statePre variable
724 cv::Mat prediction = KF.predict();
726 // The "correct" phase that is going to use the predicted value and our measurement
727 cv::Mat estimated = KF.correct(measurement);
729 // Estimated translation
730 translation_estimated.at<double>(0) = estimated.at<double>(0);
731 translation_estimated.at<double>(1) = estimated.at<double>(1);
732 translation_estimated.at<double>(2) = estimated.at<double>(2);
734 // Estimated euler angles
735 cv::Mat eulers_estimated(3, 1, CV_64F);
736 eulers_estimated.at<double>(0) = estimated.at<double>(9);
737 eulers_estimated.at<double>(1) = estimated.at<double>(10);
738 eulers_estimated.at<double>(2) = estimated.at<double>(11);
740 // Convert estimated quaternion to rotation matrix
741 rotation_estimated = euler2rot(eulers_estimated);
745 The 6th step is set the estimated rotation-translation matrix:
747 // -- Step 6: Set estimated projection matrix
748 pnp_detection_est.set_P_matrix(rotation_estimated, translation_estimated);
750 The last and optional step is draw the found pose. To do it I implemented a function to draw all
751 the mesh 3D points and an extra reference axis:
753 // -- Step X: Draw pose
755 drawObjectMesh(frame_vis, &mesh, &pnp_detection, green); // draw current pose
756 drawObjectMesh(frame_vis, &mesh, &pnp_detection_est, yellow); // draw estimated pose
759 std::vector<cv::Point2f> pose_points2d;
760 pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,0,0))); // axis center
761 pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(l,0,0))); // axis x
762 pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,l,0))); // axis y
763 pose_points2d.push_back(pnp_detection_est.backproject3DPoint(cv::Point3f(0,0,l))); // axis z
764 draw3DCoordinateAxes(frame_vis, pose_points2d); // draw axes
766 You can also modify the minimum inliers to update Kalman Filter:
768 ./cpp-tutorial-pnp_detection --inliers=20
774 The following videos are the results of pose estimation in real time using the explained detection
775 algorithm using the following parameters:
777 // Robust Matcher parameters
779 int numKeyPoints = 2000; // number of detected keypoints
780 float ratio = 0.70f; // ratio test
781 bool fast_match = true; // fastRobustMatch() or robustMatch()
786 int iterationsCount = 500; // number of Ransac iterations.
787 int reprojectionError = 2.0; // maximum allowed distance to consider it an inlier.
788 float confidence = 0.95; // ransac successful confidence.
791 // Kalman Filter parameters
793 int minInliersKalman = 30; // Kalman threshold updating
795 You can watch the real time pose estimation on the [YouTube
796 here](http://www.youtube.com/user/opencvdev/videos).
798 @youtube{XNATklaJlSQ}
799 @youtube{YLS9bWek78k}