Merge pull request #20422 from fengyuentau:dnn_face
authorYuantao Feng <yuantao.feng@outlook.com>
Fri, 8 Oct 2021 19:13:49 +0000 (03:13 +0800)
committerGitHub <noreply@github.com>
Fri, 8 Oct 2021 19:13:49 +0000 (19:13 +0000)
Add DNN-based face detection and face recognition into modules/objdetect

* Add DNN-based face detector impl and interface

* Add a sample for DNN-based face detector

* add recog

* add notes

* move samples from samples/cpp to samples/dnn

* add documentation for dnn_face

* add set/get methods for input size, nms & score threshold and topk

* remove the DNN prefix from the face detector and face recognizer

* remove default values in the constructor of impl

* regenerate priors after setting input size

* two filenames for readnet

* Update face.hpp

* Update face_recognize.cpp

* Update face_match.cpp

* Update face.hpp

* Update face_recognize.cpp

* Update face_match.cpp

* Update face_recognize.cpp

* Update dnn_face.markdown

* Update dnn_face.markdown

* Update face.hpp

* Update dnn_face.markdown

* add regression test for face detection

* remove underscore prefix; fix warnings

* add reference & acknowledgement for face detection

* Update dnn_face.markdown

* Update dnn_face.markdown

* Update ts.hpp

* Update test_face.cpp

* Update face_match.cpp

* fix a compile error for python interface; add python examples for face detection and recognition

* Major changes for Vadim's comments:

* Replace class name FaceDetector with FaceDetectorYN in related failes

* Declare local mat before loop in modules/objdetect/src/face_detect.cpp

* Make input image and save flag optional in samples/dnn/face_detect(.cpp, .py)

* Add camera support in samples/dnn/face_detect(.cpp, .py)

* correct file paths for regression test

* fix convertion warnings; remove extra spaces

* update face_recog

* Update dnn_face.markdown

* Fix warnings and errors for the default CI reports:

* Remove trailing white spaces and extra new lines.

* Fix convertion warnings for windows and iOS.

* Add braces around initialization of subobjects.

* Fix warnings and errors for the default CI systems:

* Add prefix 'FR_' for each value name in enum DisType to solve the
redefinition error for iOS compilation; Modify other code accordingly

* Add bookmark '#tutorial_dnn_face' to solve warnings from doxygen

* Correct documentations to solve warnings from doxygen

* update FaceRecognizerSF

* Fix the error for CI to find ONNX models correctly

* add suffix f to float assignments

* add backend & target options for initializing face recognizer

* add checkeq for checking input size and preset size

* update test and threshold

* changes in response to alalek's comments:

* fix typos in samples/dnn/face_match.py

* import numpy before importing cv2

* add documentation to .setInputSize()

* remove extra include in face_recognize.cpp

* fix some bugs

* Update dnn_face.markdown

* update thresholds; remove useless code

* add time suffix to YuNet filename in test

* objdetect: update test code

17 files changed:
doc/tutorials/dnn/dnn_face/dnn_face.markdown [new file with mode: 0644]
doc/tutorials/dnn/dnn_text_spotting/dnn_text_spotting.markdown
doc/tutorials/dnn/table_of_content_dnn.markdown
modules/objdetect/CMakeLists.txt
modules/objdetect/include/opencv2/objdetect.hpp
modules/objdetect/include/opencv2/objdetect/face.hpp [new file with mode: 0644]
modules/objdetect/src/face_detect.cpp [new file with mode: 0644]
modules/objdetect/src/face_recognize.cpp [new file with mode: 0644]
modules/objdetect/test/test_face.cpp [new file with mode: 0644]
modules/objdetect/test/test_main.cpp
modules/ts/include/opencv2/ts.hpp
samples/dnn/CMakeLists.txt
samples/dnn/face_detect.cpp [new file with mode: 0644]
samples/dnn/face_detect.py [new file with mode: 0644]
samples/dnn/face_match.cpp [new file with mode: 0644]
samples/dnn/face_match.py [new file with mode: 0644]
samples/dnn/results/audrybt1.jpg [new file with mode: 0644]

diff --git a/doc/tutorials/dnn/dnn_face/dnn_face.markdown b/doc/tutorials/dnn/dnn_face/dnn_face.markdown
new file mode 100644 (file)
index 0000000..e5092b8
--- /dev/null
@@ -0,0 +1,95 @@
+# DNN-based Face Detection And Recognition {#tutorial_dnn_face}
+
+@tableofcontents
+
+@prev_tutorial{tutorial_dnn_text_spotting}
+@next_tutorial{pytorch_cls_tutorial_dnn_conversion}
+
+| | |
+| -: | :- |
+| Original Author | Chengrui Wang, Yuantao Feng |
+| Compatibility | OpenCV >= 4.5.1 |
+
+## Introduction
+
+In this section, we introduce the DNN-based module for face detection and face recognition. Models can be obtained in [Models](#Models). The usage of `FaceDetectorYN` and `FaceRecognizer` are presented in [Usage](#Usage).
+
+## Models
+
+There are two models (ONNX format) pre-trained and required for this module:
+- [Face Detection](https://github.com/ShiqiYu/libfacedetection.train/tree/master/tasks/task1/onnx):
+    - Size: 337KB
+    - Results on WIDER Face Val set: 0.830(easy), 0.824(medium), 0.708(hard)
+- [Face Recognition](https://drive.google.com/file/d/1ClK9WiB492c5OZFKveF3XiHCejoOxINW/view?usp=sharing)
+    - Size: 36.9MB
+    - Results:
+
+    | Database | Accuracy | Threshold (normL2) | Threshold (cosine) |
+    | -------- | -------- | ------------------ | ------------------ |
+    | LFW      | 99.60%   | 1.128              | 0.363              |
+    | CALFW    | 93.95%   | 1.149              | 0.340              |
+    | CPLFW    | 91.05%   | 1.204              | 0.275              |
+    | AgeDB-30 | 94.90%   | 1.202              | 0.277              |
+    | CFP-FP   | 94.80%   | 1.253              | 0.212              |
+
+## Usage
+
+### DNNFaceDetector
+
+```cpp
+// Initialize FaceDetectorYN
+Ptr<FaceDetectorYN> faceDetector = FaceDetectorYN::create(onnx_path, "", image.size(), score_thresh, nms_thresh, top_k);
+
+// Forward
+Mat faces;
+faceDetector->detect(image, faces);
+```
+
+The detection output `faces` is a two-dimension array of type CV_32F, whose rows are the detected face instances, columns are the location of a face and 5 facial landmarks. The format of each row is as follows:
+
+```
+x1, y1, w, h, x_re, y_re, x_le, y_le, x_nt, y_nt, x_rcm, y_rcm, x_lcm, y_lcm
+```
+, where `x1, y1, w, h` are the top-left coordinates, width and height of the face bounding box, `{x, y}_{re, le, nt, rcm, lcm}` stands for the coordinates of right eye, left eye, nose tip, the right corner and left corner of the mouth respectively.
+
+
+### Face Recognition
+
+Following Face Detection, run codes below to extract face feature from facial image.
+
+```cpp
+// Initialize FaceRecognizer with model path (cv::String)
+Ptr<FaceRecognizer> faceRecognizer = FaceRecognizer::create(model_path, "");
+
+// Aligning and cropping facial image through the first face of faces detected by dnn_face::DNNFaceDetector
+Mat aligned_face;
+faceRecognizer->alignCrop(image, faces.row(0), aligned_face);
+
+// Run feature extraction with given aligned_face (cv::Mat)
+Mat feature;
+faceRecognizer->feature(aligned_face, feature);
+feature = feature.clone();
+```
+
+After obtaining face features *feature1* and *feature2* of two facial images, run codes below to calculate the identity discrepancy between the two faces.
+
+```cpp
+// Calculating the discrepancy between two face features by using cosine distance.
+double cos_score = faceRecognizer->match(feature1, feature2, FaceRecognizer::DisType::COSINE);
+// Calculating the discrepancy between two face features by using normL2 distance.
+double L2_score = faceRecognizer->match(feature1, feature2, FaceRecognizer::DisType::NORM_L2);
+```
+
+For example, two faces have same identity if the cosine distance is greater than or equal to 0.363, or the normL2 distance is less than or equal to 1.128.
+
+## Reference:
+
+- https://github.com/ShiqiYu/libfacedetection
+- https://github.com/ShiqiYu/libfacedetection.train
+- https://github.com/zhongyy/SFace
+
+## Acknowledgement
+
+Thanks [Professor Shiqi Yu](https://github.com/ShiqiYu/) and [Yuantao Feng](https://github.com/fengyuentau) for training and providing the face detection model.
+
+Thanks [Professor Deng](http://www.whdeng.cn/), [PhD Candidate Zhong](https://github.com/zhongyy/) and [Master Candidate Wang](https://github.com/crywang/) for training and providing the face recognition model.
index b0be262..5c46594 100644 (file)
@@ -3,7 +3,7 @@
 @tableofcontents
 
 @prev_tutorial{tutorial_dnn_OCR}
-@next_tutorial{pytorch_cls_tutorial_dnn_conversion}
+@next_tutorial{tutorial_dnn_face}
 
 |    |    |
 | -: | :- |
index 0d5e43e..3f74826 100644 (file)
@@ -10,6 +10,7 @@ Deep Neural Networks (dnn module) {#tutorial_table_of_content_dnn}
 -   @subpage tutorial_dnn_custom_layers
 -   @subpage tutorial_dnn_OCR
 -   @subpage tutorial_dnn_text_spotting
+-   @subpage tutorial_dnn_face
 
 #### PyTorch models with OpenCV
 In this section you will find the guides, which describe how to run classification, segmentation and detection PyTorch DNN models with OpenCV.
index 3fa0c5d..f4d5b22 100644 (file)
@@ -1,5 +1,5 @@
 set(the_description "Object Detection")
-ocv_define_module(objdetect opencv_core opencv_imgproc opencv_calib3d WRAP java objc python js)
+ocv_define_module(objdetect opencv_core opencv_imgproc opencv_calib3d opencv_dnn WRAP java objc python js)
 
 if(HAVE_QUIRC)
     get_property(QUIRC_INCLUDE GLOBAL PROPERTY QUIRC_INCLUDE_DIR)
index eaee129..59dca73 100644 (file)
@@ -768,5 +768,6 @@ protected:
 }
 
 #include "opencv2/objdetect/detection_based_tracker.hpp"
+#include "opencv2/objdetect/face.hpp"
 
 #endif
diff --git a/modules/objdetect/include/opencv2/objdetect/face.hpp b/modules/objdetect/include/opencv2/objdetect/face.hpp
new file mode 100644 (file)
index 0000000..f2429c5
--- /dev/null
@@ -0,0 +1,125 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_OBJDETECT_FACE_HPP
+#define OPENCV_OBJDETECT_FACE_HPP
+
+#include <opencv2/core.hpp>
+
+/** @defgroup dnn_face DNN-based face detection and recognition
+ */
+
+namespace cv
+{
+
+/** @brief DNN-based face detector, model download link: https://github.com/ShiqiYu/libfacedetection.train/tree/master/tasks/task1/onnx.
+ */
+class CV_EXPORTS_W FaceDetectorYN
+{
+public:
+    virtual ~FaceDetectorYN() {};
+
+    /** @brief Set the size for the network input, which overwrites the input size of creating model. Call this method when the size of input image does not match the input size when creating model
+     *
+     * @param input_size the size of the input image
+     */
+    CV_WRAP virtual void setInputSize(const Size& input_size) = 0;
+
+    CV_WRAP virtual Size getInputSize() = 0;
+
+    /** @brief Set the score threshold to filter out bounding boxes of score less than the given value
+     *
+     * @param score_threshold threshold for filtering out bounding boxes
+     */
+    CV_WRAP virtual void setScoreThreshold(float score_threshold) = 0;
+
+    CV_WRAP virtual float getScoreThreshold() = 0;
+
+    /** @brief Set the Non-maximum-suppression threshold to suppress bounding boxes that have IoU greater than the given value
+     *
+     * @param nms_threshold threshold for NMS operation
+     */
+    CV_WRAP virtual void setNMSThreshold(float nms_threshold) = 0;
+
+    CV_WRAP virtual float getNMSThreshold() = 0;
+
+    /** @brief Set the number of bounding boxes preserved before NMS
+     *
+     * @param top_k the number of bounding boxes to preserve from top rank based on score
+     */
+    CV_WRAP virtual void setTopK(int top_k) = 0;
+
+    CV_WRAP virtual int getTopK() = 0;
+
+    /** @brief A simple interface to detect face from given image
+     *
+     *  @param image an image to detect
+     *  @param faces detection results stored in a cv::Mat
+     */
+    CV_WRAP virtual int detect(InputArray image, OutputArray faces) = 0;
+
+    /** @brief Creates an instance of this class with given parameters
+     *
+     *  @param model the path to the requested model
+     *  @param config the path to the config file for compability, which is not requested for ONNX models
+     *  @param input_size the size of the input image
+     *  @param score_threshold the threshold to filter out bounding boxes of score smaller than the given value
+     *  @param nms_threshold the threshold to suppress bounding boxes of IoU bigger than the given value
+     *  @param top_k keep top K bboxes before NMS
+     *  @param backend_id the id of backend
+     *  @param target_id the id of target device
+     */
+    CV_WRAP static Ptr<FaceDetectorYN> create(const String& model,
+                                              const String& config,
+                                              const Size& input_size,
+                                              float score_threshold = 0.9f,
+                                              float nms_threshold = 0.3f,
+                                              int top_k = 5000,
+                                              int backend_id = 0,
+                                              int target_id = 0);
+};
+
+/** @brief DNN-based face recognizer, model download link: https://drive.google.com/file/d/1ClK9WiB492c5OZFKveF3XiHCejoOxINW/view.
+ */
+class CV_EXPORTS_W FaceRecognizerSF
+{
+public:
+    virtual ~FaceRecognizerSF() {};
+
+    /** @brief Definition of distance used for calculating the distance between two face features
+     */
+    enum DisType { FR_COSINE=0, FR_NORM_L2=1 };
+
+    /** @brief Aligning image to put face on the standard position
+     *  @param src_img input image
+     *  @param face_box the detection result used for indicate face in input image
+     *  @param aligned_img output aligned image
+     */
+    CV_WRAP virtual void alignCrop(InputArray src_img, InputArray face_box, OutputArray aligned_img) const = 0;
+
+    /** @brief Extracting face feature from aligned image
+     *  @param aligned_img input aligned image
+     *  @param face_feature output face feature
+     */
+    CV_WRAP virtual void feature(InputArray aligned_img, OutputArray face_feature) = 0;
+
+    /** @brief Calculating the distance between two face features
+     *  @param _face_feature1 the first input feature
+     *  @param _face_feature2 the second input feature of the same size and the same type as _face_feature1
+     *  @param dis_type defining the similarity with optional values "FR_OSINE" or "FR_NORM_L2"
+     */
+    CV_WRAP virtual double match(InputArray _face_feature1, InputArray _face_feature2, int dis_type = FaceRecognizerSF::FR_COSINE) const = 0;
+
+    /** @brief Creates an instance of this class with given parameters
+     *  @param model the path of the onnx model used for face recognition
+     *  @param config the path to the config file for compability, which is not requested for ONNX models
+     *  @param backend_id the id of backend
+     *  @param target_id the id of target device
+     */
+    CV_WRAP static Ptr<FaceRecognizerSF> create(const String& model, const String& config, int backend_id = 0, int target_id = 0);
+};
+
+} // namespace cv
+
+#endif
diff --git a/modules/objdetect/src/face_detect.cpp b/modules/objdetect/src/face_detect.cpp
new file mode 100644 (file)
index 0000000..4095745
--- /dev/null
@@ -0,0 +1,288 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+
+#include "opencv2/imgproc.hpp"
+#include "opencv2/core.hpp"
+#include "opencv2/dnn.hpp"
+
+#include <algorithm>
+
+namespace cv
+{
+
+class FaceDetectorYNImpl : public FaceDetectorYN
+{
+public:
+    FaceDetectorYNImpl(const String& model,
+                       const String& config,
+                       const Size& input_size,
+                       float score_threshold,
+                       float nms_threshold,
+                       int top_k,
+                       int backend_id,
+                       int target_id)
+    {
+        net = dnn::readNet(model, config);
+        CV_Assert(!net.empty());
+
+        net.setPreferableBackend(backend_id);
+        net.setPreferableTarget(target_id);
+
+        inputW = input_size.width;
+        inputH = input_size.height;
+
+        scoreThreshold = score_threshold;
+        nmsThreshold = nms_threshold;
+        topK = top_k;
+
+        generatePriors();
+    }
+
+    void setInputSize(const Size& input_size) override
+    {
+        inputW = input_size.width;
+        inputH = input_size.height;
+        generatePriors();
+    }
+
+    Size getInputSize() override
+    {
+        Size input_size;
+        input_size.width = inputW;
+        input_size.height = inputH;
+        return input_size;
+    }
+
+    void setScoreThreshold(float score_threshold) override
+    {
+        scoreThreshold = score_threshold;
+    }
+
+    float getScoreThreshold() override
+    {
+        return scoreThreshold;
+    }
+
+    void setNMSThreshold(float nms_threshold) override
+    {
+        nmsThreshold = nms_threshold;
+    }
+
+    float getNMSThreshold() override
+    {
+        return nmsThreshold;
+    }
+
+    void setTopK(int top_k) override
+    {
+        topK = top_k;
+    }
+
+    int getTopK() override
+    {
+        return topK;
+    }
+
+    int detect(InputArray input_image, OutputArray faces) override
+    {
+        // TODO: more checkings should be done?
+        if (input_image.empty())
+        {
+            return 0;
+        }
+        CV_CheckEQ(input_image.size(), Size(inputW, inputH), "Size does not match. Call setInputSize(size) if input size does not match the preset size");
+
+        // Build blob from input image
+        Mat input_blob = dnn::blobFromImage(input_image);
+
+        // Forward
+        std::vector<String> output_names = { "loc", "conf", "iou" };
+        std::vector<Mat> output_blobs;
+        net.setInput(input_blob);
+        net.forward(output_blobs, output_names);
+
+        // Post process
+        Mat results = postProcess(output_blobs);
+        results.convertTo(faces, CV_32FC1);
+        return 1;
+    }
+private:
+    void generatePriors()
+    {
+        // Calculate shapes of different scales according to the shape of input image
+        Size feature_map_2nd = {
+            int(int((inputW+1)/2)/2), int(int((inputH+1)/2)/2)
+        };
+        Size feature_map_3rd = {
+            int(feature_map_2nd.width/2), int(feature_map_2nd.height/2)
+        };
+        Size feature_map_4th = {
+            int(feature_map_3rd.width/2), int(feature_map_3rd.height/2)
+        };
+        Size feature_map_5th = {
+            int(feature_map_4th.width/2), int(feature_map_4th.height/2)
+        };
+        Size feature_map_6th = {
+            int(feature_map_5th.width/2), int(feature_map_5th.height/2)
+        };
+
+        std::vector<Size> feature_map_sizes;
+        feature_map_sizes.push_back(feature_map_3rd);
+        feature_map_sizes.push_back(feature_map_4th);
+        feature_map_sizes.push_back(feature_map_5th);
+        feature_map_sizes.push_back(feature_map_6th);
+
+        // Fixed params for generating priors
+        const std::vector<std::vector<float>> min_sizes = {
+            {10.0f,  16.0f,  24.0f},
+            {32.0f,  48.0f},
+            {64.0f,  96.0f},
+            {128.0f, 192.0f, 256.0f}
+        };
+        const std::vector<int> steps = { 8, 16, 32, 64 };
+
+        // Generate priors
+        priors.clear();
+        for (size_t i = 0; i < feature_map_sizes.size(); ++i)
+        {
+            Size feature_map_size = feature_map_sizes[i];
+            std::vector<float> min_size = min_sizes[i];
+
+            for (int _h = 0; _h < feature_map_size.height; ++_h)
+            {
+                for (int _w = 0; _w < feature_map_size.width; ++_w)
+                {
+                    for (size_t j = 0; j < min_size.size(); ++j)
+                    {
+                        float s_kx = min_size[j] / inputW;
+                        float s_ky = min_size[j] / inputH;
+
+                        float cx = (_w + 0.5f) * steps[i] / inputW;
+                        float cy = (_h + 0.5f) * steps[i] / inputH;
+
+                        Rect2f prior = { cx, cy, s_kx, s_ky };
+                        priors.push_back(prior);
+                    }
+                }
+            }
+        }
+    }
+
+    Mat postProcess(const std::vector<Mat>& output_blobs)
+    {
+        // Extract from output_blobs
+        Mat loc = output_blobs[0];
+        Mat conf = output_blobs[1];
+        Mat iou = output_blobs[2];
+
+        // Decode from deltas and priors
+        const std::vector<float> variance = {0.1f, 0.2f};
+        float* loc_v = (float*)(loc.data);
+        float* conf_v = (float*)(conf.data);
+        float* iou_v = (float*)(iou.data);
+        Mat faces;
+        // (tl_x, tl_y, w, h, re_x, re_y, le_x, le_y, nt_x, nt_y, rcm_x, rcm_y, lcm_x, lcm_y, score)
+        // 'tl': top left point of the bounding box
+        // 're': right eye, 'le': left eye
+        // 'nt':  nose tip
+        // 'rcm': right corner of mouth, 'lcm': left corner of mouth
+        Mat face(1, 15, CV_32FC1);
+        for (size_t i = 0; i < priors.size(); ++i) {
+            // Get score
+            float clsScore = conf_v[i*2+1];
+            float iouScore = iou_v[i];
+            // Clamp
+            if (iouScore < 0.f) {
+                iouScore = 0.f;
+            }
+            else if (iouScore > 1.f) {
+                iouScore = 1.f;
+            }
+            float score = std::sqrt(clsScore * iouScore);
+            face.at<float>(0, 14) = score;
+
+            // Get bounding box
+            float cx = (priors[i].x + loc_v[i*14+0] * variance[0] * priors[i].width)  * inputW;
+            float cy = (priors[i].y + loc_v[i*14+1] * variance[0] * priors[i].height) * inputH;
+            float w  = priors[i].width  * exp(loc_v[i*14+2] * variance[0]) * inputW;
+            float h  = priors[i].height * exp(loc_v[i*14+3] * variance[1]) * inputH;
+            float x1 = cx - w / 2;
+            float y1 = cy - h / 2;
+            face.at<float>(0, 0) = x1;
+            face.at<float>(0, 1) = y1;
+            face.at<float>(0, 2) = w;
+            face.at<float>(0, 3) = h;
+
+            // Get landmarks
+            face.at<float>(0, 4) = (priors[i].x + loc_v[i*14+ 4] * variance[0] * priors[i].width)  * inputW;  // right eye, x
+            face.at<float>(0, 5) = (priors[i].y + loc_v[i*14+ 5] * variance[0] * priors[i].height) * inputH;  // right eye, y
+            face.at<float>(0, 6) = (priors[i].x + loc_v[i*14+ 6] * variance[0] * priors[i].width)  * inputW;  // left eye, x
+            face.at<float>(0, 7) = (priors[i].y + loc_v[i*14+ 7] * variance[0] * priors[i].height) * inputH;  // left eye, y
+            face.at<float>(0, 8) = (priors[i].x + loc_v[i*14+ 8] * variance[0] * priors[i].width)  * inputW;  // nose tip, x
+            face.at<float>(0, 9) = (priors[i].y + loc_v[i*14+ 9] * variance[0] * priors[i].height) * inputH;  // nose tip, y
+            face.at<float>(0, 10) = (priors[i].x + loc_v[i*14+10] * variance[0] * priors[i].width)  * inputW; // right corner of mouth, x
+            face.at<float>(0, 11) = (priors[i].y + loc_v[i*14+11] * variance[0] * priors[i].height) * inputH; // right corner of mouth, y
+            face.at<float>(0, 12) = (priors[i].x + loc_v[i*14+12] * variance[0] * priors[i].width)  * inputW; // left corner of mouth, x
+            face.at<float>(0, 13) = (priors[i].y + loc_v[i*14+13] * variance[0] * priors[i].height) * inputH; // left corner of mouth, y
+
+            faces.push_back(face);
+        }
+
+        if (faces.rows > 1)
+        {
+            // Retrieve boxes and scores
+            std::vector<Rect2i> faceBoxes;
+            std::vector<float> faceScores;
+            for (int rIdx = 0; rIdx < faces.rows; rIdx++)
+            {
+                faceBoxes.push_back(Rect2i(int(faces.at<float>(rIdx, 0)),
+                                           int(faces.at<float>(rIdx, 1)),
+                                           int(faces.at<float>(rIdx, 2)),
+                                           int(faces.at<float>(rIdx, 3))));
+                faceScores.push_back(faces.at<float>(rIdx, 14));
+            }
+
+            std::vector<int> keepIdx;
+            dnn::NMSBoxes(faceBoxes, faceScores, scoreThreshold, nmsThreshold, keepIdx, 1.f, topK);
+
+            // Get NMS results
+            Mat nms_faces;
+            for (int idx: keepIdx)
+            {
+                nms_faces.push_back(faces.row(idx));
+            }
+            return nms_faces;
+        }
+        else
+        {
+            return faces;
+        }
+    }
+private:
+    dnn::Net net;
+
+    int inputW;
+    int inputH;
+    float scoreThreshold;
+    float nmsThreshold;
+    int topK;
+
+    std::vector<Rect2f> priors;
+};
+
+Ptr<FaceDetectorYN> FaceDetectorYN::create(const String& model,
+                                           const String& config,
+                                           const Size& input_size,
+                                           const float score_threshold,
+                                           const float nms_threshold,
+                                           const int top_k,
+                                           const int backend_id,
+                                           const int target_id)
+{
+    return makePtr<FaceDetectorYNImpl>(model, config, input_size, score_threshold, nms_threshold, top_k, backend_id, target_id);
+}
+
+} // namespace cv
diff --git a/modules/objdetect/src/face_recognize.cpp b/modules/objdetect/src/face_recognize.cpp
new file mode 100644 (file)
index 0000000..6550a13
--- /dev/null
@@ -0,0 +1,182 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+
+#include "opencv2/dnn.hpp"
+
+#include <algorithm>
+
+namespace cv
+{
+
+class FaceRecognizerSFImpl : public FaceRecognizerSF
+{
+public:
+    FaceRecognizerSFImpl(const String& model, const String& config, int backend_id, int target_id)
+    {
+        net = dnn::readNet(model, config);
+        CV_Assert(!net.empty());
+
+        net.setPreferableBackend(backend_id);
+        net.setPreferableTarget(target_id);
+    };
+    void alignCrop(InputArray _src_img, InputArray _face_mat, OutputArray _aligned_img) const override
+    {
+        Mat face_mat = _face_mat.getMat();
+        float src_point[5][2];
+        for (int row = 0; row < 5; ++row)
+        {
+            for(int col = 0; col < 2; ++col)
+            {
+                src_point[row][col] = face_mat.at<float>(0, row*2+col+4);
+            }
+        }
+        Mat warp_mat = getSimilarityTransformMatrix(src_point);
+        warpAffine(_src_img, _aligned_img, warp_mat, Size(112, 112), INTER_LINEAR);
+    };
+    void feature(InputArray _aligned_img, OutputArray _face_feature) override
+    {
+        Mat inputBolb = dnn::blobFromImage(_aligned_img, 1, Size(112, 112), Scalar(0, 0, 0), true, false);
+        net.setInput(inputBolb);
+        net.forward(_face_feature);
+    };
+    double match(InputArray _face_feature1, InputArray _face_feature2, int dis_type) const override
+    {
+        Mat face_feature1 = _face_feature1.getMat(), face_feature2 = _face_feature2.getMat();
+        face_feature1 /= norm(face_feature1);
+        face_feature2 /= norm(face_feature2);
+
+        if(dis_type == DisType::FR_COSINE){
+            return sum(face_feature1.mul(face_feature2))[0];
+        }else if(dis_type == DisType::FR_NORM_L2){
+            return norm(face_feature1, face_feature2);
+        }else{
+            throw std::invalid_argument("invalid parameter " + std::to_string(dis_type));
+        }
+
+    };
+
+private:
+    Mat getSimilarityTransformMatrix(float src[5][2]) const {
+        float dst[5][2] = { {38.2946f, 51.6963f}, {73.5318f, 51.5014f}, {56.0252f, 71.7366f}, {41.5493f, 92.3655f}, {70.7299f, 92.2041f} };
+        float avg0 = (src[0][0] + src[1][0] + src[2][0] + src[3][0] + src[4][0]) / 5;
+        float avg1 = (src[0][1] + src[1][1] + src[2][1] + src[3][1] + src[4][1]) / 5;
+        //Compute mean of src and dst.
+        float src_mean[2] = { avg0, avg1 };
+        float dst_mean[2] = { 56.0262f, 71.9008f };
+        //Subtract mean from src and dst.
+        float src_demean[5][2];
+        for (int i = 0; i < 2; i++)
+        {
+            for (int j = 0; j < 5; j++)
+            {
+                src_demean[j][i] = src[j][i] - src_mean[i];
+            }
+        }
+        float dst_demean[5][2];
+        for (int i = 0; i < 2; i++)
+        {
+            for (int j = 0; j < 5; j++)
+            {
+                dst_demean[j][i] = dst[j][i] - dst_mean[i];
+            }
+        }
+        double A00 = 0.0, A01 = 0.0, A10 = 0.0, A11 = 0.0;
+        for (int i = 0; i < 5; i++)
+            A00 += dst_demean[i][0] * src_demean[i][0];
+        A00 = A00 / 5;
+        for (int i = 0; i < 5; i++)
+            A01 += dst_demean[i][0] * src_demean[i][1];
+        A01 = A01 / 5;
+        for (int i = 0; i < 5; i++)
+            A10 += dst_demean[i][1] * src_demean[i][0];
+        A10 = A10 / 5;
+        for (int i = 0; i < 5; i++)
+            A11 += dst_demean[i][1] * src_demean[i][1];
+        A11 = A11 / 5;
+        Mat A = (Mat_<double>(2, 2) << A00, A01, A10, A11);
+        double d[2] = { 1.0, 1.0 };
+        double detA = A00 * A11 - A01 * A10;
+        if (detA < 0)
+            d[1] = -1;
+        double T[3][3] = { {1.0, 0.0, 0.0}, {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0} };
+        Mat s, u, vt, v;
+        SVD::compute(A, s, u, vt);
+        double smax = s.ptr<double>(0)[0]>s.ptr<double>(1)[0] ? s.ptr<double>(0)[0] : s.ptr<double>(1)[0];
+        double tol = smax * 2 * FLT_MIN;
+        int rank = 0;
+        if (s.ptr<double>(0)[0]>tol)
+            rank += 1;
+        if (s.ptr<double>(1)[0]>tol)
+            rank += 1;
+        double arr_u[2][2] = { {u.ptr<double>(0)[0], u.ptr<double>(0)[1]}, {u.ptr<double>(1)[0], u.ptr<double>(1)[1]} };
+        double arr_vt[2][2] = { {vt.ptr<double>(0)[0], vt.ptr<double>(0)[1]}, {vt.ptr<double>(1)[0], vt.ptr<double>(1)[1]} };
+        double det_u = arr_u[0][0] * arr_u[1][1] - arr_u[0][1] * arr_u[1][0];
+        double det_vt = arr_vt[0][0] * arr_vt[1][1] - arr_vt[0][1] * arr_vt[1][0];
+        if (rank == 1)
+        {
+            if ((det_u*det_vt) > 0)
+            {
+                Mat uvt = u*vt;
+                T[0][0] = uvt.ptr<double>(0)[0];
+                T[0][1] = uvt.ptr<double>(0)[1];
+                T[1][0] = uvt.ptr<double>(1)[0];
+                T[1][1] = uvt.ptr<double>(1)[1];
+            }
+            else
+            {
+                double temp = d[1];
+                d[1] = -1;
+                Mat D = (Mat_<double>(2, 2) << d[0], 0.0, 0.0, d[1]);
+                Mat Dvt = D*vt;
+                Mat uDvt = u*Dvt;
+                T[0][0] = uDvt.ptr<double>(0)[0];
+                T[0][1] = uDvt.ptr<double>(0)[1];
+                T[1][0] = uDvt.ptr<double>(1)[0];
+                T[1][1] = uDvt.ptr<double>(1)[1];
+                d[1] = temp;
+            }
+        }
+        else
+        {
+            Mat D = (Mat_<double>(2, 2) << d[0], 0.0, 0.0, d[1]);
+            Mat Dvt = D*vt;
+            Mat uDvt = u*Dvt;
+            T[0][0] = uDvt.ptr<double>(0)[0];
+            T[0][1] = uDvt.ptr<double>(0)[1];
+            T[1][0] = uDvt.ptr<double>(1)[0];
+            T[1][1] = uDvt.ptr<double>(1)[1];
+        }
+        double var1 = 0.0;
+        for (int i = 0; i < 5; i++)
+            var1 += src_demean[i][0] * src_demean[i][0];
+        var1 = var1 / 5;
+        double var2 = 0.0;
+        for (int i = 0; i < 5; i++)
+            var2 += src_demean[i][1] * src_demean[i][1];
+        var2 = var2 / 5;
+        double scale = 1.0 / (var1 + var2)* (s.ptr<double>(0)[0] * d[0] + s.ptr<double>(1)[0] * d[1]);
+        double TS[2];
+        TS[0] = T[0][0] * src_mean[0] + T[0][1] * src_mean[1];
+        TS[1] = T[1][0] * src_mean[0] + T[1][1] * src_mean[1];
+        T[0][2] = dst_mean[0] - scale*TS[0];
+        T[1][2] = dst_mean[1] - scale*TS[1];
+        T[0][0] *= scale;
+        T[0][1] *= scale;
+        T[1][0] *= scale;
+        T[1][1] *= scale;
+        Mat transform_mat = (Mat_<double>(2, 3) << T[0][0], T[0][1], T[0][2], T[1][0], T[1][1], T[1][2]);
+        return transform_mat;
+    }
+private:
+    dnn::Net net;
+};
+
+Ptr<FaceRecognizerSF> FaceRecognizerSF::create(const String& model, const String& config, int backend_id, int target_id)
+{
+    return makePtr<FaceRecognizerSFImpl>(model, config, backend_id, target_id);
+}
+
+} // namespace cv
diff --git a/modules/objdetect/test/test_face.cpp b/modules/objdetect/test/test_face.cpp
new file mode 100644 (file)
index 0000000..2e944c5
--- /dev/null
@@ -0,0 +1,219 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+// label format:
+//   image_name
+//   num_face
+//   face_1
+//   face_..
+//   face_num
+std::map<std::string, Mat> blobFromTXT(const std::string& path, int numCoords)
+{
+    std::ifstream ifs(path.c_str());
+    CV_Assert(ifs.is_open());
+
+    std::map<std::string, Mat> gt;
+
+    Mat faces;
+    int faceNum = -1;
+    int faceCount = 0;
+    for (std::string line, key; getline(ifs, line); )
+    {
+        std::istringstream iss(line);
+        if (line.find(".png") != std::string::npos)
+        {
+            // Get filename
+            iss >> key;
+        }
+        else if (line.find(" ") == std::string::npos)
+        {
+            // Get the number of faces
+            iss >> faceNum;
+        }
+        else
+        {
+            // Get faces
+            Mat face(1, numCoords, CV_32FC1);
+            for (int j = 0; j < numCoords; j++)
+            {
+                iss >> face.at<float>(0, j);
+            }
+            faces.push_back(face);
+            faceCount++;
+        }
+
+        if (faceCount == faceNum)
+        {
+            // Store faces
+            gt[key] = faces;
+
+            faces.release();
+            faceNum = -1;
+            faceCount = 0;
+        }
+    }
+
+    return gt;
+}
+
+TEST(Objdetect_face_detection, regression)
+{
+    // Pre-set params
+    float scoreThreshold = 0.7f;
+    float matchThreshold = 0.9f;
+    float l2disThreshold = 5.0f;
+    int numLM = 5;
+    int numCoords = 4 + 2 * numLM;
+
+    // Load ground truth labels
+    std::map<std::string, Mat> gt = blobFromTXT(findDataFile("dnn_face/detection/cascades_labels.txt"), numCoords);
+    // for (auto item: gt)
+    // {
+    //     std::cout << item.first << " " << item.second.size() << std::endl;
+    // }
+
+    // Initialize detector
+    std::string model = findDataFile("dnn/onnx/models/yunet-202109.onnx", false);
+    Ptr<FaceDetectorYN> faceDetector = FaceDetectorYN::create(model, "", Size(300, 300));
+    faceDetector->setScoreThreshold(0.7f);
+
+    // Detect and match
+    for (auto item: gt)
+    {
+        std::string imagePath = findDataFile("cascadeandhog/images/" + item.first);
+        Mat image = imread(imagePath);
+
+        // Set input size
+        faceDetector->setInputSize(image.size());
+
+        // Run detection
+        Mat faces;
+        faceDetector->detect(image, faces);
+        // std::cout << item.first << " " << item.second.rows << " " << faces.rows << std::endl;
+
+        // Match bboxes and landmarks
+        std::vector<bool> matchedItem(item.second.rows, false);
+        for (int i = 0; i < faces.rows; i++)
+        {
+            if (faces.at<float>(i, numCoords) < scoreThreshold)
+                continue;
+
+            bool boxMatched = false;
+            std::vector<bool> lmMatched(numLM, false);
+            cv::Rect2f resBox(faces.at<float>(i, 0), faces.at<float>(i, 1), faces.at<float>(i, 2), faces.at<float>(i, 3));
+            for (int j = 0; j < item.second.rows && !boxMatched; j++)
+            {
+                if (matchedItem[j])
+                    continue;
+
+                // Retrieve bbox and compare IoU
+                cv::Rect2f gtBox(item.second.at<float>(j, 0), item.second.at<float>(j, 1), item.second.at<float>(j, 2), item.second.at<float>(j, 3));
+                double interArea = (resBox & gtBox).area();
+                double iou = interArea / (resBox.area() + gtBox.area() - interArea);
+                if (iou >= matchThreshold)
+                {
+                    boxMatched = true;
+                    matchedItem[j] = true;
+                }
+
+                // Match landmarks if bbox is matched
+                if (!boxMatched)
+                    continue;
+                for (int lmIdx = 0; lmIdx < numLM; lmIdx++)
+                {
+                    float gtX = item.second.at<float>(j, 4 + 2 * lmIdx);
+                    float gtY = item.second.at<float>(j, 4 + 2 * lmIdx + 1);
+                    float resX = faces.at<float>(i, 4 + 2 * lmIdx);
+                    float resY = faces.at<float>(i, 4 + 2 * lmIdx + 1);
+                    float l2dis = cv::sqrt((gtX - resX) * (gtX - resX) + (gtY - resY) * (gtY - resY));
+
+                    if (l2dis <= l2disThreshold)
+                    {
+                        lmMatched[lmIdx] = true;
+                    }
+                }
+            }
+            EXPECT_TRUE(boxMatched) << "In image " << item.first << ", cannot match resBox " << resBox << " with any ground truth.";
+            if (boxMatched)
+            {
+                EXPECT_TRUE(std::all_of(lmMatched.begin(), lmMatched.end(), [](bool v) { return v; })) << "In image " << item.first << ", resBox " << resBox << " matched but its landmarks failed to match.";
+            }
+        }
+    }
+}
+
+TEST(Objdetect_face_recognition, regression)
+{
+    // Pre-set params
+    float score_thresh = 0.9f;
+    float nms_thresh = 0.3f;
+    double cosine_similar_thresh = 0.363;
+    double l2norm_similar_thresh = 1.128;
+
+    // Load ground truth labels
+    std::ifstream ifs(findDataFile("dnn_face/recognition/cascades_label.txt").c_str());
+    CV_Assert(ifs.is_open());
+
+    std::set<std::string> fSet;
+    std::map<std::string, Mat> featureMap;
+    std::map<std::pair<std::string, std::string>, int> gtMap;
+
+
+    for (std::string line, key; getline(ifs, line);)
+    {
+        std::string fname1, fname2;
+        int label;
+        std::istringstream iss(line);
+        iss>>fname1>>fname2>>label;
+        // std::cout<<fname1<<" "<<fname2<<" "<<label<<std::endl;
+
+        fSet.insert(fname1);
+        fSet.insert(fname2);
+        gtMap[std::make_pair(fname1, fname2)] = label;
+    }
+
+    // Initialize detector
+    std::string detect_model = findDataFile("dnn/onnx/models/yunet-202109.onnx", false);
+    Ptr<FaceDetectorYN> faceDetector = FaceDetectorYN::create(detect_model, "", Size(150, 150), score_thresh, nms_thresh);
+
+    std::string recog_model = findDataFile("dnn/onnx/models/face_recognizer_fast.onnx", false);
+    Ptr<FaceRecognizerSF> faceRecognizer = FaceRecognizerSF::create(recog_model, "");
+
+    // Detect and match
+    for (auto fname: fSet)
+    {
+        std::string imagePath = findDataFile("dnn_face/recognition/" + fname);
+        Mat image = imread(imagePath);
+
+        Mat faces;
+        faceDetector->detect(image, faces);
+
+        Mat aligned_face;
+        faceRecognizer->alignCrop(image, faces.row(0), aligned_face);
+
+        Mat feature;
+        faceRecognizer->feature(aligned_face, feature);
+
+        featureMap[fname] = feature.clone();
+    }
+
+    for (auto item: gtMap)
+    {
+        Mat feature1 = featureMap[item.first.first];
+        Mat feature2 = featureMap[item.first.second];
+        int label = item.second;
+
+        double cos_score = faceRecognizer->match(feature1, feature2, FaceRecognizerSF::DisType::FR_COSINE);
+        double L2_score = faceRecognizer->match(feature1, feature2, FaceRecognizerSF::DisType::FR_NORM_L2);
+
+        EXPECT_TRUE(label == 0 ? cos_score <= cosine_similar_thresh : cos_score > cosine_similar_thresh) << "Cosine match result of images " << item.first.first << " and " << item.first.second << " is different from ground truth (score: "<< cos_score <<";Thresh: "<< cosine_similar_thresh <<").";
+        EXPECT_TRUE(label == 0 ? L2_score > l2norm_similar_thresh : L2_score <= l2norm_similar_thresh) << "L2norm match result of images " << item.first.first << " and " << item.first.second << " is different from ground truth (score: "<< L2_score <<";Thresh: "<< l2norm_similar_thresh <<").";
+    }
+}
+
+}} // namespace
index 93e4d28..4031f05 100644 (file)
@@ -7,4 +7,19 @@
     #include <hpx/hpx_main.hpp>
 #endif
 
-CV_TEST_MAIN("cv")
+static
+void initTests()
+{
+#ifdef HAVE_OPENCV_DNN
+    const char* extraTestDataPath =
+#ifdef WINRT
+        NULL;
+#else
+        getenv("OPENCV_DNN_TEST_DATA_PATH");
+#endif
+    if (extraTestDataPath)
+        cvtest::addDataSearchPath(extraTestDataPath);
+#endif  // HAVE_OPENCV_DNN
+}
+
+CV_TEST_MAIN("cv", initTests())
index 394bc6e..2e7a241 100644 (file)
@@ -37,6 +37,7 @@
 #include <iterator>
 #include <limits>
 #include <algorithm>
+#include <set>
 
 
 #ifndef OPENCV_32BIT_CONFIGURATION
index 209fbb5..9a1aeed 100644 (file)
@@ -4,6 +4,7 @@ set(OPENCV_DNN_SAMPLES_REQUIRED_DEPS
   opencv_core
   opencv_imgproc
   opencv_dnn
+  opencv_objdetect
   opencv_video
   opencv_imgcodecs
   opencv_videoio
diff --git a/samples/dnn/face_detect.cpp b/samples/dnn/face_detect.cpp
new file mode 100644 (file)
index 0000000..8d91a10
--- /dev/null
@@ -0,0 +1,132 @@
+#include <opencv2/dnn.hpp>
+#include <opencv2/imgproc.hpp>
+#include <opencv2/highgui.hpp>
+#include <opencv2/objdetect.hpp>
+
+#include <iostream>
+
+using namespace cv;
+using namespace std;
+
+static Mat visualize(Mat input, Mat faces, int thickness=2)
+{
+    Mat output = input.clone();
+    for (int i = 0; i < faces.rows; i++)
+    {
+        // Print results
+        cout << "Face " << i
+             << ", top-left coordinates: (" << faces.at<float>(i, 0) << ", " << faces.at<float>(i, 1) << "), "
+             << "box width: " << faces.at<float>(i, 2)  << ", box height: " << faces.at<float>(i, 3) << ", "
+             << "score: " << faces.at<float>(i, 14) << "\n";
+
+        // Draw bounding box
+        rectangle(output, Rect2i(int(faces.at<float>(i, 0)), int(faces.at<float>(i, 1)), int(faces.at<float>(i, 2)), int(faces.at<float>(i, 3))), Scalar(0, 255, 0), thickness);
+        // Draw landmarks
+        circle(output, Point2i(int(faces.at<float>(i, 4)),  int(faces.at<float>(i, 5))),  2, Scalar(255,   0,   0), thickness);
+        circle(output, Point2i(int(faces.at<float>(i, 6)),  int(faces.at<float>(i, 7))),  2, Scalar(  0,   0, 255), thickness);
+        circle(output, Point2i(int(faces.at<float>(i, 8)),  int(faces.at<float>(i, 9))),  2, Scalar(  0, 255,   0), thickness);
+        circle(output, Point2i(int(faces.at<float>(i, 10)), int(faces.at<float>(i, 11))), 2, Scalar(255,   0, 255), thickness);
+        circle(output, Point2i(int(faces.at<float>(i, 12)), int(faces.at<float>(i, 13))), 2, Scalar(  0, 255, 255), thickness);
+    }
+    return output;
+}
+
+int main(int argc, char ** argv)
+{
+    CommandLineParser parser(argc, argv,
+        "{help  h           |            | Print this message.}"
+        "{input i           |            | Path to the input image. Omit for detecting on default camera.}"
+        "{model m           | yunet.onnx | Path to the model. Download yunet.onnx in https://github.com/ShiqiYu/libfacedetection.train/tree/master/tasks/task1/onnx.}"
+        "{score_threshold   | 0.9        | Filter out faces of score < score_threshold.}"
+        "{nms_threshold     | 0.3        | Suppress bounding boxes of iou >= nms_threshold.}"
+        "{top_k             | 5000       | Keep top_k bounding boxes before NMS.}"
+        "{save  s           | false      | Set true to save results. This flag is invalid when using camera.}"
+        "{vis   v           | true       | Set true to open a window for result visualization. This flag is invalid when using camera.}"
+    );
+    if (argc == 1 || parser.has("help"))
+    {
+        parser.printMessage();
+        return -1;
+    }
+
+    String modelPath = parser.get<String>("model");
+
+    float scoreThreshold = parser.get<float>("score_threshold");
+    float nmsThreshold = parser.get<float>("nms_threshold");
+    int topK = parser.get<int>("top_k");
+
+    bool save = parser.get<bool>("save");
+    bool vis = parser.get<bool>("vis");
+
+    // Initialize FaceDetectorYN
+    Ptr<FaceDetectorYN> detector = FaceDetectorYN::create(modelPath, "", Size(320, 320), scoreThreshold, nmsThreshold, topK);
+
+    // If input is an image
+    if (parser.has("input"))
+    {
+        String input = parser.get<String>("input");
+        Mat image = imread(input);
+
+        // Set input size before inference
+        detector->setInputSize(image.size());
+
+        // Inference
+        Mat faces;
+        detector->detect(image, faces);
+
+        // Draw results on the input image
+        Mat result = visualize(image, faces);
+
+        // Save results if save is true
+        if(save)
+        {
+            cout << "Results saved to result.jpg\n";
+            imwrite("result.jpg", result);
+        }
+
+        // Visualize results
+        if (vis)
+        {
+            namedWindow(input, WINDOW_AUTOSIZE);
+            imshow(input, result);
+            waitKey(0);
+        }
+    }
+    else
+    {
+        int deviceId = 0;
+        VideoCapture cap;
+        cap.open(deviceId, CAP_ANY);
+        int frameWidth = int(cap.get(CAP_PROP_FRAME_WIDTH));
+        int frameHeight = int(cap.get(CAP_PROP_FRAME_HEIGHT));
+        detector->setInputSize(Size(frameWidth, frameHeight));
+
+        Mat frame;
+        TickMeter tm;
+        String msg = "FPS: ";
+        while(waitKey(1) < 0) // Press any key to exit
+        {
+            // Get frame
+            if (!cap.read(frame))
+            {
+                cerr << "No frames grabbed!\n";
+                break;
+            }
+
+            // Inference
+            Mat faces;
+            tm.start();
+            detector->detect(frame, faces);
+            tm.stop();
+
+            // Draw results on the input image
+            Mat result = visualize(frame, faces);
+            putText(result, msg + to_string(tm.getFPS()), Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0));
+
+            // Visualize results
+            imshow("Live", result);
+
+            tm.reset();
+        }
+    }
+}
\ No newline at end of file
diff --git a/samples/dnn/face_detect.py b/samples/dnn/face_detect.py
new file mode 100644 (file)
index 0000000..65069d6
--- /dev/null
@@ -0,0 +1,101 @@
+import argparse
+
+import numpy as np
+import cv2 as cv
+
+def str2bool(v):
+    if v.lower() in ['on', 'yes', 'true', 'y', 't']:
+        return True
+    elif v.lower() in ['off', 'no', 'false', 'n', 'f']:
+        return False
+    else:
+        raise NotImplementedError
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--input', '-i', type=str, help='Path to the input image.')
+parser.add_argument('--model', '-m', type=str, default='yunet.onnx', help='Path to the model. Download the model at https://github.com/ShiqiYu/libfacedetection.train/tree/master/tasks/task1/onnx.')
+parser.add_argument('--score_threshold', type=float, default=0.9, help='Filtering out faces of score < score_threshold.')
+parser.add_argument('--nms_threshold', type=float, default=0.3, help='Suppress bounding boxes of iou >= nms_threshold.')
+parser.add_argument('--top_k', type=int, default=5000, help='Keep top_k bounding boxes before NMS.')
+parser.add_argument('--save', '-s', type=str2bool, default=False, help='Set true to save results. This flag is invalid when using camera.')
+parser.add_argument('--vis', '-v', type=str2bool, default=True, help='Set true to open a window for result visualization. This flag is invalid when using camera.')
+args = parser.parse_args()
+
+def visualize(input, faces, thickness=2):
+    output = input.copy()
+    if faces[1] is not None:
+        for idx, face in enumerate(faces[1]):
+            print('Face {}, top-left coordinates: ({:.0f}, {:.0f}), box width: {:.0f}, box height {:.0f}, score: {:.2f}'.format(idx, face[0], face[1], face[2], face[3], face[-1]))
+
+            coords = face[:-1].astype(np.int32)
+            cv.rectangle(output, (coords[0], coords[1]), (coords[0]+coords[2], coords[1]+coords[3]), (0, 255, 0), 2)
+            cv.circle(output, (coords[4], coords[5]), 2, (255, 0, 0), 2)
+            cv.circle(output, (coords[6], coords[7]), 2, (0, 0, 255), 2)
+            cv.circle(output, (coords[8], coords[9]), 2, (0, 255, 0), 2)
+            cv.circle(output, (coords[10], coords[11]), 2, (255, 0, 255), 2)
+            cv.circle(output, (coords[12], coords[13]), 2, (0, 255, 255), 2)
+    return output
+
+if __name__ == '__main__':
+
+    # Instantiate FaceDetectorYN
+    detector = cv.FaceDetectorYN.create(
+        args.model,
+        "",
+        (320, 320),
+        args.score_threshold,
+        args.nms_threshold,
+        args.top_k
+    )
+
+    # If input is an image
+    if args.input is not None:
+        image = cv.imread(args.input)
+
+        # Set input size before inference
+        detector.setInputSize((image.shape[1], image.shape[0]))
+
+        # Inference
+        faces = detector.detect(image)
+
+        # Draw results on the input image
+        result = visualize(image, faces)
+
+        # Save results if save is true
+        if args.save:
+            print('Resutls saved to result.jpg\n')
+            cv.imwrite('result.jpg', result)
+
+        # Visualize results in a new window
+        if args.vis:
+            cv.namedWindow(args.input, cv.WINDOW_AUTOSIZE)
+            cv.imshow(args.input, result)
+            cv.waitKey(0)
+    else: # Omit input to call default camera
+        deviceId = 0
+        cap = cv.VideoCapture(deviceId)
+        frameWidth = int(cap.get(cv.CAP_PROP_FRAME_WIDTH))
+        frameHeight = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT))
+        detector.setInputSize([frameWidth, frameHeight])
+
+        tm = cv.TickMeter()
+        while cv.waitKey(1) < 0:
+            hasFrame, frame = cap.read()
+            if not hasFrame:
+                print('No frames grabbed!')
+                break
+
+            # Inference
+            tm.start()
+            faces = detector.detect(frame) # faces is a tuple
+            tm.stop()
+
+            # Draw results on the input image
+            frame = visualize(frame, faces)
+
+            cv.putText(frame, 'FPS: {}'.format(tm.getFPS()), (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0))
+
+            # Visualize results in a new Window
+            cv.imshow('Live', frame)
+
+            tm.reset()
\ No newline at end of file
diff --git a/samples/dnn/face_match.cpp b/samples/dnn/face_match.cpp
new file mode 100644 (file)
index 0000000..f24134b
--- /dev/null
@@ -0,0 +1,103 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "opencv2/dnn.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/highgui.hpp"
+
+#include <iostream>
+
+#include "opencv2/objdetect.hpp"
+
+
+using namespace cv;
+using namespace std;
+
+
+int main(int argc, char ** argv)
+{
+    if (argc != 5)
+    {
+        std::cerr << "Usage " << argv[0] << ": "
+                  << "<det_onnx_path> "
+                  << "<reg_onnx_path> "
+                  << "<image1>"
+                  << "<image2>\n";
+        return -1;
+    }
+
+    String det_onnx_path = argv[1];
+    String reg_onnx_path = argv[2];
+    String image1_path = argv[3];
+    String image2_path = argv[4];
+    std::cout<<image1_path<<" "<<image2_path<<std::endl;
+    Mat image1 = imread(image1_path);
+    Mat image2 = imread(image2_path);
+
+    float score_thresh = 0.9f;
+    float nms_thresh = 0.3f;
+    double cosine_similar_thresh = 0.363;
+    double l2norm_similar_thresh = 1.128;
+    int top_k = 5000;
+
+    // Initialize FaceDetector
+    Ptr<FaceDetectorYN> faceDetector;
+
+    faceDetector = FaceDetectorYN::create(det_onnx_path, "", image1.size(), score_thresh, nms_thresh, top_k);
+    Mat faces_1;
+    faceDetector->detect(image1, faces_1);
+    if (faces_1.rows < 1)
+    {
+        std::cerr << "Cannot find a face in " << image1_path << "\n";
+        return -1;
+    }
+
+    faceDetector = FaceDetectorYN::create(det_onnx_path, "", image2.size(), score_thresh, nms_thresh, top_k);
+    Mat faces_2;
+    faceDetector->detect(image2, faces_2);
+    if (faces_2.rows < 1)
+    {
+        std::cerr << "Cannot find a face in " << image2_path << "\n";
+        return -1;
+    }
+
+    // Initialize FaceRecognizerSF
+    Ptr<FaceRecognizerSF> faceRecognizer = FaceRecognizerSF::create(reg_onnx_path, "");
+
+
+    Mat aligned_face1, aligned_face2;
+    faceRecognizer->alignCrop(image1, faces_1.row(0), aligned_face1);
+    faceRecognizer->alignCrop(image2, faces_2.row(0), aligned_face2);
+
+    Mat feature1, feature2;
+    faceRecognizer->feature(aligned_face1, feature1);
+    feature1 = feature1.clone();
+    faceRecognizer->feature(aligned_face2, feature2);
+    feature2 = feature2.clone();
+
+    double cos_score = faceRecognizer->match(feature1, feature2, FaceRecognizerSF::DisType::FR_COSINE);
+    double L2_score = faceRecognizer->match(feature1, feature2, FaceRecognizerSF::DisType::FR_NORM_L2);
+
+    if(cos_score >= cosine_similar_thresh)
+    {
+        std::cout << "They have the same identity;";
+    }
+    else
+    {
+        std::cout << "They have different identities;";
+    }
+    std::cout << " Cosine Similarity: " << cos_score << ", threshold: " << cosine_similar_thresh << ". (higher value means higher similarity, max 1.0)\n";
+
+    if(L2_score <= l2norm_similar_thresh)
+    {
+        std::cout << "They have the same identity;";
+    }
+    else
+    {
+        std::cout << "They have different identities.";
+    }
+    std::cout << " NormL2 Distance: " << L2_score << ", threshold: " << l2norm_similar_thresh << ". (lower value means higher similarity, min 0.0)\n";
+
+    return 0;
+}
diff --git a/samples/dnn/face_match.py b/samples/dnn/face_match.py
new file mode 100644 (file)
index 0000000..b36c9f6
--- /dev/null
@@ -0,0 +1,57 @@
+import argparse
+
+import numpy as np
+import cv2 as cv
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--input1', '-i1', type=str, help='Path to the input image1.')
+parser.add_argument('--input2', '-i2', type=str, help='Path to the input image2.')
+parser.add_argument('--face_detection_model', '-fd', type=str, help='Path to the face detection model. Download the model at https://github.com/ShiqiYu/libfacedetection.train/tree/master/tasks/task1/onnx.')
+parser.add_argument('--face_recognition_model', '-fr', type=str, help='Path to the face recognition model. Download the model at https://drive.google.com/file/d/1ClK9WiB492c5OZFKveF3XiHCejoOxINW/view.')
+args = parser.parse_args()
+
+# Read the input image
+img1 = cv.imread(args.input1)
+img2 = cv.imread(args.input2)
+
+# Instantiate face detector and recognizer
+detector = cv.FaceDetectorYN.create(
+    args.face_detection_model,
+    "",
+    (img1.shape[1], img1.shape[0])
+)
+recognizer = cv.FaceRecognizerSF.create(
+    args.face_recognition_model,
+    ""
+)
+
+# Detect face
+detector.setInputSize((img1.shape[1], img1.shape[0]))
+face1 = detector.detect(img1)
+detector.setInputSize((img2.shape[1], img2.shape[0]))
+face2 = detector.detect(img2)
+assert face1[1].shape[0] > 0, 'Cannot find a face in {}'.format(args.input1)
+assert face2[1].shape[0] > 0, 'Cannot find a face in {}'.format(args.input2)
+
+# Align faces
+face1_align = recognizer.alignCrop(img1, face1[1][0])
+face2_align = recognizer.alignCrop(img2, face2[1][0])
+
+# Extract features
+face1_feature = recognizer.faceFeature(face1_align)
+face2_feature = recognizer.faceFeature(face2_align)
+
+# Calculate distance (0: cosine, 1: L2)
+cosine_similarity_threshold = 0.363
+cosine_score = recognizer.faceMatch(face1_feature, face2_feature, 0)
+msg = 'different identities'
+if cosine_score >= cosine_similarity_threshold:
+    msg = 'the same identity'
+print('They have {}. Cosine Similarity: {}, threshold: {} (higher value means higher similarity, max 1.0).'.format(msg, cosine_score, cosine_similarity_threshold))
+
+l2_similarity_threshold = 1.128
+l2_score = recognizer.faceMatch(face1_feature, face2_feature, 1)
+msg = 'different identities'
+if l2_score <= l2_similarity_threshold:
+    msg = 'the same identity'
+print('They have {}. NormL2 Distance: {}, threshold: {} (lower value means higher similarity, min 0.0).'.format(msg, l2_score, l2_similarity_threshold))
\ No newline at end of file
diff --git a/samples/dnn/results/audrybt1.jpg b/samples/dnn/results/audrybt1.jpg
new file mode 100644 (file)
index 0000000..5829f82
Binary files /dev/null and b/samples/dnn/results/audrybt1.jpg differ