Merge pull request #20422 from fengyuentau:dnn_face

author Yuantao Feng <yuantao.feng@outlook.com>

Fri, 8 Oct 2021 19:13:49 +0000 (03:13 +0800)

committer GitHub <noreply@github.com>

Fri, 8 Oct 2021 19:13:49 +0000 (19:13 +0000)
author Yuantao Feng <yuantao.feng@outlook.com>
Fri, 8 Oct 2021 19:13:49 +0000 (03:13 +0800)
committer GitHub <noreply@github.com>
Fri, 8 Oct 2021 19:13:49 +0000 (19:13 +0000)
diff --git a/doc/tutorials/dnn/dnn_face/dnn_face.markdown b/doc/tutorials/dnn/dnn_face/dnn_face.markdown

new file mode 100644 (file)

index 0000000..e5092b8
--- /dev/null
+++ b/doc/tutorials/dnn/dnn_face/dnn_face.markdown
@@ -0,0 +1,95 @@
+# DNN-based Face Detection And Recognition {#tutorial_dnn_face}
+
+@tableofcontents
+
+@prev_tutorial{tutorial_dnn_text_spotting}
+@next_tutorial{pytorch_cls_tutorial_dnn_conversion}
+
+| | |
+| -: | :- |
+| Original Author | Chengrui Wang, Yuantao Feng |
+| Compatibility | OpenCV >= 4.5.1 |
+
+## Introduction
+
+In this section, we introduce the DNN-based module for face detection and face recognition. Models can be obtained in [Models](#Models). The usage of `FaceDetectorYN` and `FaceRecognizer` are presented in [Usage](#Usage).
+
+## Models
+
+There are two models (ONNX format) pre-trained and required for this module:
+- [Face Detection](https://github.com/ShiqiYu/libfacedetection.train/tree/master/tasks/task1/onnx):
+    - Size: 337KB
+    - Results on WIDER Face Val set: 0.830(easy), 0.824(medium), 0.708(hard)
+- [Face Recognition](https://drive.google.com/file/d/1ClK9WiB492c5OZFKveF3XiHCejoOxINW/view?usp=sharing)
+    - Size: 36.9MB
+    - Results:
+
+    | Database | Accuracy | Threshold (normL2) | Threshold (cosine) |
+    | -------- | -------- | ------------------ | ------------------ |
+    | LFW      | 99.60%   | 1.128              | 0.363              |
+    | CALFW    | 93.95%   | 1.149              | 0.340              |
+    | CPLFW    | 91.05%   | 1.204              | 0.275              |
+    | AgeDB-30 | 94.90%   | 1.202              | 0.277              |
+    | CFP-FP   | 94.80%   | 1.253              | 0.212              |
+
+## Usage
+
+### DNNFaceDetector
+
+```cpp
+// Initialize FaceDetectorYN
+Ptr<FaceDetectorYN> faceDetector = FaceDetectorYN::create(onnx_path, "", image.size(), score_thresh, nms_thresh, top_k);
+
+// Forward
+Mat faces;
+faceDetector->detect(image, faces);
+```
+
+The detection output `faces` is a two-dimension array of type CV_32F, whose rows are the detected face instances, columns are the location of a face and 5 facial landmarks. The format of each row is as follows:
+
+```
+x1, y1, w, h, x_re, y_re, x_le, y_le, x_nt, y_nt, x_rcm, y_rcm, x_lcm, y_lcm
+```
+, where `x1, y1, w, h` are the top-left coordinates, width and height of the face bounding box, `{x, y}_{re, le, nt, rcm, lcm}` stands for the coordinates of right eye, left eye, nose tip, the right corner and left corner of the mouth respectively.
+
+
+### Face Recognition
+
+Following Face Detection, run codes below to extract face feature from facial image.
+
+```cpp
+// Initialize FaceRecognizer with model path (cv::String)
+Ptr<FaceRecognizer> faceRecognizer = FaceRecognizer::create(model_path, "");
+
+// Aligning and cropping facial image through the first face of faces detected by dnn_face::DNNFaceDetector
+Mat aligned_face;
+faceRecognizer->alignCrop(image, faces.row(0), aligned_face);
+
+// Run feature extraction with given aligned_face (cv::Mat)
+Mat feature;
+faceRecognizer->feature(aligned_face, feature);
+feature = feature.clone();
+```
+
+After obtaining face features *feature1* and *feature2* of two facial images, run codes below to calculate the identity discrepancy between the two faces.
+
+```cpp
+// Calculating the discrepancy between two face features by using cosine distance.
+double cos_score = faceRecognizer->match(feature1, feature2, FaceRecognizer::DisType::COSINE);
+// Calculating the discrepancy between two face features by using normL2 distance.
+double L2_score = faceRecognizer->match(feature1, feature2, FaceRecognizer::DisType::NORM_L2);
+```
+
+For example, two faces have same identity if the cosine distance is greater than or equal to 0.363, or the normL2 distance is less than or equal to 1.128.
+
+## Reference:
+
+- https://github.com/ShiqiYu/libfacedetection
+- https://github.com/ShiqiYu/libfacedetection.train
+- https://github.com/zhongyy/SFace
+
+## Acknowledgement
+
+Thanks [Professor Shiqi Yu](https://github.com/ShiqiYu/) and [Yuantao Feng](https://github.com/fengyuentau) for training and providing the face detection model.
+
+Thanks [Professor Deng](http://www.whdeng.cn/), [PhD Candidate Zhong](https://github.com/zhongyy/) and [Master Candidate Wang](https://github.com/crywang/) for training and providing the face recognition model.
diff --git a/doc/tutorials/dnn/dnn_text_spotting/dnn_text_spotting.markdown b/doc/tutorials/dnn/dnn_text_spotting/dnn_text_spotting.markdown

index b0be262..5c46594 100644 (file)
--- a/doc/tutorials/dnn/dnn_text_spotting/dnn_text_spotting.markdown
+++ b/doc/tutorials/dnn/dnn_text_spotting/dnn_text_spotting.markdown
@@ -3,7 +3,7 @@
  @tableofcontents
  
  @prev_tutorial{tutorial_dnn_OCR}
-@next_tutorial{pytorch_cls_tutorial_dnn_conversion}
+@next_tutorial{tutorial_dnn_face}
  
  |    |    |
  | -: | :- |
diff --git a/doc/tutorials/dnn/table_of_content_dnn.markdown b/doc/tutorials/dnn/table_of_content_dnn.markdown

index 0d5e43e..3f74826 100644 (file)
--- a/doc/tutorials/dnn/table_of_content_dnn.markdown
+++ b/doc/tutorials/dnn/table_of_content_dnn.markdown
@@ -10,6 +10,7 @@ Deep Neural Networks (dnn module) {#tutorial_table_of_content_dnn}
  -   @subpage tutorial_dnn_custom_layers
  -   @subpage tutorial_dnn_OCR
  -   @subpage tutorial_dnn_text_spotting
+-   @subpage tutorial_dnn_face
  
  #### PyTorch models with OpenCV
  In this section you will find the guides, which describe how to run classification, segmentation and detection PyTorch DNN models with OpenCV.
diff --git a/modules/objdetect/CMakeLists.txt b/modules/objdetect/CMakeLists.txt

index 3fa0c5d..f4d5b22 100644 (file)
--- a/modules/objdetect/CMakeLists.txt
+++ b/modules/objdetect/CMakeLists.txt
@@ -1,5 +1,5 @@
  set(the_description "Object Detection")
-ocv_define_module(objdetect opencv_core opencv_imgproc opencv_calib3d WRAP java objc python js)
+ocv_define_module(objdetect opencv_core opencv_imgproc opencv_calib3d opencv_dnn WRAP java objc python js)
  
  if(HAVE_QUIRC)
      get_property(QUIRC_INCLUDE GLOBAL PROPERTY QUIRC_INCLUDE_DIR)
diff --git a/modules/objdetect/include/opencv2/objdetect.hpp b/modules/objdetect/include/opencv2/objdetect.hpp

index eaee129..59dca73 100644 (file)
--- a/modules/objdetect/include/opencv2/objdetect.hpp
+++ b/modules/objdetect/include/opencv2/objdetect.hpp
@@ -768,5 +768,6 @@ protected:
  }
  
  #include "opencv2/objdetect/detection_based_tracker.hpp"
+#include "opencv2/objdetect/face.hpp"
  
  #endif
diff --git a/modules/objdetect/include/opencv2/objdetect/face.hpp b/modules/objdetect/include/opencv2/objdetect/face.hpp

new file mode 100644 (file)

index 0000000..f2429c5
--- /dev/null
+++ b/modules/objdetect/include/opencv2/objdetect/face.hpp
@@ -0,0 +1,125 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#ifndef OPENCV_OBJDETECT_FACE_HPP
+#define OPENCV_OBJDETECT_FACE_HPP
+
+#include <opencv2/core.hpp>
+
+/** @defgroup dnn_face DNN-based face detection and recognition
+ */
+
+namespace cv
+{
+
+/** @brief DNN-based face detector, model download link: https://github.com/ShiqiYu/libfacedetection.train/tree/master/tasks/task1/onnx.
+ */
+class CV_EXPORTS_W FaceDetectorYN
+{
+public:
+    virtual ~FaceDetectorYN() {};
+
+    /** @brief Set the size for the network input, which overwrites the input size of creating model. Call this method when the size of input image does not match the input size when creating model
+     *
+     * @param input_size the size of the input image
+     */
+    CV_WRAP virtual void setInputSize(const Size& input_size) = 0;
+
+    CV_WRAP virtual Size getInputSize() = 0;
+
+    /** @brief Set the score threshold to filter out bounding boxes of score less than the given value
+     *
+     * @param score_threshold threshold for filtering out bounding boxes
+     */
+    CV_WRAP virtual void setScoreThreshold(float score_threshold) = 0;
+
+    CV_WRAP virtual float getScoreThreshold() = 0;
+
+    /** @brief Set the Non-maximum-suppression threshold to suppress bounding boxes that have IoU greater than the given value
+     *
+     * @param nms_threshold threshold for NMS operation
+     */
+    CV_WRAP virtual void setNMSThreshold(float nms_threshold) = 0;
+
+    CV_WRAP virtual float getNMSThreshold() = 0;
+
+    /** @brief Set the number of bounding boxes preserved before NMS
+     *
+     * @param top_k the number of bounding boxes to preserve from top rank based on score
+     */
+    CV_WRAP virtual void setTopK(int top_k) = 0;
+
+    CV_WRAP virtual int getTopK() = 0;
+
+    /** @brief A simple interface to detect face from given image
+     *
+     *  @param image an image to detect
+     *  @param faces detection results stored in a cv::Mat
+     */
+    CV_WRAP virtual int detect(InputArray image, OutputArray faces) = 0;
+
+    /** @brief Creates an instance of this class with given parameters
+     *
+     *  @param model the path to the requested model
+     *  @param config the path to the config file for compability, which is not requested for ONNX models
+     *  @param input_size the size of the input image
+     *  @param score_threshold the threshold to filter out bounding boxes of score smaller than the given value
+     *  @param nms_threshold the threshold to suppress bounding boxes of IoU bigger than the given value
+     *  @param top_k keep top K bboxes before NMS
+     *  @param backend_id the id of backend
+     *  @param target_id the id of target device
+     */
+    CV_WRAP static Ptr<FaceDetectorYN> create(const String& model,
+                                              const String& config,
+                                              const Size& input_size,
+                                              float score_threshold = 0.9f,
+                                              float nms_threshold = 0.3f,
+                                              int top_k = 5000,
+                                              int backend_id = 0,
+                                              int target_id = 0);
+};
+
+/** @brief DNN-based face recognizer, model download link: https://drive.google.com/file/d/1ClK9WiB492c5OZFKveF3XiHCejoOxINW/view.
+ */
+class CV_EXPORTS_W FaceRecognizerSF
+{
+public:
+    virtual ~FaceRecognizerSF() {};
+
+    /** @brief Definition of distance used for calculating the distance between two face features
+     */
+    enum DisType { FR_COSINE=0, FR_NORM_L2=1 };
+
+    /** @brief Aligning image to put face on the standard position
+     *  @param src_img input image
+     *  @param face_box the detection result used for indicate face in input image
+     *  @param aligned_img output aligned image
+     */
+    CV_WRAP virtual void alignCrop(InputArray src_img, InputArray face_box, OutputArray aligned_img) const = 0;
+
+    /** @brief Extracting face feature from aligned image
+     *  @param aligned_img input aligned image
+     *  @param face_feature output face feature
+     */
+    CV_WRAP virtual void feature(InputArray aligned_img, OutputArray face_feature) = 0;
+
+    /** @brief Calculating the distance between two face features
+     *  @param _face_feature1 the first input feature
+     *  @param _face_feature2 the second input feature of the same size and the same type as _face_feature1
+     *  @param dis_type defining the similarity with optional values "FR_OSINE" or "FR_NORM_L2"
+     */
+    CV_WRAP virtual double match(InputArray _face_feature1, InputArray _face_feature2, int dis_type = FaceRecognizerSF::FR_COSINE) const = 0;
+
+    /** @brief Creates an instance of this class with given parameters
+     *  @param model the path of the onnx model used for face recognition
+     *  @param config the path to the config file for compability, which is not requested for ONNX models
+     *  @param backend_id the id of backend
+     *  @param target_id the id of target device
+     */
+    CV_WRAP static Ptr<FaceRecognizerSF> create(const String& model, const String& config, int backend_id = 0, int target_id = 0);
+};
+
+} // namespace cv
+
+#endif
diff --git a/modules/objdetect/src/face_detect.cpp b/modules/objdetect/src/face_detect.cpp

new file mode 100644 (file)

index 0000000..4095745
--- /dev/null
+++ b/modules/objdetect/src/face_detect.cpp
@@ -0,0 +1,288 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+
+#include "opencv2/imgproc.hpp"
+#include "opencv2/core.hpp"
+#include "opencv2/dnn.hpp"
+
+#include <algorithm>
+
+namespace cv
+{
+
+class FaceDetectorYNImpl : public FaceDetectorYN
+{
+public:
+    FaceDetectorYNImpl(const String& model,
+                       const String& config,
+                       const Size& input_size,
+                       float score_threshold,
+                       float nms_threshold,
+                       int top_k,
+                       int backend_id,
+                       int target_id)
+    {
+        net = dnn::readNet(model, config);
+        CV_Assert(!net.empty());
+
+        net.setPreferableBackend(backend_id);
+        net.setPreferableTarget(target_id);
+
+        inputW = input_size.width;
+        inputH = input_size.height;
+
+        scoreThreshold = score_threshold;
+        nmsThreshold = nms_threshold;
+        topK = top_k;
+
+        generatePriors();
+    }
+
+    void setInputSize(const Size& input_size) override
+    {
+        inputW = input_size.width;
+        inputH = input_size.height;
+        generatePriors();
+    }
+
+    Size getInputSize() override
+    {
+        Size input_size;
+        input_size.width = inputW;
+        input_size.height = inputH;
+        return input_size;
+    }
+
+    void setScoreThreshold(float score_threshold) override
+    {
+        scoreThreshold = score_threshold;
+    }
+
+    float getScoreThreshold() override
+    {
+        return scoreThreshold;
+    }
+
+    void setNMSThreshold(float nms_threshold) override
+    {
+        nmsThreshold = nms_threshold;
+    }
+
+    float getNMSThreshold() override
+    {
+        return nmsThreshold;
+    }
+
+    void setTopK(int top_k) override
+    {
+        topK = top_k;
+    }
+
+    int getTopK() override
+    {
+        return topK;
+    }
+
+    int detect(InputArray input_image, OutputArray faces) override
+    {
+        // TODO: more checkings should be done?
+        if (input_image.empty())
+        {
+            return 0;
+        }
+        CV_CheckEQ(input_image.size(), Size(inputW, inputH), "Size does not match. Call setInputSize(size) if input size does not match the preset size");
+
+        // Build blob from input image
+        Mat input_blob = dnn::blobFromImage(input_image);
+
+        // Forward
+        std::vector<String> output_names = { "loc", "conf", "iou" };
+        std::vector<Mat> output_blobs;
+        net.setInput(input_blob);
+        net.forward(output_blobs, output_names);
+
+        // Post process
+        Mat results = postProcess(output_blobs);
+        results.convertTo(faces, CV_32FC1);
+        return 1;
+    }
+private:
+    void generatePriors()
+    {
+        // Calculate shapes of different scales according to the shape of input image
+        Size feature_map_2nd = {
+            int(int((inputW+1)/2)/2), int(int((inputH+1)/2)/2)
+        };
+        Size feature_map_3rd = {
+            int(feature_map_2nd.width/2), int(feature_map_2nd.height/2)
+        };
+        Size feature_map_4th = {
+            int(feature_map_3rd.width/2), int(feature_map_3rd.height/2)
+        };
+        Size feature_map_5th = {
+            int(feature_map_4th.width/2), int(feature_map_4th.height/2)
+        };
+        Size feature_map_6th = {
+            int(feature_map_5th.width/2), int(feature_map_5th.height/2)
+        };
+
+        std::vector<Size> feature_map_sizes;
+        feature_map_sizes.push_back(feature_map_3rd);
+        feature_map_sizes.push_back(feature_map_4th);
+        feature_map_sizes.push_back(feature_map_5th);
+        feature_map_sizes.push_back(feature_map_6th);
+
+        // Fixed params for generating priors
+        const std::vector<std::vector<float>> min_sizes = {
+            {10.0f,  16.0f,  24.0f},
+            {32.0f,  48.0f},
+            {64.0f,  96.0f},
+            {128.0f, 192.0f, 256.0f}
+        };
+        const std::vector<int> steps = { 8, 16, 32, 64 };
+
+        // Generate priors
+        priors.clear();
+        for (size_t i = 0; i < feature_map_sizes.size(); ++i)
+        {
+            Size feature_map_size = feature_map_sizes[i];
+            std::vector<float> min_size = min_sizes[i];
+
+            for (int _h = 0; _h < feature_map_size.height; ++_h)
+            {
+                for (int _w = 0; _w < feature_map_size.width; ++_w)
+                {
+                    for (size_t j = 0; j < min_size.size(); ++j)
+                    {
+                        float s_kx = min_size[j] / inputW;
+                        float s_ky = min_size[j] / inputH;
+
+                        float cx = (_w + 0.5f) * steps[i] / inputW;
+                        float cy = (_h + 0.5f) * steps[i] / inputH;
+
+                        Rect2f prior = { cx, cy, s_kx, s_ky };
+                        priors.push_back(prior);
+                    }
+                }
+            }
+        }
+    }
+
+    Mat postProcess(const std::vector<Mat>& output_blobs)
+    {
+        // Extract from output_blobs
+        Mat loc = output_blobs[0];
+        Mat conf = output_blobs[1];
+        Mat iou = output_blobs[2];
+
+        // Decode from deltas and priors
+        const std::vector<float> variance = {0.1f, 0.2f};
+        float* loc_v = (float*)(loc.data);
+        float* conf_v = (float*)(conf.data);
+        float* iou_v = (float*)(iou.data);
+        Mat faces;
+        // (tl_x, tl_y, w, h, re_x, re_y, le_x, le_y, nt_x, nt_y, rcm_x, rcm_y, lcm_x, lcm_y, score)
+        // 'tl': top left point of the bounding box
+        // 're': right eye, 'le': left eye
+        // 'nt':  nose tip
+        // 'rcm': right corner of mouth, 'lcm': left corner of mouth
+        Mat face(1, 15, CV_32FC1);
+        for (size_t i = 0; i < priors.size(); ++i) {
+            // Get score
+            float clsScore = conf_v[i*2+1];
+            float iouScore = iou_v[i];
+            // Clamp
+            if (iouScore < 0.f) {
+                iouScore = 0.f;
+            }
+            else if (iouScore > 1.f) {
+                iouScore = 1.f;
+            }
+            float score = std::sqrt(clsScore * iouScore);
+            face.at<float>(0, 14) = score;
+
+            // Get bounding box
+            float cx = (priors[i].x + loc_v[i*14+0] * variance[0] * priors[i].width)  * inputW;
+            float cy = (priors[i].y + loc_v[i*14+1] * variance[0] * priors[i].height) * inputH;
+            float w  = priors[i].width  * exp(loc_v[i*14+2] * variance[0]) * inputW;
+            float h  = priors[i].height * exp(loc_v[i*14+3] * variance[1]) * inputH;
+            float x1 = cx - w / 2;
+            float y1 = cy - h / 2;
+            face.at<float>(0, 0) = x1;
+            face.at<float>(0, 1) = y1;
+            face.at<float>(0, 2) = w;
+            face.at<float>(0, 3) = h;
+
+            // Get landmarks
+            face.at<float>(0, 4) = (priors[i].x + loc_v[i*14+ 4] * variance[0] * priors[i].width)  * inputW;  // right eye, x
+            face.at<float>(0, 5) = (priors[i].y + loc_v[i*14+ 5] * variance[0] * priors[i].height) * inputH;  // right eye, y
+            face.at<float>(0, 6) = (priors[i].x + loc_v[i*14+ 6] * variance[0] * priors[i].width)  * inputW;  // left eye, x
+            face.at<float>(0, 7) = (priors[i].y + loc_v[i*14+ 7] * variance[0] * priors[i].height) * inputH;  // left eye, y
+            face.at<float>(0, 8) = (priors[i].x + loc_v[i*14+ 8] * variance[0] * priors[i].width)  * inputW;  // nose tip, x
+            face.at<float>(0, 9) = (priors[i].y + loc_v[i*14+ 9] * variance[0] * priors[i].height) * inputH;  // nose tip, y
+            face.at<float>(0, 10) = (priors[i].x + loc_v[i*14+10] * variance[0] * priors[i].width)  * inputW; // right corner of mouth, x
+            face.at<float>(0, 11) = (priors[i].y + loc_v[i*14+11] * variance[0] * priors[i].height) * inputH; // right corner of mouth, y
+            face.at<float>(0, 12) = (priors[i].x + loc_v[i*14+12] * variance[0] * priors[i].width)  * inputW; // left corner of mouth, x
+            face.at<float>(0, 13) = (priors[i].y + loc_v[i*14+13] * variance[0] * priors[i].height) * inputH; // left corner of mouth, y
+
+            faces.push_back(face);
+        }
+
+        if (faces.rows > 1)
+        {
+            // Retrieve boxes and scores
+            std::vector<Rect2i> faceBoxes;
+            std::vector<float> faceScores;
+            for (int rIdx = 0; rIdx < faces.rows; rIdx++)
+            {
+                faceBoxes.push_back(Rect2i(int(faces.at<float>(rIdx, 0)),
+                                           int(faces.at<float>(rIdx, 1)),
+                                           int(faces.at<float>(rIdx, 2)),
+                                           int(faces.at<float>(rIdx, 3))));
+                faceScores.push_back(faces.at<float>(rIdx, 14));
+            }
+
+            std::vector<int> keepIdx;
+            dnn::NMSBoxes(faceBoxes, faceScores, scoreThreshold, nmsThreshold, keepIdx, 1.f, topK);
+
+            // Get NMS results
+            Mat nms_faces;
+            for (int idx: keepIdx)
+            {
+                nms_faces.push_back(faces.row(idx));
+            }
+            return nms_faces;
+        }
+        else
+        {
+            return faces;
+        }
+    }
+private:
+    dnn::Net net;
+
+    int inputW;
+    int inputH;
+    float scoreThreshold;
+    float nmsThreshold;
+    int topK;
+
+    std::vector<Rect2f> priors;
+};
+
+Ptr<FaceDetectorYN> FaceDetectorYN::create(const String& model,
+                                           const String& config,
+                                           const Size& input_size,
+                                           const float score_threshold,
+                                           const float nms_threshold,
+                                           const int top_k,
+                                           const int backend_id,
+                                           const int target_id)
+{
+    return makePtr<FaceDetectorYNImpl>(model, config, input_size, score_threshold, nms_threshold, top_k, backend_id, target_id);
+}
+
+} // namespace cv
diff --git a/modules/objdetect/src/face_recognize.cpp b/modules/objdetect/src/face_recognize.cpp

new file mode 100644 (file)

index 0000000..6550a13
--- /dev/null
+++ b/modules/objdetect/src/face_recognize.cpp
@@ -0,0 +1,182 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "precomp.hpp"
+
+#include "opencv2/dnn.hpp"
+
+#include <algorithm>
+
+namespace cv
+{
+
+class FaceRecognizerSFImpl : public FaceRecognizerSF
+{
+public:
+    FaceRecognizerSFImpl(const String& model, const String& config, int backend_id, int target_id)
+    {
+        net = dnn::readNet(model, config);
+        CV_Assert(!net.empty());
+
+        net.setPreferableBackend(backend_id);
+        net.setPreferableTarget(target_id);
+    };
+    void alignCrop(InputArray _src_img, InputArray _face_mat, OutputArray _aligned_img) const override
+    {
+        Mat face_mat = _face_mat.getMat();
+        float src_point[5][2];
+        for (int row = 0; row < 5; ++row)
+        {
+            for(int col = 0; col < 2; ++col)
+            {
+                src_point[row][col] = face_mat.at<float>(0, row*2+col+4);
+            }
+        }
+        Mat warp_mat = getSimilarityTransformMatrix(src_point);
+        warpAffine(_src_img, _aligned_img, warp_mat, Size(112, 112), INTER_LINEAR);
+    };
+    void feature(InputArray _aligned_img, OutputArray _face_feature) override
+    {
+        Mat inputBolb = dnn::blobFromImage(_aligned_img, 1, Size(112, 112), Scalar(0, 0, 0), true, false);
+        net.setInput(inputBolb);
+        net.forward(_face_feature);
+    };
+    double match(InputArray _face_feature1, InputArray _face_feature2, int dis_type) const override
+    {
+        Mat face_feature1 = _face_feature1.getMat(), face_feature2 = _face_feature2.getMat();
+        face_feature1 /= norm(face_feature1);
+        face_feature2 /= norm(face_feature2);
+
+        if(dis_type == DisType::FR_COSINE){
+            return sum(face_feature1.mul(face_feature2))[0];
+        }else if(dis_type == DisType::FR_NORM_L2){
+            return norm(face_feature1, face_feature2);
+        }else{
+            throw std::invalid_argument("invalid parameter " + std::to_string(dis_type));
+        }
+
+    };
+
+private:
+    Mat getSimilarityTransformMatrix(float src[5][2]) const {
+        float dst[5][2] = { {38.2946f, 51.6963f}, {73.5318f, 51.5014f}, {56.0252f, 71.7366f}, {41.5493f, 92.3655f}, {70.7299f, 92.2041f} };
+        float avg0 = (src[0][0] + src[1][0] + src[2][0] + src[3][0] + src[4][0]) / 5;
+        float avg1 = (src[0][1] + src[1][1] + src[2][1] + src[3][1] + src[4][1]) / 5;
+        //Compute mean of src and dst.
+        float src_mean[2] = { avg0, avg1 };
+        float dst_mean[2] = { 56.0262f, 71.9008f };
+        //Subtract mean from src and dst.
+        float src_demean[5][2];
+        for (int i = 0; i < 2; i++)
+        {
+            for (int j = 0; j < 5; j++)
+            {
+                src_demean[j][i] = src[j][i] - src_mean[i];
+            }
+        }
+        float dst_demean[5][2];
+        for (int i = 0; i < 2; i++)
+        {
+            for (int j = 0; j < 5; j++)
+            {
+                dst_demean[j][i] = dst[j][i] - dst_mean[i];
+            }
+        }
+        double A00 = 0.0, A01 = 0.0, A10 = 0.0, A11 = 0.0;
+        for (int i = 0; i < 5; i++)
+            A00 += dst_demean[i][0] * src_demean[i][0];
+        A00 = A00 / 5;
+        for (int i = 0; i < 5; i++)
+            A01 += dst_demean[i][0] * src_demean[i][1];
+        A01 = A01 / 5;
+        for (int i = 0; i < 5; i++)
+            A10 += dst_demean[i][1] * src_demean[i][0];
+        A10 = A10 / 5;
+        for (int i = 0; i < 5; i++)
+            A11 += dst_demean[i][1] * src_demean[i][1];
+        A11 = A11 / 5;
+        Mat A = (Mat_<double>(2, 2) << A00, A01, A10, A11);
+        double d[2] = { 1.0, 1.0 };
+        double detA = A00 * A11 - A01 * A10;
+        if (detA < 0)
+            d[1] = -1;
+        double T[3][3] = { {1.0, 0.0, 0.0}, {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0} };
+        Mat s, u, vt, v;
+        SVD::compute(A, s, u, vt);
+        double smax = s.ptr<double>(0)[0]>s.ptr<double>(1)[0] ? s.ptr<double>(0)[0] : s.ptr<double>(1)[0];
+        double tol = smax * 2 * FLT_MIN;
+        int rank = 0;
+        if (s.ptr<double>(0)[0]>tol)
+            rank += 1;
+        if (s.ptr<double>(1)[0]>tol)
+            rank += 1;
+        double arr_u[2][2] = { {u.ptr<double>(0)[0], u.ptr<double>(0)[1]}, {u.ptr<double>(1)[0], u.ptr<double>(1)[1]} };
+        double arr_vt[2][2] = { {vt.ptr<double>(0)[0], vt.ptr<double>(0)[1]}, {vt.ptr<double>(1)[0], vt.ptr<double>(1)[1]} };
+        double det_u = arr_u[0][0] * arr_u[1][1] - arr_u[0][1] * arr_u[1][0];
+        double det_vt = arr_vt[0][0] * arr_vt[1][1] - arr_vt[0][1] * arr_vt[1][0];
+        if (rank == 1)
+        {
+            if ((det_u*det_vt) > 0)
+            {
+                Mat uvt = u*vt;
+                T[0][0] = uvt.ptr<double>(0)[0];
+                T[0][1] = uvt.ptr<double>(0)[1];
+                T[1][0] = uvt.ptr<double>(1)[0];
+                T[1][1] = uvt.ptr<double>(1)[1];
+            }
+            else
+            {
+                double temp = d[1];
+                d[1] = -1;
+                Mat D = (Mat_<double>(2, 2) << d[0], 0.0, 0.0, d[1]);
+                Mat Dvt = D*vt;
+                Mat uDvt = u*Dvt;
+                T[0][0] = uDvt.ptr<double>(0)[0];
+                T[0][1] = uDvt.ptr<double>(0)[1];
+                T[1][0] = uDvt.ptr<double>(1)[0];
+                T[1][1] = uDvt.ptr<double>(1)[1];
+                d[1] = temp;
+            }
+        }
+        else
+        {
+            Mat D = (Mat_<double>(2, 2) << d[0], 0.0, 0.0, d[1]);
+            Mat Dvt = D*vt;
+            Mat uDvt = u*Dvt;
+            T[0][0] = uDvt.ptr<double>(0)[0];
+            T[0][1] = uDvt.ptr<double>(0)[1];
+            T[1][0] = uDvt.ptr<double>(1)[0];
+            T[1][1] = uDvt.ptr<double>(1)[1];
+        }
+        double var1 = 0.0;
+        for (int i = 0; i < 5; i++)
+            var1 += src_demean[i][0] * src_demean[i][0];
+        var1 = var1 / 5;
+        double var2 = 0.0;
+        for (int i = 0; i < 5; i++)
+            var2 += src_demean[i][1] * src_demean[i][1];
+        var2 = var2 / 5;
+        double scale = 1.0 / (var1 + var2)* (s.ptr<double>(0)[0] * d[0] + s.ptr<double>(1)[0] * d[1]);
+        double TS[2];
+        TS[0] = T[0][0] * src_mean[0] + T[0][1] * src_mean[1];
+        TS[1] = T[1][0] * src_mean[0] + T[1][1] * src_mean[1];
+        T[0][2] = dst_mean[0] - scale*TS[0];
+        T[1][2] = dst_mean[1] - scale*TS[1];
+        T[0][0] *= scale;
+        T[0][1] *= scale;
+        T[1][0] *= scale;
+        T[1][1] *= scale;
+        Mat transform_mat = (Mat_<double>(2, 3) << T[0][0], T[0][1], T[0][2], T[1][0], T[1][1], T[1][2]);
+        return transform_mat;
+    }
+private:
+    dnn::Net net;
+};
+
+Ptr<FaceRecognizerSF> FaceRecognizerSF::create(const String& model, const String& config, int backend_id, int target_id)
+{
+    return makePtr<FaceRecognizerSFImpl>(model, config, backend_id, target_id);
+}
+
+} // namespace cv
diff --git a/modules/objdetect/test/test_face.cpp b/modules/objdetect/test/test_face.cpp

new file mode 100644 (file)

index 0000000..2e944c5
--- /dev/null
+++ b/modules/objdetect/test/test_face.cpp
@@ -0,0 +1,219 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "test_precomp.hpp"
+
+namespace opencv_test { namespace {
+
+// label format:
+//   image_name
+//   num_face
+//   face_1
+//   face_..
+//   face_num
+std::map<std::string, Mat> blobFromTXT(const std::string& path, int numCoords)
+{
+    std::ifstream ifs(path.c_str());
+    CV_Assert(ifs.is_open());
+
+    std::map<std::string, Mat> gt;
+
+    Mat faces;
+    int faceNum = -1;
+    int faceCount = 0;
+    for (std::string line, key; getline(ifs, line); )
+    {
+        std::istringstream iss(line);
+        if (line.find(".png") != std::string::npos)
+        {
+            // Get filename
+            iss >> key;
+        }
+        else if (line.find(" ") == std::string::npos)
+        {
+            // Get the number of faces
+            iss >> faceNum;
+        }
+        else
+        {
+            // Get faces
+            Mat face(1, numCoords, CV_32FC1);
+            for (int j = 0; j < numCoords; j++)
+            {
+                iss >> face.at<float>(0, j);
+            }
+            faces.push_back(face);
+            faceCount++;
+        }
+
+        if (faceCount == faceNum)
+        {
+            // Store faces
+            gt[key] = faces;
+
+            faces.release();
+            faceNum = -1;
+            faceCount = 0;
+        }
+    }
+
+    return gt;
+}
+
+TEST(Objdetect_face_detection, regression)
+{
+    // Pre-set params
+    float scoreThreshold = 0.7f;
+    float matchThreshold = 0.9f;
+    float l2disThreshold = 5.0f;
+    int numLM = 5;
+    int numCoords = 4 + 2 * numLM;
+
+    // Load ground truth labels
+    std::map<std::string, Mat> gt = blobFromTXT(findDataFile("dnn_face/detection/cascades_labels.txt"), numCoords);
+    // for (auto item: gt)
+    // {
+    //     std::cout << item.first << " " << item.second.size() << std::endl;
+    // }
+
+    // Initialize detector
+    std::string model = findDataFile("dnn/onnx/models/yunet-202109.onnx", false);
+    Ptr<FaceDetectorYN> faceDetector = FaceDetectorYN::create(model, "", Size(300, 300));
+    faceDetector->setScoreThreshold(0.7f);
+
+    // Detect and match
+    for (auto item: gt)
+    {
+        std::string imagePath = findDataFile("cascadeandhog/images/" + item.first);
+        Mat image = imread(imagePath);
+
+        // Set input size
+        faceDetector->setInputSize(image.size());
+
+        // Run detection
+        Mat faces;
+        faceDetector->detect(image, faces);
+        // std::cout << item.first << " " << item.second.rows << " " << faces.rows << std::endl;
+
+        // Match bboxes and landmarks
+        std::vector<bool> matchedItem(item.second.rows, false);
+        for (int i = 0; i < faces.rows; i++)
+        {
+            if (faces.at<float>(i, numCoords) < scoreThreshold)
+                continue;
+
+            bool boxMatched = false;
+            std::vector<bool> lmMatched(numLM, false);
+            cv::Rect2f resBox(faces.at<float>(i, 0), faces.at<float>(i, 1), faces.at<float>(i, 2), faces.at<float>(i, 3));
+            for (int j = 0; j < item.second.rows && !boxMatched; j++)
+            {
+                if (matchedItem[j])
+                    continue;
+
+                // Retrieve bbox and compare IoU
+                cv::Rect2f gtBox(item.second.at<float>(j, 0), item.second.at<float>(j, 1), item.second.at<float>(j, 2), item.second.at<float>(j, 3));
+                double interArea = (resBox & gtBox).area();
+                double iou = interArea / (resBox.area() + gtBox.area() - interArea);
+                if (iou >= matchThreshold)
+                {
+                    boxMatched = true;
+                    matchedItem[j] = true;
+                }
+
+                // Match landmarks if bbox is matched
+                if (!boxMatched)
+                    continue;
+                for (int lmIdx = 0; lmIdx < numLM; lmIdx++)
+                {
+                    float gtX = item.second.at<float>(j, 4 + 2 * lmIdx);
+                    float gtY = item.second.at<float>(j, 4 + 2 * lmIdx + 1);
+                    float resX = faces.at<float>(i, 4 + 2 * lmIdx);
+                    float resY = faces.at<float>(i, 4 + 2 * lmIdx + 1);
+                    float l2dis = cv::sqrt((gtX - resX) * (gtX - resX) + (gtY - resY) * (gtY - resY));
+
+                    if (l2dis <= l2disThreshold)
+                    {
+                        lmMatched[lmIdx] = true;
+                    }
+                }
+            }
+            EXPECT_TRUE(boxMatched) << "In image " << item.first << ", cannot match resBox " << resBox << " with any ground truth.";
+            if (boxMatched)
+            {
+                EXPECT_TRUE(std::all_of(lmMatched.begin(), lmMatched.end(), [](bool v) { return v; })) << "In image " << item.first << ", resBox " << resBox << " matched but its landmarks failed to match.";
+            }
+        }
+    }
+}
+
+TEST(Objdetect_face_recognition, regression)
+{
+    // Pre-set params
+    float score_thresh = 0.9f;
+    float nms_thresh = 0.3f;
+    double cosine_similar_thresh = 0.363;
+    double l2norm_similar_thresh = 1.128;
+
+    // Load ground truth labels
+    std::ifstream ifs(findDataFile("dnn_face/recognition/cascades_label.txt").c_str());
+    CV_Assert(ifs.is_open());
+
+    std::set<std::string> fSet;
+    std::map<std::string, Mat> featureMap;
+    std::map<std::pair<std::string, std::string>, int> gtMap;
+
+
+    for (std::string line, key; getline(ifs, line);)
+    {
+        std::string fname1, fname2;
+        int label;
+        std::istringstream iss(line);
+        iss>>fname1>>fname2>>label;
+        // std::cout<<fname1<<" "<<fname2<<" "<<label<<std::endl;
+
+        fSet.insert(fname1);
+        fSet.insert(fname2);
+        gtMap[std::make_pair(fname1, fname2)] = label;
+    }
+
+    // Initialize detector
+    std::string detect_model = findDataFile("dnn/onnx/models/yunet-202109.onnx", false);
+    Ptr<FaceDetectorYN> faceDetector = FaceDetectorYN::create(detect_model, "", Size(150, 150), score_thresh, nms_thresh);
+
+    std::string recog_model = findDataFile("dnn/onnx/models/face_recognizer_fast.onnx", false);
+    Ptr<FaceRecognizerSF> faceRecognizer = FaceRecognizerSF::create(recog_model, "");
+
+    // Detect and match
+    for (auto fname: fSet)
+    {
+        std::string imagePath = findDataFile("dnn_face/recognition/" + fname);
+        Mat image = imread(imagePath);
+
+        Mat faces;
+        faceDetector->detect(image, faces);
+
+        Mat aligned_face;
+        faceRecognizer->alignCrop(image, faces.row(0), aligned_face);
+
+        Mat feature;
+        faceRecognizer->feature(aligned_face, feature);
+
+        featureMap[fname] = feature.clone();
+    }
+
+    for (auto item: gtMap)
+    {
+        Mat feature1 = featureMap[item.first.first];
+        Mat feature2 = featureMap[item.first.second];
+        int label = item.second;
+
+        double cos_score = faceRecognizer->match(feature1, feature2, FaceRecognizerSF::DisType::FR_COSINE);
+        double L2_score = faceRecognizer->match(feature1, feature2, FaceRecognizerSF::DisType::FR_NORM_L2);
+
+        EXPECT_TRUE(label == 0 ? cos_score <= cosine_similar_thresh : cos_score > cosine_similar_thresh) << "Cosine match result of images " << item.first.first << " and " << item.first.second << " is different from ground truth (score: "<< cos_score <<";Thresh: "<< cosine_similar_thresh <<").";
+        EXPECT_TRUE(label == 0 ? L2_score > l2norm_similar_thresh : L2_score <= l2norm_similar_thresh) << "L2norm match result of images " << item.first.first << " and " << item.first.second << " is different from ground truth (score: "<< L2_score <<";Thresh: "<< l2norm_similar_thresh <<").";
+    }
+}
+
+}} // namespace
diff --git a/modules/objdetect/test/test_main.cpp b/modules/objdetect/test/test_main.cpp

index 93e4d28..4031f05 100644 (file)
--- a/modules/objdetect/test/test_main.cpp
+++ b/modules/objdetect/test/test_main.cpp
@@ -7,4 +7,19 @@
      #include <hpx/hpx_main.hpp>
  #endif
  
-CV_TEST_MAIN("cv")
+static
+void initTests()
+{
+#ifdef HAVE_OPENCV_DNN
+    const char* extraTestDataPath =
+#ifdef WINRT
+        NULL;
+#else
+        getenv("OPENCV_DNN_TEST_DATA_PATH");
+#endif
+    if (extraTestDataPath)
+        cvtest::addDataSearchPath(extraTestDataPath);
+#endif  // HAVE_OPENCV_DNN
+}
+
+CV_TEST_MAIN("cv", initTests())
diff --git a/modules/ts/include/opencv2/ts.hpp b/modules/ts/include/opencv2/ts.hpp

index 394bc6e..2e7a241 100644 (file)
--- a/modules/ts/include/opencv2/ts.hpp
+++ b/modules/ts/include/opencv2/ts.hpp
@@ -37,6 +37,7 @@
  #include <iterator>
  #include <limits>
  #include <algorithm>
+#include <set>
  
  
  #ifndef OPENCV_32BIT_CONFIGURATION
diff --git a/samples/dnn/CMakeLists.txt b/samples/dnn/CMakeLists.txt

index 209fbb5..9a1aeed 100644 (file)
--- a/samples/dnn/CMakeLists.txt
+++ b/samples/dnn/CMakeLists.txt
@@ -4,6 +4,7 @@ set(OPENCV_DNN_SAMPLES_REQUIRED_DEPS
    opencv_core
    opencv_imgproc
    opencv_dnn
+  opencv_objdetect
    opencv_video
    opencv_imgcodecs
    opencv_videoio
diff --git a/samples/dnn/face_detect.cpp b/samples/dnn/face_detect.cpp

new file mode 100644 (file)

index 0000000..8d91a10
--- /dev/null
+++ b/samples/dnn/face_detect.cpp
@@ -0,0 +1,132 @@
+#include <opencv2/dnn.hpp>
+#include <opencv2/imgproc.hpp>
+#include <opencv2/highgui.hpp>
+#include <opencv2/objdetect.hpp>
+
+#include <iostream>
+
+using namespace cv;
+using namespace std;
+
+static Mat visualize(Mat input, Mat faces, int thickness=2)
+{
+    Mat output = input.clone();
+    for (int i = 0; i < faces.rows; i++)
+    {
+        // Print results
+        cout << "Face " << i
+             << ", top-left coordinates: (" << faces.at<float>(i, 0) << ", " << faces.at<float>(i, 1) << "), "
+             << "box width: " << faces.at<float>(i, 2)  << ", box height: " << faces.at<float>(i, 3) << ", "
+             << "score: " << faces.at<float>(i, 14) << "\n";
+
+        // Draw bounding box
+        rectangle(output, Rect2i(int(faces.at<float>(i, 0)), int(faces.at<float>(i, 1)), int(faces.at<float>(i, 2)), int(faces.at<float>(i, 3))), Scalar(0, 255, 0), thickness);
+        // Draw landmarks
+        circle(output, Point2i(int(faces.at<float>(i, 4)),  int(faces.at<float>(i, 5))),  2, Scalar(255,   0,   0), thickness);
+        circle(output, Point2i(int(faces.at<float>(i, 6)),  int(faces.at<float>(i, 7))),  2, Scalar(  0,   0, 255), thickness);
+        circle(output, Point2i(int(faces.at<float>(i, 8)),  int(faces.at<float>(i, 9))),  2, Scalar(  0, 255,   0), thickness);
+        circle(output, Point2i(int(faces.at<float>(i, 10)), int(faces.at<float>(i, 11))), 2, Scalar(255,   0, 255), thickness);
+        circle(output, Point2i(int(faces.at<float>(i, 12)), int(faces.at<float>(i, 13))), 2, Scalar(  0, 255, 255), thickness);
+    }
+    return output;
+}
+
+int main(int argc, char ** argv)
+{
+    CommandLineParser parser(argc, argv,
+        "{help  h           |            | Print this message.}"
+        "{input i           |            | Path to the input image. Omit for detecting on default camera.}"
+        "{model m           | yunet.onnx | Path to the model. Download yunet.onnx in https://github.com/ShiqiYu/libfacedetection.train/tree/master/tasks/task1/onnx.}"
+        "{score_threshold   | 0.9        | Filter out faces of score < score_threshold.}"
+        "{nms_threshold     | 0.3        | Suppress bounding boxes of iou >= nms_threshold.}"
+        "{top_k             | 5000       | Keep top_k bounding boxes before NMS.}"
+        "{save  s           | false      | Set true to save results. This flag is invalid when using camera.}"
+        "{vis   v           | true       | Set true to open a window for result visualization. This flag is invalid when using camera.}"
+    );
+    if (argc == 1 || parser.has("help"))
+    {
+        parser.printMessage();
+        return -1;
+    }
+
+    String modelPath = parser.get<String>("model");
+
+    float scoreThreshold = parser.get<float>("score_threshold");
+    float nmsThreshold = parser.get<float>("nms_threshold");
+    int topK = parser.get<int>("top_k");
+
+    bool save = parser.get<bool>("save");
+    bool vis = parser.get<bool>("vis");
+
+    // Initialize FaceDetectorYN
+    Ptr<FaceDetectorYN> detector = FaceDetectorYN::create(modelPath, "", Size(320, 320), scoreThreshold, nmsThreshold, topK);
+
+    // If input is an image
+    if (parser.has("input"))
+    {
+        String input = parser.get<String>("input");
+        Mat image = imread(input);
+
+        // Set input size before inference
+        detector->setInputSize(image.size());
+
+        // Inference
+        Mat faces;
+        detector->detect(image, faces);
+
+        // Draw results on the input image
+        Mat result = visualize(image, faces);
+
+        // Save results if save is true
+        if(save)
+        {
+            cout << "Results saved to result.jpg\n";
+            imwrite("result.jpg", result);
+        }
+
+        // Visualize results
+        if (vis)
+        {
+            namedWindow(input, WINDOW_AUTOSIZE);
+            imshow(input, result);
+            waitKey(0);
+        }
+    }
+    else
+    {
+        int deviceId = 0;
+        VideoCapture cap;
+        cap.open(deviceId, CAP_ANY);
+        int frameWidth = int(cap.get(CAP_PROP_FRAME_WIDTH));
+        int frameHeight = int(cap.get(CAP_PROP_FRAME_HEIGHT));
+        detector->setInputSize(Size(frameWidth, frameHeight));
+
+        Mat frame;
+        TickMeter tm;
+        String msg = "FPS: ";
+        while(waitKey(1) < 0) // Press any key to exit
+        {
+            // Get frame
+            if (!cap.read(frame))
+            {
+                cerr << "No frames grabbed!\n";
+                break;
+            }
+
+            // Inference
+            Mat faces;
+            tm.start();
+            detector->detect(frame, faces);
+            tm.stop();
+
+            // Draw results on the input image
+            Mat result = visualize(frame, faces);
+            putText(result, msg + to_string(tm.getFPS()), Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0));
+
+            // Visualize results
+            imshow("Live", result);
+
+            tm.reset();
+        }
+    }
+}
+\ No newline at end of file
diff --git a/samples/dnn/face_detect.py b/samples/dnn/face_detect.py

new file mode 100644 (file)

index 0000000..65069d6
--- /dev/null
+++ b/samples/dnn/face_detect.py
@@ -0,0 +1,101 @@
+import argparse
+
+import numpy as np
+import cv2 as cv
+
+def str2bool(v):
+    if v.lower() in ['on', 'yes', 'true', 'y', 't']:
+        return True
+    elif v.lower() in ['off', 'no', 'false', 'n', 'f']:
+        return False
+    else:
+        raise NotImplementedError
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--input', '-i', type=str, help='Path to the input image.')
+parser.add_argument('--model', '-m', type=str, default='yunet.onnx', help='Path to the model. Download the model at https://github.com/ShiqiYu/libfacedetection.train/tree/master/tasks/task1/onnx.')
+parser.add_argument('--score_threshold', type=float, default=0.9, help='Filtering out faces of score < score_threshold.')
+parser.add_argument('--nms_threshold', type=float, default=0.3, help='Suppress bounding boxes of iou >= nms_threshold.')
+parser.add_argument('--top_k', type=int, default=5000, help='Keep top_k bounding boxes before NMS.')
+parser.add_argument('--save', '-s', type=str2bool, default=False, help='Set true to save results. This flag is invalid when using camera.')
+parser.add_argument('--vis', '-v', type=str2bool, default=True, help='Set true to open a window for result visualization. This flag is invalid when using camera.')
+args = parser.parse_args()
+
+def visualize(input, faces, thickness=2):
+    output = input.copy()
+    if faces[1] is not None:
+        for idx, face in enumerate(faces[1]):
+            print('Face {}, top-left coordinates: ({:.0f}, {:.0f}), box width: {:.0f}, box height {:.0f}, score: {:.2f}'.format(idx, face[0], face[1], face[2], face[3], face[-1]))
+
+            coords = face[:-1].astype(np.int32)
+            cv.rectangle(output, (coords[0], coords[1]), (coords[0]+coords[2], coords[1]+coords[3]), (0, 255, 0), 2)
+            cv.circle(output, (coords[4], coords[5]), 2, (255, 0, 0), 2)
+            cv.circle(output, (coords[6], coords[7]), 2, (0, 0, 255), 2)
+            cv.circle(output, (coords[8], coords[9]), 2, (0, 255, 0), 2)
+            cv.circle(output, (coords[10], coords[11]), 2, (255, 0, 255), 2)
+            cv.circle(output, (coords[12], coords[13]), 2, (0, 255, 255), 2)
+    return output
+
+if __name__ == '__main__':
+
+    # Instantiate FaceDetectorYN
+    detector = cv.FaceDetectorYN.create(
+        args.model,
+        "",
+        (320, 320),
+        args.score_threshold,
+        args.nms_threshold,
+        args.top_k
+    )
+
+    # If input is an image
+    if args.input is not None:
+        image = cv.imread(args.input)
+
+        # Set input size before inference
+        detector.setInputSize((image.shape[1], image.shape[0]))
+
+        # Inference
+        faces = detector.detect(image)
+
+        # Draw results on the input image
+        result = visualize(image, faces)
+
+        # Save results if save is true
+        if args.save:
+            print('Resutls saved to result.jpg\n')
+            cv.imwrite('result.jpg', result)
+
+        # Visualize results in a new window
+        if args.vis:
+            cv.namedWindow(args.input, cv.WINDOW_AUTOSIZE)
+            cv.imshow(args.input, result)
+            cv.waitKey(0)
+    else: # Omit input to call default camera
+        deviceId = 0
+        cap = cv.VideoCapture(deviceId)
+        frameWidth = int(cap.get(cv.CAP_PROP_FRAME_WIDTH))
+        frameHeight = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT))
+        detector.setInputSize([frameWidth, frameHeight])
+
+        tm = cv.TickMeter()
+        while cv.waitKey(1) < 0:
+            hasFrame, frame = cap.read()
+            if not hasFrame:
+                print('No frames grabbed!')
+                break
+
+            # Inference
+            tm.start()
+            faces = detector.detect(frame) # faces is a tuple
+            tm.stop()
+
+            # Draw results on the input image
+            frame = visualize(frame, faces)
+
+            cv.putText(frame, 'FPS: {}'.format(tm.getFPS()), (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0))
+
+            # Visualize results in a new Window
+            cv.imshow('Live', frame)
+
+            tm.reset()
+\ No newline at end of file
diff --git a/samples/dnn/face_match.cpp b/samples/dnn/face_match.cpp

new file mode 100644 (file)

index 0000000..f24134b
--- /dev/null
+++ b/samples/dnn/face_match.cpp
@@ -0,0 +1,103 @@
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html.
+
+#include "opencv2/dnn.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/highgui.hpp"
+
+#include <iostream>
+
+#include "opencv2/objdetect.hpp"
+
+
+using namespace cv;
+using namespace std;
+
+
+int main(int argc, char ** argv)
+{
+    if (argc != 5)
+    {
+        std::cerr << "Usage " << argv[0] << ": "
+                  << "<det_onnx_path> "
+                  << "<reg_onnx_path> "
+                  << "<image1>"
+                  << "<image2>\n";
+        return -1;
+    }
+
+    String det_onnx_path = argv[1];
+    String reg_onnx_path = argv[2];
+    String image1_path = argv[3];
+    String image2_path = argv[4];
+    std::cout<<image1_path<<" "<<image2_path<<std::endl;
+    Mat image1 = imread(image1_path);
+    Mat image2 = imread(image2_path);
+
+    float score_thresh = 0.9f;
+    float nms_thresh = 0.3f;
+    double cosine_similar_thresh = 0.363;
+    double l2norm_similar_thresh = 1.128;
+    int top_k = 5000;
+
+    // Initialize FaceDetector
+    Ptr<FaceDetectorYN> faceDetector;
+
+    faceDetector = FaceDetectorYN::create(det_onnx_path, "", image1.size(), score_thresh, nms_thresh, top_k);
+    Mat faces_1;
+    faceDetector->detect(image1, faces_1);
+    if (faces_1.rows < 1)
+    {
+        std::cerr << "Cannot find a face in " << image1_path << "\n";
+        return -1;
+    }
+
+    faceDetector = FaceDetectorYN::create(det_onnx_path, "", image2.size(), score_thresh, nms_thresh, top_k);
+    Mat faces_2;
+    faceDetector->detect(image2, faces_2);
+    if (faces_2.rows < 1)
+    {
+        std::cerr << "Cannot find a face in " << image2_path << "\n";
+        return -1;
+    }
+
+    // Initialize FaceRecognizerSF
+    Ptr<FaceRecognizerSF> faceRecognizer = FaceRecognizerSF::create(reg_onnx_path, "");
+
+
+    Mat aligned_face1, aligned_face2;
+    faceRecognizer->alignCrop(image1, faces_1.row(0), aligned_face1);
+    faceRecognizer->alignCrop(image2, faces_2.row(0), aligned_face2);
+
+    Mat feature1, feature2;
+    faceRecognizer->feature(aligned_face1, feature1);
+    feature1 = feature1.clone();
+    faceRecognizer->feature(aligned_face2, feature2);
+    feature2 = feature2.clone();
+
+    double cos_score = faceRecognizer->match(feature1, feature2, FaceRecognizerSF::DisType::FR_COSINE);
+    double L2_score = faceRecognizer->match(feature1, feature2, FaceRecognizerSF::DisType::FR_NORM_L2);
+
+    if(cos_score >= cosine_similar_thresh)
+    {
+        std::cout << "They have the same identity;";
+    }
+    else
+    {
+        std::cout << "They have different identities;";
+    }
+    std::cout << " Cosine Similarity: " << cos_score << ", threshold: " << cosine_similar_thresh << ". (higher value means higher similarity, max 1.0)\n";
+
+    if(L2_score <= l2norm_similar_thresh)
+    {
+        std::cout << "They have the same identity;";
+    }
+    else
+    {
+        std::cout << "They have different identities.";
+    }
+    std::cout << " NormL2 Distance: " << L2_score << ", threshold: " << l2norm_similar_thresh << ". (lower value means higher similarity, min 0.0)\n";
+
+    return 0;
+}
diff --git a/samples/dnn/face_match.py b/samples/dnn/face_match.py

new file mode 100644 (file)

index 0000000..b36c9f6
--- /dev/null
+++ b/samples/dnn/face_match.py
@@ -0,0 +1,57 @@
+import argparse
+
+import numpy as np
+import cv2 as cv
+
+parser = argparse.ArgumentParser()
+parser.add_argument('--input1', '-i1', type=str, help='Path to the input image1.')
+parser.add_argument('--input2', '-i2', type=str, help='Path to the input image2.')
+parser.add_argument('--face_detection_model', '-fd', type=str, help='Path to the face detection model. Download the model at https://github.com/ShiqiYu/libfacedetection.train/tree/master/tasks/task1/onnx.')
+parser.add_argument('--face_recognition_model', '-fr', type=str, help='Path to the face recognition model. Download the model at https://drive.google.com/file/d/1ClK9WiB492c5OZFKveF3XiHCejoOxINW/view.')
+args = parser.parse_args()
+
+# Read the input image
+img1 = cv.imread(args.input1)
+img2 = cv.imread(args.input2)
+
+# Instantiate face detector and recognizer
+detector = cv.FaceDetectorYN.create(
+    args.face_detection_model,
+    "",
+    (img1.shape[1], img1.shape[0])
+)
+recognizer = cv.FaceRecognizerSF.create(
+    args.face_recognition_model,
+    ""
+)
+
+# Detect face
+detector.setInputSize((img1.shape[1], img1.shape[0]))
+face1 = detector.detect(img1)
+detector.setInputSize((img2.shape[1], img2.shape[0]))
+face2 = detector.detect(img2)
+assert face1[1].shape[0] > 0, 'Cannot find a face in {}'.format(args.input1)
+assert face2[1].shape[0] > 0, 'Cannot find a face in {}'.format(args.input2)
+
+# Align faces
+face1_align = recognizer.alignCrop(img1, face1[1][0])
+face2_align = recognizer.alignCrop(img2, face2[1][0])
+
+# Extract features
+face1_feature = recognizer.faceFeature(face1_align)
+face2_feature = recognizer.faceFeature(face2_align)
+
+# Calculate distance (0: cosine, 1: L2)
+cosine_similarity_threshold = 0.363
+cosine_score = recognizer.faceMatch(face1_feature, face2_feature, 0)
+msg = 'different identities'
+if cosine_score >= cosine_similarity_threshold:
+    msg = 'the same identity'
+print('They have {}. Cosine Similarity: {}, threshold: {} (higher value means higher similarity, max 1.0).'.format(msg, cosine_score, cosine_similarity_threshold))
+
+l2_similarity_threshold = 1.128
+l2_score = recognizer.faceMatch(face1_feature, face2_feature, 1)
+msg = 'different identities'
+if l2_score <= l2_similarity_threshold:
+    msg = 'the same identity'
+print('They have {}. NormL2 Distance: {}, threshold: {} (lower value means higher similarity, min 0.0).'.format(msg, l2_score, l2_similarity_threshold))
+\ No newline at end of file
diff --git a/samples/dnn/results/audrybt1.jpg b/samples/dnn/results/audrybt1.jpg

new file mode 100644 (file)

index 0000000..5829f82

Binary files /dev/null and b/samples/dnn/results/audrybt1.jpg differ
author	Yuantao Feng <yuantao.feng@outlook.com>
	Fri, 8 Oct 2021 19:13:49 +0000 (03:13 +0800)
committer	GitHub <noreply@github.com>
	Fri, 8 Oct 2021 19:13:49 +0000 (19:13 +0000)
doc/tutorials/dnn/dnn_face/dnn_face.markdown	[new file with mode: 0644]	patch \| blob
doc/tutorials/dnn/dnn_text_spotting/dnn_text_spotting.markdown		patch \| blob \| history
doc/tutorials/dnn/table_of_content_dnn.markdown		patch \| blob \| history
modules/objdetect/CMakeLists.txt		patch \| blob \| history
modules/objdetect/include/opencv2/objdetect.hpp		patch \| blob \| history
modules/objdetect/include/opencv2/objdetect/face.hpp	[new file with mode: 0644]	patch \| blob
modules/objdetect/src/face_detect.cpp	[new file with mode: 0644]	patch \| blob
modules/objdetect/src/face_recognize.cpp	[new file with mode: 0644]	patch \| blob
modules/objdetect/test/test_face.cpp	[new file with mode: 0644]	patch \| blob
modules/objdetect/test/test_main.cpp		patch \| blob \| history
modules/ts/include/opencv2/ts.hpp		patch \| blob \| history
samples/dnn/CMakeLists.txt		patch \| blob \| history
samples/dnn/face_detect.cpp	[new file with mode: 0644]	patch \| blob
samples/dnn/face_detect.py	[new file with mode: 0644]	patch \| blob
samples/dnn/face_match.cpp	[new file with mode: 0644]	patch \| blob
samples/dnn/face_match.py	[new file with mode: 0644]	patch \| blob
samples/dnn/results/audrybt1.jpg	[new file with mode: 0644]	patch \| blob