doc/tutorials/gapi/face_beautification/face_beautification.markdown

   1 # Implementing a face beautification algorithm with G-API {#tutorial_gapi_face_beautification}
   2
   3 [TOC]
   4
   5 # Introduction {#gapi_fb_intro}
   6
   7 In this tutorial you will learn:
   8 * Basics of a sample face beautification algorithm;
   9 * How to infer different networks inside a pipeline with G-API;
  10 * How to run a G-API pipeline on a video stream.
  11
  12 ## Prerequisites {#gapi_fb_prerec}
  13
  14 This sample requires:
  15 - PC with GNU/Linux or Microsoft Windows (Apple macOS is supported but
  16   was not tested);
  17 - OpenCV 4.2 or later built with Intel® Distribution of [OpenVINO™
  18   Toolkit](https://docs.openvinotoolkit.org/) (building with [Intel®
  19   TBB](https://www.threadingbuildingblocks.org/intel-tbb-tutorial) is
  20   a plus);
  21 - The following topologies from OpenVINO™ Toolkit [Open Model
  22   Zoo](https://github.com/opencv/open_model_zoo):
  23   - `face-detection-adas-0001`;
  24   - `facial-landmarks-35-adas-0002`.
  25
  26 ## Face beautification algorithm {#gapi_fb_algorithm}
  27
  28 We will implement a simple face beautification algorithm using a
  29 combination of modern Deep Learning techniques and traditional
  30 Computer Vision. The general idea behind the algorithm is to make
  31 face skin smoother while preserving face features like eyes or a
  32 mouth contrast. The algorithm identifies parts of the face using a DNN
  33 inference, applies different filters to the parts found, and then
  34 combines it into the final result using basic image arithmetics:
  35
  36 \dot
  37 strict digraph Pipeline {
  38   node [shape=record fontname=Helvetica fontsize=10 style=filled color="#4c7aa4" fillcolor="#5b9bd5" fontcolor="white"];
  39   edge [color="#62a8e7"];
  40   ordering="out";
  41   splines=ortho;
  42   rankdir=LR;
  43
  44   input [label="Input"];
  45   fd [label="Face\ndetector"];
  46   bgMask [label="Generate\nBG mask"];
  47   unshMask [label="Unsharp\nmask"];
  48   bilFil [label="Bilateral\nfilter"];
  49   shMask [label="Generate\nsharp mask"];
  50   blMask [label="Generate\nblur mask"];
  51   mul_1 [label="*" fontsize=24 shape=circle labelloc=b];
  52   mul_2 [label="*" fontsize=24 shape=circle labelloc=b];
  53   mul_3 [label="*" fontsize=24 shape=circle labelloc=b];
  54
  55   subgraph cluster_0 {
  56     style=dashed
  57     fontsize=10
  58     ld [label="Landmarks\ndetector"];
  59     label="for each face"
  60   }
  61
  62   sum_1 [label="+" fontsize=24 shape=circle];
  63   out [label="Output"];
  64
  65   temp_1 [style=invis shape=point width=0];
  66   temp_2 [style=invis shape=point width=0];
  67   temp_3 [style=invis shape=point width=0];
  68   temp_4 [style=invis shape=point width=0];
  69   temp_5 [style=invis shape=point width=0];
  70   temp_6 [style=invis shape=point width=0];
  71   temp_7 [style=invis shape=point width=0];
  72   temp_8 [style=invis shape=point width=0];
  73   temp_9 [style=invis shape=point width=0];
  74
  75   input -> temp_1 [arrowhead=none]
  76   temp_1 -> fd -> ld
  77   ld -> temp_4 [arrowhead=none]
  78   temp_4 -> bgMask
  79   bgMask -> mul_1 -> sum_1 -> out
  80
  81   temp_4 -> temp_5 -> temp_6 [arrowhead=none constraint=none]
  82   ld -> temp_2 -> temp_3 [style=invis constraint=none]
  83
  84   temp_1 -> {unshMask, bilFil}
  85   fd -> unshMask [style=invis constraint=none]
  86   unshMask -> bilFil [style=invis constraint=none]
  87
  88   bgMask -> shMask [style=invis constraint=none]
  89   shMask -> blMask [style=invis constraint=none]
  90   mul_1 -> mul_2 [style=invis constraint=none]
  91   temp_5 -> shMask -> mul_2
  92   temp_6 -> blMask -> mul_3
  93
  94   unshMask -> temp_2 -> temp_5 [style=invis]
  95   bilFil -> temp_3 -> temp_6 [style=invis]
  96
  97   mul_2 -> temp_7 [arrowhead=none]
  98   mul_3 -> temp_8 [arrowhead=none]
  99
 100   temp_8 -> temp_7 [arrowhead=none constraint=none]
 101   temp_7 -> sum_1 [constraint=none]
 102
 103   unshMask -> mul_2 [constraint=none]
 104   bilFil -> mul_3 [constraint=none]
 105   temp_1 -> mul_1 [constraint=none]
 106 }
 107 \enddot
 108
 109 Briefly the algorithm is described as follows:
 110 - Input image \f$I\f$ is passed to unsharp mask and bilateral filters
 111   (\f$U\f$ and \f$L\f$ respectively);
 112 - Input image \f$I\f$ is passed to an SSD-based face detector;
 113 - SSD result (a \f$[1 \times 1 \times 200 \times 7]\f$ blob) is parsed
 114   and converted to an array of faces;
 115 - Every face is passed to a landmarks detector;
 116 - Based on landmarks found for every face, three image masks are
 117   generated:
 118   - A background mask \f$b\f$ -- indicating which areas from the
 119     original image to keep as-is;
 120   - A face part mask \f$p\f$ -- identifying regions to preserve
 121     (sharpen).
 122   - A face skin mask \f$s\f$ -- identifying regions to blur;
 123 - The final result \f$O\f$ is a composition of features above
 124   calculated as \f$O = b*I + p*U + s*L\f$.
 125
 126 Generating face element masks based on a limited set of features (just
 127 35 per face, including all its parts) is not very trivial and is
 128 described in the sections below.
 129
 130 # Constructing a G-API pipeline {#gapi_fb_pipeline}
 131
 132 ## Declaring Deep Learning topologies {#gapi_fb_decl_nets}
 133
 134 This sample is using two DNN detectors. Every network takes one input
 135 and produces one output. In G-API, networks are defined with macro
 136 G_API_NET():
 137
 138 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp net_decl
 139
 140 To get more information, see
 141 [Declaring Deep Learning topologies](@ref gapi_ifd_declaring_nets)
 142 described in the "Face Analytics pipeline" tutorial.
 143
 144 ## Describing the processing graph {#gapi_fb_ppline}
 145
 146 The code below generates a graph for the algorithm above:
 147
 148 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp ppl
 149
 150 The resulting graph is a mixture of G-API's standard operations,
 151 user-defined operations (namespace `custom::`), and DNN inference.
 152 The generic function `cv::gapi::infer<>()` allows to trigger inference
 153 within the pipeline; networks to infer are specified as template
 154 parameters.  The sample code is using two versions of `cv::gapi::infer<>()`:
 155 - A frame-oriented one is used to detect faces on the input frame.
 156 - An ROI-list oriented one is used to run landmarks inference on a
 157   list of faces -- this version produces an array of landmarks per
 158   every face.
 159
 160 More on this in "Face Analytics pipeline"
 161 ([Building a GComputation](@ref gapi_ifd_gcomputation) section).
 162
 163 ## Unsharp mask in G-API {#gapi_fb_unsh}
 164
 165 The unsharp mask \f$U\f$ for image \f$I\f$ is defined as:
 166
 167 \f[U = I - s * L(M(I)),\f]
 168
 169 where \f$M()\f$ is a median filter, \f$L()\f$ is the Laplace operator,
 170 and \f$s\f$ is a strength coefficient. While G-API doesn't provide
 171 this function out-of-the-box, it is expressed naturally with the
 172 existing G-API operations:
 173
 174 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp unsh
 175
 176 Note that the code snipped above is a regular C++ function defined
 177 with G-API types. Users can write functions like this to simplify
 178 graph construction; when called, this function just puts the relevant
 179 nodes to the pipeline it is used in.
 180
 181 # Custom operations {#gapi_fb_proc}
 182
 183 The face beautification graph is using custom operations
 184 extensively. This chapter focuses on the most interesting kernels,
 185 refer to [G-API Kernel API](@ref gapi_kernel_api) for general
 186 information on defining operations and implementing kernels in G-API.
 187
 188 ## Face detector post-processing {#gapi_fb_face_detect}
 189
 190 A face detector output is converted to an array of faces with the
 191 following kernel:
 192
 193 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp vec_ROI
 194 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp fd_pp
 195
 196 ## Facial landmarks post-processing {#gapi_fb_landm_detect}
 197
 198 The algorithm infers locations of face elements (like the eyes, the mouth
 199 and the head contour itself) using a generic facial landmarks detector
 200 (<a href="https://github.com/opencv/open_model_zoo/blob/master/models/intel/facial-landmarks-35-adas-0002/description/facial-landmarks-35-adas-0002.md">details</a>)
 201 from OpenVINO™ Open Model Zoo. However, the detected landmarks as-is are not
 202 enough to generate masks --- this operation requires regions of interest on
 203 the face represented by closed contours, so some interpolation is applied to
 204 get them. This landmarks
 205 processing and interpolation is performed by the following kernel:
 206
 207 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp ld_pp_cnts
 208
 209 The kernel takes two arrays of denormalized landmarks coordinates and
 210 returns an array of elements' closed contours and an array of faces'
 211 closed contours; in other words, outputs are, the first, an array of
 212 contours of image areas to be sharpened and, the second, another one
 213 to be smoothed.
 214
 215 Here and below `Contour` is a vector of points.
 216
 217 ### Getting an eye contour {#gapi_fb_ld_eye}
 218
 219 Eye contours are estimated with the following function:
 220
 221 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp ld_pp_incl
 222 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp ld_pp_eye
 223
 224 Briefly, this function restores the bottom side of an eye by a
 225 half-ellipse based on two points in left and right eye
 226 corners. In fact, `cv::ellipse2Poly()` is used to approximate the eye region, and
 227 the function only defines ellipse parameters based on just two points:
 228 - The ellipse center and the \f$X\f$ half-axis calculated by two eye Points;
 229 - The \f$Y\f$ half-axis calculated according to the assumption that an average
 230 eye width is \f$1/3\f$ of its length;
 231 - The start and the end angles which are 0 and 180 (refer to
 232   `cv::ellipse()` documentation);
 233 - The angle delta: how much points to produce in the contour;
 234 - The inclination angle of the axes.
 235
 236 The use of the `atan2()` instead of just `atan()` in function
 237 `custom::getLineInclinationAngleDegrees()` is essential as it allows to
 238 return a negative value depending on the `x` and the `y` signs so we
 239 can get the right angle even in case of upside-down face arrangement
 240 (if we put the points in the right order, of course).
 241
 242 ### Getting a forehead contour {#gapi_fb_ld_fhd}
 243
 244 The function  approximates the forehead contour:
 245
 246 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp ld_pp_fhd
 247
 248 As we have only jaw points in our detected landmarks, we have to get a
 249 half-ellipse based on three points of a jaw: the leftmost, the
 250 rightmost and the lowest one. The jaw width is assumed to be equal to the
 251 forehead width and the latter is calculated using the left and the
 252 right points. Speaking of the \f$Y\f$ axis, we have no points to get
 253 it directly, and instead assume that the forehead height is about \f$2/3\f$
 254 of the jaw height, which can be figured out from the face center (the
 255 middle between the left and right points) and the lowest jaw point.
 256
 257 ## Drawing masks {#gapi_fb_masks_drw}
 258
 259 When we have all the contours needed, we are able to draw masks:
 260
 261 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp msk_ppline
 262
 263 The steps to get the masks are:
 264 * the "sharp" mask calculation:
 265     * fill the contours that should be sharpened;
 266     * blur that to get the "sharp" mask (`mskSharpG`);
 267 * the "bilateral" mask calculation:
 268     * fill all the face contours fully;
 269     * blur that;
 270     * subtract areas which intersect with the "sharp" mask --- and get the
 271       "bilateral" mask (`mskBlurFinal`);
 272 * the background mask calculation:
 273     * add two previous masks
 274     * set all non-zero pixels of the result as 255 (by `cv::gapi::threshold()`)
 275     * revert the output (by `cv::gapi::bitwise_not`) to get the background
 276       mask (`mskNoFaces`).
 277
 278 # Configuring and running the pipeline {#gapi_fb_comp_args}
 279
 280 Once the graph is fully expressed, we can finally compile it and run
 281 on real data. G-API graph compilation is the stage where the G-API
 282 framework actually understands which kernels and networks to use. This
 283 configuration happens via G-API compilation arguments.
 284
 285 ## DNN parameters {#gapi_fb_comp_args_net}
 286
 287 This sample is using OpenVINO™ Toolkit Inference Engine backend for DL
 288 inference, which is configured the following way:
 289
 290 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp net_param
 291
 292 Every `cv::gapi::ie::Params<>` object is related to the network
 293 specified in its template argument. We should pass there the network
 294 type we have defined in `G_API_NET()` in the early beginning of the
 295 tutorial.
 296
 297 Network parameters are then wrapped in `cv::gapi::NetworkPackage`:
 298
 299 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp netw
 300
 301 More details in "Face Analytics Pipeline"
 302 ([Configuring the pipeline](@ref gapi_ifd_configuration) section).
 303
 304 ## Kernel packages  {#gapi_fb_comp_args_kernels}
 305
 306 In this example we use a lot of custom kernels, in addition to that we
 307 use Fluid backend to optimize out memory for G-API's standard kernels
 308 where applicable. The resulting kernel package is formed like this:
 309
 310 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp kern_pass_1
 311
 312 ## Compiling the streaming pipeline  {#gapi_fb_compiling}
 313
 314 G-API optimizes execution for video streams when compiled in the
 315 "Streaming" mode.
 316
 317 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp str_comp
 318
 319 More on this in "Face Analytics Pipeline"
 320 ([Configuring the pipeline](@ref gapi_ifd_configuration) section).
 321
 322 ## Running the streaming pipeline {#gapi_fb_running}
 323
 324 In order to run the G-API streaming pipeline, all we need is to
 325 specify the input video source, call
 326 `cv::GStreamingCompiled::start()`, and then fetch the pipeline
 327 processing results:
 328
 329 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp str_src
 330 @snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp str_loop
 331
 332 Once results are ready and can be pulled from the pipeline we display
 333 it on the screen and handle GUI events.
 334
 335 See [Running the pipeline](@ref gapi_ifd_running) section
 336 in the "Face Analytics Pipeline" tutorial for more details.
 337
 338 # Conclusion {#gapi_fb_cncl}
 339
 340 The tutorial has two goals: to show the use of brand new features of
 341 G-API introduced in OpenCV 4.2, and give a basic understanding on a
 342 sample face beautification algorithm.
 343
 344 The result of the algorithm application:
 345
 346 ![Face Beautification example](pics/example.jpg)
 347
 348 On the test machine (Intel® Core™ i7-8700) the G-API-optimized video
 349 pipeline outperforms its serial (non-pipelined) version by a factor of
 350 **2.7** -- meaning that for such a non-trivial graph, the proper
 351 pipelining can bring almost 3x increase in performance.
 352
 353 <!---
 354 The idea in general is to implement a real-time video stream processing that
 355 detects faces and applies some filters to make them look beautiful (more or
 356 less). The pipeline is the following:
 357
 358 Two topologies from OMZ have been used in this sample: the
 359 <a href="https://github.com/opencv/open_model_zoo/tree/master/models/intel
 360 /face-detection-adas-0001">face-detection-adas-0001</a>
 361 and the
 362 <a href="https://github.com/opencv/open_model_zoo/blob/master/models/intel
 363 /facial-landmarks-35-adas-0002/description/facial-landmarks-35-adas-0002.md">
 364 facial-landmarks-35-adas-0002</a>.
 365
 366 The face detector takes the input image and returns a blob with the shape
 367 [1,1,200,7] after the inference (200 is the maximum number of
 368 faces which can be detected).
 369 In order to process every face individually, we need to convert this output to a
 370 list of regions on the image.
 371
 372 The masks for different filters are built based on facial landmarks, which are
 373 inferred for every face. The result of the inference
 374 is a blob with 35 landmarks: the first 18 of them are facial elements
 375 (eyes, eyebrows, a nose, a mouth) and the last 17 --- a jaw contour. Landmarks
 376 are floating point values of coordinates normalized relatively to an input ROI
 377 (not the original frame). In addition, for the further goals we need contours of
 378 eyes, mouths, faces, etc., not the landmarks. So, post-processing of the Mat is
 379 also required here. The process is split into two parts --- landmarks'
 380 coordinates denormalization to the real pixel coordinates of the source frame
 381 and getting necessary closed contours based on these coordinates.
 382
 383 The last step of processing the inference data is drawing masks using the
 384 calculated contours. In this demo the contours don't need to be pixel accurate,
 385 since masks are blurred with Gaussian filter anyway. Another point that should
 386 be mentioned here is getting
 387 three masks (for areas to be smoothed, for ones to be sharpened and for the
 388 background) which have no intersections with each other; this approach allows to
 389 apply the calculated masks to the corresponding images prepared beforehand and
 390 then just to summarize them to get the output image without any other actions.
 391
 392 As we can see, this algorithm is appropriate to illustrate G-API usage
 393 convenience and efficiency in the context of solving a real CV/DL problem.
 394
 395 (On detector post-proc)
 396 Some points to be mentioned about this kernel implementation:
 397
 398 - It takes a `cv::Mat` from the detector and a `cv::Mat` from the input; it
 399 returns an array of ROI's where faces have been detected.
 400
 401 - `cv::Mat` data parsing by the pointer on a float is used here.
 402
 403 - By far the most important thing here is solving an issue that sometimes
 404 detector returns coordinates located outside of the image; if we pass such an
 405 ROI to be processed, errors in the landmarks detection will occur. The frame box
 406 `borders` is created and then intersected with the face rectangle
 407 (by `operator&()`) to handle such cases and save the ROI which is for sure
 408 inside the frame.
 409
 410 Data parsing after the facial landmarks detector happens according to the same
 411 scheme with inconsiderable adjustments.
 412
 413
 414 ## Possible further improvements
 415
 416 There are some points in the algorithm to be improved.
 417
 418 ### Correct ROI reshaping for meeting conditions required by the facial landmarks detector
 419
 420 The input of the facial landmarks detector is a square ROI, but the face
 421 detector gives non-square rectangles in general. If we let the backend within
 422 Inference-API compress the rectangle to a square by itself, the lack of
 423 inference accuracy can be noticed in some cases.
 424 There is a solution: we can give a describing square ROI instead of the
 425 rectangular one to the landmarks detector, so there will be no need to compress
 426 the ROI, which will lead to accuracy improvement.
 427 Unfortunately, another problem occurs if we do that:
 428 if the rectangular ROI is near the border, a describing square will probably go
 429 out of the frame --- that leads to errors of the landmarks detector.
 430 To aviod such a mistake, we have to implement an algorithm that, firstly,
 431 describes every rectangle by a square, then counts the farthest coordinates
 432 turned up to be outside of the frame and, finally, pads the source image by
 433 borders (e.g. single-colored) with the size counted. It will be safe to take
 434 square ROIs for the facial landmarks detector after that frame adjustment.
 435
 436 ### Research for the best parameters (used in GaussianBlur() or unsharpMask(), etc.)
 437
 438 ### Parameters autoscaling
 439
 440 -->