doc/tutorials/videoio/video-input-psnr-ssim/video_input_psnr_ssim.markdown

   1 Video Input with OpenCV and similarity measurement {#tutorial_video_input_psnr_ssim}
   2 ==================================================
   3
   4 @next_tutorial{tutorial_video_write}
   5
   6 Goal
   7 ----
   8
   9 Today it is common to have a digital video recording system at your disposal. Therefore, you will
  10 eventually come to the situation that you no longer process a batch of images, but video streams.
  11 These may be of two kinds: real-time image feed (in the case of a webcam) or prerecorded and hard
  12 disk drive stored files. Luckily OpenCV treats these two in the same manner, with the same C++
  13 class. So here's what you'll learn in this tutorial:
  14
  15 -   How to open and read video streams
  16 -   Two ways for checking image similarity: PSNR and SSIM
  17
  18 The source code
  19 ---------------
  20
  21 As a test case where to show off these using OpenCV I've created a small program that reads in two
  22 video files and performs a similarity check between them. This is something you could use to check
  23 just how well a new video compressing algorithms works. Let there be a reference (original) video
  24 like [this small Megamind clip
  25 ](https://github.com/opencv/opencv/tree/3.4/samples/data/Megamind.avi) and [a compressed
  26 version of it ](https://github.com/opencv/opencv/tree/3.4/samples/data/Megamind_bugy.avi).
  27 You may also find the source code and these video file in the
  28 `samples/data` folder of the OpenCV source library.
  29
  30 @add_toggle_cpp
  31 @include cpp/tutorial_code/videoio/video-input-psnr-ssim/video-input-psnr-ssim.cpp
  32 @end_toggle
  33
  34 @add_toggle_python
  35 @include samples/python/tutorial_code/videoio/video-input-psnr-ssim.py
  36 @end_toggle
  37
  38 How to read a video stream (online-camera or offline-file)?
  39 -----------------------------------------------------------
  40
  41 Essentially, all the functionalities required for video manipulation is integrated in the @ref cv::VideoCapture
  42 C++ class. This on itself builds on the FFmpeg open source library. This is a basic
  43 dependency of OpenCV so you shouldn't need to worry about this. A video is composed of a succession
  44 of images, we refer to these in the literature as frames. In case of a video file there is a *frame
  45 rate* specifying just how long is between two frames. While for the video cameras usually there is a
  46 limit of just how many frames they can digitize per second, this property is less important as at
  47 any time the camera sees the current snapshot of the world.
  48
  49 The first task you need to do is to assign to a @ref cv::VideoCapture class its source. You can do
  50 this either via the @ref cv::VideoCapture::VideoCapture or its @ref cv::VideoCapture::open function. If this argument is an
  51 integer then you will bind the class to a camera, a device. The number passed here is the ID of the
  52 device, assigned by the operating system. If you have a single camera attached to your system its ID
  53 will probably be zero and further ones increasing from there. If the parameter passed to these is a
  54 string it will refer to a video file, and the string points to the location and name of the file.
  55 For example, to the upper source code a valid command line is:
  56 @code{.bash}
  57 video/Megamind.avi video/Megamind_bug.avi  35 10
  58 @endcode
  59 We do a similarity check. This requires a reference and a test case video file. The first two
  60 arguments refer to this. Here we use a relative address. This means that the application will look
  61 into its current working directory and open the video folder and try to find inside this the
  62 *Megamind.avi* and the *Megamind_bug.avi*.
  63 @code{.cpp}
  64 const string sourceReference = argv[1],sourceCompareWith = argv[2];
  65
  66 VideoCapture captRefrnc(sourceReference);
  67 // or
  68 VideoCapture captUndTst;
  69 captUndTst.open(sourceCompareWith);
  70 @endcode
  71 To check if the binding of the class to a video source was successful or not use the @ref cv::VideoCapture::isOpened
  72 function:
  73 @code{.cpp}
  74 if ( !captRefrnc.isOpened())
  75   {
  76   cout  << "Could not open reference " << sourceReference << endl;
  77   return -1;
  78   }
  79 @endcode
  80 Closing the video is automatic when the objects destructor is called. However, if you want to close
  81 it before this you need to call its @ref cv::VideoCapture::release function. The frames of the video are just
  82 simple images. Therefore, we just need to extract them from the @ref cv::VideoCapture object and put
  83 them inside a *Mat* one. The video streams are sequential. You may get the frames one after another
  84 by the @ref cv::VideoCapture::read or the overloaded \>\> operator:
  85 @code{.cpp}
  86 Mat frameReference, frameUnderTest;
  87 captRefrnc >> frameReference;
  88 captUndTst.read(frameUnderTest);
  89 @endcode
  90 The upper read operations will leave empty the *Mat* objects if no frame could be acquired (either
  91 cause the video stream was closed or you got to the end of the video file). We can check this with a
  92 simple if:
  93 @code{.cpp}
  94 if( frameReference.empty()  || frameUnderTest.empty())
  95 {
  96  // exit the program
  97 }
  98 @endcode
  99 A read method is made of a frame grab and a decoding applied on that. You may call explicitly these
 100 two by using the @ref cv::VideoCapture::grab and then the @ref cv::VideoCapture::retrieve functions.
 101
 102 Videos have many-many information attached to them besides the content of the frames. These are
 103 usually numbers, however in some case it may be short character sequences (4 bytes or less). Due to
 104 this to acquire these information there is a general function named @ref cv::VideoCapture::get that returns double
 105 values containing these properties. Use bitwise operations to decode the characters from a double
 106 type and conversions where valid values are only integers. Its single argument is the ID of the
 107 queried property. For example, here we get the size of the frames in the reference and test case
 108 video file; plus the number of frames inside the reference.
 109 @code{.cpp}
 110 Size refS = Size((int) captRefrnc.get(CAP_PROP_FRAME_WIDTH),
 111                  (int) captRefrnc.get(CAP_PROP_FRAME_HEIGHT)),
 112
 113 cout << "Reference frame resolution: Width=" << refS.width << "  Height=" << refS.height
 114      << " of nr#: " << captRefrnc.get(CAP_PROP_FRAME_COUNT) << endl;
 115 @endcode
 116 When you are working with videos you may often want to control these values yourself. To do this
 117 there is a @ref cv::VideoCapture::set function. Its first argument remains the name of the property you want to
 118 change and there is a second of double type containing the value to be set. It will return true if
 119 it succeeds and false otherwise. Good examples for this is seeking in a video file to a given time
 120 or frame:
 121 @code{.cpp}
 122 captRefrnc.set(CAP_PROP_POS_MSEC, 1.2);  // go to the 1.2 second in the video
 123 captRefrnc.set(CAP_PROP_POS_FRAMES, 10); // go to the 10th frame of the video
 124 // now a read operation would read the frame at the set position
 125 @endcode
 126 For properties you can read and change look into the documentation of the @ref cv::VideoCapture::get and
 127 @ref cv::VideoCapture::set functions.
 128
 129 Image similarity - PSNR and SSIM
 130 --------------------------------
 131
 132 We want to check just how imperceptible our video converting operation went, therefore we need a
 133 system to check frame by frame the similarity or differences. The most common algorithm used for
 134 this is the PSNR (aka **Peak signal-to-noise ratio**). The simplest definition of this starts out
 135 from the *mean squad error*. Let there be two images: I1 and I2; with a two dimensional size i and
 136 j, composed of c number of channels.
 137
 138 \f[MSE = \frac{1}{c*i*j} \sum{(I_1-I_2)^2}\f]
 139
 140 Then the PSNR is expressed as:
 141
 142 \f[PSNR = 10 \cdot \log_{10} \left( \frac{MAX_I^2}{MSE} \right)\f]
 143
 144 Here the \f$MAX_I\f$ is the maximum valid value for a pixel. In case of the simple single byte image
 145 per pixel per channel this is 255. When two images are the same the MSE will give zero, resulting in
 146 an invalid divide by zero operation in the PSNR formula. In this case the PSNR is undefined and as
 147 we'll need to handle this case separately. The transition to a logarithmic scale is made because the
 148 pixel values have a very wide dynamic range. All this translated to OpenCV and a C++ function looks
 149 like:
 150
 151 @add_toggle_cpp
 152 @include cpp/tutorial_code/videoio/video-input-psnr-ssim/video-input-psnr-ssim.cpp get-psnr
 153 @end_toggle
 154
 155 @add_toggle_python
 156 @include samples/python/tutorial_code/videoio/video-input-psnr-ssim.py get-psnr
 157 @end_toggle
 158
 159 Typically result values are anywhere between 30 and 50 for video compression, where higher is
 160 better. If the images significantly differ you'll get much lower ones like 15 and so. This
 161 similarity check is easy and fast to calculate, however in practice it may turn out somewhat
 162 inconsistent with human eye perception. The **structural similarity** algorithm aims to correct
 163 this.
 164
 165 Describing the methods goes well beyond the purpose of this tutorial. For that I invite you to read
 166 the article introducing it. Nevertheless, you can get a good image of it by looking at the OpenCV
 167 implementation below.
 168
 169 @note
 170     SSIM is described more in-depth in the: "Z. Wang, A. C. Bovik, H. R. Sheikh and E. P.
 171     Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE
 172     Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004." article.
 173
 174 @add_toggle_cpp
 175 @include cpp/tutorial_code/videoio/video-input-psnr-ssim/video-input-psnr-ssim.cpp get-mssim
 176 @end_toggle
 177
 178 @add_toggle_python
 179 @include samples/python/tutorial_code/videoio/video-input-psnr-ssim.py get-mssim
 180 @end_toggle
 181
 182 This will return a similarity index for each channel of the image. This value is between zero and
 183 one, where one corresponds to perfect fit. Unfortunately, the many Gaussian blurring is quite
 184 costly, so while the PSNR may work in a real time like environment (24 frame per second) this will
 185 take significantly more than to accomplish similar performance results.
 186
 187 Therefore, the source code presented at the start of the tutorial will perform the PSNR measurement
 188 for each frame, and the SSIM only for the frames where the PSNR falls below an input value. For
 189 visualization purpose we show both images in an OpenCV window and print the PSNR and MSSIM values to
 190 the console. Expect to see something like:
 191
 192 ![](images/outputVideoInput.png)
 193
 194 You may observe a runtime instance of this on the [YouTube here](https://www.youtube.com/watch?v=iOcNljutOgg).
 195
 196 @youtube{iOcNljutOgg}