inference-engine/samples/benchmark_app/README.md

   1 # Benchmark Application C++ Demo
   2
   3 This topic demonstrates how to use the Benchmark Application to estimate deep learning inference performance on
   4 supported devices. Performance can be measured for two inference modes: synchronous and asynchronous.
   5
   6 > **NOTE:** This topic describes usage of C++ implementation of the Benchmark Application. For the Python* implementation, refer to [Benchmark Application (Python*)](./inference-engine/ie_bridges/python/sample/benchmark_app/README.md).
   7
   8
   9 ## How It Works
  10
  11 > **NOTE:** To achieve benchmark results similar to the official published results, set CPU frequency to 2.9 GHz and GPU frequency to 1 GHz.
  12
  13 Upon start-up, the application reads command-line parameters and loads a network and images to the Inference Engine
  14 plugin, which is chosen depending on a specified device. The number of infer requests and execution approach depend
  15 on the mode defined with the `-api` command-line parameter.
  16
  17 > **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Specify Input Shapes** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md).
  18
  19 If you run the application in the synchronous mode, it creates one infer request and executes the `Infer` method.
  20 If you run the application in the asynchronous mode, it creates as many infer requests as specified in the `-nireq`
  21 command-line parameter and executes the `StartAsync` method for each of them.
  22
  23 The `Wait` method is used to wait for a previous execution of an infer request to complete. A number of execution steps
  24 is defined by one of the two values:
  25 * Number of iterations specified with the `-niter` command-line argument
  26 * Predefined duration if `-niter` is not specified. Predefined duration value depends on device.
  27
  28 During the execution, the application collects latency for each executed infer request.
  29
  30 Reported latency value is calculated as a median value of all collected latencies. Reported throughput value is reported
  31 in frames per second (FPS) and calculated as a derivative from:
  32 * Reported latency in the Sync mode
  33 * The total execution time in the Async mode
  34
  35 Throughput value also depends on batch size.
  36
  37 The application also collects per-layer Performance Measurement (PM) counters for each executed infer request if you
  38 enable statistics dumping by setting the `-report_type` parameter to one of the possible values:
  39 * `no_counters` report includes configuration options specified, resulting FPS and latency.
  40 * `median_counters` report extends the `no_counters` report and additionally includes median PM counters values for each layer from the network.
  41 * `detailed_counters` report extends the `median_counters` report and additionally includes per-layer PM counters and latency for each executed infer request.
  42
  43 Depending on the type, the report is stored to `benchmark_no_counters_report.csv`, `benchmark_median_counters_report.csv`,
  44 or `benchmark_detailed_counters_report.csv` file located in the path specified in `-report_folder`.
  45
  46 The application also saves executable graph information serialized to a XML file if you specify a path to it with the
  47 `-exec_graph_path` parameter.
  48
  49
  50 ## Running
  51
  52 Running the application with the `-h` option yields the following usage message:
  53 ```sh
  54 ./benchmark_app -h
  55 InferenceEngine:
  56         API version ............ <version>
  57         Build .................. <number>
  58 [ INFO ] Parsing input parameters
  59
  60 benchmark_app [OPTION]
  61 Options:
  62
  63     -h                        Print a usage message
  64     -i "<path>"               Required. Path to a folder with images or to image files.
  65     -m "<path>"               Required. Path to an .xml file with a trained model.
  66     -pp "<path>"              Optional. Path to a plugin folder.
  67     -d "<device>"             Optional. Specify a target device to infer on: CPU, GPU, FPGA, HDDL or MYRIAD. Default value is CPU. Use "-d HETERO:<comma-separated_devices_list>" format to specify HETERO plugin. The application looks for a suitable plugin for the specified device.
  68     -l "<absolute_path>"      Required for CPU custom layers. Absolute path to a shared library with the kernels implementations.
  69           Or
  70     -c "<absolute_path>"      Required for GPU custom kernels. Absolute path to an .xml file with the kernels description.
  71     -api "<sync/async>"       Optional. Enable Sync/Async API. Default value is "async".
  72     -niter "<integer>"        Optional. Number of iterations. If not specified, the number of iterations is calculated depending on a device.
  73     -nireq "<integer>"        Optional. Number of infer requests. Default value is 2.
  74     -b "<integer>"            Optional. Batch size value. If not specified, the batch size value is determined from Intermediate Representation.
  75     -stream_output            Optional. Print progress as a plain text. When specified, an interactive progress bar is replaced with a multiline output.
  76
  77   CPU-specific performance options:
  78     -nthreads "<integer>"     Optional. Number of threads to use for inference on the CPU (including HETERO cases).
  79     -pin "YES"/"NO"           Optional. Enable ("YES" is default value) or disable ("NO") CPU threads pinning for CPU-involved inference.
  80
  81   Statistics dumping options:
  82     -report_type "<type>"     Optional. Enable collecting statistics report. "no_counters" report contains configuration options specified, resulting FPS and latency. "median_counters" report extends "no_counters" report and additionally includes median PM counters values for each layer from the network. "detailed_counters" report extends "median_counters" report and additionally includes per-layer PM counters and latency for each executed infer request.
  83     -report_folder            Optional. Path to a folder where statistics report is stored.
  84     -exec_graph_path          Optional. Path to a file where to store executable graph information serialized.
  85 ```
  86
  87 Running the application with the empty list of options yields the usage message given above and an error message.
  88
  89 You can run the application for one input layer four-dimensional models that support images as input, for example, public
  90 AlexNet and GoogLeNet models. To download the pre-trained models, use the OpenVINO [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/).
  91
  92 > **NOTE**: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md).
  93
  94 For example, to perform inference on CPU in the synchronous mode and get estimated performance metrics for AlexNet model,
  95 run the following command:
  96
  97 ```sh
  98 ./benchmark_app -i <path_to_image>/inputImage.bmp -m <path_to_model>/alexnet_fp32.xml -d CPU -api sync
  99 ```
 100
 101 For the asynchronous mode:
 102 ```sh
 103 ./benchmark_app -i <path_to_image>/inputImage.bmp -m <path_to_model>/alexnet_fp32.xml -d CPU -api async
 104 ```
 105
 106
 107 ## Demo Output
 108
 109 The application outputs latency and throughput. Additionally, if you set the `-report_type` parameter, the application
 110 outputs statistics report. If you set `-exec_graph_path`, the application reports executable graph information serialized.
 111 Progress bar shows the progress of each execution step:
 112
 113 ```
 114 [Step 7/8] Start inference asynchronously (100 async inference executions, 4 inference requests in parallel)
 115 Progress: [....................] 100.00% done
 116
 117 [Step 8/8] Dump statistics report
 118 [ INFO ] statistics report is stored to benchmark_detailed_counters_report.csv
 119 Progress: [....................] 100.00% done
 120
 121 Latency: 73.33 ms
 122 Throughput: 53.28 FPS
 123 ```
 124
 125 All measurements including per-layer PM counters are reported in milliseconds.
 126
 127
 128 ## See Also
 129 * [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md)
 130 * [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md)
 131 * [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader)