inference-engine/thirdparty/clDNN/README.md

   1
   2 # Compute Library for Deep Neural Networks (clDNN)
   3 [![Apache License Version 2.0](https://img.shields.io/badge/license-Apache_2.0-green.svg)](LICENSE)
   4 ![v1.0](https://img.shields.io/badge/1.0-RC1-green.svg)
   5
   6 *Compute Library for Deep Neural Networks* (*clDNN*) is an open source performance
   7 library for Deep Learning (DL) applications intended for acceleration of
   8 DL Inference on Intel® Processor Graphics – including HD Graphics and
   9 Iris® Graphics.
  10 *clDNN* includes highly optimized building blocks for implementation of
  11 convolutional neural networks (CNN) with C and C++ interfaces. We created
  12 this project to enable the DL community to innovate on Intel® processors.
  13
  14 **Usages supported:** Image recognition, image detection, and image segmentation.
  15
  16 **Validated Topologies:** AlexNet\*, VGG(16,19)\*, GoogleNet(v1,v2,v3)\*, ResNet(50,101,152)\* Faster R-CNN\*, Squeezenet\*, SSD_googlenet\*, SSD_VGG\*, PVANET\*, PVANET_REID\*, age_gender\*, FCN\* and yolo\*.
  17
  18 As with any technical preview, APIs may change in future updates.
  19
  20 ## License
  21 clDNN is licensed is licensed under
  22 [Apache License Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).
  23
  24 ### Attached licenses
  25 clDNN uses 3<sup>rd</sup>-party components licensed under following licenses:
  26 - *googletest* under [Google\* License](https://github.com/google/googletest/blob/master/googletest/LICENSE)
  27 - *OpenCL™ ICD and C++ Wrapper* under [Khronos™ License](https://github.com/KhronosGroup/OpenCL-CLHPP/blob/master/LICENSE.txt)
  28 - *RapidJSON* under [Tencent\* License](https://github.com/Tencent/rapidjson/blob/master/license.txt)
  29
  30 ## Documentation
  31 The latest clDNN documentation is at [GitHub pages](https://intel.github.io/clDNN/index.html).
  32
  33 There is also inline documentation available that can be [generated with Doxygen](#generating-documentation).
  34
  35 Accelerate Deep Learning Inference with Intel® Processor Graphics whitepaper [link](https://software.intel.com/en-us/articles/accelerating-deep-learning-inference-with-intel-processor-graphics).
  36
  37 ## Intel® OpenVino™ Toolkit and clDNN
  38
  39 clDNN is released also together with Intel® OpenVino™ Toolkit, which contains:
  40 - *Model Optimizer* a Python*-based command line tool, which imports trained models from popular deep learning frameworks such as Caffe*, TensorFlow*, and Apache MXNet*.
  41 - *Inference Engine* an execution engine which uses a common API to deliver inference solutions on the platform of your choice (for example GPU with clDNN library)
  42
  43 You can find more information [here](https://software.intel.com/en-us/openvino-toolkit/deep-learning-cv).
  44
  45 ## OpenVINO specific changes
  46     New features:
  47     - added `not` activation type
  48     - added `depth_to_space` layer
  49     - new clip options in `detection_output` (cpu impl) and `proposal` layers
  50     - added eltwise `xor` and `squared_diff` operations
  51     - added `gather` layer
  52     - added `bilinear` mode for position sensitive `roi_pooling` layer
  53     - added `shuffle_channels` layer
  54     - added `strided_slice` layer
  55     - added IE gates ordering for lstm layer
  56     - added `reverse_sequence` layer
  57     Bug fixes:
  58     - fixed unknown bool type error in C API
  59     - fixed non-relu activation fusing with conv_eltwise node
  60     - fixed infinite performance regression on several topologies
  61     - minor internal fixes
  62     - unified the permute order with cldnn's tensor order
  63     Other:
  64     - removed boost
  65     - supported compilation with c++11 only
  66
  67
  68 ## Changelog
  69
  70 ### Drop 13.1
  71     New features:
  72     - added max mode for contract primitive
  73     - added one_hot primitive
  74     - optional explicit output data type support for all primitives
  75     Bug fixes:
  76     - fix for graph optimizer (crop primitive)
  77     - fix for processing order (deconvolution primitive)
  78     - fix for convolution-eltwise primitive
  79     UX:
  80     - cache.json is searched in to library directory
  81     Performance:
  82     - optimizations for lstm_gemm primitive
  83
  84 ### Drop 13.0
  85     New features:
  86     - events pool
  87     - group support in convolution and deconvolution primitives
  88     - broadcastable inputs support for eltwise primitive
  89     - asymmetric padding for convolution primitive
  90     - fused convolution-eltwise primitive (API extension)
  91     - auto-calculated output shape support for reshape primitive
  92     - crop support for i8/s8/i32/i64 types
  93     - broadcast axis support for broadcast primitive
  94     - logic and comparison operations support for eltwise primitive
  95     Bug fixes:
  96     - added required alignment checks for some fc implementations
  97     - added lstm support for f16 (half) type
  98     - reorders for fc moved to graph compiler
  99     - primitive fusing and reorder fixes
 100     UX:
 101     - added internal core tests project
 102     - refactored optimizations pass manager and passes
 103     Performance:
 104     - optimized concatenation during upsampling (unpool)
 105     - IMAD-based optimizations for convolution, fc, eltwise and pooling primitives (i8/s8)
 106     - convolution-eltwise fusing optimizations
 107     - partial writes optimizations for block-based kernels
 108
 109 ### Drop 12.1
 110         - gtests code refactor
 111         - buildbreak fix
 112
 113 ### Drop 12.0
 114     New features:
 115     - pyramidRoiAlign primitive
 116     - multiple axes support for reverse mode in index_select
 117     - eltwise min/max/mod support for i8/i32/i64
 118     - broadcast support for i32/i64
 119     Bug fixes:
 120     - memory leak fixes
 121     - in-place reshape
 122     - no padding for output primitives
 123     UX:
 124     - RapidJSON library for auto-tune cache
 125     - less dependencies in program.cpp
 126     - do not throw error, when device not validated
 127     - global pooling in c API
 128     - optimized padding for convolution
 129
 130 ### Drop 11.0
 131     New features:
 132     - throttle hints
 133     - extended border and tile
 134     - GPU implementation of Detection Output
 135         - More cases for BatchNorm primitive
 136     Bug fixes:
 137     - GEMM fix (align with ONNX)
 138         - memory leak fix in memory pool
 139         - increase FC precision for fp16 (fp32 accu)
 140     Performance:
 141     - cache for new topologies and devices
 142     - conv1x1 with stride >1 into eltwise optimization
 143
 144 ### Drop 10.0
 145     New features:
 146     - condition primitive
 147     - fused convolution with bn and scale (backprop)
 148     - scale/shit and mean/var as an output in batch norm
 149     - add LSTM output selection
 150     Bug fixes:
 151     - memory pool fixes
 152     UX:
 153     - downgrade to cxx11
 154     - add support for u8 data type in custom primitive
 155     - library size optimizations
 156     Performance:
 157     - in place concatenation optimization
 158     - conv1x1 with stride >1 into eltwise optimization
 159
 160 ### Drop 9.2
 161         New features
 162         - local convolution
 163         - eltwise with strie
 164
 165 ### Drop 9.1
 166     New features:
 167     - select index primitive
 168         - gemm primitive
 169     Bug fixes:
 170     - fix for output format in fully connected primitive
 171
 172 ### Drop 9.0
 173     New features:
 174     - log2 activation function
 175     - support for i32 and i64 types
 176     - select primitive
 177         - border primitive
 178         - tile primitive
 179     Bug fixes:
 180     - dilation > input size fix
 181
 182 ### Drop 8.0
 183     New features:
 184     - lstm primitive
 185     - average unpooling primitive
 186     - serialization - dump weights, biases and kernels
 187     - scale grad for input and weights primitive
 188     Bug fixes:
 189     - wrong gws in concatenation
 190     - int8 layers
 191     - convolution depthwise bias concatenation
 192     - params in engine_info
 193     - mutable_data filler
 194     - momentum calculation
 195     UX:
 196     - kernel selector renaming
 197     - bfyx_yxfb batched reorder
 198     - code cleanups
 199     - primitives allocation order
 200
 201 ### Drop 7.0
 202     New features:
 203     - support for img_info=4 in proposal_gpu
 204     - support images format in winograd
 205     - support for 2 or more inputs in eltwise
 206     - priority and throttle hints
 207     - deconvolution_grad_input primitive
 208     - fc_grad_input and fc_grad_weights primitives
 209     Bug fixes:
 210     - tensor fixes (i.e. less operator fix)
 211     - cascade concat fixes
 212     - winograd fixes for bfyx format
 213     - auto-tuning fixes for weights calculation
 214     UX:
 215     - memory pool (reusing memory buffers)
 216     - added choosen kernel name in graph dump
 217     - flush memory functionality
 218     Performance:
 219     - graph optimizations
 220     - depth-concatenation with fused relu optimization
 221     - winograd optimizations
 222     - deconvolution optimizations (i.e bfyx opt)
 223
 224 ### Drop 6.0
 225         New features:
 226         - fused winograd
 227         - image support for weights
 228         - yolo_region primitive support
 229         - yolo_reorg primitive support
 230         Bug fixes:
 231         - winograd bias fix
 232         - mean subtract fix
 233         UX:
 234         - extend graph dumps
 235         Performance:
 236         - update offline caches for newer drivers
 237         - conv1x1 byxf optimization
 238         - conv1x1 with images
 239         - cascade depth concatenation fuse optimization
 240
 241 ### Drop 5.0
 242         New features:
 243         - split primitive
 244         - upsampling primitive
 245         - add preliminary Coffe Lake support
 246         - uint8 weights support
 247         - versioning
 248         - offline autotuner cache
 249         - Winograd phase 1 - not used yet
 250         Bug fixes:
 251         - in-place crop optimization bug fix
 252         - output spatial padding in yxfb kernels fix
 253         - local work sizes fix in softmax
 254         - underflow fix in batch normalization
 255         - average pooling corner case fix
 256         UX:
 257         - graph logger, dumps graphwiz format files
 258         - extended documentation with API diagram and graph compilation steps
 259         Performance:
 260         - softmax optimization
 261         - lrn within channel optimization
 262         - priorbox optimization
 263         - constant propagation
 264
 265 ### Drop 4.0
 266         New features:
 267         - OOOQ execution model implementation
 268         - depthwise separable convolution implementation
 269         - kernel auto-tuner implementation
 270         Bug fixes:
 271         - dump hidden layer fix
 272         - run single layer fix
 273         - reshape fix
 274         UX:
 275         - enable RTTI
 276         - better error handling/reporting
 277         Performance:
 278         - lrn optimization
 279         - dynamic pruning for sparse fc layers
 280         - reorder optimization
 281         - concatenation optimization
 282         - eltwise optimization
 283         - activation fusing
 284
 285 ### Drop 3.0
 286         Added:
 287         - kernel selector
 288         - custom layer
 289         Changed:
 290         - performance improvments
 291         - bug fixes (deconvolution, softmax, reshape)
 292         - apply fixes from community reported issues
 293
 294 ### Drop 2.0
 295         Added:
 296         - step by step tutorial
 297         Changed:
 298         - perfomance optimization for: softmax, fully connected, eltwise, reshape
 299         - bug fixes (conformance)
 300
 301 ### Drop 1.0
 302         - initial drop of clDNN
 303
 304 ## Support
 305 Please report issues and suggestions
 306 [GitHub issues](https://github.com/01org/cldnn/issues).
 307
 308 ## How to Contribute
 309 We welcome community contributions to clDNN. If you have an idea how to improve the library:
 310
 311 - Share your proposal via
 312  [GitHub issues](https://github.com/01org/cldnn/issues)
 313 - Ensure you can build the product and run all the examples with your patch
 314 - In the case of a larger feature, create a test
 315 - Submit a [pull request](https://github.com/01org/cldnn/pulls)
 316
 317 We will review your contribution and, if any additional fixes or modifications
 318 are necessary, may provide feedback to guide you. When accepted, your pull
 319 request will be merged into our internal and GitHub repositories.
 320
 321 ## System Requirements
 322 clDNN supports Intel® HD Graphics and Intel® Iris® Graphics and is optimized for
 323 - Codename *Skylake*:
 324     * Intel® HD Graphics 510 (GT1, *client* market)
 325     * Intel® HD Graphics 515 (GT2, *client* market)
 326     * Intel® HD Graphics 520 (GT2, *client* market)
 327     * Intel® HD Graphics 530 (GT2, *client* market)
 328     * Intel® Iris® Graphics 540 (GT3e, *client* market)
 329     * Intel® Iris® Graphics 550 (GT3e, *client* market)
 330     * Intel® Iris® Pro Graphics 580 (GT4e, *client* market)
 331     * Intel® HD Graphics P530 (GT2, *server* market)
 332     * Intel® Iris® Pro Graphics P555 (GT3e, *server* market)
 333     * Intel® Iris® Pro Graphics P580 (GT4e, *server* market)
 334 - Codename *Apollolake*:
 335     * Intel® HD Graphics 500
 336     * Intel® HD Graphics 505
 337 - Codename *Kabylake*:
 338     * Intel® HD Graphics 610 (GT1, *client* market)
 339         * Intel® HD Graphics 615 (GT2, *client* market)
 340     * Intel® HD Graphics 620 (GT2, *client* market)
 341         * Intel® HD Graphics 630 (GT2, *client* market)
 342     * Intel® Iris® Graphics 640 (GT3e, *client* market)
 343     * Intel® Iris® Graphics 650 (GT3e, *client* market)
 344     * Intel® HD Graphics P630 (GT2, *server* market)
 345     * Intel® Iris® Pro Graphics 630 (GT2, *server* market)
 346
 347 clDNN currently uses OpenCL™ with multiple Intel® OpenCL™ extensions and requires Intel® Graphics Driver to run.
 348
 349 clDNN requires CPU with Intel® SSE/Intel® AVX support.
 350
 351 ---
 352
 353 The software dependencies are:
 354 - [CMake\*](https://cmake.org/download/) 3.5 or later
 355 - C++ compiler with C++11 standard support compatible with:
 356     * GNU\* Compiler Collection 4.8 or later
 357     * clang 3.5 or later
 358     * [Intel® C++ Compiler](https://software.intel.com/en-us/intel-parallel-studio-xe) 17.0 or later
 359     * Visual C++ 2015 (MSVC++ 19.0) or later
 360
 361 > Intel® CPU intrinsics header (`<immintrin.h>`) must be available during compilation.
 362
 363 - [python™](https://www.python.org/downloads/) 2.7 or later (scripts are both compatible with python™ 2.7.x and python™ 3.x)
 364 - *(optional)* [Doxygen\*](http://www.stack.nl/~dimitri/doxygen/download.html) 1.8.13 or later
 365     Needed for manual generation of documentation from inline comments or running `docs` custom target which will generate it automatically.
 366
 367 > [GraphViz\*](http://www.graphviz.org/Download..php) (2.38 or later) is also recommended to generate documentation with all embedded diagrams.
 368 (Make sure that `dot` application is visible in the `PATH` environment variable.)
 369
 370 ---
 371
 372 - The software was validated on:
 373     * CentOS* 7.2 with GNU* Compiler Collection 5.2 (64-bit only), using [Intel® Graphics Compute Runtime for OpenCL(TM)](https://software.intel.com/en-us/articles/opencl-drivers) .
 374     * Windows® 10 and Windows® Server 2012 R2 with MSVC 14.0, using [Intel® Graphics Driver for Windows* [24.20] driver package](https://downloadcenter.intel.com/download/27803/Graphics-Intel-Graphics-Driver-for-Windows-10?v=t).
 375
 376         More information on Intel® OpenCL™ drivers can be found [here](https://software.intel.com/en-us/articles/opencl-drivers).
 377
 378 We recommend to use latest for Linux [link](https://github.com/intel/compute-runtime/releases) and 24.20 driver for Windows [link](https://downloadcenter.intel.com/download/27803/Graphics-Intel-Graphics-Driver-for-Windows-10?v=t).
 379
 380 ## Installation
 381
 382 ### Building
 383
 384 Download [clDNN source code](https://github.com/01org/cldnn/archive/master.zip)
 385 or clone the repository to your system:
 386
 387 ```
 388     git clone  https://github.com/intel/cldnn.git
 389 ```
 390
 391 Satisfy all software dependencies and ensure that the versions are correct before building.
 392
 393 clDNN uses multiple 3<sup>rd</sup>-party components. They are stored in binary form in `common` subdirectory. Currently they are prepared for MSVC++ and GCC\*. They will be cloned with repository.
 394
 395 ---
 396
 397 clDNN uses a CMake-based build system. You can use CMake command-line tool or CMake GUI (`cmake-gui`) to generate required solution.
 398 For Windows system, you can call in `cmd` (or `powershell`):
 399 ```shellscript
 400     @REM Generate 32-bit solution (solution contains multiple build configurations)...
 401     cmake -E make_directory build && cd build && cmake -G "Visual Studio 14 2015" ..
 402     @REM Generate 64-bit solution (solution contains multiple build configurations)...
 403     cmake -E make_directory build && cd build && cmake -G "Visual Studio 14 2015 Win64" ..
 404 ```
 405 Created solution can be opened in Visual Studio 2015 or built using appropriate `msbuild` tool
 406 (you can also use `cmake --build .` to select build tool automatically).
 407
 408 For Unix and Linux systems:
 409 ```shellscript
 410     @REM Create GNU makefile for release clDNN and build it...
 411     cmake -E make_directory build && cd build && cmake -DCMAKE_BUILD_TYPE=Release .. && make
 412     @REM Create Ninja makefile for debug clDNN and build it...
 413     cmake -E make_directory build && cd build && cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug .. && ninja -k 20
 414 ```
 415
 416 You can call also scripts in main directory of project which will create solutions/makefiles for clDNN (they
 417 will generate solutions/makefiles in `build` subdirectory and binary outputs will be written to `build/out` subdirectory):
 418 - `create_msvc_mscc.bat` (Windows\*, Visual Studio\* 2015)
 419 - `create_unixmake_gcc.sh [Y|N] [<devtoolset-version>]` (Linux\*, GNU\* or Ninja\* makefiles, optional devtoolset support)
 420     * If you specify the first parameter as `Y`, the Ninja makefiles will be generated.
 421     * If you specify second parameter (number), the CMake will be called via `scl` with selected `devtoolset` version.
 422
 423 CMake solution offers multiple options which you can specify using normal CMake syntax (`-D<option-name>=<value>`):
 424
 425 | CMake option                              | Type     | Description                                                                  |
 426 |:------------------------------------------|:---------|:-----------------------------------------------------------------------------|
 427 | CMAKE\_BUILD\_TYPE                        | STRING   | Build configuration that will be used by generated makefiles (it does not affect multi-configuration generators like generators for Visual Studio solutions). Currently supported: `Debug` (default), `Release` |
 428 | CMAKE\_INSTALL\_PREFIX                    | PATH     | Install directory prefix.                                                    |
 429 | CLDNN\_\_ARCHITECTURE\_TARGET             | STRING   | Architecture of target system (where binary output will be deployed). CMake will try to detect it automatically (based on selected generator type, host OS and compiler properties). Specify this option only if CMake has problem with detection. Currently supported: `Windows32`, `Windows64`, `Linux64` |
 430 | CLDNN\_\_OUTPUT\_DIR (CLDNN\_\_OUTPUT\_BIN\_DIR, CLDNN\_\_OUTPUT\_LIB\_DIR) | PATH | Location where built artifacts will be written to. It is set automatically to roughly `build/out/<arch-target>/<build-type>` subdirectory. For more control use: `CLDNN__OUTPUT_LIB_DIR` (specifies output path for static libraries) or `CLDNN__OUTPUT_BIN_DIR` (for shared libs and executables). |
 431 |                                           |          |                                                                              |
 432 | **CMake advanced option**                 | **Type** | **Description**                                                              |
 433 | PYTHON\_EXECUTABLE                        | FILEPATH | Path to Python interpreter. CMake will try to detect Python. Specify this option only if CMake has problem with locating Python. |
 434 | CLDNN\_\_IOCL\_ICD\_USE\_EXTERNAL         | BOOL     | Use this option to enable use of external Intel® OpenCL™ SDK as a source for ICD binaries and headers (based on `INTELOCLSDKROOT` environment variable). Default: `OFF` |
 435 | CLDNN\_\_IOCL\_ICD\_VERSION               | STRING   | Version of Intel® OpenCL™ ICD binaries and headers to use (from `common` subdirectory). It is automatically setected by CMake (highest version). Specify, if you have multiple versions and want to use different than automatically selected. |
 436 |                                           |          |                                                                              |
 437 | CLDNN__COMPILE_LINK_ALLOW_UNSAFE_SIZE_OPT | BOOL     | Allow unsafe optimizations during linking (like aggressive dead code elimination, etc.). Default: `ON` |
 438 | CLDNN__COMPILE_LINK_USE_STATIC_RUNTIME    | BOOL     | Link with static C++ runtime. Default: `OFF` (shared C++ runtime is used)    |
 439 |                                           |          |                                                                              |
 440 | CLDNN__INCLUDE_CORE                       | BOOL     | Include core clDNN library project in generated makefiles/solutions. Default: `ON` |
 441 | CLDNN__INCLUDE_TESTS                      | BOOL     | Include tests application project (based on googletest framework) in generated makefiles/solutions . Default: `ON` |
 442 |                                           |          |                                                                              |
 443 | CLDNN__RUN_TESTS                          | BOOL     | Run tests after building `tests` project. This option requires `CLDNN__INCLUDE_TESTS` option to be `ON`. Default: `OFF` |
 444 |                                           |          |                                                                              |
 445 | CLDNN__CMAKE_DEBUG                        | BOOL     | Enable extended debug messages in CMake. Default: `OFF`                      |
 446
 447 ---
 448
 449 clDNN includes unit tests implemented using the googletest framework. To validate your build, run `tests` target, e.g.:
 450
 451 ```
 452     make tests
 453 ```
 454
 455 (Make sure that both `CLDNN__INCLUDE_TESTS` and `CLDNN__RUN_TESTS` were set to `ON` when invoking CMake.)
 456
 457 ### Generating documentation
 458
 459 Documentation is provided inline and can be generated in HTML format with Doxygen. We recommend to use latest
 460 [Doxygen\*](http://www.stack.nl/~dimitri/doxygen/download.html) and [GraphViz\*](http://www.graphviz.org/Download..php).
 461
 462 Documentation templates and configuration files are stored in `docs` subdirectory. You can simply call:
 463
 464 ```shellscript
 465     cd docs && doxygen
 466 ```
 467 to generate HTML documentation in `docs/html` subdirectory.
 468
 469 There is also custom CMake target named `docs` which will generate documentation in `CLDNN__OUTPUT_BIN_DIR/html` directory. For example, when using Unix makefiles, you can run:
 470 ```
 471     make docs
 472 ```
 473 in order to create it.
 474
 475 ### Deployment
 476
 477 Special `install` target will place the API header files and libraries in `/usr/local`
 478 (`C:/Program Files/clDNN` or `C:/Program Files (x86)/clDNN` on Windows). To change
 479 the installation path, use the option `-DCMAKE_INSTALL_PREFIX=<prefix>` when invoking CMake.
 480
 481 ---
 482
 483
 484 \* Other names and brands may be claimed as the property of others.
 485
 486 Copyright © 2017, Intel® Corporation