2 # Compute Library for Deep Neural Networks (clDNN)
3 [![Apache License Version 2.0](https://img.shields.io/badge/license-Apache_2.0-green.svg)](LICENSE)
4 ![v1.0](https://img.shields.io/badge/1.0-RC1-green.svg)
6 *Compute Library for Deep Neural Networks* (*clDNN*) is an open source performance
7 library for Deep Learning (DL) applications intended for acceleration of
8 DL Inference on Intel® Processor Graphics – including HD Graphics and
10 *clDNN* includes highly optimized building blocks for implementation of
11 convolutional neural networks (CNN) with C and C++ interfaces. We created
12 this project to enable the DL community to innovate on Intel® processors.
14 **Usages supported:** Image recognition, image detection, and image segmentation.
16 **Validated Topologies:** AlexNet\*, VGG(16,19)\*, GoogleNet(v1,v2,v3)\*, ResNet(50,101,152)\* Faster R-CNN\*, Squeezenet\*, SSD_googlenet\*, SSD_VGG\*, PVANET\*, PVANET_REID\*, age_gender\*, FCN\* and yolo\*.
18 As with any technical preview, APIs may change in future updates.
21 clDNN is licensed is licensed under
22 [Apache License Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).
25 clDNN uses 3<sup>rd</sup>-party components licensed under following licenses:
26 - *googletest* under [Google\* License](https://github.com/google/googletest/blob/master/googletest/LICENSE)
27 - *OpenCL™ ICD and C++ Wrapper* under [Khronos™ License](https://github.com/KhronosGroup/OpenCL-CLHPP/blob/master/LICENSE.txt)
28 - *RapidJSON* under [Tencent\* License](https://github.com/Tencent/rapidjson/blob/master/license.txt)
31 The latest clDNN documentation is at [GitHub pages](https://intel.github.io/clDNN/index.html).
33 There is also inline documentation available that can be [generated with Doxygen](#generating-documentation).
35 Accelerate Deep Learning Inference with Intel® Processor Graphics whitepaper [link](https://software.intel.com/en-us/articles/accelerating-deep-learning-inference-with-intel-processor-graphics).
37 ## Intel® OpenVino™ Toolkit and clDNN
39 clDNN is released also together with Intel® OpenVino™ Toolkit, which contains:
40 - *Model Optimizer* a Python*-based command line tool, which imports trained models from popular deep learning frameworks such as Caffe*, TensorFlow*, and Apache MXNet*.
41 - *Inference Engine* an execution engine which uses a common API to deliver inference solutions on the platform of your choice (for example GPU with clDNN library)
43 You can find more information [here](https://software.intel.com/en-us/openvino-toolkit/deep-learning-cv).
45 ## OpenVINO specific changes
47 - added `not` activation type
48 - added `depth_to_space` layer
49 - new clip options in `detection_output` (cpu impl) and `proposal` layers
50 - added eltwise `xor` and `squared_diff` operations
51 - added `gather` layer
52 - added `bilinear` mode for position sensitive `roi_pooling` layer
53 - added `shuffle_channels` layer
54 - added `strided_slice` layer
55 - added IE gates ordering for lstm layer
56 - added `reverse_sequence` layer
58 - fixed unknown bool type error in C API
59 - fixed non-relu activation fusing with conv_eltwise node
60 - fixed infinite performance regression on several topologies
61 - minor internal fixes
62 - unified the permute order with cldnn's tensor order
65 - supported compilation with c++11 only
72 - added max mode for contract primitive
73 - added one_hot primitive
74 - optional explicit output data type support for all primitives
76 - fix for graph optimizer (crop primitive)
77 - fix for processing order (deconvolution primitive)
78 - fix for convolution-eltwise primitive
80 - cache.json is searched in to library directory
82 - optimizations for lstm_gemm primitive
87 - group support in convolution and deconvolution primitives
88 - broadcastable inputs support for eltwise primitive
89 - asymmetric padding for convolution primitive
90 - fused convolution-eltwise primitive (API extension)
91 - auto-calculated output shape support for reshape primitive
92 - crop support for i8/s8/i32/i64 types
93 - broadcast axis support for broadcast primitive
94 - logic and comparison operations support for eltwise primitive
96 - added required alignment checks for some fc implementations
97 - added lstm support for f16 (half) type
98 - reorders for fc moved to graph compiler
99 - primitive fusing and reorder fixes
101 - added internal core tests project
102 - refactored optimizations pass manager and passes
104 - optimized concatenation during upsampling (unpool)
105 - IMAD-based optimizations for convolution, fc, eltwise and pooling primitives (i8/s8)
106 - convolution-eltwise fusing optimizations
107 - partial writes optimizations for block-based kernels
110 - gtests code refactor
115 - pyramidRoiAlign primitive
116 - multiple axes support for reverse mode in index_select
117 - eltwise min/max/mod support for i8/i32/i64
118 - broadcast support for i32/i64
122 - no padding for output primitives
124 - RapidJSON library for auto-tune cache
125 - less dependencies in program.cpp
126 - do not throw error, when device not validated
127 - global pooling in c API
128 - optimized padding for convolution
133 - extended border and tile
134 - GPU implementation of Detection Output
135 - More cases for BatchNorm primitive
137 - GEMM fix (align with ONNX)
138 - memory leak fix in memory pool
139 - increase FC precision for fp16 (fp32 accu)
141 - cache for new topologies and devices
142 - conv1x1 with stride >1 into eltwise optimization
146 - condition primitive
147 - fused convolution with bn and scale (backprop)
148 - scale/shit and mean/var as an output in batch norm
149 - add LSTM output selection
154 - add support for u8 data type in custom primitive
155 - library size optimizations
157 - in place concatenation optimization
158 - conv1x1 with stride >1 into eltwise optimization
167 - select index primitive
170 - fix for output format in fully connected primitive
174 - log2 activation function
175 - support for i32 and i64 types
180 - dilation > input size fix
185 - average unpooling primitive
186 - serialization - dump weights, biases and kernels
187 - scale grad for input and weights primitive
189 - wrong gws in concatenation
191 - convolution depthwise bias concatenation
192 - params in engine_info
193 - mutable_data filler
194 - momentum calculation
196 - kernel selector renaming
197 - bfyx_yxfb batched reorder
199 - primitives allocation order
203 - support for img_info=4 in proposal_gpu
204 - support images format in winograd
205 - support for 2 or more inputs in eltwise
206 - priority and throttle hints
207 - deconvolution_grad_input primitive
208 - fc_grad_input and fc_grad_weights primitives
210 - tensor fixes (i.e. less operator fix)
211 - cascade concat fixes
212 - winograd fixes for bfyx format
213 - auto-tuning fixes for weights calculation
215 - memory pool (reusing memory buffers)
216 - added choosen kernel name in graph dump
217 - flush memory functionality
219 - graph optimizations
220 - depth-concatenation with fused relu optimization
221 - winograd optimizations
222 - deconvolution optimizations (i.e bfyx opt)
227 - image support for weights
228 - yolo_region primitive support
229 - yolo_reorg primitive support
236 - update offline caches for newer drivers
237 - conv1x1 byxf optimization
238 - conv1x1 with images
239 - cascade depth concatenation fuse optimization
244 - upsampling primitive
245 - add preliminary Coffe Lake support
246 - uint8 weights support
248 - offline autotuner cache
249 - Winograd phase 1 - not used yet
251 - in-place crop optimization bug fix
252 - output spatial padding in yxfb kernels fix
253 - local work sizes fix in softmax
254 - underflow fix in batch normalization
255 - average pooling corner case fix
257 - graph logger, dumps graphwiz format files
258 - extended documentation with API diagram and graph compilation steps
260 - softmax optimization
261 - lrn within channel optimization
262 - priorbox optimization
263 - constant propagation
267 - OOOQ execution model implementation
268 - depthwise separable convolution implementation
269 - kernel auto-tuner implementation
271 - dump hidden layer fix
272 - run single layer fix
276 - better error handling/reporting
279 - dynamic pruning for sparse fc layers
280 - reorder optimization
281 - concatenation optimization
282 - eltwise optimization
290 - performance improvments
291 - bug fixes (deconvolution, softmax, reshape)
292 - apply fixes from community reported issues
296 - step by step tutorial
298 - perfomance optimization for: softmax, fully connected, eltwise, reshape
299 - bug fixes (conformance)
302 - initial drop of clDNN
305 Please report issues and suggestions
306 [GitHub issues](https://github.com/01org/cldnn/issues).
309 We welcome community contributions to clDNN. If you have an idea how to improve the library:
311 - Share your proposal via
312 [GitHub issues](https://github.com/01org/cldnn/issues)
313 - Ensure you can build the product and run all the examples with your patch
314 - In the case of a larger feature, create a test
315 - Submit a [pull request](https://github.com/01org/cldnn/pulls)
317 We will review your contribution and, if any additional fixes or modifications
318 are necessary, may provide feedback to guide you. When accepted, your pull
319 request will be merged into our internal and GitHub repositories.
321 ## System Requirements
322 clDNN supports Intel® HD Graphics and Intel® Iris® Graphics and is optimized for
323 - Codename *Skylake*:
324 * Intel® HD Graphics 510 (GT1, *client* market)
325 * Intel® HD Graphics 515 (GT2, *client* market)
326 * Intel® HD Graphics 520 (GT2, *client* market)
327 * Intel® HD Graphics 530 (GT2, *client* market)
328 * Intel® Iris® Graphics 540 (GT3e, *client* market)
329 * Intel® Iris® Graphics 550 (GT3e, *client* market)
330 * Intel® Iris® Pro Graphics 580 (GT4e, *client* market)
331 * Intel® HD Graphics P530 (GT2, *server* market)
332 * Intel® Iris® Pro Graphics P555 (GT3e, *server* market)
333 * Intel® Iris® Pro Graphics P580 (GT4e, *server* market)
334 - Codename *Apollolake*:
335 * Intel® HD Graphics 500
336 * Intel® HD Graphics 505
337 - Codename *Kabylake*:
338 * Intel® HD Graphics 610 (GT1, *client* market)
339 * Intel® HD Graphics 615 (GT2, *client* market)
340 * Intel® HD Graphics 620 (GT2, *client* market)
341 * Intel® HD Graphics 630 (GT2, *client* market)
342 * Intel® Iris® Graphics 640 (GT3e, *client* market)
343 * Intel® Iris® Graphics 650 (GT3e, *client* market)
344 * Intel® HD Graphics P630 (GT2, *server* market)
345 * Intel® Iris® Pro Graphics 630 (GT2, *server* market)
347 clDNN currently uses OpenCL™ with multiple Intel® OpenCL™ extensions and requires Intel® Graphics Driver to run.
349 clDNN requires CPU with Intel® SSE/Intel® AVX support.
353 The software dependencies are:
354 - [CMake\*](https://cmake.org/download/) 3.5 or later
355 - C++ compiler with C++11 standard support compatible with:
356 * GNU\* Compiler Collection 4.8 or later
358 * [Intel® C++ Compiler](https://software.intel.com/en-us/intel-parallel-studio-xe) 17.0 or later
359 * Visual C++ 2015 (MSVC++ 19.0) or later
361 > Intel® CPU intrinsics header (`<immintrin.h>`) must be available during compilation.
363 - [python™](https://www.python.org/downloads/) 2.7 or later (scripts are both compatible with python™ 2.7.x and python™ 3.x)
364 - *(optional)* [Doxygen\*](http://www.stack.nl/~dimitri/doxygen/download.html) 1.8.13 or later
365 Needed for manual generation of documentation from inline comments or running `docs` custom target which will generate it automatically.
367 > [GraphViz\*](http://www.graphviz.org/Download..php) (2.38 or later) is also recommended to generate documentation with all embedded diagrams.
368 (Make sure that `dot` application is visible in the `PATH` environment variable.)
372 - The software was validated on:
373 * CentOS* 7.2 with GNU* Compiler Collection 5.2 (64-bit only), using [Intel® Graphics Compute Runtime for OpenCL(TM)](https://software.intel.com/en-us/articles/opencl-drivers) .
374 * Windows® 10 and Windows® Server 2012 R2 with MSVC 14.0, using [Intel® Graphics Driver for Windows* [24.20] driver package](https://downloadcenter.intel.com/download/27803/Graphics-Intel-Graphics-Driver-for-Windows-10?v=t).
376 More information on Intel® OpenCL™ drivers can be found [here](https://software.intel.com/en-us/articles/opencl-drivers).
378 We recommend to use latest for Linux [link](https://github.com/intel/compute-runtime/releases) and 24.20 driver for Windows [link](https://downloadcenter.intel.com/download/27803/Graphics-Intel-Graphics-Driver-for-Windows-10?v=t).
384 Download [clDNN source code](https://github.com/01org/cldnn/archive/master.zip)
385 or clone the repository to your system:
388 git clone https://github.com/intel/cldnn.git
391 Satisfy all software dependencies and ensure that the versions are correct before building.
393 clDNN uses multiple 3<sup>rd</sup>-party components. They are stored in binary form in `common` subdirectory. Currently they are prepared for MSVC++ and GCC\*. They will be cloned with repository.
397 clDNN uses a CMake-based build system. You can use CMake command-line tool or CMake GUI (`cmake-gui`) to generate required solution.
398 For Windows system, you can call in `cmd` (or `powershell`):
400 @REM Generate 32-bit solution (solution contains multiple build configurations)...
401 cmake -E make_directory build && cd build && cmake -G "Visual Studio 14 2015" ..
402 @REM Generate 64-bit solution (solution contains multiple build configurations)...
403 cmake -E make_directory build && cd build && cmake -G "Visual Studio 14 2015 Win64" ..
405 Created solution can be opened in Visual Studio 2015 or built using appropriate `msbuild` tool
406 (you can also use `cmake --build .` to select build tool automatically).
408 For Unix and Linux systems:
410 @REM Create GNU makefile for release clDNN and build it...
411 cmake -E make_directory build && cd build && cmake -DCMAKE_BUILD_TYPE=Release .. && make
412 @REM Create Ninja makefile for debug clDNN and build it...
413 cmake -E make_directory build && cd build && cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug .. && ninja -k 20
416 You can call also scripts in main directory of project which will create solutions/makefiles for clDNN (they
417 will generate solutions/makefiles in `build` subdirectory and binary outputs will be written to `build/out` subdirectory):
418 - `create_msvc_mscc.bat` (Windows\*, Visual Studio\* 2015)
419 - `create_unixmake_gcc.sh [Y|N] [<devtoolset-version>]` (Linux\*, GNU\* or Ninja\* makefiles, optional devtoolset support)
420 * If you specify the first parameter as `Y`, the Ninja makefiles will be generated.
421 * If you specify second parameter (number), the CMake will be called via `scl` with selected `devtoolset` version.
423 CMake solution offers multiple options which you can specify using normal CMake syntax (`-D<option-name>=<value>`):
425 | CMake option | Type | Description |
426 |:------------------------------------------|:---------|:-----------------------------------------------------------------------------|
427 | CMAKE\_BUILD\_TYPE | STRING | Build configuration that will be used by generated makefiles (it does not affect multi-configuration generators like generators for Visual Studio solutions). Currently supported: `Debug` (default), `Release` |
428 | CMAKE\_INSTALL\_PREFIX | PATH | Install directory prefix. |
429 | CLDNN\_\_ARCHITECTURE\_TARGET | STRING | Architecture of target system (where binary output will be deployed). CMake will try to detect it automatically (based on selected generator type, host OS and compiler properties). Specify this option only if CMake has problem with detection. Currently supported: `Windows32`, `Windows64`, `Linux64` |
430 | CLDNN\_\_OUTPUT\_DIR (CLDNN\_\_OUTPUT\_BIN\_DIR, CLDNN\_\_OUTPUT\_LIB\_DIR) | PATH | Location where built artifacts will be written to. It is set automatically to roughly `build/out/<arch-target>/<build-type>` subdirectory. For more control use: `CLDNN__OUTPUT_LIB_DIR` (specifies output path for static libraries) or `CLDNN__OUTPUT_BIN_DIR` (for shared libs and executables). |
432 | **CMake advanced option** | **Type** | **Description** |
433 | PYTHON\_EXECUTABLE | FILEPATH | Path to Python interpreter. CMake will try to detect Python. Specify this option only if CMake has problem with locating Python. |
434 | CLDNN\_\_IOCL\_ICD\_USE\_EXTERNAL | BOOL | Use this option to enable use of external Intel® OpenCL™ SDK as a source for ICD binaries and headers (based on `INTELOCLSDKROOT` environment variable). Default: `OFF` |
435 | CLDNN\_\_IOCL\_ICD\_VERSION | STRING | Version of Intel® OpenCL™ ICD binaries and headers to use (from `common` subdirectory). It is automatically setected by CMake (highest version). Specify, if you have multiple versions and want to use different than automatically selected. |
437 | CLDNN__COMPILE_LINK_ALLOW_UNSAFE_SIZE_OPT | BOOL | Allow unsafe optimizations during linking (like aggressive dead code elimination, etc.). Default: `ON` |
438 | CLDNN__COMPILE_LINK_USE_STATIC_RUNTIME | BOOL | Link with static C++ runtime. Default: `OFF` (shared C++ runtime is used) |
440 | CLDNN__INCLUDE_CORE | BOOL | Include core clDNN library project in generated makefiles/solutions. Default: `ON` |
441 | CLDNN__INCLUDE_TESTS | BOOL | Include tests application project (based on googletest framework) in generated makefiles/solutions . Default: `ON` |
443 | CLDNN__RUN_TESTS | BOOL | Run tests after building `tests` project. This option requires `CLDNN__INCLUDE_TESTS` option to be `ON`. Default: `OFF` |
445 | CLDNN__CMAKE_DEBUG | BOOL | Enable extended debug messages in CMake. Default: `OFF` |
449 clDNN includes unit tests implemented using the googletest framework. To validate your build, run `tests` target, e.g.:
455 (Make sure that both `CLDNN__INCLUDE_TESTS` and `CLDNN__RUN_TESTS` were set to `ON` when invoking CMake.)
457 ### Generating documentation
459 Documentation is provided inline and can be generated in HTML format with Doxygen. We recommend to use latest
460 [Doxygen\*](http://www.stack.nl/~dimitri/doxygen/download.html) and [GraphViz\*](http://www.graphviz.org/Download..php).
462 Documentation templates and configuration files are stored in `docs` subdirectory. You can simply call:
467 to generate HTML documentation in `docs/html` subdirectory.
469 There is also custom CMake target named `docs` which will generate documentation in `CLDNN__OUTPUT_BIN_DIR/html` directory. For example, when using Unix makefiles, you can run:
473 in order to create it.
477 Special `install` target will place the API header files and libraries in `/usr/local`
478 (`C:/Program Files/clDNN` or `C:/Program Files (x86)/clDNN` on Windows). To change
479 the installation path, use the option `-DCMAKE_INSTALL_PREFIX=<prefix>` when invoking CMake.
484 \* Other names and brands may be claimed as the property of others.
486 Copyright © 2017, Intel® Corporation