[IE CLDNN] Improve performance of fc block fp16 implementation (#1993)
authorKonrad Dobros <konrad.dobros@intel.com>
Mon, 31 Aug 2020 06:52:47 +0000 (08:52 +0200)
committerGitHub <noreply@github.com>
Mon, 31 Aug 2020 06:52:47 +0000 (09:52 +0300)
commit2b8249fc9f899dea2270b2dbfda6840ac1151990
treea121c12269ad2d16d34cc584de8caa6c3a06946d
parentc7b3bd0195a4c8fd1c1d64ef6b47c60b1bb1eda0
[IE CLDNN] Improve performance of fc block fp16 implementation (#1993)

Main purpose of this change is to fix weird behaviour of
fully_connected_gpu_fb_io_block_fp16 implementation where it shows
severe performance drop without bias.
Additionally assembly for case with bias is improved.
inference-engine/thirdparty/clDNN/kernel_selector/core/cl_kernels/fully_connected_gpu_fb_io_block_fp16.cl