Fix pontential issue with number of blocks launched for depthwise kernels: the number...
authorA. Unique TensorFlower <gardener@tensorflow.org>
Fri, 16 Feb 2018 21:20:13 +0000 (13:20 -0800)
committerTensorFlower Gardener <gardener@tensorflow.org>
Fri, 16 Feb 2018 21:24:00 +0000 (13:24 -0800)
commit428d034227c9e7b637de0194d80cac3976a37eef
tree09e7948680f12f13238254deb488473edaab8aa7
parent96c2a846609d3a68f9a88c60c4c68a243f74ee44
Fix pontential issue with number of blocks launched for depthwise kernels: the number of work_elements was too small, which could return a block_count that is too small to cover all elements.

We also have been ignoring the suggested thread_per_block, so were potentially launching more blocks than necessary to fill the GPU (which is inefficient, but functionally correct).

Changing 'assert(false && ...' to LOG(FATAL) because it shouldn't be debug only.

PiperOrigin-RevId: 186037306
tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
tensorflow/core/util/cuda_launch_config.h