Added launch bounds in VolumetricConvolution.cu (#14564)
authorMichael Carilli <mcarilli@nvidia.com>
Thu, 29 Nov 2018 22:47:32 +0000 (14:47 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Thu, 29 Nov 2018 22:49:29 +0000 (14:49 -0800)
commita2d8e84594cda4576729790239db97f922fc4554
tree0f695e8befaa44535abceadd3a81b70dad6a543e
parent0d663cec3068f8d22064fb5d691a1b90775e42ea
Added launch bounds in VolumetricConvolution.cu (#14564)

Summary:
A few months ago we were seeing test failures on certain architectures due to invalid launch configurations of the kernels in aten/src/THCUNN/VolumetricConvolution.cu.

This PR ensures that those kernels are always compiled such that at least one block can be resident on an SM, and such errors will not be encountered at runtime on any architecture after compiling for that architecture.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14564

Differential Revision: D13266136

Pulled By: soumith

fbshipit-source-id: 35464b20848bb0a1168e8f3b233172331c50b35b
aten/src/THCUNN/VolumetricConvolution.cu