Added launch bounds in VolumetricConvolution.cu (#14564)
Summary:
A few months ago we were seeing test failures on certain architectures due to invalid launch configurations of the kernels in aten/src/THCUNN/VolumetricConvolution.cu.
This PR ensures that those kernels are always compiled such that at least one block can be resident on an SM, and such errors will not be encountered at runtime on any architecture after compiling for that architecture.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14564
Differential Revision:
D13266136
Pulled By: soumith
fbshipit-source-id:
35464b20848bb0a1168e8f3b233172331c50b35b