review.tizen.org Git - platform/upstream/pytorch.git/commit

Speed-up adaptive average pooling for the common case of size=1 output (#17011)

Summary:
When adaptive pooling has to produce a single pixel feature map, it is faster to do so by calling .mean(). Backward calls a pretty inefficient cuda kernel with atomics, which becomes ridiculously slow for halfs. For half this PR provides approx 30x speed-up for adaptive average pooling, which results in 30% end-to-end speed-up on senet. Improvements are smaller for float, but still significant (approx 5x).
Also this PR unifies handling of 3d (no batch dimension) and 4d tensors, using negative dimension indices.
cc ezyang for review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17011

Reviewed By: ailzhang

Differential Revision: D14078747

Pulled By: soumith

fbshipit-source-id: 0eb9255da2351190a6bcaf68c30e2ae2402a2dd9

author	ngimel <ngimelshein@nvidia.com>
	Fri, 15 Feb 2019 05:11:30 +0000 (21:11 -0800)
committer	Facebook Github Bot <facebook-github-bot@users.noreply.github.com>
	Fri, 15 Feb 2019 05:15:16 +0000 (21:15 -0800)
commit	91c50aeec6eccb9e23b8c08b161dbae63de9a0b0
tree	960392698734b84189667f491e0cc53a7f506b0a	tree \| snapshot
parent	7cff803d0a09e36622f4e72e2ca2820cf9b97c52	commit \| diff

aten/src/ATen/native/AdaptiveAveragePooling.cpp		diff \| blob \| history
aten/src/ATen/native/cuda/AdaptiveAveragePooling.cu		diff \| blob \| history
aten/src/ATen/native/native_functions.yaml		diff \| blob \| history
test/common_nn.py		diff \| blob \| history
tools/autograd/derivatives.yaml		diff \| blob \| history
torch/csrc/jit/symbolic_script.cpp		diff \| blob \| history