Improve performance of index_select by avoiding item (#63008)
author= <=>
Mon, 30 Aug 2021 16:43:25 +0000 (09:43 -0700)
committerFacebook GitHub Bot <facebook-github-bot@users.noreply.github.com>
Mon, 30 Aug 2021 16:50:41 +0000 (09:50 -0700)
commit93d2e5090f9823102debab3845117c8e8208995b
treee7f42d988ef2a825adf1d9d6bd4a4513f4c1eb8c
parente24c3644d87acfb0359cb14bde4afcd62a9255ba
Improve performance of index_select by avoiding item (#63008)

Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/61788

From a CUDA perspective: item already pulls all Tensor content onto the host (albeit one-by-one), which incurs very expensive memory transfers. This way we'll do it all at once.
From a CPU perspective: item has a lot of overhead as a native function in comparison to simply using a pointer.

Overall there's still lots of performance gains to be had, but this is a small change that should take us into a more usable landscape. This doesn't land a separate benchmark, but I postulate that's not necessary to decide on the benefit of this (we'll also see if it shows up indirectly), however is still a good follow-up item.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63008

Reviewed By: zou3519

Differential Revision: D30211160

Pulled By: cpuhrsch

fbshipit-source-id: 70b752be5df51afc66b5aa1c77135d1205520cdd
aten/src/ATen/native/TensorShape.cpp