From: = <=> Date: Mon, 30 Aug 2021 16:43:25 +0000 (-0700) Subject: Improve performance of index_select by avoiding item (#63008) X-Git-Tag: accepted/tizen/8.0/unified/20231005.095509~597 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=93d2e5090f9823102debab3845117c8e8208995b;p=platform%2Fupstream%2Fpytorch.git Improve performance of index_select by avoiding item (#63008) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/61788 From a CUDA perspective: item already pulls all Tensor content onto the host (albeit one-by-one), which incurs very expensive memory transfers. This way we'll do it all at once. From a CPU perspective: item has a lot of overhead as a native function in comparison to simply using a pointer. Overall there's still lots of performance gains to be had, but this is a small change that should take us into a more usable landscape. This doesn't land a separate benchmark, but I postulate that's not necessary to decide on the benefit of this (we'll also see if it shows up indirectly), however is still a good follow-up item. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63008 Reviewed By: zou3519 Differential Revision: D30211160 Pulled By: cpuhrsch fbshipit-source-id: 70b752be5df51afc66b5aa1c77135d1205520cdd --- diff --git a/aten/src/ATen/native/TensorShape.cpp b/aten/src/ATen/native/TensorShape.cpp index 2545ec4..1dc2a27 100644 --- a/aten/src/ATen/native/TensorShape.cpp +++ b/aten/src/ATen/native/TensorShape.cpp @@ -1209,12 +1209,15 @@ Tensor index_select_sparse(const Tensor& self, int64_t dim, const Tensor& index) if (dim < sparse_dim) { - auto dim_indices = indices[dim]; + auto cpu_dim_indices = indices[dim].to(c10::kCPU).contiguous(); + int64_t* cpu_dim_indices_ptr = cpu_dim_indices.data_ptr(); + auto cpu_index = index.to(c10::kCPU).contiguous(); + int64_t* cpu_index_ptr = cpu_index.data_ptr(); std::vector zindices; std::vector iindices; int64_t new_nnz = 0; - for (const auto i : c10::irange(new_sizes[dim])) { - auto idx = index[i].item(); + for (int64_t i = 0; i < new_sizes[dim]; i++) { + int64_t idx = cpu_index_ptr[i]; if (idx < -size || idx >= size) { TORCH_CHECK_INDEX(false, "index_select(): index contains ", idx, " that is out of range for tensor of size ", self.sizes(), " at dimension ", dim); @@ -1222,8 +1225,8 @@ Tensor index_select_sparse(const Tensor& self, int64_t dim, const Tensor& index) if (idx < 0) { idx += size; } - for (const auto j : c10::irange(nnz)) { - auto jdx = dim_indices[j].item(); + for (int64_t j = 0; j < nnz; j++) { + int64_t jdx = cpu_dim_indices_ptr[j]; if (idx == jdx) { new_nnz++; iindices.push_back(i);