In loop_wrapper, do not copy the passed-in functor (capture it by reference instead...
authorBrennan Vincent <btv@fb.com>
Wed, 9 Jan 2019 03:51:41 +0000 (19:51 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Wed, 9 Jan 2019 03:59:39 +0000 (19:59 -0800)
Summary:
The overhead of the copy actually makes an appreciable difference when doing a lot of small reductions (i.e., when the reduced dimension is significantly smaller than the non-reduced dimensions.

```
x=torch.randn((1024,10,1024),dtype=torch.float64)
torch.set_num_threads(1)
%timeit x.std(1)
```

Before: 813.0 ms

After: 708.25 ms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15845

Differential Revision: D13603246

Pulled By: umanwizard

fbshipit-source-id: 020d224d76fcb8a0b55b75b0f2937e9508891beb

aten/src/ATen/native/TensorIterator.cpp

index c4cf591..c76dfc8 100644 (file)
@@ -332,7 +332,7 @@ int TensorIterator::num_reduce_dims() const {
   return count;
 }
 static loop2d_t loop_wrapper(const loop_t& loop) {
-  return [loop](int ntensor, char** base, const int64_t* strides, int64_t size0, int64_t size1) {
+  return [&loop](int ntensor, char** base, const int64_t* strides, int64_t size0, int64_t size1) {
     auto data = PtrVector(base, base + ntensor);
     const int64_t* outer_strides = &strides[ntensor];