In loop_wrapper, do not copy the passed-in functor (capture it by reference instead...
authorBrennan Vincent <btv@fb.com>
Wed, 9 Jan 2019 03:51:41 +0000 (19:51 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Wed, 9 Jan 2019 03:59:39 +0000 (19:59 -0800)
commit8a07cbe5e1c2c7b56e9b46ebed0d192dc9551612
tree079764816ab4737c09ee256e24e6dad9081bc082
parent2b226122899c4fdb66432aa740044fcca9d25b2d
In loop_wrapper, do not copy the passed-in functor (capture it by reference instead). (#15845)

Summary:
The overhead of the copy actually makes an appreciable difference when doing a lot of small reductions (i.e., when the reduced dimension is significantly smaller than the non-reduced dimensions.

```
x=torch.randn((1024,10,1024),dtype=torch.float64)
torch.set_num_threads(1)
%timeit x.std(1)
```

Before: 813.0 ms

After: 708.25 ms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15845

Differential Revision: D13603246

Pulled By: umanwizard

fbshipit-source-id: 020d224d76fcb8a0b55b75b0f2937e9508891beb
aten/src/ATen/native/TensorIterator.cpp