Don't initialize a new `std::vector` in a loop. (#15850)
authorBrennan Vincent <btv@fb.com>
Mon, 28 Jan 2019 16:47:35 +0000 (08:47 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Mon, 28 Jan 2019 16:50:27 +0000 (08:50 -0800)
commit14138f4605bf4ddd24808c9eb4890a11c0b789dd
tree7a9174c7889a3bd0fab9c680ce2dd8318affcd25
parentd2cdffaf37a732ce14964e23c5fb139c636c2934
Don't initialize a new `std::vector` in a loop. (#15850)

Summary:
Before this diff, we execute `std::vector<optional<acc_t>> buffer((unsigned)max_threads, optional<acc_t> {});` in every iteration of `foreach_reduced_elt`. Change the code to only execute that line if we need it; i.e., we are actually about to parallelize.

This overhead is quite significant when we are doing a lot of small reductions in single-threaded code.

```
x=torch.randn((1024,10,1024),dtype=torch.float64)
torch.set_num_threads(1)
%timeit x.std(1)
```

Before (with #15845 applied): 708.25 ms
After: 508 ms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15850

Differential Revision: D13612960

Pulled By: umanwizard

fbshipit-source-id: f5e61abfe0027775c97ed81ac09c997fbee741df
aten/src/ATen/native/cpu/Reduce.h