Fix logic errors when accumulating reductions in output (CUDA) (#16023)
authorBrennan Vincent <btv@fb.com>
Wed, 16 Jan 2019 03:55:13 +0000 (19:55 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Wed, 16 Jan 2019 03:57:57 +0000 (19:57 -0800)
commitfb68d813be3ccbab3cbf74008a11071b9a646c75
treec45efb56589285e46f2756d40e0bee4c38b32ee7
parent5353847b191757995b24c0f6412c4290faff76fc
Fix logic errors when accumulating reductions in output (CUDA) (#16023)

Summary:
The correct logic is as follows:

* If there is an earlier split, we need to combine with its result
* If there is *not* a later split, we need to project before saving into the output.

This should partially f i x #15837  . For example:
```
In [7]: a=torch.ones([1838860800], dtype=torch.float, device="cuda:1")

In [8]: a.mean()
Out[8]: tensor(1., device='cuda:1')
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16023

Differential Revision: D13678449

Pulled By: umanwizard

fbshipit-source-id: ab5078484c88e96bb30121b5cf24a0e8b0a8c2f8
aten/src/ATen/native/TensorIterator.cpp
aten/src/ATen/native/TensorIterator.h
aten/src/ATen/native/cuda/Reduce.cuh
test/test_cuda.py