[TensorIterator fixing mean to output correct result for half precisi… (#14878)
authorJie <jiej@nvidia.com>
Tue, 18 Dec 2018 04:08:15 +0000 (20:08 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Tue, 18 Dec 2018 04:13:30 +0000 (20:13 -0800)
commitbd958cde685c2de67ecf691934470ef3c289e00d
tree228d6d40e6ee57d5b0b1342848286d49faa5e10b
parent71ee882157ea06b0e8facb510c44b5a3a55e5d91
[TensorIterator fixing mean to output correct result for half precisi… (#14878)

Summary:
…on](#12115)

mean is calculated in two step sum()/numel(). For half precision, data gets
casted back to half after sum().
We fused the division into the reduction kernel by adding pre_op/post_op.

This allows us to do torch.ones(65536).cuda().half().mean() to return correct
result.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14878

Differential Revision: D13491159

Pulled By: soumith

fbshipit-source-id: e83802e1628b6d2615c45e18d7acf991d143a09e
aten/src/ATen/native/ReduceOps.cpp
aten/src/ATen/native/ReduceOps.h
aten/src/ATen/native/TensorIterator.cpp
aten/src/ATen/native/cpu/Reduce.h
aten/src/ATen/native/cpu/ReduceOpsKernel.cpp
aten/src/ATen/native/cuda/Reduce.cuh
aten/src/ATen/native/cuda/ReduceOpsKernel.cu
test/test_cuda.py
tools/autograd/derivatives.yaml