multi-dim standard deviation for CUDA. (#14990)
authorBrennan Vincent <btv@fb.com>
Thu, 20 Dec 2018 16:53:44 +0000 (08:53 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Thu, 20 Dec 2018 16:56:32 +0000 (08:56 -0800)
commit7a764fe270ef06f364e6e504db2ce5959660bd8f
treea80c571fe50512c4dfd030404bb10f150f35cfb1
parent5e624948b65ff32f927eed7e4fa1002b4113f8c1
multi-dim standard deviation for CUDA. (#14990)

Summary:
This is the CUDA version of #14535 .
It refactors Reduce.cuh to allow more general classes of reductions to be performed -- we no longer assume that the temporary data returned during reduction is just one scalar, and instead allow an arbitrary accumulate type.
We also allow 64-bit indexing when necessary, since in general we will no longer be able to accumulate directly in the output. (In the cases when we can, we continue to split the tensors until they can be addressed with 32-bits, as before).
As an initial use-case, we implement `std` in multiple dimensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14990

Differential Revision: D13405097

Pulled By: umanwizard

fbshipit-source-id: a56c24dc2fd5326d417632089bd3f5c4f9f0d2cb
12 files changed:
aten/src/ATen/cuda/detail/OffsetCalculator.cuh
aten/src/ATen/detail/FunctionTraits.h
aten/src/ATen/native/ReduceOps.cpp
aten/src/ATen/native/SharedReduceOps.h [new file with mode: 0644]
aten/src/ATen/native/cpu/Reduce.h
aten/src/ATen/native/cpu/ReduceOpsKernel.cpp
aten/src/ATen/native/cuda/DeviceSqrt.cuh [new file with mode: 0644]
aten/src/ATen/native/cuda/Normalization.cuh
aten/src/ATen/native/cuda/Reduce.cuh
aten/src/ATen/native/cuda/ReduceOpsKernel.cu
test/test_torch.py
torch/_torch_docs.py