review.tizen.org Git - platform/upstream/pytorch.git/commit

int32 indexing for Tensor Iterator Reduction (#17428)

Summary:
1. Enabling int32 indexing for cases where TI cannot accumulate in output due to
incompatible data types (e.g. Welford).
2. Updating Welford kernel to use int32 instead of int64 indexing on GPU.

This change improves performance for torch.var / torch.std

Implementation:
1. Allocated extra buffer to handle accumulation between sub Tensor Iterators.
2. Removed int64 indexing in gpu_reduce_kernel
3. WelfordOps now supports index type / combination typeas a template parameter.
While GPU uses int32_t and float, CPU implementation uses int64_t and double.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17428

Differential Revision: D14264608

Pulled By: umanwizard

fbshipit-source-id: 3eb54451de925b469dbc1127e5ea7443c4431036

author	Jie <jiej@nvidia.com>
	Mon, 4 Mar 2019 21:02:40 +0000 (13:02 -0800)
committer	Facebook Github Bot <facebook-github-bot@users.noreply.github.com>
	Mon, 4 Mar 2019 21:11:47 +0000 (13:11 -0800)
commit	a87eeec9bff6d6f283e752cd2d41e6521b55555a
tree	8df899853d609d7ebdf9c0f072ec86f459c09c10	tree \| snapshot
parent	32576082765011ba65a306acd4d28d3d8c4f0142	commit \| diff

aten/src/ATen/native/SharedReduceOps.h		diff \| blob \| history
aten/src/ATen/native/cpu/ReduceOpsKernel.cpp		diff \| blob \| history
aten/src/ATen/native/cuda/Reduce.cuh		diff \| blob \| history
aten/src/ATen/native/cuda/ReduceOpsKernel.cu		diff \| blob \| history