review.tizen.org Git - platform/upstream/pytorch.git/commit

C++ handler for gradient reduction (#18251)

Summary:
This commit adds the `c10d::Reducer` class that hooks into autograd
and performs gradient bucketing and reduction. These are the core
parts of `nn.parallel.DistributedDataParallel` that up to now were
only usable for CUDA models.

This should enable the following:

* Distributed data parallelism for models defined using the C++ frontend.
* Allow overlap of gradient computation and reduction for non-CUDA models.
* Enable distributed data parallelism for models with some unused parameters.

This does not include any logic for computing bucket assignment, which
can be done separately; either by observing autograd execution order
(this is what Apex does), or by assigning buckets based on some
maximum byte size, or both.

Also see #17757 and #13273.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18251

Reviewed By: mrshenli

Differential Revision: D14571899

Pulled By: pietern

fbshipit-source-id: 20f95eefd288dfe8cfffe0a28ca22fa7c9c3cd4c

author	Pieter Noordhuis <pcnoordhuis@gmail.com>
	Mon, 1 Apr 2019 21:27:03 +0000 (14:27 -0700)
committer	Facebook Github Bot <facebook-github-bot@users.noreply.github.com>
	Mon, 1 Apr 2019 21:30:02 +0000 (14:30 -0700)
commit	bdfdf6c2b936bc5ae34c6fe52dfbb92847ae4205
tree	cf6c21e9e9e043b437e418dcb2e6590d557d6ffb	tree \| snapshot
parent	a0285dd0f48175315e8523d69bc2143da6dddabc	commit \| diff

tools/build_variables.py		diff \| blob \| history
torch/CMakeLists.txt		diff \| blob \| history
torch/csrc/distributed/c10d/init.cpp		diff \| blob \| history
torch/csrc/distributed/c10d/reducer.cpp	[new file with mode: 0644]	blob
torch/csrc/distributed/c10d/reducer.h	[new file with mode: 0644]	blob
torch/distributed/distributed_c10d.py		diff \| blob \| history