C++ handler for gradient reduction (#18251)
authorPieter Noordhuis <pcnoordhuis@gmail.com>
Mon, 1 Apr 2019 21:27:03 +0000 (14:27 -0700)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Mon, 1 Apr 2019 21:30:02 +0000 (14:30 -0700)
commitbdfdf6c2b936bc5ae34c6fe52dfbb92847ae4205
treecf6c21e9e9e043b437e418dcb2e6590d557d6ffb
parenta0285dd0f48175315e8523d69bc2143da6dddabc
C++ handler for gradient reduction (#18251)

Summary:
This commit adds the `c10d::Reducer` class that hooks into autograd
and performs gradient bucketing and reduction. These are the core
parts of `nn.parallel.DistributedDataParallel` that up to now were
only usable for CUDA models.

This should enable the following:

* Distributed data parallelism for models defined using the C++ frontend.
* Allow overlap of gradient computation and reduction for non-CUDA models.
* Enable distributed data parallelism for models with some unused parameters.

This does not include any logic for computing bucket assignment, which
can be done separately; either by observing autograd execution order
(this is what Apex does), or by assigning buckets based on some
maximum byte size, or both.

Also see #17757 and #13273.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18251

Reviewed By: mrshenli

Differential Revision: D14571899

Pulled By: pietern

fbshipit-source-id: 20f95eefd288dfe8cfffe0a28ca22fa7c9c3cd4c
tools/build_variables.py
torch/CMakeLists.txt
torch/csrc/distributed/c10d/init.cpp
torch/csrc/distributed/c10d/reducer.cpp [new file with mode: 0644]
torch/csrc/distributed/c10d/reducer.h [new file with mode: 0644]
torch/distributed/distributed_c10d.py