Make DistributedDataParallel use new reducer (#18953)
authorPieter Noordhuis <pietern@fb.com>
Mon, 15 Apr 2019 19:24:43 +0000 (12:24 -0700)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Mon, 15 Apr 2019 19:44:38 +0000 (12:44 -0700)
commita0263ec04765fb9b20decc178241d2668ea58cee
tree8d4f54a03fd487547de9ee8baea4106cde1e91ce
parent6ed57e052dbf603cc8c5cdb31f5404b67ab7c768
Make DistributedDataParallel use new reducer (#18953)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18953

This removes Python side bucketing code from DistributedDataParallel
and replaces it with calls to the new C++ based bucketing and reducing
code. To confirm this is working well, we ran a test with both the
previous implementation and the new implementation, and confirmed they
are numerically equivalent.

Performance is improved by a couple percent or more, including the
single machine multiple GPU runs.

Closes #13273.

Reviewed By: mrshenli

Differential Revision: D14580911

fbshipit-source-id: 44e76f8b0b7e58dd6c91644e3df4660ca2ee4ae2
test/test_c10d.py
torch/csrc/distributed/c10d/init.cpp
torch/csrc/distributed/c10d/reducer.cpp
torch/csrc/distributed/c10d/reducer.h
torch/nn/parallel/distributed.py