[Model Averaging] Allow subgroup to be None in PostLocalSGDState (#63277)
authorYi Wang <wayi@fb.com>
Mon, 16 Aug 2021 17:05:47 +0000 (10:05 -0700)
committerFacebook GitHub Bot <facebook-github-bot@users.noreply.github.com>
Mon, 16 Aug 2021 17:07:41 +0000 (10:07 -0700)
commit979180cd0118f9379680df9e39fc6acb80d7b80a
tree005fd05828634faefed530392f2694e8bc2157bf
parentd5d5f42ea977d61fe21f5986e664efa6609881ff
[Model Averaging] Allow subgroup to be None in PostLocalSGDState (#63277)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63277

`PostLocalSGDState` requires a subgroup. To initialize this subgroup, a global process group must be initialized. However, this imposes a restriction that a hook state can only be provided after distributed environment initialization, which is not compatible with lightning DDP plugin setup where hook state should be provided before distributed environment initialization.

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 135848575

Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD

Reviewed By: cbalioglu

Differential Revision: D30325041

fbshipit-source-id: 7b870166d096d306c3f2f7c69816a705cec0bebd
torch/distributed/algorithms/ddp_comm_hooks/post_localSGD_hook.py
torch/testing/_internal/distributed/distributed_test.py