review.tizen.org Git - platform/upstream/pytorch.git/commit

Make NCCL backend support barrier op (#14142)

Summary:
This is a feature request from: https://github.com/pytorch/pytorch/issues/13573

As the title says, this PR makes NCCL backend support barrier op.

There are a couple scenarios that need to be addressed:
(1) When there is already a NCCL op happened, we need to record what GPU device(s) the previous op happened and queue the allreduce barrier op on the same GPU device
(2) When there is no NCCL op yet, we will try to use a single GPU and separate each process from a single GPU as the best effort.

As for the async work, during wait, we would like not just wait on the NCCL kernel to be completed, but also block the thread until the current stream and nccl stream return.

`test_distributed` should cover the test. I also manually tested both scenarios.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14142

Differential Revision: D13113391

Pulled By: teng-li

fbshipit-source-id: 96c33d4d129e2977e6892d85d0fc449424c35499

author	Teng Li <tengli@fb.com>
	Wed, 21 Nov 2018 05:10:18 +0000 (21:10 -0800)
committer	Facebook Github Bot <facebook-github-bot@users.noreply.github.com>
	Wed, 21 Nov 2018 05:12:22 +0000 (21:12 -0800)
commit	bb301a431da886d994fb0155185ab7567d71279d
tree	f628bed8f8d2dc545ff9773d9d8685e673b7e668	tree \| snapshot
parent	1acaafbe7076ca0ef391fe63a40c13d3275d7735	commit \| diff

test/test_distributed.py		diff \| blob \| history
torch/lib/c10d/ProcessGroupNCCL.cpp		diff \| blob \| history
torch/lib/c10d/ProcessGroupNCCL.hpp		diff \| blob \| history