Highlight NCCL all_reduce and all_gather requirements (#18741)
authorShen Li <shenli@fb.com>
Wed, 3 Apr 2019 16:06:09 +0000 (09:06 -0700)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Wed, 3 Apr 2019 16:50:29 +0000 (09:50 -0700)
Summary:
See #18689
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18741

Differential Revision: D14726874

Pulled By: mrshenli

fbshipit-source-id: a92404c653e3c62fc23fa3ccacfb3b2959b2e307

torch/distributed/distributed_c10d.py

index 1fea9d5..ffa989b 100644 (file)
@@ -300,7 +300,10 @@ def init_process_group(backend,
             build-time configurations, valid values include ``mpi``, ``gloo``,
             and ``nccl``. This field should be given as a lowercase string
             (e.g., ``"gloo"``), which can also be accessed via
-            :class:`Backend` attributes (e.g., ``Backend.GLOO``).
+            :class:`Backend` attributes (e.g., ``Backend.GLOO``). If using
+            multiple processes per machine with ``nccl`` backend, each process
+            must have exclusive access to every GPU it uses, as sharing GPUs
+            between processes can result in deadlocks.
         init_method (str, optional): URL specifying how to initialize the
                                      process group.
         world_size (int, optional): Number of processes participating in