Remind users to set map_location properly when using DDP

author Shen Li <shenli@fb.com>

Tue, 9 Apr 2019 23:11:05 +0000 (16:11 -0700)

committer Facebook Github Bot <facebook-github-bot@users.noreply.github.com>

Tue, 9 Apr 2019 23:29:38 +0000 (16:29 -0700)
author Shen Li <shenli@fb.com>
Tue, 9 Apr 2019 23:11:05 +0000 (16:11 -0700)
committer Facebook Github Bot <facebook-github-bot@users.noreply.github.com>
Tue, 9 Apr 2019 23:29:38 +0000 (16:29 -0700)
diff --git a/torch/nn/parallel/distributed.py b/torch/nn/parallel/distributed.py

index 35a9695..e4878f9 100644 (file)
--- a/torch/nn/parallel/distributed.py
+++ b/torch/nn/parallel/distributed.py
@@ -83,6 +83,12 @@ class DistributedDataParallel(Module):
          Also note that ``nccl`` backend is currently the fastest and highly
          recommended backend for fp16/fp32 mixed-precision training.
  
+    .. note:: If you use ``torch.save`` on one process to checkpoint the module,
+        and ``torch.load`` on some other processes to recover it, make sure that
+        ``map_location`` is configured properly for every process. Without
+        ``map_location``, ``torch.load`` would recover the module to devices
+        where the module was saved from.
+
      .. warning::
          This module works only with the ``gloo`` and ``nccl`` backends.
author	Shen Li <shenli@fb.com>
	Tue, 9 Apr 2019 23:11:05 +0000 (16:11 -0700)
committer	Facebook Github Bot <facebook-github-bot@users.noreply.github.com>
	Tue, 9 Apr 2019 23:29:38 +0000 (16:29 -0700)