Increase test barrier timeout for barrier test (#14689)
authorPieter Noordhuis <pietern@fb.com>
Mon, 3 Dec 2018 01:41:36 +0000 (17:41 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Mon, 3 Dec 2018 01:46:17 +0000 (17:46 -0800)
Summary:
The CUDA initialization for the participating processes can
take long enough for the barrier timeout to trigger on the
process that doesn't participate in the group.

See #14676.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14689

Reviewed By: teng-li

Differential Revision: D13293695

Pulled By: pietern

fbshipit-source-id: 6268dc9acfdb22f70c027e5e4be082f7127c0db4

test/test_distributed.py

index 072c2bc..9a1c96d 100644 (file)
@@ -1016,7 +1016,12 @@ class _DistTestBase(object):
                 dist.barrier(group_id)
                 self.assertGreaterEqual(time.time(), expected_time[0])
 
-        self._barrier()
+        # Use higher timeout for the instance where the test runs
+        # against a subgroup and uses a CUDA tensor for expected time.
+        # The CUDA initialization for the participating processes can
+        # take long enough for the barrier timeout to trigger on the
+        # process that doesn't participate in the group.
+        self._barrier(timeout=20)
 
     @skip_if_no_gpu
     @unittest.skipIf(BACKEND == "mpi", "MPI doesn't supports GPU barrier")