Summary:
The CUDA initialization for the participating processes can
take long enough for the barrier timeout to trigger on the
process that doesn't participate in the group.
See #14676.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14689
Reviewed By: teng-li
Differential Revision:
D13293695
Pulled By: pietern
fbshipit-source-id:
6268dc9acfdb22f70c027e5e4be082f7127c0db4
dist.barrier(group_id)
self.assertGreaterEqual(time.time(), expected_time[0])
- self._barrier()
+ # Use higher timeout for the instance where the test runs
+ # against a subgroup and uses a CUDA tensor for expected time.
+ # The CUDA initialization for the participating processes can
+ # take long enough for the barrier timeout to trigger on the
+ # process that doesn't participate in the group.
+ self._barrier(timeout=20)
@skip_if_no_gpu
@unittest.skipIf(BACKEND == "mpi", "MPI doesn't supports GPU barrier")