Summary:
It is possible that some sort of contention causes process scheduling
delays which in turn cause the timeout to *not* be hit.
Increased sleep here will decrease the probability of this happening.
Fixes #14555.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14814
Differential Revision:
D13351924
Pulled By: pietern
fbshipit-source-id:
1222cf0855408dfcb79f30f94694c790ee998cf9
# Sleep on one of the processes to trigger barrier timeout
if self.rank == 0:
- time.sleep(0.6)
+ time.sleep(1.0)
# The barrier will now time out
with self.assertRaisesRegex(RuntimeError, " (Timed out|closed) "):