From: Howard Huang Date: Fri, 17 Sep 2021 14:55:01 +0000 (-0700) Subject: Fix port allocation race condition for elastic test (#65149) X-Git-Tag: accepted/tizen/8.0/unified/20231005.095509~125 X-Git-Url: http://review.tizen.org/git/?a=commitdiff_plain;h=a95fabfecbab9f43f87dd13ae88ffd29cafd7c45;p=platform%2Fupstream%2Fpytorch.git Fix port allocation race condition for elastic test (#65149) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65149 Fixes #64789 There is a race condition between when the free port is acquired to when it is used to create the store in which it may have been used. Since this test only tests that timeout is triggered for tcpstore, we can bind to any port on tcpstore creation. This only affects the test on the server (since that is where the port is used), but I changed both tests for clarity cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D30993166 Pulled By: H-Huang fbshipit-source-id: eac4f28d641ac87c4ebee89df83f90955144f2f1 --- diff --git a/test/distributed/elastic/utils/distributed_test.py b/test/distributed/elastic/utils/distributed_test.py index 5a31ee0..e3c1de3 100644 --- a/test/distributed/elastic/utils/distributed_test.py +++ b/test/distributed/elastic/utils/distributed_test.py @@ -84,22 +84,22 @@ class DistributedUtilTest(unittest.TestCase): def test_create_store_timeout_on_server(self): with self.assertRaises(TimeoutError): - port = get_free_port() + # use any available port (port 0) since timeout is expected create_c10d_store( is_server=True, server_addr=socket.gethostname(), - server_port=port, + server_port=0, world_size=2, timeout=1, ) def test_create_store_timeout_on_worker(self): with self.assertRaises(TimeoutError): - port = get_free_port() + # use any available port (port 0) since timeout is expected create_c10d_store( is_server=False, server_addr=socket.gethostname(), - server_port=port, + server_port=0, world_size=2, timeout=1, )