Fix flaky store timeout test (#19114)
authorShen Li <cs.shenli@gmail.com>
Thu, 11 Apr 2019 03:30:46 +0000 (20:30 -0700)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Thu, 11 Apr 2019 03:35:36 +0000 (20:35 -0700)
commit6b0ca8eae5d663ad3db560b428abcef465f09dbb
tree314a2269b3d7c6adf74ce2565dd6f4f8a257f46d
parent821b5f138a987807032a2fd908fe10a5be5439d9
Fix flaky store timeout test (#19114)

Summary:
~Sometimes, `init_process_group()`, `store.get()`, and `destory_process_group()` can take more than a few seconds. Hence, removing thread join timeout.~

The error was due to `Address already in use` when starting TPC backend. The solution is to catch the error and report it to the `retry_on_address_already_in_use_error` decorator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19114

Reviewed By: ezyang

Differential Revision: D14872680

Pulled By: mrshenli

fbshipit-source-id: fc504d02853ca73f76288c0ade564ab20bc01f7e
test/test_c10d.py