Add timeout kwarg to init_process_group (#14435)
Summary:
This applies to the gloo backend only. Timeout support for the NCCL and
MPI backends is tracked in issues #14371 and #14372 respectively.
When creating a new process group (either the global one or any subgroup
created through `new_group`) you can specify a timeout keyword
argument (of type datetime.timedelta). This timeout applies to all
collective operations executed against that process group, such that any
operation taking longer than the timeout will throw a runtime error.
Using a different, better catchable error type is tracked in #14433.
This fixes #14376.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14435
Differential Revision:
D13234317
Pulled By: pietern
fbshipit-source-id:
973993b67994dc64861c0977cbb6f051ec9d87f6