Fix multi-argument allreduce in ProcessGroupGloo (#14688)
Summary:
If multiple arguments are specified to c10d allreduce, they are
interpreted as if they are expanding the ranks in the process group.
Therefore, not only is every argument to allreduce an input that must
be considered, it is also an output. The problem that this commit
fixes is that they were not correctly considered as outputs.
The upstream problem is tracked in facebookincubator/gloo#152. Once
this is fixed there we can remove the copies that this commit adds.
This fixes #14676.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14688
Differential Revision:
D13294405
Pulled By: pietern
fbshipit-source-id:
078a2a0a0ff12d051392461438f1496201ec3cb9