[CUDA graphs] Error if attempting to capture uncapturable nccl (#64440)
authorMichael Carilli <mcarilli@gmail.com>
Fri, 3 Sep 2021 20:21:23 +0000 (13:21 -0700)
committerFacebook GitHub Bot <facebook-github-bot@users.noreply.github.com>
Fri, 3 Sep 2021 20:23:07 +0000 (13:23 -0700)
commite4ff14ad5955f7c4d052aa44069c77654e8b5f2e
tree930326d5f21cd8a6ae91465092f461fe5cdc9043
parent0e3b45eaefbef29c36f0198195022a1e4088b3e0
[CUDA graphs] Error if attempting to capture uncapturable nccl (#64440)

Summary:
NCCL < 2.9.6 is not capturable. Attempting to capture it can cause nasty behavior (for example, ive seen capture succeed, but replay silently hang). Pytorch should preempt this with a friendlier error.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64440

Reviewed By: mruberry

Differential Revision: D30733884

Pulled By: ngimel

fbshipit-source-id: 5f2df3cf5cc0e5e68f49bf22a80d9f58064dc7ec
torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp