Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16294
In `ReinitializeTensor`, we compare `tensor->GetDevice()` and `options.device()`, but in the callsite, we actually just provide an option with `device_type`, which means the `device_id` will always be default(-1) for `options`, but for tensor, although it is passed a `device` with default `device_id`, when we allocate the data, the `device` of the `tensor` is the `device` of `Storage`, which is the `device` of underlying `DataPtr`, which is the same as the `device` of the `Context` of the operator, which has a non-default `device_id`.
Therefore everytime we do `ReinitializeTensor`, we'll find the `device` does not match, and after the `ReinitializeTensor` call, the `device` still does not match. That's why everytime we'll allocate a new Tensor and cause perf regressions for ops that uses `ReinitializeTensor` on multiple GPUs.
Reviewed By: BIT-silence
Differential Revision:
D13795635
fbshipit-source-id:
24d6afa1a0196a32eb0134ee08b4280244cdb0c3
at::TensorOptions options) {
CAFFE_ENFORCE(options.device_opt() != c10::nullopt);
if (*tensor) {
- if (tensor->GetDevice() == options.device()) {
+ // Note: we don't compare device_id here because of the purpose of
+ // ReinitializeTensor: https://github.com/pytorch/pytorch/pull/13147
+ // In the original code, we don't have device_id defined, therefore, we should not
+ // include device_id in the comparison
+ if (tensor->GetDeviceType() == options.device().type()) {
if (tensor->sizes() != dims) {
// Resize when the dims doesn't match
tensor->Resize(dims);