Implement reference counting for shared IPC CUDA tensors (#16854)
authorVitaly Fedyunin <vitalyf@fb.com>
Mon, 25 Mar 2019 17:18:29 +0000 (10:18 -0700)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Mon, 25 Mar 2019 17:24:38 +0000 (10:24 -0700)
commit5653a914f757a032e27d74d44ad90c40149deadb
treeb4c499b4efe8c4a6a8c54ed40e4e9b3caf865869
parentf5ea52868777b63283200c2261e85001999913f5
Implement reference counting for shared IPC CUDA tensors (#16854)

Summary:
This is to fix #16141 and similar issues.

The idea is to track a reference to every shared CUDA Storage and deallocate memory only after a consumer process deallocates received Storage.

ezyang Done with cleanup. Same (insignificantly better) performance as in file-per-share solution, but handles millions of shared tensors easily. Note [ ] documentation in progress.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16854

Differential Revision: D13994490

Pulled By: VitalyFedyunin

fbshipit-source-id: 565148ec3ac4fafb32d37fde0486b325bed6fbd1
15 files changed:
c10/core/StorageImpl.h
c10/cuda/CUDACachingAllocator.cpp
c10/cuda/CUDACachingAllocator.h
docs/source/multiprocessing.rst
test/test_multiprocessing.py
torch/CMakeLists.txt
torch/csrc/CudaIPCTypes.cpp [new file with mode: 0644]
torch/csrc/CudaIPCTypes.h [new file with mode: 0644]
torch/csrc/Storage.cpp
torch/csrc/cuda/Module.cpp
torch/csrc/cuda/Storage.cpp
torch/csrc/generic/StorageSharing.cpp
torch/cuda/__init__.py
torch/multiprocessing/cuda_multiprocessing.md [new file with mode: 0644]
torch/multiprocessing/reductions.py