Refactor dataloader.py (#15331)
authorSsnL <tongzhou.wang.1994@gmail.com>
Wed, 19 Dec 2018 20:26:44 +0000 (12:26 -0800)
committerFacebook Github Bot <facebook-github-bot@users.noreply.github.com>
Wed, 19 Dec 2018 20:36:03 +0000 (12:36 -0800)
commit9217bde807115bf8e161dc54faeed0851a247780
treee0976a53264ecd4fd229f7da63ae8c0c1a0b2816
parent41e7e1bc40ba8bb50b84afc3addf514549d23336
Refactor dataloader.py (#15331)

Summary:
Same as #14668, and was approved there.

ailzhang , please apply this patch to Horizon's `data_streamer.py`: https://gist.github.com/SsnL/020fdb3d6b7016d81b6ba1d04cc41459 Thank you!

Below is the original description at #14668:

As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse.

So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this.

No functionality is changed, except that  I added `torch._six.queue`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15331

Reviewed By: yf225

Differential Revision: D13503120

Pulled By: ailzhang

fbshipit-source-id: 94df16b4d80ad1102c437cde0d5a2e62cffe1f8e
test/test_dataloader.py
torch/_six.py
torch/csrc/DataLoader.cpp
torch/utils/data/__init__.py
torch/utils/data/_utils/__init__.py [new file with mode: 0644]
torch/utils/data/_utils/collate.py [new file with mode: 0644]
torch/utils/data/_utils/pin_memory.py [new file with mode: 0644]
torch/utils/data/_utils/signal_handling.py [new file with mode: 0644]
torch/utils/data/_utils/worker.py [new file with mode: 0644]
torch/utils/data/dataloader.py