Fix TestDataLoader.test_proper_exit (#15665)
Summary:
Currently, in `test_proper_exit`,
1. we do not kill the correct input `pid` in the `kill_pid` function
https://github.com/pytorch/pytorch/blob/
fe15d6a2c231a7bc1b32781217ed336ccf9adff7/test/test_dataloader.py#L325-L329
2. the Windows command that detects process status doesn't actually work
https://github.com/pytorch/pytorch/blob/
fe15d6a2c231a7bc1b32781217ed336ccf9adff7/test/test_dataloader.py#L641-L646
3. `worker_error` and `worker_kill` cases (sometimes?) are not tested because the workers may exit naturally due to the pre-fetching mechanism and a too small `dataset size / batch size`.
In this PR, I, in separate commits:
1. Install `psutil` (a python package specifically built for process monitoring) on some CI builds. (Linux builds installation are done in https://github.com/pietern/pytorch-dockerfiles/pull/29 https://github.com/pietern/pytorch-dockerfiles/pull/30 https://github.com/pytorch/ossci-job-dsl/pull/36 and https://github.com/pytorch/pytorch/pull/15795).
2. Rewrite `test_proper_exit` with `psutil` so we
1. do not rely on the hacky `is_process_alive` https://github.com/pytorch/pytorch/blob/
fe15d6a2c231a7bc1b32781217ed336ccf9adff7/test/test_dataloader.py#L640-L653
2. increase the #task per worker so `worker_error` and `worker_kill` properly trigger
3. test error message content to ensure that the loader exits with correct message corresponding to each exiting scenario.
3. Fix Windows data loader not having any mechanism to detect worker failures.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15665
Differential Revision:
D13615527
Pulled By: soumith
fbshipit-source-id:
cfb2f67837d2d87928a53f00b4d20f09754b7949