[DataPipe] Remove ZipArchiveReader's dependency on FileLoader (#64786)
authorKevin Tse <ktse@fb.com>
Fri, 10 Sep 2021 21:22:36 +0000 (14:22 -0700)
committerFacebook GitHub Bot <facebook-github-bot@users.noreply.github.com>
Fri, 10 Sep 2021 23:49:17 +0000 (16:49 -0700)
commitf3f410880a71068bf2649efd977167972a85274b
tree0e0cadd1576bead2bda09896194ee7ba3281d947
parent717d267e191bcc1669acad21d87ffb70e6e89b90
[DataPipe] Remove ZipArchiveReader's dependency on FileLoader (#64786)

Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* https://github.com/pytorch/pytorch/issues/64788
* __->__ https://github.com/pytorch/pytorch/issues/64786

This PR removes ZipArchiveReader's dependency on FileLoader DataPipe, by allowing it to use a IterDataPipe of path names as input rather than a tuple of path name and a stream.

It also adds additional tests to ensure that the DataPipe is functioning properly when it is read multiple times or reset half way through reading.

The whole stack fixes issues related to unclosed buffer stream (see https://github.com/pytorch/pytorch/issues/64281).

cc VitalyFedyunin ejguan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64786

Reviewed By: ngimel

Differential Revision: D30870968

Pulled By: NivekT

fbshipit-source-id: 64b04d1697b99772f2fa20fc141668e6b8e18c41
test/test_datapipe.py
torch/utils/data/datapipes/iter/ziparchivereader.py