Fork/join parallelism for ensemble export modules (#310)
Summary:
Pull Request resolved: https://github.com/pytorch/translate/pull/310
This adds fork/join parallelism to the EncoderEnsemble and DecoderBatchedStepEnsemble models. Note that when run in Python, these calls are no-op, and similarly we remove these calls before exporting to ONNX. But when we run in the PyTorch native runtime, we will now have the opportunity to run these sections in parallel.
Benchmark validation is pending me slogging through FBLearner Flow issues, as usual
Reviewed By: jmp84
Differential Revision:
D13827861
fbshipit-source-id:
0cb9df6e10c0ba64a6b81fa374e077bce90f1d5b