Reenable all forward-pass fusions that worked before the AD fix (#14558)
Summary:
Dealing with so many `aten::size` calls (in particular calls on elements computed inside fusion groups) requires us to do some extra graph processing in the fuser (to compute the sizes by explicit broadcasts, instead of writing the intermediate tensors only to check their size). This restores the forward expects of LSTM and MiLSTM to a single big kernel. Unfortunately the backward is much harder, because as long as we can't prove that the reductions are unnecessary (or if we can't distribute them over the op), we will not be able to fuse them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14558
Differential Revision:
D13321748
Pulled By: zou3519
fbshipit-source-id:
c04fc2f70d106d2bfb56206b5aec517a93b79d1f