Add support for some multi-store cases in affine fusion
This PR is a stepping stone towards supporting generic multi-store
source loop nests in affine loop fusion. It extends the algorithm to
support fusion of multi-store loop nests that:
1. have only one store that writes to a function-local live out, and
2. the remaining stores are involved in loop nest self dependences
or no dependences within the function.
Closes tensorflow/mlir#162
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/162 from dcaballe:dcaballe/multi-output-fusion
7fb7dec6fe8b45f5ce176f018bfe37b256420c45
PiperOrigin-RevId:
273773907