[mlir][scf][Transform] Refactor transform.fuse_into_containing_op so it is iterative and supports output fusion.
This revision revisits the implementation of `transform.fuse_into_containing_op` so that it iterates on
producers one use at a time.
Support is added to fuse a producer through a foreach_thread shared tensor argument, in which case we
tile and fuse the op inside the containing op and update the shared tensor argument to the unique destination operand.
If one cannot find such a unique destination operand the transform fails.
Differential Revision: https://reviews.llvm.org/D134051