[XLA:GPU] Fuse broadcasts into reduction fusions
authorBenjamin Kramer <kramerb@google.com>
Wed, 7 Mar 2018 14:28:00 +0000 (06:28 -0800)
committerTensorFlower Gardener <gardener@tensorflow.org>
Wed, 7 Mar 2018 14:32:08 +0000 (06:32 -0800)
commitb2fcd7d80af4b7be7501135e043ef89ac9e65cb4
tree9fa2b243aa214afdf01ff816c314193d472e778e
parent358fd36d0f2c23b725bf952d7c919e7d704a45ec
[XLA:GPU] Fuse broadcasts into reduction fusions

We didn't do this because reconstructing a layout was hard. With
layout_assignment before fusion this becomes much easier. Remove the
limitations.

PiperOrigin-RevId: 188167436
tensorflow/compiler/xla/service/gpu/instruction_fusion.cc
tensorflow/compiler/xla/service/gpu/instruction_fusion_test.cc
tensorflow/compiler/xla/service/gpu/ir_emitter_unnested.cc