[XLA:GPU] Unroll multi-output loop fusions
authorBenjamin Kramer <kramerb@google.com>
Thu, 17 May 2018 18:06:05 +0000 (11:06 -0700)
committerTensorFlower Gardener <gardener@tensorflow.org>
Thu, 17 May 2018 18:08:44 +0000 (11:08 -0700)
commit0f8be44de22a344ce6aac1e2cee8595b7c89d9f8
tree7ce2b045133ae1ab620d20912902fdcbae810940
parent18cb26be8d14dbb46b79cbe2256857a3d14c51d1
[XLA:GPU] Unroll multi-output loop fusions

This is easier than I thought because we can assume that all tuple members have
the same number of elements. LLVM doesn't do a great job of vectorizing the
resulting stores, but otherwise this is working fine.

PiperOrigin-RevId: 197019718
tensorflow/compiler/xla/service/gpu/ir_emitter_unnested.cc
tensorflow/compiler/xla/tests/multioutput_fusion_test.cc