[XLA:GPU] Don't crash when the root instruction of a computation is a multi-output...
authorJustin Lebar <jlebar@google.com>
Wed, 14 Feb 2018 01:16:29 +0000 (17:16 -0800)
committerTensorFlower Gardener <gardener@tensorflow.org>
Wed, 14 Feb 2018 01:20:27 +0000 (17:20 -0800)
commit7575f334ee0879825ceed23928f5e99d0f71b5f8
tree52bade9cb1b2173ae8bb680ba6b8f477260827ff
parentcf04f92c340b6fb0207eb780959a12fa03356f77
[XLA:GPU] Don't crash when the root instruction of a computation is a multi-output fusion node, and avoid some pointer chasing with tuples.

Previously, the kernels we generated would have one argument per
*top-level* buffer of the input/output.  This was fine for inputs.  But
it doesn't work for outputs: Imagine you're a node that returns a tuple
-- e.g. multi-output fusion -- if all you get is a pointer to the
top-level buffer of your output (which should contain pointers to the
lower-level buffers at some point, but at the moment is just empty), how
are you supposed to figure out where to write your output?

(This usually worked because most of the time your output would live
inside of the big XLA temp buffer, and kernels always get a pointer to
that.)

Now we pass all the buffers, top-level and otherwise, to our kernel.  In
addition, we're now willing to dereference statically tuples that live
entirely in XLA's temp buffer.  Pointers in input tuples must still be
dereferenced dynamically, because the caller has the option of giving us
these values or not when invoking XLA.

This change makes some parts of BufferAssignment/BufferAllocations more
truthful.  Previously, if you passed a tuple-shaped input to XLA, we'd
say in BufferAllocations that the pointer for some subshape of the param
was the *top-level tuple pointer*.  XLA then knew that this was a lie
and would dereference it accordingly.  Now we have an explicit notion of
a BufferAllocation pointing to a subshape of an input parameter.

PiperOrigin-RevId: 185614060
14 files changed:
tensorflow/compiler/xla/service/buffer_assignment.cc
tensorflow/compiler/xla/service/buffer_assignment.h
tensorflow/compiler/xla/service/gpu/buffer_allocations.cc
tensorflow/compiler/xla/service/gpu/gpu_executable.cc
tensorflow/compiler/xla/service/gpu/hlo_to_ir_bindings.h
tensorflow/compiler/xla/service/gpu/ir_emitter_unnested.cc
tensorflow/compiler/xla/service/gpu/ir_emitter_unnested.h
tensorflow/compiler/xla/service/gpu/kernel_thunk.cc
tensorflow/compiler/xla/service/gpu/kernel_thunk.h
tensorflow/compiler/xla/service/hlo.proto
tensorflow/compiler/xla/shape_util.h
tensorflow/compiler/xla/tests/BUILD
tensorflow/compiler/xla/tests/multioutput_fusion_test.cc
tensorflow/compiler/xla/tests/tuple_test.cc