Always use the local worker name in CreateWorkerSession when not doing ClusterSpec...
authorDerek Murray <mrry@google.com>
Fri, 20 Apr 2018 22:38:06 +0000 (15:38 -0700)
committerTensorFlower Gardener <gardener@tensorflow.org>
Fri, 20 Apr 2018 22:40:46 +0000 (15:40 -0700)
commitb2f786867dca85b6b848f09f2c1d40dd123fc0fc
treeed52d548cf956a445cefcfdb456babe1b150cac6
parentcadbb0b70b9441388a04533433245ac85f2887a9
Always use the local worker name in CreateWorkerSession when not doing ClusterSpec propagation.

Previously, the master would send a job name and task index in an
otherwise-empty ServerDef, and the worker would unquestioningly use
those to build its worker name. However, this would lead to errors if
the worker had a local name like "/job:worker/replica:1/task:0",
because the ServerDef doesn't support non-zero replica IDs, and so the
local worker would end up an inconsistent view of what its worker name
should be. In particular `WorkerSession::worker_name` would disagree
with the device names added during graph partitioning by the master,
which would lead to runtime failures ("InvalidArgumentError: Invalid
rendezvous key").

PiperOrigin-RevId: 193733855
tensorflow/core/distributed_runtime/BUILD
tensorflow/core/distributed_runtime/master_session.cc
tensorflow/core/distributed_runtime/session_mgr.cc
tensorflow/core/distributed_runtime/session_mgr_test.cc