[TF:XLA:INTERPRETER] speed up select and scatter by avoiding memory allocation in...
authorNick Desaulniers <ndesaulniers@google.com>
Wed, 16 May 2018 17:03:12 +0000 (10:03 -0700)
committerTensorFlower Gardener <gardener@tensorflow.org>
Wed, 16 May 2018 17:08:13 +0000 (10:08 -0700)
commit1cb3552c019d351bf740457e7d14da54324c5921
tree7b2d041ffd17b22c3164602dbbb7ae72ada2ae1d
parentf48c4115438f764a5d08e155275fa21f581ff55e
[TF:XLA:INTERPRETER] speed up select and scatter by avoiding memory allocation in loops

HandleSelectAndScatter() has 2 IterateThroughWindow() blocks. Before, we spent (in percent total program time):
11.98% Literal::CreateR0() = 10.82% (block1) + 1.16% (block2)
 4.91% Literal::~Literal() =  4.44% (block1) + 0.51% (block2)
 1.52% operator delete     =  1.38% (block1) + 0.14% (block2)
=====
18.41% total

After:
 1.99% Literal::~Literal() =  1.83% (block1) + 0.16% (block2)
 0.68% operator delete     =  0.61% (block1) + 0.07% (block2)
=====
 2.67% total
PiperOrigin-RevId: 196844177
tensorflow/compiler/xla/service/hlo_evaluator_typed_visitor.h