[TF:XLA:INTERPRETER] speed up select and scatter by avoiding memory allocation in loops
HandleSelectAndScatter() has 2 IterateThroughWindow() blocks. Before, we spent (in percent total program time):
11.98% Literal::CreateR0() = 10.82% (block1) + 1.16% (block2)
4.91% Literal::~Literal() = 4.44% (block1) + 0.51% (block2)
1.52% operator delete = 1.38% (block1) + 0.14% (block2)
=====
18.41% total
After:
1.99% Literal::~Literal() = 1.83% (block1) + 0.16% (block2)
0.68% operator delete = 0.61% (block1) + 0.07% (block2)
=====
2.67% total
PiperOrigin-RevId:
196844177