[XLA] Initialize arrays using cudaMemset when possible.
authorJustin Lebar <jlebar@google.com>
Wed, 21 Mar 2018 14:33:03 +0000 (07:33 -0700)
committerTensorFlower Gardener <gardener@tensorflow.org>
Wed, 21 Mar 2018 14:35:27 +0000 (07:35 -0700)
commit39dd4ee6a3727a0eb30a8d5b8f39390383a1e761
tree58e4db7ae151fe0ccd093771bd2fb2eefd9c01ab
parentabd5b15ababbb5601f02691620d4d8e094cff64e
[XLA] Initialize arrays using cudaMemset when possible.

Previously we were using our own hand-rolled initializer thunk.  This
worked OK for reduces, because the amount of data we were initializing
is usually small.  But for e.g. select-and-scatter, it's quite slow.

This patch lets us use cudaMemset instead.

PiperOrigin-RevId: 189904720
tensorflow/compiler/xla/service/gpu/BUILD
tensorflow/compiler/xla/service/gpu/ir_emitter_unnested.cc
tensorflow/compiler/xla/service/gpu/ir_emitter_unnested.h
tensorflow/compiler/xla/service/gpu/memset_thunk.cc [new file with mode: 0644]
tensorflow/compiler/xla/service/gpu/memset_thunk.h [new file with mode: 0644]
tensorflow/compiler/xla/service/gpu/thunk.h
tensorflow/compiler/xla/tests/reduce_test.cc