Change Softmax on CUDA to use fp32 for denominator when input/output are fp16.
authorJames Qin <jamesqin@google.com>
Wed, 21 Mar 2018 22:55:30 +0000 (15:55 -0700)
committerTensorFlower Gardener <gardener@tensorflow.org>
Wed, 21 Mar 2018 22:58:16 +0000 (15:58 -0700)
commit942a32bc71291994c14625b6311268319dd27808
tree1ed34c04d06867fd34ef2dcba46351fb7fe6c5bc
parent9cd65e9a9081640934b2b78cf84b6e51ddd69796
Change Softmax on CUDA to use fp32 for denominator when input/output are fp16.

This avoids potential overflow in the denominator, also makes sure accumulation is done
in high precision.

PiperOrigin-RevId: 189982655
tensorflow/core/kernels/softmax_op_gpu.cu.cc
tensorflow/python/framework/test_util.py
tensorflow/python/kernel_tests/BUILD
tensorflow/python/kernel_tests/softmax_op_test.py