While revisiting 18016 I noticed that Bahdanau attention has a similiar
dtype mismatch issue when normalized=True. The issue comes from:
```
g = variable_scope.get_variable(
"attention_g", dtype=dtype,
initializer=math.sqrt((1. / num_units)))
```
where the initializer value does not work well with differnt dtype.
This fix converts changes the initializer to `init_ops.constant_initializer`
to address the issue, and adds additional test cases for it.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
# Scalar used in weight normalization
g = variable_scope.get_variable(
"attention_g", dtype=dtype,
- initializer=math.sqrt((1. / num_units)))
+ initializer=init_ops.constant_initializer(math.sqrt((1. / num_units))), shape=())
# Bias added prior to the nonlinearity
b = variable_scope.get_variable(
"attention_b", [num_units], dtype=dtype,