Cudnn compatible GRU (from Cudnn library user guide):
```python
- r_t = sigma(x_t * W_r + h_t-1 * R_h + b_Wr + b_Rr) # reset gate
- u_t = sigma(x_t * W_u + h_t-1 * R_u + b_Wu + b_Ru) # update gate
- h'_t = tanh(x_t * W_h + r_t .* (h_t-1 * R_h + b_Rh) + b_Wh) # new memory gate
- h_t = (1 - u_t) .* h'_t + u_t .* h_t-1
+ # reset gate
+ $$r_t = \sigma(x_t * W_r + h_t-1 * R_h + b_{Wr} + b_{Rr})$$
+ # update gate
+ $$u_t = \sigma(x_t * W_u + h_t-1 * R_u + b_{Wu} + b_{Ru})$$
+ # new memory gate
+ $$h'_t = tanh(x_t * W_h + r_t .* (h_t-1 * R_h + b_{Rh}) + b_{Wh})$$
+ $$h_t = (1 - u_t) .* h'_t + u_t .* h_t-1$$
```
Other GRU (see @{tf.nn.rnn_cell.GRUCell} and @{tf.contrib.rnn.GRUBlockCell}):
```python
- h'_t = tanh(x_t * W_h + (r_t .* h_t-1) * R_h + b_Wh) # new memory gate
+ # new memory gate
+ \\(h'_t = tanh(x_t * W_h + (r_t .* h_t-1) * R_h + b_{Wh})\\)
```
which is not equivalent to Cudnn GRU: in addition to the extra bias term b_Rh,
```python
- r .* (h * R) != (r .* h) * R
+ \\(r .* (h * R) != (r .* h) * R\\)
```
"""
_clustering_ops = loader.load_op_library(
resource_loader.get_path_to_datafile('_clustering_ops.so'))
-# Euclidean distance between vectors U and V is defined as ||U - V||_F which is
-# the square root of the sum of the absolute squares of the elements difference.
+# Euclidean distance between vectors U and V is defined as \\(||U - V||_F\\)
+# which is the square root of the sum of the absolute squares of the elements
+# difference.
SQUARED_EUCLIDEAN_DISTANCE = 'squared_euclidean'
# Cosine distance between vectors U and V is defined as
-# 1 - (U \dot V) / (||U||_F ||V||_F)
+# \\(1 - (U \dot V) / (||U||_F ||V||_F)\\)
COSINE_DISTANCE = 'cosine'
RANDOM_INIT = 'random'
# Locally compute the sum of inputs mapped to each id.
# For a cluster with old cluster value x, old count n, and with data
# d_1,...d_k newly assigned to it, we recompute the new value as
- # x += (sum_i(d_i) - k * x) / (n + k).
- # Compute sum_i(d_i), see comment above.
+ # \\(x += (sum_i(d_i) - k * x) / (n + k)\\).
+ # Compute \\(sum_i(d_i)\\), see comment above.
cluster_center_updates = math_ops.unsorted_segment_sum(
inp, unique_idx, num_unique_cluster_idx)
# Shape to enable broadcasting count_updates and learning_rate to inp.
# Shape broadcasting.
probs = array_ops.expand_dims(self._probs[shard_id], 0)
# Membership weights are computed as:
- # w_{ik} = \frac{\alpha_k f(\mathbf{y_i}|\mathbf{\theta}_k)}
- # {\sum_{m=1}^{K}\alpha_mf(\mathbf{y_i}|\mathbf{\theta}_m)}
+ # $$w_{ik} = \frac{\alpha_k f(\mathbf{y_i}|\mathbf{\theta}_k)}$$
+ # $$ {\sum_{m=1}^{K}\alpha_mf(\mathbf{y_i}|\mathbf{\theta}_m)}$$
# where "i" is the i-th example, "k" is the k-th mixture, theta are
# the model parameters and y_i the observations.
# These are defined for each shard.
than `num_clusters`, a TensorFlow runtime error occurs.
distance_metric: The distance metric used for clustering. One of:
* `KMeansClustering.SQUARED_EUCLIDEAN_DISTANCE`: Euclidean distance
- between vectors `u` and `v` is defined as `||u - v||_2` which is
- the square root of the sum of the absolute squares of the elements'
- difference.
+ between vectors `u` and `v` is defined as `\\(||u - v||_2\\)`
+ which is the square root of the sum of the absolute squares of
+ the elements' difference.
* `KMeansClustering.COSINE_DISTANCE`: Cosine distance between vectors
- `u` and `v` is defined as `1 - (u . v) / (||u||_2 ||v||_2)`.
+ `u` and `v` is defined as `\\(1 - (u . v) / (||u||_2 ||v||_2)\\)`.
random_seed: Python integer. Seed for PRNG used to initialize centers.
use_mini_batch: A boolean specifying whether to use the mini-batch k-means
algorithm. See explanation above.
name=WALSMatrixFactorization.LOSS,
collections=[ops.GraphKeys.GLOBAL_VARIABLES])
# The root weighted squared error =
- # \sqrt( \sum_{i,j} w_ij * (a_ij - r_ij)^2 / \sum_{i,j} w_ij )
+ # \\(\sqrt( \sum_{i,j} w_ij * (a_ij - r_ij)^2 / \sum_{i,j} w_ij )\\)
rwse_var = variable_scope.variable(
0.,
trainable=False,
and the problem simplifies to ALS. Note that, in this case,
col_weights must also be set to "None".
- List of lists of non-negative scalars, of the form
- [[w_0, w_1, ...], [w_k, ... ], [...]],
+ \\([[w_0, w_1, ...], [w_k, ... ], [...]]\\),
where the number of inner lists equal to the number of row factor
shards and the elements in each inner list are the weights for the
rows of that shard. In this case,
- w_ij = unonbserved_weight + row_weights[i] * col_weights[j].
+ \\(w_ij = unonbserved_weight + row_weights[i] * col_weights[j]\\).
- A non-negative scalar: This value is used for all row weights.
Note that it is allowed to have row_weights as a list and col_weights
as a scalar, or vice-versa.