"Autoregressive models decompose the joint density as a product of
conditionals, and model each conditional in turn. Normalizing flows
transform a base density (e.g. a standard Gaussian) into the target density
- by an invertible transformation with tractable Jacobian." [1]
+ by an invertible transformation with tractable Jacobian." [(Papamakarios et
+ al., 2016)][1]
In other words, the "autoregressive property" is equivalent to the
decomposition, `p(x) = prod{ p(x[i] | x[0:i]) : i=0, ..., d }`. The provided
Practically speaking the autoregressive property means that there exists a
permutation of the event coordinates such that each coordinate is a
- diffeomorphic function of only preceding coordinates. [2]
+ diffeomorphic function of only preceding coordinates
+ [(van den Oord et al., 2016)][2].
#### Mathematical Details
- The probability function is,
+ The probability function is
```none
prob(x; fn, n) = fn(x).prob(x)
```
- And a sample is generated by,
+ And a sample is generated by
```none
x = fn(...fn(fn(x0).sample()).sample()).sample()
```
- [1]: "Masked Autoregressive Flow for Density Estimation."
- George Papamakarios, Theo Pavlakou, Iain Murray. Arxiv. 2017.
- https://arxiv.org/abs/1705.07057
+ #### References
- [2]: "Conditional Image Generation with PixelCNN Decoders."
- Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex
- Graves, Koray Kavukcuoglu. Arxiv, 2016.
+ [1]: George Papamakarios, Theo Pavlakou, and Iain Murray. Masked
+ Autoregressive Flow for Density Estimation. In _Neural Information
+ Processing Systems_, 2017. https://arxiv.org/abs/1705.07057
+
+ [2]: Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt,
+ Alex Graves, and Koray Kavukcuoglu. Conditional Image Generation with
+ PixelCNN Decoders. In _Neural Information Processing Systems_, 2016.
https://arxiv.org/abs/1606.05328
"""
matrices, i.e., the matmul is [matrix-free](
https://en.wikipedia.org/wiki/Matrix-free_methods) when possible.
- Examples:
+ #### Examples
```python
# Y = X
class BatchNormalization(bijector.Bijector):
"""Compute `Y = g(X) s.t. X = g^-1(Y) = (Y - mean(Y)) / std(Y)`.
- Applies Batch Normalization [1] to samples from a data distribution. This can
- be used to stabilize training of normalizing flows [2, 3].
+ Applies Batch Normalization [(Ioffe and Szegedy, 2015)][1] to samples from a
+ data distribution. This can be used to stabilize training of normalizing
+ flows ([Papamakarios et al., 2016][3]; [Dinh et al., 2017][2])
When training Deep Neural Networks (DNNs), it is common practice to
normalize or whiten features by shifting them to have zero mean and
scaling them to have unit variance.
- The `inverse()` method of the BatchNorm bijector, which is used in the
- log-likelihood computation of data samples, implements the normalization
+ The `inverse()` method of the `BatchNormalization` bijector, which is used in
+ the log-likelihood computation of data samples, implements the normalization
procedure (shift-and-scale) using the mean and standard deviation of the
current minibatch.
`X*std(Y) + mean(Y)` with the running-average mean and standard deviation
computed at training-time. De-normalization is useful for sampling.
-
```python
dist = tfd.TransformedDistribution(
`BatchNorm.forward(BatchNorm.inverse(...))` will be identical when
`training=False` but may be different when `training=True`.
- [1]: "Batch Normalization: Accelerating Deep Network Training by Reducing
- Internal Covariate Shift."
- Sergey Ioffe, Christian Szegedy. Arxiv. 2015.
- https://arxiv.org/abs/1502.03167
+ #### References
- [2]: "Density Estimation using Real NVP."
- Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio. ICLR. 2017.
- https://arxiv.org/abs/1605.08803
+ [1]: Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating
+ Deep Network Training by Reducing Internal Covariate Shift. In
+ _International Conference on Machine Learning_, 2015.
+ https://arxiv.org/abs/1502.03167
- [3]: "Masked Autoregressive Flow for Density Estimation."
- George Papamakarios, Theo Pavlakou, Iain Murray. Arxiv. 2017.
- https://arxiv.org/abs/1705.07057
+ [2]: Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density Estimation
+ using Real NVP. In _International Conference on Learning
+ Representations_, 2017. https://arxiv.org/abs/1605.08803
+ [3]: George Papamakarios, Theo Pavlakou, and Iain Murray. Masked
+ Autoregressive Flow for Density Estimation. In _Neural Information
+ Processing Systems_, 2017. https://arxiv.org/abs/1705.07057
"""
def __init__(self,
that, if `I = L_3 @ L_3.T`, with L_3 being lower-triangular with positive-
diagonal, then `L_3 = I`. Thus, `L_1 = L_2`, proving injectivity of g.
- Examples:
+ #### Examples
```python
bijector.CholeskyOuterProduct().forward(x=[[1., 0], [2, 1]])
class MaskedAutoregressiveFlow(bijector_lib.Bijector):
"""Affine MaskedAutoregressiveFlow bijector for vector-valued events.
- The affine autoregressive flow [1] provides a relatively simple framework for
- user-specified (deep) architectures to learn a distribution over vector-valued
- events. Regarding terminology,
+ The affine autoregressive flow [(Papamakarios et al., 2016)][3] provides a
+ relatively simple framework for user-specified (deep) architectures to learn
+ a distribution over vector-valued events. Regarding terminology,
"Autoregressive models decompose the joint density as a product of
conditionals, and model each conditional in turn. Normalizing flows
transform a base density (e.g. a standard Gaussian) into the target density
- by an invertible transformation with tractable Jacobian." [1]
+ by an invertible transformation with tractable Jacobian."
+ [(Papamakarios et al., 2016)][3]
In other words, the "autoregressive property" is equivalent to the
decomposition, `p(x) = prod{ p(x[i] | x[0:i]) : i=0, ..., d }`. The provided
Given a `shift_and_log_scale_fn`, the forward and inverse transformations are
(a sequence of) affine transformations. A "valid" `shift_and_log_scale_fn`
- must compute each `shift` (aka `loc` or "mu" [2]) and `log(scale)` (aka
- "alpha" [2]) such that each are broadcastable with the arguments to `forward`
- and `inverse`, i.e., such that the calculations in `forward`, `inverse`
- [below] are possible.
+ must compute each `shift` (aka `loc` or "mu" in [Germain et al. (2015)][1])
+ and `log(scale)` (aka "alpha" in [Germain et al. (2015)][1]) such that each
+ are broadcastable with the arguments to `forward` and `inverse`, i.e., such
+ that the calculations in `forward`, `inverse` [below] are possible.
For convenience, `masked_autoregressive_default_template` is offered as a
possible `shift_and_log_scale_fn` function. It implements the MADE
- architecture [2]. MADE is a feed-forward network that computes a `shift` and
- `log(scale)` using `masked_dense` layers in a deep neural network. Weights are
- masked to ensure the autoregressive property. It is possible that this
- architecture is suboptimal for your task. To build alternative networks,
- either change the arguments to `masked_autoregressive_default_template`, use
- the `masked_dense` function to roll-out your own, or use some other
- architecture, e.g., using `tf.layers`.
+ architecture [(Germain et al., 2015)][1]. MADE is a feed-forward network that
+ computes a `shift` and `log(scale)` using `masked_dense` layers in a deep
+ neural network. Weights are masked to ensure the autoregressive property. It
+ is possible that this architecture is suboptimal for your task. To build
+ alternative networks, either change the arguments to
+ `masked_autoregressive_default_template`, use the `masked_dense` function to
+ roll-out your own, or use some other architecture, e.g., using `tf.layers`.
Warning: no attempt is made to validate that the `shift_and_log_scale_fn`
enforces the "autoregressive property".
Assuming `shift_and_log_scale_fn` has valid shape and autoregressive
- semantics, the forward transformation is,
+ semantics, the forward transformation is
```python
def forward(x):
return y
```
- and the inverse transformation is,
+ and the inverse transformation is
```python
def inverse(y):
the "last" `y` used to compute `shift`, `log_scale`. (Roughly speaking, this
also proves the transform is bijective.)
- #### Example Use
+ #### Examples
```python
tfd = tf.contrib.distributions
maf.log_prob(x) # Almost free; uses Bijector caching.
maf.log_prob(0.) # Cheap; no `tf.while_loop` despite no Bijector caching.
- # [1] also describes an "Inverse Autoregressive Flow", e.g.,
+ # [Papamakarios et al. (2016)][3] also describe an Inverse Autoregressive
+ # Flow [(Kingma et al., 2016)][2]:
iaf = tfd.TransformedDistribution(
distribution=tfd.Normal(loc=0., scale=1.),
bijector=tfb.Invert(tfb.MaskedAutoregressiveFlow(
event_shape=[dims])
```
- [1]: "Masked Autoregressive Flow for Density Estimation."
- George Papamakarios, Theo Pavlakou, Iain Murray. Arxiv. 2017.
- https://arxiv.org/abs/1705.07057
+ #### References
- [2]: "MADE: Masked Autoencoder for Distribution Estimation."
- Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle. ICML. 2015.
- https://arxiv.org/abs/1502.03509
+ [1]: Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE:
+ Masked Autoencoder for Distribution Estimation. In _International
+ Conference on Machine Learning_, 2015. https://arxiv.org/abs/1502.03509
+ [2]: Diederik P. Kingma, Tim Salimans, Rafal Jozefowicz, Xi Chen, Ilya
+ Sutskever, and Max Welling. Improving Variational Inference with Inverse
+ Autoregressive Flow. In _Neural Information Processing Systems_, 2016.
+ https://arxiv.org/abs/1606.04934
+
+ [3]: George Papamakarios, Theo Pavlakou, and Iain Murray. Masked
+ Autoregressive Flow for Density Estimation. In _Neural Information
+ Processing Systems_, 2017. https://arxiv.org/abs/1705.07057
"""
def __init__(self,
**kwargs):
"""A autoregressively masked dense layer. Analogous to `tf.layers.dense`.
- See [1] for detailed explanation.
-
- [1]: "MADE: Masked Autoencoder for Distribution Estimation."
- Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle. ICML. 2015.
- https://arxiv.org/abs/1502.03509
+ See [Germain et al. (2015)][1] for detailed explanation.
Arguments:
inputs: Tensor input.
Raises:
NotImplementedError: if rightmost dimension of `inputs` is unknown prior to
graph execution.
+
+ #### References
+
+ [1]: Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE:
+ Masked Autoencoder for Distribution Estimation. In _International
+ Conference on Machine Learning_, 2015. https://arxiv.org/abs/1502.03509
"""
# TODO(b/67594795): Better support of dynamic shape.
input_depth = inputs.shape.with_rank_at_least(1)[-1].value
name=None,
*args,
**kwargs):
- """Build the MADE Model [1].
+ """Build the Masked Autoregressive Density Estimator (Germain et al., 2015).
This will be wrapped in a make_template to ensure the variables are only
- created once. It takes the input and returns the `loc` ("mu" [1]) and
- `log_scale` ("alpha" [1]) from the MADE network.
+ created once. It takes the input and returns the `loc` ("mu" in [Germain et
+ al. (2015)][1]) and `log_scale` ("alpha" in [Germain et al. (2015)][1]) from
+ the MADE network.
Warning: This function uses `masked_dense` to create randomly initialized
`tf.Variables`. It is presumed that these will be fit, just as you would any
other neural architecture which uses `tf.layers.dense`.
- #### About Hidden Layers:
+ #### About Hidden Layers
Each element of `hidden_layers` should be greater than the `input_depth`
(i.e., `input_depth = tf.shape(input)[-1]` where `input` is the input to the
neural network). This is necessary to ensure the autoregressivity property.
- #### About Clipping:
+ #### About Clipping
This function also optionally clips the `log_scale` (but possibly not its
gradient). This is useful because if `log_scale` is too small/large it might
`grad[exp(clip(x))] = grad[x] exp(clip(x))` rather than the usual
`grad[clip(x)] exp(clip(x))`.
- [1]: "MADE: Masked Autoencoder for Distribution Estimation."
- Mathieu Germain, Karol Gregor, Iain Murray, Hugo Larochelle. ICML. 2015.
- https://arxiv.org/abs/1502.03509
-
- Arguments:
+ Args:
hidden_layers: Python `list`-like of non-negative integer, scalars
indicating the number of units in each hidden layer. Default: `[512, 512].
shift_only: Python `bool` indicating if only the `shift` term shall be
**kwargs: `tf.layers.dense` keyword arguments.
Returns:
- shift: `Float`-like `Tensor` of shift terms (the "mu" in [2]).
- log_scale: `Float`-like `Tensor` of log(scale) terms (the "alpha" in [2]).
+ shift: `Float`-like `Tensor` of shift terms (the "mu" in
+ [Germain et al. (2015)][1]).
+ log_scale: `Float`-like `Tensor` of log(scale) terms (the "alpha" in
+ [Germain et al. (2015)][1]).
Raises:
NotImplementedError: if rightmost dimension of `inputs` is unknown prior to
graph execution.
+
+ #### References
+
+ [1]: Mathieu Germain, Karol Gregor, Iain Murray, and Hugo Larochelle. MADE:
+ Masked Autoencoder for Distribution Estimation. In _International
+ Conference on Machine Learning_, 2015. https://arxiv.org/abs/1502.03509
"""
with ops.name_scope(name, "masked_autoregressive_default_template",
"""RealNVP "affine coupling layer" for vector-valued events.
Real NVP models a normalizing flow on a `D`-dimensional distribution via a
- single `D-d`-dimensional conditional distribution [1]:
+ single `D-d`-dimensional conditional distribution [(Dinh et al., 2017)][1]:
`y[d:D] = y[d:D] * math_ops.exp(log_scale_fn(y[d:D])) + shift_fn(y[d:D])`
`y[0:d] = x[0:d]`
Masking is currently only supported for base distributions with
`event_ndims=1`. For more sophisticated masking schemes like checkerboard or
- channel-wise masking [2], use the `tfb.Permute` bijector to re-order desired
- masked units into the first `d` units. For base distributions with
- `event_ndims > 1`, use the `tfb.Reshape` bijector to flatten the event shape.
-
- Recall that the MAF bijector [2] implements a normalizing flow via an
- autoregressive transformation. MAF and IAF have opposite computational
- tradeoffs - MAF can train all units in parallel but must sample units
- sequentially, while IAF must train units sequentially but can sample in
- parallel. In contrast, Real NVP can compute both forward and inverse
- computations in parallel. However, the lack of an autoregressive
+ channel-wise masking [(Papamakarios et al., 2016)[4], use the `tfb.Permute`
+ bijector to re-order desired masked units into the first `d` units. For base
+ distributions with `event_ndims > 1`, use the `tfb.Reshape` bijector to
+ flatten the event shape.
+
+ Recall that the MAF bijector [(Papamakarios et al., 2016)][4] implements a
+ normalizing flow via an autoregressive transformation. MAF and IAF have
+ opposite computational tradeoffs - MAF can train all units in parallel but
+ must sample units sequentially, while IAF must train units sequentially but
+ can sample in parallel. In contrast, Real NVP can compute both forward and
+ inverse computations in parallel. However, the lack of an autoregressive
transformations makes it less expressive on a per-bijector basis.
A "valid" `shift_and_log_scale_fn` must compute each `shift` (aka `loc` or
- "mu" [2]) and `log(scale)` (aka "alpha" [2]) such that each are broadcastable
- with the arguments to `forward` and `inverse`, i.e., such that the
- calculations in `forward`, `inverse` [below] are possible. For convenience,
+ "mu" in [Papamakarios et al. (2016)][4]) and `log(scale)` (aka "alpha" in
+ [Papamakarios et al. (2016)][4]) such that each are broadcastable with the
+ arguments to `forward` and `inverse`, i.e., such that the calculations in
+ `forward`, `inverse` [below] are possible. For convenience,
`real_nvp_default_nvp` is offered as a possible `shift_and_log_scale_fn`
function.
- NICE [3] is a special case of the Real NVP bijector which discards the scale
- transformation, resulting in a constant-time inverse-log-determinant-Jacobian.
- To use a NICE bijector instead of Real NVP, `shift_and_log_scale_fn` should
- return `(shift, None)`, and `is_constant_jacobian` should be set to `True` in
- the `RealNVP` constructor. Calling `real_nvp_default_template` with
- `shift_only=True` returns one such NICE-compatible `shift_and_log_scale_fn`.
+ NICE [(Dinh et al., 2014)][2] is a special case of the Real NVP bijector
+ which discards the scale transformation, resulting in a constant-time
+ inverse-log-determinant-Jacobian. To use a NICE bijector instead of Real
+ NVP, `shift_and_log_scale_fn` should return `(shift, None)`, and
+ `is_constant_jacobian` should be set to `True` in the `RealNVP` constructor.
+ Calling `real_nvp_default_template` with `shift_only=True` returns one such
+ NICE-compatible `shift_and_log_scale_fn`.
Caching: the scalar input depth `D` of the base distribution is not known at
construction time. The first call to any of `forward(x)`, `inverse(x)`,
nvp.log_prob(0.)
```
- For more examples, see [4].
+ For more examples, see [Jang (2018)][3].
- [1]: "Density Estimation using Real NVP."
- Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio. ICLR. 2017.
- https://arxiv.org/abs/1605.08803
+ #### References
- [2]: "Masked Autoregressive Flow for Density Estimation."
- George Papamakarios, Theo Pavlakou, Iain Murray. Arxiv. 2017.
- https://arxiv.org/abs/1705.07057
+ [1]: Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density Estimation
+ using Real NVP. In _International Conference on Learning
+ Representations_, 2017. https://arxiv.org/abs/1605.08803
- [3]: "NICE: Non-linear Independent Components Estimation."
- Laurent Dinh, David Krueger, Yoshua Bengio. ICLR. 2015.
- https://arxiv.org/abs/1410.8516
+ [2]: Laurent Dinh, David Krueger, and Yoshua Bengio. NICE: Non-linear
+ Independent Components Estimation. _arXiv preprint arXiv:1410.8516_,
+ 2014. https://arxiv.org/abs/1410.8516
- [4]: "Normalizing Flows Tutorial, Part 2: Modern Normalizing Flows."
- Eric Jang. Blog post. January 2018.
- http://blog.evjang.com/2018/01/nf2.html
+ [3]: Eric Jang. Normalizing Flows Tutorial, Part 2: Modern Normalizing Flows.
+ _Technical Report_, 2018. http://blog.evjang.com/2018/01/nf2.html
+
+ [4]: George Papamakarios, Theo Pavlakou, and Iain Murray. Masked
+ Autoregressive Flow for Density Estimation. In _Neural Information
+ Processing Systems_, 2017. https://arxiv.org/abs/1705.07057
"""
def __init__(self,
**kwargs: `tf.layers.dense` keyword arguments.
Returns:
- shift: `Float`-like `Tensor` of shift terms (the "mu" in [2]).
- log_scale: `Float`-like `Tensor` of log(scale) terms (the "alpha" in [2]).
+ shift: `Float`-like `Tensor` of shift terms ("mu" in
+ [Papamakarios et al. (2016)][1]).
+ log_scale: `Float`-like `Tensor` of log(scale) terms ("alpha" in
+ [Papamakarios et al. (2016)][1]).
Raises:
NotImplementedError: if rightmost dimension of `inputs` is unknown prior to
graph execution.
+
+ #### References
+
+ [1]: George Papamakarios, Theo Pavlakou, and Iain Murray. Masked
+ Autoregressive Flow for Density Estimation. In _Neural Information
+ Processing Systems_, 2017. https://arxiv.org/abs/1705.07057
"""
with ops.name_scope(name, "real_nvp_default_template"):
g is a bijection between the non-negative real numbers (R_+) and the
non-negative real numbers.
- Examples:
+ #### Examples
```python
bijector.Square().forward(x=[[1., 0], [2, 1]])
def _harmonic_number(x):
"""Compute the harmonic number from its analytic continuation.
- Derivation from [1] and Euler's constant [2].
- [1] -
- https://en.wikipedia.org/wiki/Digamma_function#Relation_to_harmonic_numbers
- [2] - https://en.wikipedia.org/wiki/Euler%E2%80%93Mascheroni_constant
-
+ Derivation from [here](
+ https://en.wikipedia.org/wiki/Digamma_function#Relation_to_harmonic_numbers)
+ and [Euler's constant](
+ https://en.wikipedia.org/wiki/Euler%E2%80%93Mascheroni_constant).
Args:
x: input float.
Returns:
z: The analytic continuation of the harmonic number for the input.
-
"""
one = array_ops.ones([], dtype=x.dtype)
return math_ops.digamma(x + one) - math_ops.digamma(one)
Note: `mean_var` is updated *after* `variance_var`, i.e., `variance_var` uses
the lag-1 mean.
- For derivation justification, see equation 143 of:
- T. Finch, Feb 2009. "Incremental calculation of weighted mean and variance".
- http://people.ds.cam.ac.uk/fanf2/hermes/doc/antiforgery/stats.pdf
+ For derivation justification, see [Finch (2009; Eq. 143)][1].
Args:
mean_var: `float`-like `Variable` representing the exponentially weighted
TypeError: if `mean_var` does not have float type `dtype`.
TypeError: if `mean_var`, `variance_var`, `value`, `decay` have different
`base_dtype`.
+
+ #### References
+
+ [1]: Tony Finch. Incremental calculation of weighted mean and variance.
+ _Technical Report_, 2009.
+ http://people.ds.cam.ac.uk/fanf2/hermes/doc/antiforgery/stats.pdf
"""
with ops.name_scope(name, "assign_moving_mean_variance",
[variance_var, mean_var, value, decay]):
Note: `mean_var` is updated *after* `variance_var`, i.e., `variance_var` uses
the lag-`1` mean.
- For derivation justification, see equation 143 of:
- T. Finch, Feb 2009. "Incremental calculation of weighted mean and variance".
- http://people.ds.cam.ac.uk/fanf2/hermes/doc/antiforgery/stats.pdf
+ For derivation justification, see [Finch (2009; Eq. 143)][1].
Unlike `assign_moving_mean_variance`, this function handles
variable creation.
Raises:
TypeError: if `value_var` does not have float type `dtype`.
TypeError: if `value`, `decay` have different `base_dtype`.
+
+ #### References
+
+ [1]: Tony Finch. Incremental calculation of weighted mean and variance.
+ _Technical Report_, 2009.
+ http://people.ds.cam.ac.uk/fanf2/hermes/doc/antiforgery/stats.pdf
"""
if collections is None:
collections = [ops.GraphKeys.GLOBAL_VARIABLES]
class _DistributionShape(object):
"""Manage and manipulate `Distribution` shape.
- Terminology:
- Recall that a `Tensor` has:
- - `shape`: size of `Tensor` dimensions,
- - `ndims`: size of `shape`; number of `Tensor` dimensions,
- - `dims`: indexes into `shape`; useful for transpose, reduce.
-
- `Tensor`s sampled from a `Distribution` can be partitioned by `sample_dims`,
- `batch_dims`, and `event_dims`. To understand the semantics of these
- dimensions, consider when two of the three are fixed and the remaining
- is varied:
- - `sample_dims`: indexes independent draws from identical
- parameterizations of the `Distribution`.
- - `batch_dims`: indexes independent draws from non-identical
- parameterizations of the `Distribution`.
- - `event_dims`: indexes event coordinates from one sample.
-
- The `sample`, `batch`, and `event` dimensions constitute the entirety of a
- `Distribution` `Tensor`'s shape.
-
- The dimensions are always in `sample`, `batch`, `event` order.
-
- Purpose:
- This class partitions `Tensor` notions of `shape`, `ndims`, and `dims` into
- `Distribution` notions of `sample,` `batch,` and `event` dimensions. That
- is, it computes any of:
+ #### Terminology
- ```
- sample_shape batch_shape event_shape
- sample_dims batch_dims event_dims
- sample_ndims batch_ndims event_ndims
- ```
+ Recall that a `Tensor` has:
+ - `shape`: size of `Tensor` dimensions,
+ - `ndims`: size of `shape`; number of `Tensor` dimensions,
+ - `dims`: indexes into `shape`; useful for transpose, reduce.
+
+ `Tensor`s sampled from a `Distribution` can be partitioned by `sample_dims`,
+ `batch_dims`, and `event_dims`. To understand the semantics of these
+ dimensions, consider when two of the three are fixed and the remaining
+ is varied:
+ - `sample_dims`: indexes independent draws from identical
+ parameterizations of the `Distribution`.
+ - `batch_dims`: indexes independent draws from non-identical
+ parameterizations of the `Distribution`.
+ - `event_dims`: indexes event coordinates from one sample.
+
+ The `sample`, `batch`, and `event` dimensions constitute the entirety of a
+ `Distribution` `Tensor`'s shape.
+
+ The dimensions are always in `sample`, `batch`, `event` order.
+
+ #### Purpose
+
+ This class partitions `Tensor` notions of `shape`, `ndims`, and `dims` into
+ `Distribution` notions of `sample,` `batch,` and `event` dimensions. That
+ is, it computes any of:
+
+ ```
+ sample_shape batch_shape event_shape
+ sample_dims batch_dims event_dims
+ sample_ndims batch_ndims event_ndims
+ ```
- for a given `Tensor`, e.g., the result of
- `Distribution.sample(sample_shape=...)`.
+ for a given `Tensor`, e.g., the result of
+ `Distribution.sample(sample_shape=...)`.
- For a given `Tensor`, this class computes the above table using minimal
- information: `batch_ndims` and `event_ndims`.
+ For a given `Tensor`, this class computes the above table using minimal
+ information: `batch_ndims` and `event_ndims`.
+
+ #### Examples
+
+ We show examples of distribution shape semantics.
- Examples of `Distribution` `shape` semantics:
- Sample dimensions:
Computing summary statistics, i.e., the average is a reduction over sample
dimensions.
tf.div(1., tf.reduce_prod(x, event_dims))
```
- Examples using this class:
- Write `S, B, E` for `sample_shape`, `batch_shape`, and `event_shape`.
-
- ```python
- # 150 iid samples from one multivariate Normal with two degrees of freedom.
- mu = [0., 0]
- sigma = [[1., 0],
- [0, 1]]
- mvn = MultivariateNormal(mu, sigma)
- rand_mvn = mvn.sample(sample_shape=[3, 50])
- shaper = DistributionShape(batch_ndims=0, event_ndims=1)
- S, B, E = shaper.get_shape(rand_mvn)
- # S = [3, 50]
- # B = []
- # E = [2]
-
- # 12 iid samples from one Wishart with 2x2 events.
- sigma = [[1., 0],
- [2, 1]]
- wishart = Wishart(df=5, scale=sigma)
- rand_wishart = wishart.sample(sample_shape=[3, 4])
- shaper = DistributionShape(batch_ndims=0, event_ndims=2)
- S, B, E = shaper.get_shape(rand_wishart)
- # S = [3, 4]
- # B = []
- # E = [2, 2]
-
- # 100 iid samples from two, non-identical trivariate Normal distributions.
- mu = ... # shape(2, 3)
- sigma = ... # shape(2, 3, 3)
- X = MultivariateNormal(mu, sigma).sample(shape=[4, 25])
- # S = [4, 25]
- # B = [2]
- # E = [3]
- ```
-
- Argument Validation:
- When `validate_args=False`, checks that cannot be done during
- graph construction are performed at graph execution. This may result in a
- performance degradation because data must be switched from GPU to CPU.
-
- For example, when `validate_args=False` and `event_ndims` is a
- non-constant `Tensor`, it is checked to be a non-negative integer at graph
- execution. (Same for `batch_ndims`). Constant `Tensor`s and non-`Tensor`
- arguments are always checked for correctness since this can be done for
- "free," i.e., during graph construction.
+ We show examples using this class.
+
+ Write `S, B, E` for `sample_shape`, `batch_shape`, and `event_shape`.
+
+ ```python
+ # 150 iid samples from one multivariate Normal with two degrees of freedom.
+ mu = [0., 0]
+ sigma = [[1., 0],
+ [0, 1]]
+ mvn = MultivariateNormal(mu, sigma)
+ rand_mvn = mvn.sample(sample_shape=[3, 50])
+ shaper = DistributionShape(batch_ndims=0, event_ndims=1)
+ S, B, E = shaper.get_shape(rand_mvn)
+ # S = [3, 50]
+ # B = []
+ # E = [2]
+
+ # 12 iid samples from one Wishart with 2x2 events.
+ sigma = [[1., 0],
+ [2, 1]]
+ wishart = Wishart(df=5, scale=sigma)
+ rand_wishart = wishart.sample(sample_shape=[3, 4])
+ shaper = DistributionShape(batch_ndims=0, event_ndims=2)
+ S, B, E = shaper.get_shape(rand_wishart)
+ # S = [3, 4]
+ # B = []
+ # E = [2, 2]
+
+ # 100 iid samples from two, non-identical trivariate Normal distributions.
+ mu = ... # shape(2, 3)
+ sigma = ... # shape(2, 3, 3)
+ X = MultivariateNormal(mu, sigma).sample(shape=[4, 25])
+ # S = [4, 25]
+ # B = [2]
+ # E = [3]
+ ```
+
+ #### Argument Validation
+
+ When `validate_args=False`, checks that cannot be done during
+ graph construction are performed at graph execution. This may result in a
+ performance degradation because data must be switched from GPU to CPU.
+
+ For example, when `validate_args=False` and `event_ndims` is a
+ non-constant `Tensor`, it is checked to be a non-negative integer at graph
+ execution. (Same for `batch_ndims`). Constant `Tensor`s and non-`Tensor`
+ arguments are always checked for correctness since this can be done for
+ "free," i.e., during graph construction.
"""
def __init__(self,
The default quadrature scheme chooses `z_{N, n}` as `N` midpoints of
the quantiles of `p(z)` (generalized quantiles if `K > 2`).
- See [1] for more details.
-
- [1]. "Quadrature Compound: An approximating family of distributions"
- Joshua Dillon, Ian Langmore, arXiv preprints
- https://arxiv.org/abs/1801.03080
+ See [Dillon and Langmore (2018)][1] for more details.
#### About `Vector` distributions in TensorFlow.
is_positive_definite=True),
],
validate_args=True)
+ ```
+
+ #### References
+
+ [1]: Joshua Dillon and Ian Langmore. Quadrature Compound: An approximating
+ family of distributions. _arXiv preprint arXiv:1801.03080_, 2018.
+ https://arxiv.org/abs/1801.03080
"""
def __init__(self,