--- /dev/null
+# Regression Examples
+
+This unit provides the following short examples demonstrating how
+to implement regression in Estimators:
+
+<table>
+ <tr> <th>Example</th> <th>Demonstrates How To...</th></tr>
+
+ <tr>
+ <td><a href="https://www.tensorflow.org/code/tensorflow/examples/get_started/regression/linear_regression.py">linear_regression.py</a></td>
+ <td>Use the @{tf.estimator.LinearRegressor} Estimator to train a
+ regression model on numeric data.</td>
+ </tr>
+
+ <tr>
+ <td><a href="https://www.tensorflow.org/code/tensorflow/examples/get_started/regression/linear_regression_categorical.py">linear_regression_categorical.py</a></td>
+ <td>Use the @{tf.estimator.LinearRegressor} Estimator to train a
+ regression model on categorical data.</td>
+ </tr>
+
+ <tr>
+ <td><a href="https://www.tensorflow.org/code/tensorflow/examples/get_started/regression/dnn_regression.py">dnn_regression.py</a></td>
+ <td>Use the @{tf.estimator.DNNRegressor} Estimator to train a
+ regression model on discrete data with a deep neural network.</td>
+ </tr>
+
+ <tr>
+ <td><a href="https://www.tensorflow.org/code/tensorflow/examples/get_started/regression/custom_regression.py">custom_regression.py</a></td>
+ <td>Use @{tf.estimator.Estimator} to train a customized dnn
+ regression model.</td>
+ </tr>
+
+</table>
+
+The preceding examples rely on the following data set utility:
+
+<table>
+ <tr> <th>Utility</th> <th>Description</th></tr>
+
+ <tr>
+ <td><a href="../../examples/get_started/regression/imports85.py">imports85.py</a></td>
+ <td>This program provides utility functions that load the
+ <tt>imports85</tt> data set into formats that other TensorFlow
+ programs (for example, <tt>linear_regression.py</tt> and
+ <tt>dnn_regression.py</tt>) can use.</td>
+ </tr>
+
+
+</table>
+
+
+<!--
+## Linear regression concepts
+
+If you are new to machine learning and want to learn about regression,
+watch the following video:
+
+(todo:jbgordon) Video introduction goes here.
+-->
+
+<!--
+[When MLCC becomes available externally, add links to the relevant MLCC units.]
+-->
+
+
+<a name="running"></a>
+## Running the examples
+
+You must @{$install$install TensorFlow} prior to running these examples.
+Depending on the way you've installed TensorFlow, you might also
+need to activate your TensorFlow environment. Then, do the following:
+
+1. Clone the TensorFlow repository from github.
+2. `cd` to the top of the downloaded tree.
+3. Check out the branch for you current tensorflow version: `git checkout rX.X`
+4. `cd tensorflow/examples/get_started/regression`.
+
+You can now run any of the example TensorFlow programs in the
+`tensorflow/examples/get_started/regression` directory as you
+would run any Python program:
+
+```bsh
+python linear_regressor.py
+```
+
+During training, all three programs output the following information:
+
+* The name of the checkpoint directory, which is important for TensorBoard.
+* The training loss after every 100 iterations, which helps you
+ determine whether the model is converging.
+
+For example, here's some possible output for the `linear_regressor.py`
+program:
+
+``` None
+INFO:tensorflow:Saving checkpoints for 1 into /tmp/tmpAObiz9/model.ckpt.
+INFO:tensorflow:loss = 161.308, step = 1
+INFO:tensorflow:global_step/sec: 1557.24
+INFO:tensorflow:loss = 15.7937, step = 101 (0.065 sec)
+INFO:tensorflow:global_step/sec: 1529.17
+INFO:tensorflow:loss = 12.1988, step = 201 (0.065 sec)
+INFO:tensorflow:global_step/sec: 1663.86
+...
+INFO:tensorflow:loss = 6.99378, step = 901 (0.058 sec)
+INFO:tensorflow:Saving checkpoints for 1000 into /tmp/tmpAObiz9/model.ckpt.
+INFO:tensorflow:Loss for final step: 5.12413.
+```
+
+
+<a name="basic"></a>
+## linear_regressor.py
+
+`linear_regressor.py` trains a model that predicts the price of a car from
+two numerical features.
+
+<table>
+ <tr>
+ <td>Estimator</td>
+ <td><tt>LinearRegressor</tt>, which is a pre-made Estimator for linear
+ regression.</td>
+ </tr>
+
+ <tr>
+ <td>Features</td>
+ <td>Numerical: <tt>body-style</tt> and <tt>make</tt>.</td>
+ </tr>
+
+ <tr>
+ <td>Label</td>
+ <td>Numerical: <tt>price</tt>
+ </tr>
+
+ <tr>
+ <td>Algorithm</td>
+ <td>Linear regression.</td>
+ </tr>
+</table>
+
+After training the model, the program concludes by outputting predicted
+car prices for two car models.
+
+
+
+<a name="categorical"></a>
+## linear_regression_categorical.py
+
+This program illustrates ways to represent categorical features. It
+also demonstrates how to train a linear model based on a mix of
+categorical and numerical features.
+
+<table>
+ <tr>
+ <td>Estimator</td>
+ <td><tt>LinearRegressor</tt>, which is a pre-made Estimator for linear
+ regression. </td>
+ </tr>
+
+ <tr>
+ <td>Features</td>
+ <td>Categorical: <tt>curb-weight</tt> and <tt>highway-mpg</tt>.<br/>
+ Numerical: <tt>body-style</tt> and <tt>make</tt>.</td>
+ </tr>
+
+ <tr>
+ <td>Label</td>
+ <td>Numerical: <tt>price</tt>.</td>
+ </tr>
+
+ <tr>
+ <td>Algorithm</td>
+ <td>Linear regression.</td>
+ </tr>
+</table>
+
+
+<a name="dnn"></a>
+## dnn_regression.py
+
+Like `linear_regression_categorical.py`, the `dnn_regression.py` example
+trains a model that predicts the price of a car from two features.
+Unlike `linear_regression_categorical.py`, the `dnn_regression.py` example uses
+a deep neural network to train the model. Both examples rely on the same
+features; `dnn_regression.py` demonstrates how to treat categorical features
+in a deep neural network.
+
+<table>
+ <tr>
+ <td>Estimator</td>
+ <td><tt>DNNRegressor</tt>, which is a pre-made Estimator for
+ regression that relies on a deep neural network. The
+ `hidden_units` parameter defines the topography of the network.</td>
+ </tr>
+
+ <tr>
+ <td>Features</td>
+ <td>Categorical: <tt>curb-weight</tt> and <tt>highway-mpg</tt>.<br/>
+ Numerical: <tt>body-style</tt> and <tt>make</tt>.</td>
+ </tr>
+
+ <tr>
+ <td>Label</td>
+ <td>Numerical: <tt>price</tt>.</td>
+ </tr>
+
+ <tr>
+ <td>Algorithm</td>
+ <td>Regression through a deep neural network.</td>
+ </tr>
+</table>
+
+After printing loss values, the program outputs the Mean Square Error
+on a test set.
+
+
+<a name="dnn"></a>
+## custom_regression.py
+
+The `custom_regression.py` example also trains a model that predicts the price
+of a car based on mixed real-valued and categorical input features, described by
+feature_columns. Unlike `linear_regression_categorical.py`, and
+`dnn_regression.py` this example does not use a pre-made estimator, but defines
+a custom model using the base @{tf.estimator.Estimator$`Estimator`} class. The
+custom model is quite similar to the model defined by `dnn_regression.py`.
+
+The custom model is defined by the `model_fn` argument to the constructor. The
+customization is made more reusable through `params` dictionary, which is later
+passed through to the `model_fn` when the `model_fn` is called.
+
+The `model_fn` returns an
+@{tf.estimator.EstimatorSpec$`EstimatorSpec`} which is a simple structure
+indicating to the `Estimator` which operations should be run to accomplish
+varions tasks.
--- /dev/null
+# Checkpoints
+
+This document examines how to save and restore TensorFlow models built with
+Estimators. TensorFlow provides two model formats:
+
+* checkpoints, which is a format dependent on the code that created
+ the model.
+* SavedModel, which is a format independent of the code that created
+ the model.
+
+This document focuses on checkpoints. For details on SavedModel, see the
+@{$saved_model$Saving and Restoring} chapter of the
+*TensorFlow Programmer's Guide*.
+
+
+## Sample code
+
+This document relies on the same
+[https://github.com/tensorflow/models/blob/master/samples/core/get_started/premade_estimator.py](Iris classification example) detailed in @{$premade_estimators$Getting Started with TensorFlow}.
+To download and access the example, invoke the following two commands:
+
+```shell
+git clone https://github.com/tensorflow/models/
+cd models/samples/core/get_started
+```
+
+Most of the code snippets in this document are minor variations
+on `premade_estimator.py`.
+
+
+## Saving partially-trained models
+
+Estimators automatically write the following to disk:
+
+* **checkpoints**, which are versions of the model created during training.
+* **event files**, which contain information that
+ [TensorBoard](https://developers.google.com/machine-learning/glossary/#TensorBoard)
+ uses to create visualizations.
+
+To specify the top-level directory in which the Estimator stores its
+information, assign a value to the optional `model_dir` argument of any
+Estimator's constructor. For example, the following code sets the `model_dir`
+argument to the `models/iris` directory:
+
+```python
+classifier = tf.estimator.DNNClassifier(
+ feature_columns=my_feature_columns,
+ hidden_units=[10, 10],
+ n_classes=3,
+ model_dir='models/iris')
+```
+
+Suppose you call the Estimator's `train` method. For example:
+
+
+```python
+classifier.train(
+ input_fn=lambda:train_input_fn(train_x, train_y, batch_size=100),
+ steps=200)
+```
+
+As suggested by the following diagrams, the first call to `train`
+adds checkpoints and other files to the `model_dir` directory:
+
+<div style="width:80%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/first_train_calls.png">
+</div>
+<div style="text-align: center">
+The first call to train().
+</div>
+
+
+To see the objects in the created `model_dir` directory on a
+UNIX-based system, just call `ls` as follows:
+
+```none
+$ ls -1 models/iris
+checkpoint
+events.out.tfevents.timestamp.hostname
+graph.pbtxt
+model.ckpt-1.data-00000-of-00001
+model.ckpt-1.index
+model.ckpt-1.meta
+model.ckpt-200.data-00000-of-00001
+model.ckpt-200.index
+model.ckpt-200.meta
+```
+
+The preceding `ls` command shows that the Estimator created checkpoints
+at steps 1 (the start of training) and 200 (the end of training).
+
+
+### Default checkpoint directory
+
+If you don't specify `model_dir` in an Estimator's constructor, the Estimator
+writes checkpoint files to a temporary directory chosen by Python's
+[tempfile.mkdtemp](https://docs.python.org/3/library/tempfile.html#tempfile.mkdtemp)
+function. For example, the following Estimator constructor does *not* specify
+the `model_dir` argument:
+
+```python
+classifier = tf.estimator.DNNClassifier(
+ feature_columns=my_feature_columns,
+ hidden_units=[10, 10],
+ n_classes=3)
+
+print(classifier.model_dir)
+```
+
+The `tempfile.mkdtemp` function picks a secure, temporary directory
+appropriate for your operating system. For example, a typical temporary
+directory on macOS might be something like the following:
+
+```None
+/var/folders/0s/5q9kfzfj3gx2knj0vj8p68yc00dhcr/T/tmpYm1Rwa
+```
+
+### Checkpointing Frequency
+
+By default, the Estimator saves
+[checkpoints](https://developers.google.com/machine-learning/glossary/#checkpoint)
+in the `model_dir` according to the following schedule:
+
+* Writes a checkpoint every 10 minutes (600 seconds).
+* Writes a checkpoint when the `train` method starts (first iteration)
+ and completes (final iteration).
+* Retains only the 5 most recent checkpoints in the directory.
+
+You may alter the default schedule by taking the following steps:
+
+1. Create a @{tf.estimator.RunConfig$`RunConfig`} object that defines the
+ desired schedule.
+2. When instantiating the Estimator, pass that `RunConfig` object to the
+ Estimator's `config` argument.
+
+For example, the following code changes the checkpointing schedule to every
+20 minutes and retains the 10 most recent checkpoints:
+
+```python
+my_checkpointing_config = tf.estimator.RunConfig(
+ save_checkpoints_secs = 20*60, # Save checkpoints every 20 minutes.
+ keep_checkpoint_max = 10, # Retain the 10 most recent checkpoints.
+)
+
+classifier = tf.estimator.DNNClassifier(
+ feature_columns=my_feature_columns,
+ hidden_units=[10, 10],
+ n_classes=3,
+ model_dir='models/iris',
+ config=my_checkpointing_config)
+```
+
+## Restoring your model
+
+The first time you call an Estimator's `train` method, TensorFlow saves a
+checkpoint to the `model_dir`. Each subsequent call to the Estimator's
+`train`, `eval`, or `predict` method causes the following:
+
+1. The Estimator builds the model's
+ [graph](https://developers.google.com/machine-learning/glossary/#graph)
+ by running the `model_fn()`. (For details on the `model_fn()`, see
+ @{$custom_estimators$Creating Custom Estimators.})
+2. The Estimator initializes the weights of the new model from the data
+ stored in the most recent checkpoint.
+
+In other words, as the following illustration suggests, once checkpoints
+exist, TensorFlow rebuilds the model each time you call `train()`,
+`evaluate()`, or `predict()`.
+
+<div style="width:80%; margin:auto; margin-bottom:10px; margin-top:20px;">
+<img style="width:100%" src="../images/subsequent_calls.png">
+</div>
+<div style="text-align: center">
+Subsequent calls to train(), evaluate(), or predict()
+</div>
+
+
+### Avoiding a bad restoration
+
+Restoring a model's state from a checkpoint only works if the model
+and checkpoint are compatible. For example, suppose you trained a
+`DNNClassifier` Estimator containing two hidden layers,
+each having 10 nodes:
+
+```python
+classifier = tf.estimator.DNNClassifier(
+ feature_columns=feature_columns,
+ hidden_units=[10, 10],
+ n_classes=3,
+ model_dir='models/iris')
+
+classifier.train(
+ input_fn=lambda:train_input_fn(train_x, train_y, batch_size=100),
+ steps=200)
+```
+
+After training (and, therefore, after creating checkpoints in `models/iris`),
+imagine that you changed the number of neurons in each hidden layer from 10 to
+20 and then attempted to retrain the model:
+
+``` python
+classifier2 = tf.estimator.DNNClassifier(
+ feature_columns=my_feature_columns,
+ hidden_units=[20, 20], # Change the number of neurons in the model.
+ n_classes=3,
+ model_dir='models/iris')
+
+classifier.train(
+ input_fn=lambda:train_input_fn(train_x, train_y, batch_size=100),
+ steps=200)
+```
+
+Since the state in the checkpoint is incompatible with the model described
+in `classifier2`, retraining fails with the following error:
+
+```None
+...
+InvalidArgumentError (see above for traceback): tensor_name =
+dnn/hiddenlayer_1/bias/t_0/Adagrad; shape in shape_and_slice spec [10]
+does not match the shape stored in checkpoint: [20]
+```
+
+To run experiments in which you train and compare slightly different
+versions of a model, save a copy of the code that created each
+`model-dir`, possibly by creating a separate git branch for each version.
+This separation will keep your checkpoints recoverable.
+
+## Summary
+
+Checkpoints provide an easy automatic mechanism for saving and restoring
+models created by Estimators.
+
+See the @{$saved_model$Saving and Restoring}
+chapter of the *TensorFlow Programmer's Guide* for details on:
+
+* Saving and restoring models using low-level TensorFlow APIs.
+* Exporting and importing models in the SavedModel format, which is a
+ language-neutral, recoverable, serialization format.
+++ /dev/null
-# Regression Examples
-
-This unit provides the following short examples demonstrating how
-to implement regression in Estimators:
-
-<table>
- <tr> <th>Example</th> <th>Demonstrates How To...</th></tr>
-
- <tr>
- <td><a href="https://www.tensorflow.org/code/tensorflow/examples/get_started/regression/linear_regression.py">linear_regression.py</a></td>
- <td>Use the @{tf.estimator.LinearRegressor} Estimator to train a
- regression model on numeric data.</td>
- </tr>
-
- <tr>
- <td><a href="https://www.tensorflow.org/code/tensorflow/examples/get_started/regression/linear_regression_categorical.py">linear_regression_categorical.py</a></td>
- <td>Use the @{tf.estimator.LinearRegressor} Estimator to train a
- regression model on categorical data.</td>
- </tr>
-
- <tr>
- <td><a href="https://www.tensorflow.org/code/tensorflow/examples/get_started/regression/dnn_regression.py">dnn_regression.py</a></td>
- <td>Use the @{tf.estimator.DNNRegressor} Estimator to train a
- regression model on discrete data with a deep neural network.</td>
- </tr>
-
- <tr>
- <td><a href="https://www.tensorflow.org/code/tensorflow/examples/get_started/regression/custom_regression.py">custom_regression.py</a></td>
- <td>Use @{tf.estimator.Estimator} to train a customized dnn
- regression model.</td>
- </tr>
-
-</table>
-
-The preceding examples rely on the following data set utility:
-
-<table>
- <tr> <th>Utility</th> <th>Description</th></tr>
-
- <tr>
- <td><a href="../../examples/get_started/regression/imports85.py">imports85.py</a></td>
- <td>This program provides utility functions that load the
- <tt>imports85</tt> data set into formats that other TensorFlow
- programs (for example, <tt>linear_regression.py</tt> and
- <tt>dnn_regression.py</tt>) can use.</td>
- </tr>
-
-
-</table>
-
-
-<!--
-## Linear regression concepts
-
-If you are new to machine learning and want to learn about regression,
-watch the following video:
-
-(todo:jbgordon) Video introduction goes here.
--->
-
-<!--
-[When MLCC becomes available externally, add links to the relevant MLCC units.]
--->
-
-
-<a name="running"></a>
-## Running the examples
-
-You must @{$install$install TensorFlow} prior to running these examples.
-Depending on the way you've installed TensorFlow, you might also
-need to activate your TensorFlow environment. Then, do the following:
-
-1. Clone the TensorFlow repository from github.
-2. `cd` to the top of the downloaded tree.
-3. Check out the branch for you current tensorflow version: `git checkout rX.X`
-4. `cd tensorflow/examples/get_started/regression`.
-
-You can now run any of the example TensorFlow programs in the
-`tensorflow/examples/get_started/regression` directory as you
-would run any Python program:
-
-```bsh
-python linear_regressor.py
-```
-
-During training, all three programs output the following information:
-
-* The name of the checkpoint directory, which is important for TensorBoard.
-* The training loss after every 100 iterations, which helps you
- determine whether the model is converging.
-
-For example, here's some possible output for the `linear_regressor.py`
-program:
-
-``` None
-INFO:tensorflow:Saving checkpoints for 1 into /tmp/tmpAObiz9/model.ckpt.
-INFO:tensorflow:loss = 161.308, step = 1
-INFO:tensorflow:global_step/sec: 1557.24
-INFO:tensorflow:loss = 15.7937, step = 101 (0.065 sec)
-INFO:tensorflow:global_step/sec: 1529.17
-INFO:tensorflow:loss = 12.1988, step = 201 (0.065 sec)
-INFO:tensorflow:global_step/sec: 1663.86
-...
-INFO:tensorflow:loss = 6.99378, step = 901 (0.058 sec)
-INFO:tensorflow:Saving checkpoints for 1000 into /tmp/tmpAObiz9/model.ckpt.
-INFO:tensorflow:Loss for final step: 5.12413.
-```
-
-
-<a name="basic"></a>
-## linear_regressor.py
-
-`linear_regressor.py` trains a model that predicts the price of a car from
-two numerical features.
-
-<table>
- <tr>
- <td>Estimator</td>
- <td><tt>LinearRegressor</tt>, which is a pre-made Estimator for linear
- regression.</td>
- </tr>
-
- <tr>
- <td>Features</td>
- <td>Numerical: <tt>body-style</tt> and <tt>make</tt>.</td>
- </tr>
-
- <tr>
- <td>Label</td>
- <td>Numerical: <tt>price</tt>
- </tr>
-
- <tr>
- <td>Algorithm</td>
- <td>Linear regression.</td>
- </tr>
-</table>
-
-After training the model, the program concludes by outputting predicted
-car prices for two car models.
-
-
-
-<a name="categorical"></a>
-## linear_regression_categorical.py
-
-This program illustrates ways to represent categorical features. It
-also demonstrates how to train a linear model based on a mix of
-categorical and numerical features.
-
-<table>
- <tr>
- <td>Estimator</td>
- <td><tt>LinearRegressor</tt>, which is a pre-made Estimator for linear
- regression. </td>
- </tr>
-
- <tr>
- <td>Features</td>
- <td>Categorical: <tt>curb-weight</tt> and <tt>highway-mpg</tt>.<br/>
- Numerical: <tt>body-style</tt> and <tt>make</tt>.</td>
- </tr>
-
- <tr>
- <td>Label</td>
- <td>Numerical: <tt>price</tt>.</td>
- </tr>
-
- <tr>
- <td>Algorithm</td>
- <td>Linear regression.</td>
- </tr>
-</table>
-
-
-<a name="dnn"></a>
-## dnn_regression.py
-
-Like `linear_regression_categorical.py`, the `dnn_regression.py` example
-trains a model that predicts the price of a car from two features.
-Unlike `linear_regression_categorical.py`, the `dnn_regression.py` example uses
-a deep neural network to train the model. Both examples rely on the same
-features; `dnn_regression.py` demonstrates how to treat categorical features
-in a deep neural network.
-
-<table>
- <tr>
- <td>Estimator</td>
- <td><tt>DNNRegressor</tt>, which is a pre-made Estimator for
- regression that relies on a deep neural network. The
- `hidden_units` parameter defines the topography of the network.</td>
- </tr>
-
- <tr>
- <td>Features</td>
- <td>Categorical: <tt>curb-weight</tt> and <tt>highway-mpg</tt>.<br/>
- Numerical: <tt>body-style</tt> and <tt>make</tt>.</td>
- </tr>
-
- <tr>
- <td>Label</td>
- <td>Numerical: <tt>price</tt>.</td>
- </tr>
-
- <tr>
- <td>Algorithm</td>
- <td>Regression through a deep neural network.</td>
- </tr>
-</table>
-
-After printing loss values, the program outputs the Mean Square Error
-on a test set.
-
-
-<a name="dnn"></a>
-## custom_regression.py
-
-The `custom_regression.py` example also trains a model that predicts the price
-of a car based on mixed real-valued and categorical input features, described by
-feature_columns. Unlike `linear_regression_categorical.py`, and
-`dnn_regression.py` this example does not use a pre-made estimator, but defines
-a custom model using the base @{tf.estimator.Estimator$`Estimator`} class. The
-custom model is quite similar to the model defined by `dnn_regression.py`.
-
-The custom model is defined by the `model_fn` argument to the constructor. The
-customization is made more reusable through `params` dictionary, which is later
-passed through to the `model_fn` when the `model_fn` is called.
-
-The `model_fn` returns an
-@{tf.estimator.EstimatorSpec$`EstimatorSpec`} which is a simple structure
-indicating to the `Estimator` which operations should be run to accomplish
-varions tasks.
+++ /dev/null
-# Logging and Monitoring Basics with tf.contrib.learn
-
-When training a model, it’s often valuable to track and evaluate progress in
-real time. In this tutorial, you’ll learn how to use TensorFlow’s logging
-capabilities and the `Monitor` API to audit the in-progress training of a neural
-network classifier for categorizing irises. This tutorial builds on the code
-developed in @{$estimator$tf.estimator Quickstart} so if you
-haven't yet completed that tutorial, you may want to explore it first,
-especially if you're looking for an intro/refresher on tf.contrib.learn basics.
-
-## Setup {#setup}
-
-For this tutorial, you'll be building upon the following code from
-@{$estimator$tf.estimator Quickstart}:
-
-```python
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-
-import numpy as np
-import tensorflow as tf
-
-# Data sets
-IRIS_TRAINING = os.path.join(os.path.dirname(__file__), "iris_training.csv")
-IRIS_TEST = os.path.join(os.path.dirname(__file__), "iris_test.csv")
-
-def main(unused_argv):
- # Load datasets.
- training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
- filename=IRIS_TRAINING, target_dtype=np.int, features_dtype=np.float32)
- test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
- filename=IRIS_TEST, target_dtype=np.int, features_dtype=np.float32)
-
- # Specify that all features have real-value data
- feature_columns = [tf.contrib.layers.real_valued_column("", dimension=4)]
-
- # Build 3 layer DNN with 10, 20, 10 units respectively.
- classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns,
- hidden_units=[10, 20, 10],
- n_classes=3,
- model_dir="/tmp/iris_model")
-
- # Fit model.
- classifier.fit(x=training_set.data,
- y=training_set.target,
- steps=2000)
-
- # Evaluate accuracy.
- accuracy_score = classifier.evaluate(x=test_set.data,
- y=test_set.target)["accuracy"]
- print('Accuracy: {0:f}'.format(accuracy_score))
-
- # Classify two new flower samples.
- new_samples = np.array(
- [[6.4, 3.2, 4.5, 1.5], [5.8, 3.1, 5.0, 1.7]], dtype=float)
- y = list(classifier.predict(new_samples, as_iterable=True))
- print('Predictions: {}'.format(str(y)))
-
-if __name__ == "__main__":
- tf.app.run()
-```
-
-Copy the above code into a file, and download the corresponding
-[training](http://download.tensorflow.org/data/iris_training.csv) and
-[test](http://download.tensorflow.org/data/iris_test.csv) data sets to the same
-directory.
-
-In the following sections, you'll progressively make updates to the above code
-to add logging and monitoring capabilities. Final code incorporating all updates
-is [available for download
-here](https://www.tensorflow.org/code/tensorflow/examples/tutorials/monitors/iris_monitors.py).
-
-## Overview
-
-The @{$estimator$tf.estimator Quickstart tutorial} walked through
-how to implement a neural net classifier to categorize iris examples into one of
-three species.
-
-But when [the code](#setup) from this tutorial is run, the output contains no
-logging tracking how model training is progressing—only the results of the
-`print` statements that were included:
-
-```none
-Accuracy: 0.933333
-Predictions: [1 2]
-```
-
-Without any logging, model training feels like a bit of a black box; you can't
-see what's happening as TensorFlow steps through gradient descent, get a sense
-of whether the model is converging appropriately, or audit to determine whether
-[early stopping](https://en.wikipedia.org/wiki/Early_stopping) might be
-appropriate.
-
-One way to address this problem would be to split model training into multiple
-`fit` calls with smaller numbers of steps in order to evaluate accuracy more
-progressively. However, this is not recommended practice, as it greatly slows
-down model training. Fortunately, tf.contrib.learn offers another solution: a
-@{tf.contrib.learn.monitors$Monitor API} designed to help
-you log metrics and evaluate your model while training is in progress. In the
-following sections, you'll learn how to enable logging in TensorFlow, set up a
-ValidationMonitor to do streaming evaluations, and visualize your metrics using
-TensorBoard.
-
-## Enabling Logging with TensorFlow
-
-TensorFlow uses five different levels for log messages. In order of ascending
-severity, they are `DEBUG`, `INFO`, `WARN`, `ERROR`, and `FATAL`. When you
-configure logging at any of these levels, TensorFlow will output all log
-messages corresponding to that level and all levels of higher severity. For
-example, if you set a logging level of `ERROR`, you'll get log output containing
-`ERROR` and `FATAL` messages, and if you set a level of `DEBUG`, you'll get log
-messages from all five levels.
-
-By default, TensorFlow is configured at a logging level of `WARN`, but when
-tracking model training, you'll want to adjust the level to `INFO`, which will
-provide additional feedback as `fit` operations are in progress.
-
-Add the following line to the beginning of your code (right after your
-`import`s):
-
-```python
-tf.logging.set_verbosity(tf.logging.INFO)
-```
-
-Now when you run the code, you'll see additional log output like the following:
-
-```none
-INFO:tensorflow:loss = 1.18812, step = 1
-INFO:tensorflow:loss = 0.210323, step = 101
-INFO:tensorflow:loss = 0.109025, step = 201
-```
-
-With `INFO`-level logging, tf.contrib.learn automatically outputs [training-loss
-metrics](https://en.wikipedia.org/wiki/Loss_function) to stderr after every 100
-steps.
-
-## Configuring a ValidationMonitor for Streaming Evaluation
-
-Logging training loss is helpful to get a sense whether your model is
-converging, but what if you want further insight into what's happening during
-training? tf.contrib.learn provides several high-level `Monitor`s you can attach
-to your `fit` operations to further track metrics and/or debug lower-level
-TensorFlow operations during model training, including:
-
-Monitor | Description
-------------------- | -----------
-`CaptureVariable` | Saves a specified variable's values into a collection at every _n_ steps of training
-`PrintTensor` | Logs a specified tensor's values at every _n_ steps of training
-`SummarySaver` | Saves @{tf.Summary} [protocol buffers](https://developers.google.com/protocol-buffers/) for a given tensor using a @{tf.summary.FileWriter} at every _n_ steps of training
-`ValidationMonitor` | Logs a specified set of evaluation metrics at every _n_ steps of training, and, if desired, implements early stopping under certain conditions
-
-### Evaluating Every *N* Steps
-
-For the iris neural network classifier, while logging training loss, you might
-also want to simultaneously evaluate against test data to see how well the model
-is generalizing. You can accomplish this by configuring a `ValidationMonitor`
-with the test data (`test_set.data` and `test_set.target`), and setting how
-often to evaluate with `every_n_steps`. The default value of `every_n_steps` is
-`100`; here, set `every_n_steps` to `50` to evaluate after every 50 steps of
-model training:
-
-```python
-validation_monitor = tf.contrib.learn.monitors.ValidationMonitor(
- test_set.data,
- test_set.target,
- every_n_steps=50)
-```
-
-Place this code right before the line instantiating the `classifier`.
-
-`ValidationMonitor`s rely on saved checkpoints to perform evaluation operations,
-so you'll want to modify instantiation of the `classifier` to add a
-@{tf.contrib.learn.RunConfig} that includes
-`save_checkpoints_secs`, which specifies how many seconds should elapse between
-checkpoint saves during training. Because the iris data set is quite small, and
-thus trains quickly, it makes sense to set `save_checkpoints_secs` to 1 (saving
-a checkpoint every second) to ensure a sufficient number of checkpoints:
-
-```python
-classifier = tf.contrib.learn.DNNClassifier(
- feature_columns=feature_columns,
- hidden_units=[10, 20, 10],
- n_classes=3,
- model_dir="/tmp/iris_model",
- config=tf.contrib.learn.RunConfig(save_checkpoints_secs=1))
-```
-
-NOTE: The `model_dir` parameter specifies an explicit directory
-(`/tmp/iris_model`) for model data to be stored; this directory path will be
-easier to reference later on than an autogenerated one. Each time you run the
-code, any existing data in `/tmp/iris_model` will be loaded, and model training
-will continue where it left off in the last run (e.g., running the script twice
-in succession will execute 4000 steps during training—2000 during each
-`fit` operation). To start over model training from scratch, delete
-`/tmp/iris_model` before running the code.
-
-Finally, to attach your `validation_monitor`, update the `fit` call to include a
-`monitors` param, which takes a list of all monitors to run during model
-training:
-
-```python
-classifier.fit(x=training_set.data,
- y=training_set.target,
- steps=2000,
- monitors=[validation_monitor])
-```
-
-Now, when you rerun the code, you should see validation metrics in your log
-output, e.g.:
-
-```none
-INFO:tensorflow:Validation (step 50): loss = 1.71139, global_step = 0, accuracy = 0.266667
-...
-INFO:tensorflow:Validation (step 300): loss = 0.0714158, global_step = 268, accuracy = 0.966667
-...
-INFO:tensorflow:Validation (step 1750): loss = 0.0574449, global_step = 1729, accuracy = 0.966667
-```
-
-### Customizing the Evaluation Metrics with MetricSpec
-
-By default, if no evaluation metrics are specified, `ValidationMonitor` will log
-both [loss](https://en.wikipedia.org/wiki/Loss_function) and accuracy, but you
-can customize the list of metrics that will be run every 50 steps. To specify
-the exact metrics you'd like to run in each evaluation pass, you can add a
-`metrics` param to the `ValidationMonitor` constructor. `metrics` takes a dict
-of key/value pairs, where each key is the name you'd like logged for the metric,
-and the corresponding value is a
-[`MetricSpec`](https://www.tensorflow.org/code/tensorflow/contrib/learn/python/learn/metric_spec.py)
-object.
-
-The `MetricSpec` constructor accepts four parameters:
-
-* `metric_fn`. The function that calculates and returns the value of a metric.
- This can be a predefined function available in the
- @{tf.contrib.metrics} module, such as
- @{tf.contrib.metrics.streaming_precision} or
- @{tf.contrib.metrics.streaming_recall}.
-
- Alternatively, you can define your own custom metric function, which must
- take `predictions` and `labels` tensors as arguments (a `weights` argument
- can also optionally be supplied). The function must return the value of the
- metric in one of two formats:
-
- * A single tensor
- * A pair of ops `(value_op, update_op)`, where `value_op` returns the
- metric value and `update_op` performs a corresponding operation to
- update internal model state.
-
-* `prediction_key`. The key of the tensor containing the predictions returned
- by the model. This argument may be omitted if the model returns either a
- single tensor or a dict with a single entry. For a `DNNClassifier` model,
- class predictions will be returned in a tensor with the key
- @{tf.contrib.learn.PredictionKey.CLASSES}.
-
-* `label_key`. The key of the tensor containing the labels returned by the
- model, as specified by the model's @{$input_fn$`input_fn`}. As
- with `prediction_key`, this argument may be omitted if the `input_fn`
- returns either a single tensor or a dict with a single entry. In the iris
- example in this tutorial, the `DNNClassifier` does not have an `input_fn`
- (`x`,`y` data is passed directly to `fit`), so it's not necessary to provide
- a `label_key`.
-
-* `weights_key`. *Optional*. The key of the tensor (returned by the
- @{$input_fn$`input_fn`}) containing weights inputs for the
- `metric_fn`.
-
-The following code creates a `validation_metrics` dict that defines three
-metrics to log during model evaluation:
-
-* `"accuracy"`, using @{tf.contrib.metrics.streaming_accuracy}
- as the `metric_fn`
-* `"precision"`, using @{tf.contrib.metrics.streaming_precision}
- as the `metric_fn`
-* `"recall"`, using @{tf.contrib.metrics.streaming_recall}
- as the `metric_fn`
-
-```python
-validation_metrics = {
- "accuracy":
- tf.contrib.learn.MetricSpec(
- metric_fn=tf.contrib.metrics.streaming_accuracy,
- prediction_key=tf.contrib.learn.PredictionKey.CLASSES),
- "precision":
- tf.contrib.learn.MetricSpec(
- metric_fn=tf.contrib.metrics.streaming_precision,
- prediction_key=tf.contrib.learn.PredictionKey.CLASSES),
- "recall":
- tf.contrib.learn.MetricSpec(
- metric_fn=tf.contrib.metrics.streaming_recall,
- prediction_key=tf.contrib.learn.PredictionKey.CLASSES)
-}
-```
-
-Add the above code before the `ValidationMonitor` constructor. Then revise the
-`ValidationMonitor` constructor as follows to add a `metrics` parameter to log
-the accuracy, precision, and recall metrics specified in `validation_metrics`
-(loss is always logged, and doesn't need to be explicitly specified):
-
-```python
-validation_monitor = tf.contrib.learn.monitors.ValidationMonitor(
- test_set.data,
- test_set.target,
- every_n_steps=50,
- metrics=validation_metrics)
-```
-
-Rerun the code, and you should see precision and recall included in your log
-output, e.g.:
-
-```none
-INFO:tensorflow:Validation (step 50): recall = 0.0, loss = 1.20626, global_step = 1, precision = 0.0, accuracy = 0.266667
-...
-INFO:tensorflow:Validation (step 600): recall = 1.0, loss = 0.0530696, global_step = 571, precision = 1.0, accuracy = 0.966667
-...
-INFO:tensorflow:Validation (step 1500): recall = 1.0, loss = 0.0617403, global_step = 1452, precision = 1.0, accuracy = 0.966667
-```
-
-### Early Stopping with ValidationMonitor
-
-Note that in the above log output, by step 600, the model has already achieved
-precision and recall rates of 1.0. This raises the question as to whether model
-training could benefit from
-[early stopping](https://en.wikipedia.org/wiki/Early_stopping).
-
-In addition to logging eval metrics, `ValidationMonitor`s make it easy to
-implement early stopping when specified conditions are met, via three params:
-
-| Param | Description |
-| -------------------------------- | ----------------------------------------- |
-| `early_stopping_metric` | Metric that triggers early stopping |
-: : (e.g., loss or accuracy) under conditions :
-: : specified in `early_stopping_rounds` and :
-: : `early_stopping_metric_minimize`. Default :
-: : is `"loss"`. :
-| `early_stopping_metric_minimize` | `True` if desired model behavior is to |
-: : minimize the value of :
-: : `early_stopping_metric`; `False` if :
-: : desired model behavior is to maximize the :
-: : value of `early_stopping_metric`. Default :
-: : is `True`. :
-| `early_stopping_rounds` | Sets a number of steps during which if |
-: : the `early_stopping_metric` does not :
-: : decrease (if :
-: : `early_stopping_metric_minimize` is :
-: : `True`) or increase (if :
-: : `early_stopping_metric_minimize` is :
-: : `False`), training will be stopped. :
-: : Default is `None`, which means early :
-: : stopping will never occur. :
-
-Make the following revision to the `ValidationMonitor` constructor, which
-specifies that if loss (`early_stopping_metric="loss"`) does not decrease
-(`early_stopping_metric_minimize=True`) over a period of 200 steps
-(`early_stopping_rounds=200`), model training will stop immediately at that
-point, and not complete the full 2000 steps specified in `fit`:
-
-```python
-validation_monitor = tf.contrib.learn.monitors.ValidationMonitor(
- test_set.data,
- test_set.target,
- every_n_steps=50,
- metrics=validation_metrics,
- early_stopping_metric="loss",
- early_stopping_metric_minimize=True,
- early_stopping_rounds=200)
-```
-
-Rerun the code to see if model training stops early:
-
-```none
-...
-INFO:tensorflow:Validation (step 1150): recall = 1.0, loss = 0.056436, global_step = 1119, precision = 1.0, accuracy = 0.966667
-INFO:tensorflow:Stopping. Best step: 800 with loss = 0.048313818872.
-```
-
-Indeed, here training stops at step 1150, indicating that for the past 200
-steps, loss did not decrease, and that overall, step 800 produced the smallest
-loss value against the test data set. This suggests that additional calibration
-of hyperparameters by decreasing the step count might further improve the model.
-
-## Visualizing Log Data with TensorBoard
-
-Reading through the log produced by `ValidationMonitor` provides plenty of raw
-data on model performance during training, but it may also be helpful to see
-visualizations of this data to get further insight into trends—for
-example, how accuracy is changing over step count. You can use TensorBoard (a
-separate program packaged with TensorFlow) to plot graphs like this by setting
-the `logdir` command-line argument to the directory where you saved your model
-training data (here, `/tmp/iris_model`). Run the following on your command line:
-
-<pre><strong>$ tensorboard --logdir=/tmp/iris_model/</strong>
-Starting TensorBoard 39 on port 6006</pre>
-
-Then navigate to `http://0.0.0.0:`*`<port_number>`* in your browser, where
-*`<port_number>`* is the port specified in the command-line output (here,
-`6006`).
-
-If you click on the accuracy field, you'll see an image like the following,
-which shows accuracy plotted against step count:
-
-
-
-For more on using TensorBoard, see @{$summaries_and_tensorboard$TensorBoard: Visualizing Learning} and @{$graph_viz$TensorBoard: Graph Visualization}.
Now that you've gotten started writing TensorFlow programs, consider the
following material:
-* @{$get_started/saving_models$Checkpoints} to learn how to save and restore
+* @{$get_started/checkpoints$Checkpoints} to learn how to save and restore
models.
* @{$get_started/datasets_quickstart$Datasets} to learn more about importing
data into your
+++ /dev/null
-# Checkpoints
-
-This document examines how to save and restore TensorFlow models built with
-Estimators. TensorFlow provides two model formats:
-
-* checkpoints, which is a format dependent on the code that created
- the model.
-* SavedModel, which is a format independent of the code that created
- the model.
-
-This document focuses on checkpoints. For details on SavedModel, see the
-@{$saved_model$Saving and Restoring} chapter of the
-*TensorFlow Programmer's Guide*.
-
-
-## Sample code
-
-This document relies on the same
-[https://github.com/tensorflow/models/blob/master/samples/core/get_started/premade_estimator.py](Iris classification example) detailed in @{$premade_estimators$Getting Started with TensorFlow}.
-To download and access the example, invoke the following two commands:
-
-```shell
-git clone https://github.com/tensorflow/models/
-cd models/samples/core/get_started
-```
-
-Most of the code snippets in this document are minor variations
-on `premade_estimator.py`.
-
-
-## Saving partially-trained models
-
-Estimators automatically write the following to disk:
-
-* **checkpoints**, which are versions of the model created during training.
-* **event files**, which contain information that
- [TensorBoard](https://developers.google.com/machine-learning/glossary/#TensorBoard)
- uses to create visualizations.
-
-To specify the top-level directory in which the Estimator stores its
-information, assign a value to the optional `model_dir` argument of any
-Estimator's constructor. For example, the following code sets the `model_dir`
-argument to the `models/iris` directory:
-
-```python
-classifier = tf.estimator.DNNClassifier(
- feature_columns=my_feature_columns,
- hidden_units=[10, 10],
- n_classes=3,
- model_dir='models/iris')
-```
-
-Suppose you call the Estimator's `train` method. For example:
-
-
-```python
-classifier.train(
- input_fn=lambda:train_input_fn(train_x, train_y, batch_size=100),
- steps=200)
-```
-
-As suggested by the following diagrams, the first call to `train`
-adds checkpoints and other files to the `model_dir` directory:
-
-<div style="width:80%; margin:auto; margin-bottom:10px; margin-top:20px;">
-<img style="width:100%" src="../images/first_train_calls.png">
-</div>
-<div style="text-align: center">
-The first call to train().
-</div>
-
-
-To see the objects in the created `model_dir` directory on a
-UNIX-based system, just call `ls` as follows:
-
-```none
-$ ls -1 models/iris
-checkpoint
-events.out.tfevents.timestamp.hostname
-graph.pbtxt
-model.ckpt-1.data-00000-of-00001
-model.ckpt-1.index
-model.ckpt-1.meta
-model.ckpt-200.data-00000-of-00001
-model.ckpt-200.index
-model.ckpt-200.meta
-```
-
-The preceding `ls` command shows that the Estimator created checkpoints
-at steps 1 (the start of training) and 200 (the end of training).
-
-
-### Default checkpoint directory
-
-If you don't specify `model_dir` in an Estimator's constructor, the Estimator
-writes checkpoint files to a temporary directory chosen by Python's
-[tempfile.mkdtemp](https://docs.python.org/3/library/tempfile.html#tempfile.mkdtemp)
-function. For example, the following Estimator constructor does *not* specify
-the `model_dir` argument:
-
-```python
-classifier = tf.estimator.DNNClassifier(
- feature_columns=my_feature_columns,
- hidden_units=[10, 10],
- n_classes=3)
-
-print(classifier.model_dir)
-```
-
-The `tempfile.mkdtemp` function picks a secure, temporary directory
-appropriate for your operating system. For example, a typical temporary
-directory on macOS might be something like the following:
-
-```None
-/var/folders/0s/5q9kfzfj3gx2knj0vj8p68yc00dhcr/T/tmpYm1Rwa
-```
-
-### Checkpointing Frequency
-
-By default, the Estimator saves
-[checkpoints](https://developers.google.com/machine-learning/glossary/#checkpoint)
-in the `model_dir` according to the following schedule:
-
-* Writes a checkpoint every 10 minutes (600 seconds).
-* Writes a checkpoint when the `train` method starts (first iteration)
- and completes (final iteration).
-* Retains only the 5 most recent checkpoints in the directory.
-
-You may alter the default schedule by taking the following steps:
-
-1. Create a @{tf.estimator.RunConfig$`RunConfig`} object that defines the
- desired schedule.
-2. When instantiating the Estimator, pass that `RunConfig` object to the
- Estimator's `config` argument.
-
-For example, the following code changes the checkpointing schedule to every
-20 minutes and retains the 10 most recent checkpoints:
-
-```python
-my_checkpointing_config = tf.estimator.RunConfig(
- save_checkpoints_secs = 20*60, # Save checkpoints every 20 minutes.
- keep_checkpoint_max = 10, # Retain the 10 most recent checkpoints.
-)
-
-classifier = tf.estimator.DNNClassifier(
- feature_columns=my_feature_columns,
- hidden_units=[10, 10],
- n_classes=3,
- model_dir='models/iris',
- config=my_checkpointing_config)
-```
-
-## Restoring your model
-
-The first time you call an Estimator's `train` method, TensorFlow saves a
-checkpoint to the `model_dir`. Each subsequent call to the Estimator's
-`train`, `eval`, or `predict` method causes the following:
-
-1. The Estimator builds the model's
- [graph](https://developers.google.com/machine-learning/glossary/#graph)
- by running the `model_fn()`. (For details on the `model_fn()`, see
- @{$custom_estimators$Creating Custom Estimators.})
-2. The Estimator initializes the weights of the new model from the data
- stored in the most recent checkpoint.
-
-In other words, as the following illustration suggests, once checkpoints
-exist, TensorFlow rebuilds the model each time you call `train()`,
-`evaluate()`, or `predict()`.
-
-<div style="width:80%; margin:auto; margin-bottom:10px; margin-top:20px;">
-<img style="width:100%" src="../images/subsequent_calls.png">
-</div>
-<div style="text-align: center">
-Subsequent calls to train(), evaluate(), or predict()
-</div>
-
-
-### Avoiding a bad restoration
-
-Restoring a model's state from a checkpoint only works if the model
-and checkpoint are compatible. For example, suppose you trained a
-`DNNClassifier` Estimator containing two hidden layers,
-each having 10 nodes:
-
-```python
-classifier = tf.estimator.DNNClassifier(
- feature_columns=feature_columns,
- hidden_units=[10, 10],
- n_classes=3,
- model_dir='models/iris')
-
-classifier.train(
- input_fn=lambda:train_input_fn(train_x, train_y, batch_size=100),
- steps=200)
-```
-
-After training (and, therefore, after creating checkpoints in `models/iris`),
-imagine that you changed the number of neurons in each hidden layer from 10 to
-20 and then attempted to retrain the model:
-
-``` python
-classifier2 = tf.estimator.DNNClassifier(
- feature_columns=my_feature_columns,
- hidden_units=[20, 20], # Change the number of neurons in the model.
- n_classes=3,
- model_dir='models/iris')
-
-classifier.train(
- input_fn=lambda:train_input_fn(train_x, train_y, batch_size=100),
- steps=200)
-```
-
-Since the state in the checkpoint is incompatible with the model described
-in `classifier2`, retraining fails with the following error:
-
-```None
-...
-InvalidArgumentError (see above for traceback): tensor_name =
-dnn/hiddenlayer_1/bias/t_0/Adagrad; shape in shape_and_slice spec [10]
-does not match the shape stored in checkpoint: [20]
-```
-
-To run experiments in which you train and compare slightly different
-versions of a model, save a copy of the code that created each
-`model-dir`, possibly by creating a separate git branch for each version.
-This separation will keep your checkpoints recoverable.
-
-## Summary
-
-Checkpoints provide an easy automatic mechanism for saving and restoring
-models created by Estimators.
-
-See the @{$saved_model$Saving and Restoring}
-chapter of the *TensorFlow Programmer's Guide* for details on:
-
-* Saving and restoring models using low-level TensorFlow APIs.
-* Exporting and importing models in the SavedModel format, which is a
- language-neutral, recoverable, serialization format.
.filter(lambda line: tf.not_equal(tf.substr(line, 0, 1), "#"))))
```
-For a full example of parsing a CSV file using datasets, see [`imports85.py`](https://www.tensorflow.org/code/tensorflow/examples/get_started/regression/imports85.py)
-in @{$get_started/linear_regression}.
-
<!--
TODO(mrry): Add these sections.