If you open a GitHub issue, here is our policy:
-1. It must be a bug or a feature request.
+1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
2. The form below must be filled out.
3. It shouldn't be a TensorBoard issue. Those go [here](https://github.com/tensorflow/tensorboard/issues).
| **`Linux CPU`** | **`Linux GPU`** | **`Mac OS CPU`** | **`Windows CPU`** | **`Android`** |
|-----------------|---------------------|------------------|-------------------|---------------|
-| [](https://ci.tensorflow.org/job/tensorflow-master-cpu) | [](https://ci.tensorflow.org/job/tensorflow-master-linux-gpu) | [](https://ci.tensorflow.org/job/tensorflow-master-mac) | [](https://ci.tensorflow.org/job/tensorflow-master-win-cmake-py) | [](https://ci.tensorflow.org/job/tensorflow-master-android) |
+| [](https://ci.tensorflow.org/job/tensorflow-master-cpu) | [](https://ci.tensorflow.org/job/tensorflow-master-linux-gpu) | [](https://ci.tensorflow.org/job/tensorflow-master-mac) | [](https://ci.tensorflow.org/job/tensorflow-master-win-cmake-py) | [](https://ci.tensorflow.org/job/tensorflow-master-android) [  ](https://bintray.com/google/tensorflow/tensorflow/_latestVersion) |
**TensorFlow** is an open source software library for numerical computation using
data flow graphs. The graph nodes represent mathematical operations, while
uphold this code.**
**We use [GitHub issues](https://github.com/tensorflow/tensorflow/issues) for
-tracking requests and bugs. So please see
+tracking requests and bugs. So please see
[TensorFlow Discuss](https://groups.google.com/a/tensorflow.org/forum/#!forum/discuss) for general questions
and discussion, and please direct specific questions to [Stack Overflow](https://stackoverflow.com/questions/tagged/tensorflow).**
# Release 1.5.0
## Breaking Changes
-* Prebuilt binaries are now built against CUDA 9 and cuDNN 7.
+* Prebuilt binaries are now built against CUDA 9.0 and cuDNN 7.
* Our Linux binaries are built using ubuntu 16 containers, potentially
introducing glibc incompatibility issues with ubuntu 14.
* Starting from 1.6 release, our prebuilt binaries will use AVX instructions.
This may break TF on older CPUs.
+## Known Bugs
+* Using XLA:GPU with CUDA 9 and CUDA 9.1 results in garbage results and/or
+ `CUDA_ILLEGAL_ADDRESS` failures.
+
+ Google discovered in mid-December 2017 that the PTX-to-SASS compiler in CUDA 9
+ and CUDA 9.1 sometimes does not properly compute the carry bit when
+ decomposing 64-bit address calculations with large offsets (e.g. `load [x +
+ large_constant]`) into 32-bit arithmetic in SASS.
+
+ As a result, these versions of `ptxas` miscompile most XLA programs which use
+ more than 4GB of temp memory. This results in garbage results and/or
+ `CUDA_ERROR_ILLEGAL_ADDRESS` failures.
+
+ A fix in CUDA 9.1.121 is expected in late February 2018. We do not expect a
+ fix for CUDA 9.0.x. Until the fix is available, the only workaround is to
+ [downgrade](https://developer.nvidia.com/cuda-toolkit-archive) to CUDA 8.0.x
+ or disable XLA:GPU.
+
+ TensorFlow will print a warning if you use XLA:GPU with a known-bad version of
+ CUDA; see e00ba24c4038e7644da417ddc639169b6ea59122.
+
## Major Features And Improvements
* [Eager execution](https://github.com/tensorflow/tensorflow/tree/r1.5/tensorflow/contrib/eager)
preview version is now available.
* [TensorFlow Lite](https://github.com/tensorflow/tensorflow/tree/r1.5/tensorflow/contrib/lite)
dev preview is now available.
-* CUDA 9 and cuDNN 7 support.
+* CUDA 9.0 and cuDNN 7 support.
* Accelerated Linear Algebra (XLA):
* Add `complex64` support to XLA compiler.
* `bfloat` support is now added to XLA infrastructure.
* Fixed LIBXSMM integration.
* Make decode_jpeg/decode_png/decode_gif handle all formats, since users frequently try to decode an image as the wrong type.
* Improve implicit broadcasting lowering.
-* Improving stability of GCS/Bigquery clients by a faster retrying of stale transmissions.
+* Improving stability of GCS/BigQuery clients by a faster retrying of stale transmissions.
* Remove OpKernelConstruction::op_def() as part of minimizing proto dependencies.
* VectorLaplaceDiag distribution added.
* Android demo no longer requires libtensorflow_demo.so to run (libtensorflow_inference.so still required)
tf_workspace()
new_http_archive(
- name = "inception5h",
+ name = "inception_v1",
build_file = "models.BUILD",
- sha256 = "d13569f6a98159de37e92e9c8ec4dae8f674fbf475f69fe6199b514f756d4364",
+ sha256 = "7efe12a8363f09bc24d7b7a450304a15655a57a7751929b2c1593a71183bb105",
urls = [
- "http://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip",
- "http://download.tensorflow.org/models/inception5h.zip",
+ "http://storage.googleapis.com/download.tensorflow.org/models/inception_v1.zip",
+ "http://download.tensorflow.org/models/inception_v1.zip",
],
)
System".
enabled_by_default: boolean for default behavior.
question: optional string for how to ask for user input.
- yes_reply: optionanl string for reply when feature is enabled.
+ yes_reply: optional string for reply when feature is enabled.
no_reply: optional string for reply when feature is disabled.
Returns:
System".
enabled_by_default: boolean for default behavior.
question: optional string for how to ask for user input.
- yes_reply: optionanl string for reply when feature is enabled.
+ yes_reply: optional string for reply when feature is enabled.
no_reply: optional string for reply when feature is disabled.
"""
var = int(
environ_cp['TF_NEED_GCP'] = '0'
environ_cp['TF_NEED_HDFS'] = '0'
environ_cp['TF_NEED_JEMALLOC'] = '0'
+ environ_cp['TF_NEED_KAFKA'] = '0'
environ_cp['TF_NEED_OPENCL_SYCL'] = '0'
environ_cp['TF_NEED_COMPUTECPP'] = '0'
environ_cp['TF_NEED_OPENCL'] = '0'
'with_hdfs_support', True, 'hdfs')
set_build_var(environ_cp, 'TF_NEED_S3', 'Amazon S3 File System',
'with_s3_support', True, 's3')
+ set_build_var(environ_cp, 'TF_NEED_KAFKA', 'Apache Kafka Platform',
+ 'with_kafka_support', False, 'kafka')
set_build_var(environ_cp, 'TF_ENABLE_XLA', 'XLA JIT', 'with_xla_support',
False, 'xla')
set_build_var(environ_cp, 'TF_NEED_GDR', 'GDR', 'with_gdr_support',
visibility = ["//visibility:public"],
)
+config_setting(
+ name = "with_kafka_support",
+ define_values = {"with_kafka_support": "true"},
+ visibility = ["//visibility:public"],
+)
+
# Crosses between platforms and file system libraries not supported on those
# platforms due to limitations in nested select() statements.
config_setting(
"linalg_ops",
"logging_ops",
"lookup_ops",
+ "manip_ops",
"math_ops",
"nn_ops",
"no_op",
return Status::OK();
}
- // Adds `graph_def` to `saved_model_bundle` and intializes a session with
+ // Adds `graph_def` to `saved_model_bundle` and initializes a session with
// `init_node`.
Status AddGraphDefToSavedModelBundle(const GraphDef& graph_def,
const string& init_node,
config = "test_graph_tfadd.config.pbtxt",
cpp_class = "AddComp",
graph = "test_graph_tfadd.pbtxt",
- tags = ["manual"],
+ tags = [
+ "manual",
+ "notap",
+ ],
)
# A test of tf_library that includes a graph with an unknown op, but where
config = "test_graph_tfunknownop.config.pbtxt",
cpp_class = "UnknownOpAddComp",
graph = "test_graph_tfunknownop.pbtxt",
- tags = ["manual"],
+ tags = [
+ "manual",
+ "notap",
+ ],
)
# A test of tf_library that includes a graph with an unknown op, but where
config = "test_graph_tfunknownop2.config.pbtxt",
cpp_class = "UnknownOpAddComp",
graph = "test_graph_tfunknownop.pbtxt",
- tags = ["manual"],
+ tags = [
+ "manual",
+ "notap",
+ ],
)
# A test of tf_library that includes a graph with an unknown op, but where
config = "test_graph_tfunknownop3.config.pbtxt",
cpp_class = "UnknownOpAddComp",
graph = "test_graph_tfunknownop.pbtxt",
- tags = ["manual"],
+ tags = [
+ "manual",
+ "notap",
+ ],
)
# Utility library for benchmark binaries, used by the *_benchmark rules that are
# compile but the others in this directory succeed, you may need to
# expand the "required by all tf_library targets" list in tfcompile.bzl.
include_standard_runtime_deps = False,
- tags = ["manual"],
+ tags = [
+ "manual",
+ "notap",
+ ],
)
tf_library(
cpp_class = "AddWithCkptComp",
freeze_checkpoint = "test_graph_tfadd_with_ckpt.ckpt",
graph = "test_graph_tfadd_with_ckpt.pb",
- tags = ["manual"],
+ tags = [
+ "manual",
+ "notap",
+ ],
)
tf_library(
freeze_checkpoint = "test_graph_tfadd_with_ckpt_saver.ckpt",
freeze_saver = "test_graph_tfadd_with_ckpt_saver.saver",
graph = "test_graph_tfadd_with_ckpt_saver.pb",
- tags = ["manual"],
+ tags = [
+ "manual",
+ "notap",
+ ],
)
tf_library(
config = "test_graph_tffunction.config.pbtxt",
cpp_class = "FunctionComp",
graph = "test_graph_tffunction.pb",
- tags = ["manual"],
+ tags = [
+ "manual",
+ "notap",
+ ],
)
tf_library(
config = "test_graph_tfgather.config.pbtxt",
cpp_class = "GatherComp",
graph = "test_graph_tfgather.pb",
- tags = ["manual"],
+ tags = [
+ "manual",
+ "notap",
+ ],
)
tf_library(
config = "test_graph_tfmatmul.config.pbtxt",
cpp_class = "foo::bar::MatMulComp",
graph = "test_graph_tfmatmul.pb",
- tags = ["manual"],
+ tags = [
+ "manual",
+ "notap",
+ ],
)
tf_library(
config = "test_graph_tfmatmulandadd.config.pbtxt",
cpp_class = "MatMulAndAddComp",
graph = "test_graph_tfmatmulandadd.pb",
- tags = ["manual"],
+ tags = [
+ "manual",
+ "notap",
+ ],
tfcompile_flags = "--gen_name_to_index --gen_program_shape",
)
config = "test_graph_tfsplits.config.pbtxt",
cpp_class = "SplitsComp",
graph = "test_graph_tfsplits.pb",
- tags = ["manual"],
+ tags = [
+ "manual",
+ "notap",
+ ],
)
tf_cc_test(
name = "tfcompile_test",
srcs = ["tfcompile_test.cc"],
- tags = ["manual"],
+ tags = [
+ "manual",
+ "notap",
+ ],
deps = [
":test_graph_tfadd",
":test_graph_tfadd_with_ckpt",
def DISABLED_testSparseMatMul(self):
# Binary wrappers for sparse_matmul with different hints
def SparseMatmulWrapperTF(a, b):
- return tf.sparse_matmul(a, b, a_is_sparse=True)
+ return math_ops.sparse_matmul(a, b, a_is_sparse=True)
def SparseMatmulWrapperFT(a, b):
- return tf.sparse_matmul(a, b, b_is_sparse=True)
+ return math_ops.sparse_matmul(a, b, b_is_sparse=True)
def SparseMatmulWrapperTT(a, b):
- return tf.sparse_matmul(a, b, a_is_sparse=True, b_is_sparse=True)
+ return math_ops.sparse_matmul(a, b, a_is_sparse=True, b_is_sparse=True)
- self._testMatMul(tf.sparse_matmul)
+ self._testMatMul(math_ops.sparse_matmul)
self._testMatMul(SparseMatmulWrapperTF)
self._testMatMul(SparseMatmulWrapperFT)
self._testMatMul(SparseMatmulWrapperTT)
PoolingOp(OpKernelConstruction* ctx, int num_spatial_dims)
: XlaOpKernel(ctx), num_spatial_dims_(num_spatial_dims) {
if (ctx->num_inputs() == 1) {
- OP_REQUIRES_OK(ctx, ctx->GetAttr("ksize", &ksize_));
- OP_REQUIRES_OK(ctx, ctx->GetAttr("strides", &stride_));
+ std::vector<int32> ksize_int;
+ std::vector<int32> stride_int;
+ OP_REQUIRES_OK(ctx, ctx->GetAttr("ksize", &ksize_int));
+ OP_REQUIRES(ctx, ksize_int.size() == num_dims(),
+ errors::InvalidArgument("Sliding window ksize field must "
+ "specify ",
+ num_dims(), " dimensions"));
+ OP_REQUIRES_OK(ctx, ctx->GetAttr("strides", &stride_int));
+ OP_REQUIRES(ctx, stride_int.size() == num_dims(),
+ errors::InvalidArgument("Sliding window stride field must "
+ "specify ",
+ num_dims(), " dimensions"));
+ for (int i = 0; i < num_dims(); ++i) {
+ ksize_.push_back(ksize_int[i]);
+ stride_.push_back(stride_int[i]);
+ }
}
Padding padding;
OP_REQUIRES_OK(ctx, ctx->GetAttr("padding", &padding));
xla::ComputationDataHandle input = ctx->Input(0);
const TensorShape input_shape = ctx->InputShape(0);
+ std::vector<int64> ksize = ksize_;
+ std::vector<int64> stride = stride_;
if (ctx->num_inputs() != 1) {
const TensorShape ksize_shape = ctx->InputShape(1);
+ // Validate input sizes.
OP_REQUIRES(ctx, TensorShapeUtils::IsVector(ksize_shape),
errors::InvalidArgument("ksize must be a vector, not shape ",
ksize_shape.DebugString()));
- OP_REQUIRES_OK(ctx, ctx->ConstantInputAsIntVector(1, &ksize_));
+ OP_REQUIRES(ctx, ksize_shape.num_elements() == num_dims(),
+ errors::InvalidArgument("Sliding window ksize field must "
+ "specify ",
+ num_dims(), " dimensions"));
+ ksize.clear();
+ OP_REQUIRES_OK(ctx, ctx->ConstantInputAsIntVector(1, &ksize));
const TensorShape stride_shape = ctx->InputShape(2);
+ // Validate input sizes.
OP_REQUIRES(ctx, TensorShapeUtils::IsVector(stride_shape),
errors::InvalidArgument("stride must be a vector, not shape ",
stride_shape.DebugString()));
- OP_REQUIRES_OK(ctx, ctx->ConstantInputAsIntVector(2, &stride_));
+ OP_REQUIRES(ctx, stride_shape.num_elements() == num_dims(),
+ errors::InvalidArgument("Sliding window stride field must "
+ "specify ",
+ num_dims(), " dimensions"));
+ stride.clear();
+ OP_REQUIRES_OK(ctx, ctx->ConstantInputAsIntVector(2, &stride));
}
-
- OP_REQUIRES(ctx, ksize_.size() == num_dims(),
- errors::InvalidArgument("Sliding window ksize field must "
- "specify ",
- num_dims(), " dimensions"));
- OP_REQUIRES(ctx, stride_.size() == num_dims(),
- errors::InvalidArgument("Sliding window stride field must "
- "specify ",
- num_dims(), " dimensions"));
OP_REQUIRES(ctx, input_shape.dims() == num_dims(),
errors::InvalidArgument("Input to ", type_string(),
" operator must have ", num_dims(),
const DataType type = input_type(0);
xla::ComputationDataHandle pooled = ctx->builder()->ReduceWindow(
- input, InitValue(ctx->builder(), type), *Reduction(ctx, type), ksize_,
- stride_, padding_);
+ input, InitValue(ctx->builder(), type), *Reduction(ctx, type), ksize,
+ stride, padding_);
ctx->SetOutput(0, PostProcessOutput(ctx, pooled, type, input_shape));
}
// OpMetadata is often applied to a series of XLA HLO instructions. As a
// result, OpMetadata is set on the Computation Builder. All subsequent
// instructions generated via this Computation Builder will have the same
- // OpMetadata attached until a call to ClearOpMetdata.
+ // OpMetadata attached until a call to ClearOpMetadata.
void SetOpMetadata(const OpMetadata& metadata) { metadata_ = metadata; }
// Clears the HloMetadata state.
//
// {[2:3:4], [5:6:7], [8:9]}
//
-// The the parsed result will be:
+// The parsed result will be:
//
// {/*starts=*/{2, 5, 8}, /*limits=*/{3, 6, 9}, /*strides=*/{4, 7, 1}}
//
"//tensorflow/contrib/image:single_image_random_dot_stereograms_py",
"//tensorflow/contrib/input_pipeline:input_pipeline_py",
"//tensorflow/contrib/integrate:integrate_py",
+ "//tensorflow/contrib/kafka",
"//tensorflow/contrib/keras",
"//tensorflow/contrib/kernel_methods",
"//tensorflow/contrib/kfac",
"//tensorflow/contrib/factorization:all_ops",
"//tensorflow/contrib/framework:all_ops",
"//tensorflow/contrib/input_pipeline:input_pipeline_ops_op_lib",
+ "//tensorflow/contrib/kafka:kafka_ops_op_lib",
"//tensorflow/contrib/layers:sparse_feature_cross_op_op_lib",
"//tensorflow/contrib/nccl:nccl_ops_op_lib",
"//tensorflow/contrib/nearest_neighbor:nearest_neighbor_ops_op_lib",
* @param outputNames A list of output nodes which should be filled by the inference pass.
*/
public void run(String[] outputNames, boolean enableStats) {
+ run(outputNames, enableStats, new String[] {});
+ }
+
+ /** An overloaded version of runInference that allows supplying targetNodeNames as well */
+ public void run(String[] outputNames, boolean enableStats, String[] targetNodeNames) {
// Release any Tensors from the previous run calls.
closeFetches();
runner.fetch(tid.name, tid.outputIndex);
}
+ // Add targets.
+ for (String t : targetNodeNames) {
+ runner.addTarget(t);
+ }
+
// Run the session.
try {
if (enableStats) {
tensorflow/core/framework
tensorflow/core/lib
tensorflow/core/lib/core
+tensorflow/core/profiler
tensorflow/core/protobuf
tensorflow/core/util
tensorflow/examples
tensorflow/contrib/integrate
tensorflow/contrib/integrate/python
tensorflow/contrib/integrate/python/ops
+tensorflow/contrib/kafka/python
+tensorflow/contrib/kafka/python/ops
tensorflow/contrib/keras
tensorflow/contrib/keras/api
tensorflow/contrib/keras/api/keras
"list_ops"
"lookup_ops"
"logging_ops"
+ "manip_ops"
"math_ops"
"nn_ops"
"no_op"
GENERATE_PYTHON_OP_LIB("logging_ops")
GENERATE_PYTHON_OP_LIB("lookup_ops")
GENERATE_PYTHON_OP_LIB("nn_ops")
+GENERATE_PYTHON_OP_LIB("manip_ops")
GENERATE_PYTHON_OP_LIB("parsing_ops")
GENERATE_PYTHON_OP_LIB("random_ops")
GENERATE_PYTHON_OP_LIB("remote_fused_graph_ops"
from __future__ import print_function
import argparse
-import io
+import codecs
import os
import re
import subprocess
for lib_path in args.input:
proc = subprocess.Popen([DUMPBIN, "/nologo", "/linkermember:1", lib_path],
stdout=subprocess.PIPE)
- for line in io.TextIOWrapper(proc.stdout, encoding="utf-8"):
+ for line in codecs.getreader("utf-8")(proc.stdout):
cols = line.split()
if len(cols) < 2:
continue
# We compare on undname but use the decorated name from candidates.
dupes = 0
proc = subprocess.Popen([UNDNAME, tmpfile.name], stdout=subprocess.PIPE)
- for idx, line in enumerate(io.TextIOWrapper(proc.stdout, encoding="utf-8")):
+ for idx, line in enumerate(codecs.getreader("utf-8")(proc.stdout)):
decorated = candidates[idx]
if decorated in taken:
# Symbol is already in output, done.
around,
- The number of CDF axes does not extend, i.e., `CDF.ndim == data.ndim + 1`.
-In the previous example where data has shape (10, 10), the followings are
+In the previous example where data has shape (10, 10), the following are
acceptable CDF shapes:
- (10, 10, 65)
}
} else if (base_ != 0) {
// If base == 0, then pick 0 from [base, base + size) and no zeros are
- // explcitly written.
+ // explicitly written.
//
// Otherwise, pick (base + (2^16 - base[16:0])), i.e., round up base to the
// next multiple of 2^16. As 2^16 < size, this value should be in the
import time
+from six.moves import xrange # pylint: disable=redefined-builtin
from tensorflow.contrib import rnn as contrib_rnn
from tensorflow.contrib.cudnn_rnn.python.ops import cudnn_rnn_ops
from tensorflow.contrib.rnn.python.ops import lstm_ops
call_op: An op that updates evaluation state on a mini-batch of examples.
Must generate an tf.errors.OutOfRangeError when done.
results_op: A dictionary of tensors that compute the final evaluation
- results from the evaulation state.
+ results from the evaluation state.
sess: The Session to run the evaluation in. Defaults to the default
Session.
(Or remove the `--config=cuda` flag for running on CPU instead of GPU).
-On October 31, 2017, the benchmarks demostrated comparable performance
+On October 31, 2017, the benchmarks demonstrated comparable performance
for eager and graph execution of this particular model when using
a single NVIDIA Titan X (Pascal) GPU on a host with an
Intel Xeon E5-1650 CPU @ 3.50GHz and a batch size of 32.
Args:
kernel_size: the kernel size of middle conv layer at main path
- filters: list of integers, the filterss of 3 conv layer at main path
+ filters: list of integers, the filters of 3 conv layer at main path
stage: integer, current stage label, used for generating layer names
block: 'a','b'..., current block label, used for generating layer names
data_format: data_format for the input ('channels_first' or
import tempfile
import time
+from six.moves import xrange # pylint: disable=redefined-builtin
import tensorflow as tf
import tensorflow.contrib.eager as tfe
(Or remove the `--config=cuda` flag for running on CPU instead of GPU).
-On October 31, 2017, the benchmarks demostrated slightly better performance
+On October 31, 2017, the benchmarks demonstrated slightly better performance
(3-6%) for graph execution over eager execution for this particular model when
using a single NVIDIA Titan X (Pascal) GPU on a host with an Intel Xeon E5-1650
CPU @ 3.50GHz and a batch size of 32.
class PTBModel(tfe.Network):
- """LSTM for word language modelling.
+ """LSTM for word language modeling.
Model described in:
(Zaremba, et. al.) Recurrent Neural Network Regularization
"http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz")
parser.add_argument(
"--logdir", type=str, default="", help="Directory for checkpoint.")
- parser.add_argument(
- "--epoch", type=int, default=20, help="Number of epoches.")
+ parser.add_argument("--epoch", type=int, default=20, help="Number of epochs.")
parser.add_argument("--batch-size", type=int, default=20, help="Batch size.")
parser.add_argument(
"--seq-len", type=int, default=35, help="Sequence length.")
"""Get the non-parenthesis items from a SNLI parsed sentence.
Args:
- items: Data items from a parsed SNLI setence, with parentheses. E.g.,
+ items: Data items from a parsed SNLI sentence, with parentheses. E.g.,
["(", "Man", "(", "(", "(", "(", "(", "wearing", "pass", ")", ...
Returns:
- A list of non-parenthis word items, all converted to lower case. E.g.,
+ A list of non-parentheses word items, all converted to lower case. E.g.,
["man", "wearing", "pass", ...
"""
return [x.lower() for x in items if x not in PARENTHESES and x]
def calculate_bins(length2count, min_bin_size):
- """Cacluate bin boundaries given a histogram of lengths and mininum bin size.
+ """Calculate bin boundaries given a histogram of lengths and minimum bin size.
Args:
length2count: A `dict` mapping length to sentence count.
# The sorting above and the batching here makes sure that sentences of
# similar max lengths are batched together, minimizing the inefficiency
# due to uneven max lengths. The sentences are batched differently in
- # each call to get_generator() due to the shuffling before sotring
+ # each call to get_generator() due to the shuffling before sorting
# above. The pad_and_reverse_word_ids() and pad_transitions() functions
- # take care of any remaning unevenness of the max sentence lengths.
+ # take care of any remaining unevenness of the max sentence lengths.
end = min(begin + batch_size, len(labels))
# Transpose, because the SPINN model requires time-major, instead of
# batch-major.
import time
import numpy as np
+from six.moves import xrange # pylint: disable=redefined-builtin
import tensorflow as tf
# pylint: disable=g-bad-import-order
# No issue here since the name is unique within its scope.
name_conflict3 = MyNetwork(name="name_conflict")
net2 = MyNetwork() # name=outside_scope/my_network_2 to avoid the
- # variable_scope my_network_1 below.
+ # variable_scope my_network_1 below.
vs_name_conflict = MyNetwork(name="vs_name_conflict") # conflict below
with variable_scope.variable_scope("intervening_scope"):
with variable_scope.variable_scope(captured_scope):
net2(one)
# Layer names typically are globally unique rather than being unique within
# the scope of their first use. However, within a Network they must be named
- # locally so that previous Layer consutrciton does not interfere with
+ # locally so that previous Layer construction does not interfere with
# variable naming (e.g. add a Layer construction before the Network,
# suddenly your previously saved checkpoint is incompatible).
self.assertEqual("dense", net1.l1.name)
map_func_wrapper = lambda self, x: x
else:
if not callable(map_func):
- raise ValueError("map_func must be callaled.")
+ raise ValueError("map_func must be callable.")
map_func_wrapper = lambda self, x: map_func(x)
ckpt_var_cache = dict()
return Status::OK();
})
.Doc(R"doc(
-Processes the contents of an audio file into a tensor using FFmpeg to decode
+Processes the contents of an video file into a tensor using FFmpeg to decode
the file.
-One row of the tensor is created for each channel in the audio file. Each
-channel contains audio samples starting at the beginning of the audio and
-having `1/samples_per_second` time between them. If the `channel_count` is
-different from the contents of the file, channels will be merged or created.
-
-contents: The binary audio file contents, as a string or rank-0 string
- tensor.
+contents: The binary contents of the video file to decode. This is a
+ scalar.
+output: A rank-4 `Tensor` that has `[frames, height, width, 3]` RGB as output.
)doc");
} // namespace ffmpeg
from tensorflow.contrib.framework.python.ops import add_arg_scope as contrib_add_arg_scope
from tensorflow.contrib.framework.python.ops import gen_variable_ops
from tensorflow.contrib.util import loader
+from tensorflow.core.protobuf import saver_pb2
from tensorflow.python import pywrap_tensorflow
from tensorflow.python.framework import device as tf_device
from tensorflow.python.framework import dtypes
'Variable %s missing in checkpoint %s', var, model_path)
var_list = available_vars
if var_list:
- saver = tf_saver.Saver(var_list, reshape=reshape_variables)
+ saver = tf_saver.Saver(var_list, reshape=reshape_variables,
+ write_version=saver_pb2.SaverDef.V1)
def callback(session):
saver.restore(session, model_path)
return callback
from __future__ import print_function
import functools
+import os
import sys
import tarfile
return graph_pb2.GraphDef.FromString(resource_loader.load_resource(filename))
-def get_graph_def_from_url_tarball(url, filename):
- """Get a GraphDef proto from a tarball on the web."""
- def _progress(count, block_size, total_size):
- sys.stdout.write('\r>> Downloading %s %.1f%%' % (
- url, float(count * block_size) / float(total_size) * 100.0))
- sys.stdout.flush()
- tar_filename, _ = urllib.request.urlretrieve(url, reporthook=_progress)
+def get_graph_def_from_url_tarball(url, filename, tar_filename=None):
+ """Get a GraphDef proto from a tarball on the web.
+
+ Args:
+ url: Web address of tarball
+ filename: Filename of graph definition within tarball
+ tar_filename: Temporary download filename (None = always download)
+
+ Returns:
+ A GraphDef loaded from a file in the downloaded tarball.
+ """
+ if not (tar_filename and os.path.exists(tar_filename)):
+
+ def _progress(count, block_size, total_size):
+ sys.stdout.write('\r>> Downloading %s %.1f%%' %
+ (url,
+ float(count * block_size) / float(total_size) * 100.0))
+ sys.stdout.flush()
+
+ tar_filename, _ = urllib.request.urlretrieve(url, tar_filename, _progress)
with tarfile.open(tar_filename, 'r:gz') as tar:
proto_str = tar.extractfile(filename).read()
return graph_pb2.GraphDef.FromString(proto_str)
def _default_graph_def_fn():
- return get_graph_def_from_url_tarball(INCEPTION_URL, INCEPTION_FROZEN_GRAPH)
+ return get_graph_def_from_url_tarball(INCEPTION_URL, INCEPTION_FROZEN_GRAPH,
+ os.path.basename(INCEPTION_URL))
def run_inception(images,
with self.test_session(use_gpu=True) as sess:
for _ in range(10): # spot check closeness on more than one sample.
gnorm_np, precond_gnorm_np = sess.run([gnorm, precond_gnorm])
- self.assertNear(gnorm_np, precond_gnorm_np, 1e-5)
+ self.assertNear(gnorm_np, precond_gnorm_np, 1e-4)
class CycleConsistencyLossTest(test.TestCase):
# TensorFlow Runtime with HVX Acceleration
-## Description
+This README explain how to build and use the TensorFlow runtime with HVX Acceleration. HVX is an extension of Hexagon, a DSP provided by Qualcomm, which can compute vector calculations faster using less energy than ARM processors.
-This README explain how to build and use the TensorFlow Runtime with HVX Acceleration. HVX is an extension of Hexagon which is a DSP provided by qualcomm which can compute vector calculations faster using lower energy than ARM processors.
+## Dependencies
+
+* [Android SDK](https://developer.android.com/studio/index.html).
+* [Android NDK](https://developer.android.com/ndk/index.html). Save the path in `${NDK_ROOT}`.
+* A rooted Qualcomm-based Android device connected to the computer (preferably, a [Snapdragon Development Board](https://developer.qualcomm.com/hardware/additional-snapdragon), but it could be a rooted phone with a Qualcomm SoC, albeit this guide may not work with it). The device needs to be rooted for development and testing purposes, and shouldn't be needed in production. See [Behold, The Snapdragon MDP](https://developer.qualcomm.com/blog/behold-snapdragon-mdp) for more information.
+* [Hexagon SDK v3.0](https://developer.qualcomm.com/software/hexagon-dsp-sdk/tools). Save the path in `${QUALCOMM_SDK}`.
+* The current directory should be TensorFlow source code (`git clone https://github.com/tensorflow/tensorflow.git && cd tensorflow`), and saved into `${TF_ROOT_DIR}`.
+
+You may also need to add a test signature in the device to run HVX-based binaries. Follow the instructions in `${QUALCOMM_SDK}/docs/Tools_Signing.html`, using Python 2.
+
+Note that if the device is not rooted, you may not be able to get the serial number, push the test signature and/or run binary files that call HVX libraries.
## Quick Start Guide
-We provides several tools to build and run inference with this runtime quickly.
+We provide several tools to build and run inference with this runtime quickly.
-#### All-in-one script to run inception model with prebuild hexagon library
-If you don’t need to build your own implementation of hexagon HVX, we provide a shortcut to execute graphs by using pre-compiled binaries.
+### Run inception model with a prebuilt Hexagon library
+If you don’t need to build your own implementation of Hexagon HVX, we provide a shortcut to execute graphs by using pre-compiled binaries.
+
+```shell
+./tensorflow/contrib/makefile/samples/build_and_run_inception_hexagon.sh -p
```
-git clone https://github.com/tensorflow/tensorflow.git
-cd tensorflow
-NDK_ROOT="/path/to/ndk" ./tensorflow/contrib/makefile/build_all_android.sh -X
-```
-(-X downloads dependencies to hexagon HVX and graphs, and copy all dependencies to android and execute a test)
-#### All-in-one script to run inception model by building entire libraries from source code
- If you want to build your own implementation of hexagon HVX, we provide a sample all-in-one script to execute graphs which downloads source and build everything for hexagon.
+The `-p` option makes the script download dependencies (i.e., Hexagon HVX binaries and graphs models), copy them to the Android device and execute a test.
-```
-git clone https://github.com/tensorflow/tensorflow.git
-cd tensorflow
-QUALCOMM_SDK="/path/to/qualcomm/sdk" NDK_ROOT="/path/to/ndk" ./tensorflow/contrib/makefile/samples/build_and_run_inception_hexagon.sh
+### Run inception model by building all from the source code
+
+If you want to build your own implementation of Hexagon HVX, we provide a sample all-in-one script to execute graphs which downloads the source and builds everything that's necessary.
+
+```shell
+./tensorflow/contrib/makefile/samples/build_and_run_inception_hexagon.sh
```
## Building libraries
If you've finished walking through the quick start guide, you may want to try building each binary manually.
-#### Build libhexagon_nn_skel.so
-Download hexagon nn library from codeaurora.org and build it.
+### Build libhexagon\_nn\_skel.so
-```
+Download Hexagon NN library from codeaurora.org and build it.
+
+```shell
git clone https://source.codeaurora.org/quic/hexagon_nn/nnlib
cd nnlib
```
-(Just follow instructions in README.HOW_TO_BUILD. You can find libhexagon_nn_skel.so in hexagon_Release_dynamic_toolv72_v60/ship)
-Then copy the generated binary to GEN_LIBS_DIR
+Just follow the instructions in `README.HOW_TO_BUILD`. You can find the file `libhexagon_nn_skel.so` in `hexagon_Release_dynamic_toolv72_v60/ship`.
+Then copy the generated binary to `${GEN_LIBS_DIR}`.
-```
+```shell
GEN_LIBS_DIR="/path/to/a/dir/to/store/hexagon/libraries"
cp -v "hexagon_Release_dynamic_toolv72_v60/ship/libhexagon_nn_skel.so" "${GEN_LIBS_DIR}"
```
-#### Build libhexagon_controller.so
+### Build libhexagon\_controller.so
+
Download tensorflow and build hexagon controller.
-```
-git clone https://github.com/tensorflow/tensorflow.git
-cd tensorflow
-TF_ROOT_DIR="$(pwd)"
-QUALCOMM_SDK="/path/to/qualcomm/sdk"
+```shell
GENERATED_NNLIB_DIRECTORY="/path/to/nnlib"
GENERATED_HEXAGON_CONTROLLER_DIRECTORY="${QUALCOMM_SDK}/examples/common/generated_hexagon_controller"
rm -rf "${GENERATED_HEXAGON_CONTROLLER_DIRECTORY}"
cp -v "${GENERATED_HEXAGON_CONTROLLER_DIRECTORY}/android_Release/ship/libhexagon_controller.so" "${GEN_LIBS_DIR}"
```
-#### Build tensorflow linking hexagon library
-Build tensorflow with the build_all_android.sh with specifying -x option.
+### Build TensorFlow linking Hexagon library
-```
+Build TensorFlow with `build_all_android.sh` specifying the `-x` option.
+
+```shell
BUILD_ALL_ANDROID_PATH="${TF_ROOT_DIR}/tensorflow/contrib/makefile/build_all_android.sh"
-NDK_ROOT="/path/to/ndk/root"
CC_PREFIX=${CC_PREFIX} NDK_ROOT=${NDK_ROOT} "${BUILD_ALL_ANDROID_PATH}" \
-x "${GEN_LIBS_DIR}" \
-t hexagon_graph_execution
```
-#### Push binaries to your Android device
+### Push binaries to your Android device
Before running tests on your Android device, you need to push several binaries to it.
-```
+```shell
adb push "${GEN_LIBS_DIR}/libhexagon_controller.so" "/data/local/tmp"
adb push "${GEN_LIBS_DIR}/libhexagon_nn_skel.so" "/vendor/lib/rfsa/adsp"
adb push -p \
adb wait-for-device
```
-#### Run tests on the device
+### Run tests on the device
Finally, you can run the inference tests on your device.
-```
+```shell
adb shell 'LD_LIBRARY_PATH=/data/local/tmp:$LD_LIBRARY_PATH' \
"/data/local/tmp/hexagon_graph_execution"
```
-#### Troubleshooting
-If you're using the Open-Q 820 Snapdragon development kit, you may run into an issue with running the executable due to a missing testsig library. From the Hexagon SDK documentation: *Dynamic shared objects are required to be digitally signed and then authenticated at runtime before they are allowed to be loaded and executed.* Generating a testsig library is necessary to run the unsigned sample library built from this project.
+### Troubleshooting
+
+#### Testsig issue
+
+If you're using the Open-Q 820 Snapdragon Development Kit, you may run into an issue with running the executable due to a missing `testsig` library. From the Hexagon SDK documentation: *Dynamic shared objects are required to be digitally signed and then authenticated at runtime before they are allowed to be loaded and executed.* Generating a testsig library is necessary to run the unsigned sample library built from this project.
-If the lack of a testsig library is your problem, you will see errors of the type:
+If the lack of a `testsig` library is your problem, you will see errors of the type:
`vendor/qcom/proprietary/adsprpc/src/fastrpc_apps_user.c:169::error: -1: 0 == (nErr = remotectl_open(name, (int*)ph, dlerrstr, sizeof(dlerrstr), &dlerr))`
-appearing in adb logcat.
-
-There are several ways to create the testsig library, the only prerequisite is Python and the correct version of the Hexagon-SDK. The following steps is one way to create this library:
-1. Run adb as root: `adb root`
-2. Run the command `adb shell cat /sys/devices/soc0/serial_number`
-3. Convert the decimal number you get as output to hex
-4. Run the python script: `python ${QUALCOMM_SDK}/tools/elfsigner/elfsigner.py -t $(SERIAL_NUMBER_HEX_VALUE)`
-5. The output of the python script is a shared library stored in ${QUALCOMM_SDK}/tools/elfsigner/output/testsig-$(SERIAL_NUMBER_HEX_VALUE).so
-6. Push the shared library to your device:
+appearing in `adb logcat` or ["Expected: (version) >= (1), actual: 0 vs 1" while running a binary from adb](https://github.com/tensorflow/tensorflow/issues/11210).
+
+You need to add a test signature, as described at the beginning of this README. After rebooting your device, you should be able to run the sample application.
+
+#### Qualcomm SDK Linux installation fails with "Malformed \uxxxx encoding"
+
+The installation file is based on LaunchAnywhere, which fails in Linux if the `PS1` env variable contains non-common Unicode chars:
+
```
-adb root
-adb wait-for-device
-adb remount
-adb wait-for-device
-adb shell mkdir /system/lib/rfsa
-adb shell mkdir /system/lib/rfsa/adsp
-adb push ${QUALCOMM_SDK}/tools/elfsigner/output/testsig-$(SERIAL_NUMBER_HEX_VALUE).so /system/lib/rfsa/adsp/
+Preparing to install...
+Extracting the JRE from the installer archive...
+Unpacking the JRE...
+Extracting the installation resources from the installer archive...
+Configuring the installer for this system's environment...
+
+Launching installer...
+
+An internal LaunchAnywhere application error has occurred and this application cannot proceed. (LAX)
+
+Stack Trace:
+java.lang.IllegalArgumentException: Malformed \uxxxx encoding.
+ at java.util.Properties.loadConvert(Properties.java:574)
+ at java.util.Properties.load0(Properties.java:391)
+ at java.util.Properties.load(Properties.java:317)
+ at com.zerog.common.java.util.PropertiesUtil.loadProperties(Unknown Source)
+ at com.zerog.lax.LAX.<init>(Unknown Source)
+ at com.zerog.lax.LAX.main(Unknown Source)
```
-After rebooting your device, you should be able to run the sample application.
+It can be solved by temporarily assigning the `PS1` environment variable to something simple, such as '$'.
+
+## Maintainers
-Maintainers:
-- Satoshi Kataoka (satok@google.com, github.com/satok16)
+* Satoshi Kataoka (satok@google.com, github.com/satok16)
--- /dev/null
+package(
+ default_visibility = ["//visibility:private"],
+)
+
+licenses(["notice"]) # Apache 2.0
+
+exports_files(["LICENSE"])
+
+load("//tensorflow:tensorflow.bzl", "tf_gen_op_libs")
+load("//tensorflow:tensorflow.bzl", "tf_gen_op_wrapper_py")
+load("//tensorflow:tensorflow.bzl", "tf_kernel_library")
+load("//tensorflow:tensorflow.bzl", "tf_py_test")
+
+tf_kernel_library(
+ name = "kafka_kernels",
+ srcs = ["kernels/kafka_dataset_ops.cc"],
+ visibility = ["//visibility:public"],
+ deps = [
+ "//tensorflow/core:framework",
+ "//tensorflow/core:lib",
+ "//tensorflow/core:lib_internal",
+ "//tensorflow/core/kernels:bounds_check_lib",
+ "//tensorflow/core/kernels:dataset",
+ "//third_party/eigen3",
+ "@kafka",
+ ],
+)
+
+tf_gen_op_libs(
+ op_lib_names = ["kafka_ops"],
+ deps = [
+ "//tensorflow/core:lib",
+ ],
+)
+
+tf_gen_op_wrapper_py(
+ name = "gen_kafka_ops",
+ out = "python/ops/gen_kafka_ops.py",
+ require_shape_functions = True,
+ deps = [":kafka_ops_op_lib"],
+)
+
+py_library(
+ name = "kafka",
+ srcs = [
+ "__init__.py",
+ "python/ops/kafka_dataset_ops.py",
+ ],
+ srcs_version = "PY2AND3",
+ visibility = ["//visibility:public"],
+ deps = [
+ ":gen_kafka_ops",
+ "//tensorflow/contrib/util:util_py",
+ "//tensorflow/python:array_ops",
+ "//tensorflow/python:control_flow_ops",
+ "//tensorflow/python:framework",
+ "//tensorflow/python:framework_for_generated_wrappers",
+ "//tensorflow/python:platform",
+ "//tensorflow/python:state_ops",
+ "//tensorflow/python:training",
+ "//tensorflow/python/data/ops:dataset_ops",
+ "//tensorflow/python/data/ops:iterator_ops",
+ "//tensorflow/python/data/ops:readers",
+ ],
+)
+
+# The Kafka server has to be setup before running the test.
+# The Kafka server is setup through Docker so the Docker engine
+# has to be installed.
+#
+# Once the Docker engine is ready:
+# To setup the Kafka server:
+# $ bash tensorflow/contrib/kafka/python/kernel_tests/kafka_test.sh start kafka
+#
+# After the test is complete:
+# To team down the Kafka server:
+# $ bash tensorflow/contrib/kafka/python/kernel_tests/kafka_test.sh stop kafka
+tf_py_test(
+ name = "kafka_test",
+ srcs = ["python/kernel_tests/kafka_test.py"],
+ additional_deps = [
+ ":kafka",
+ "//third_party/py/numpy",
+ "//tensorflow/python:client_testlib",
+ "//tensorflow/python:framework",
+ "//tensorflow/python:framework_test_lib",
+ "//tensorflow/python:platform_test",
+ ],
+ tags = [
+ "manual",
+ "notap",
+ ],
+)
+
+filegroup(
+ name = "all_files",
+ srcs = glob(
+ ["**/*"],
+ exclude = [
+ "**/METADATA",
+ "**/OWNERS",
+ ],
+ ),
+ visibility = ["//tensorflow:__subpackages__"],
+)
--- /dev/null
+# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Kafka Dataset.
+
+@@KafkaDataset
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.contrib.kafka.python.ops.kafka_dataset_ops import KafkaDataset
+
+from tensorflow.python.util.all_util import remove_undocumented
+
+_allowed_symbols = [
+ "KafkaDataset",
+]
+
+remove_undocumented(__name__)
--- /dev/null
+/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#include "tensorflow/core/kernels/dataset.h"
+
+#include "tensorflow/core/framework/tensor.h"
+
+#include "src-cpp/rdkafkacpp.h"
+
+namespace tensorflow {
+
+class KafkaDatasetOp : public DatasetOpKernel {
+ public:
+ using DatasetOpKernel::DatasetOpKernel;
+
+ void MakeDataset(OpKernelContext* ctx, DatasetBase** output) override {
+ const Tensor* topics_tensor;
+ OP_REQUIRES_OK(ctx, ctx->input("topics", &topics_tensor));
+ OP_REQUIRES(
+ ctx, topics_tensor->dims() <= 1,
+ errors::InvalidArgument("`topics` must be a scalar or a vector."));
+
+ std::vector<string> topics;
+ topics.reserve(topics_tensor->NumElements());
+ for (int i = 0; i < topics_tensor->NumElements(); ++i) {
+ topics.push_back(topics_tensor->flat<string>()(i));
+ }
+
+ std::string servers = "";
+ OP_REQUIRES_OK(ctx,
+ ParseScalarArgument<std::string>(ctx, "servers", &servers));
+ std::string group = "";
+ OP_REQUIRES_OK(ctx, ParseScalarArgument<std::string>(ctx, "group", &group));
+ bool eof = false;
+ OP_REQUIRES_OK(ctx, ParseScalarArgument<bool>(ctx, "eof", &eof));
+ int64 timeout = -1;
+ OP_REQUIRES_OK(ctx, ParseScalarArgument<int64>(ctx, "timeout", &timeout));
+ OP_REQUIRES(ctx, (timeout > 0),
+ errors::InvalidArgument(
+ "Timeout value should be large than 0, got ", timeout));
+ *output = new Dataset(ctx, std::move(topics), servers, group, eof, timeout);
+ }
+
+ private:
+ class Dataset : public GraphDatasetBase {
+ public:
+ Dataset(OpKernelContext* ctx, std::vector<string> topics,
+ const string& servers, const string& group, const bool eof,
+ const int64 timeout)
+ : GraphDatasetBase(ctx),
+ topics_(std::move(topics)),
+ servers_(servers),
+ group_(group),
+ eof_(eof),
+ timeout_(timeout) {}
+
+ std::unique_ptr<IteratorBase> MakeIterator(
+ const string& prefix) const override {
+ return std::unique_ptr<IteratorBase>(
+ new Iterator({this, strings::StrCat(prefix, "::Kafka")}));
+ }
+
+ const DataTypeVector& output_dtypes() const override {
+ static DataTypeVector* dtypes = new DataTypeVector({DT_STRING});
+ return *dtypes;
+ }
+
+ const std::vector<PartialTensorShape>& output_shapes() const override {
+ static std::vector<PartialTensorShape>* shapes =
+ new std::vector<PartialTensorShape>({{}});
+ return *shapes;
+ }
+
+ string DebugString() override { return "KafkaDatasetOp::Dataset"; }
+
+ protected:
+ Status AsGraphDefInternal(DatasetGraphDefBuilder* b,
+ Node** output) const override {
+ Node* topics = nullptr;
+ TF_RETURN_IF_ERROR(b->AddVector(topics_, &topics));
+ Node* servers = nullptr;
+ TF_RETURN_IF_ERROR(b->AddScalar(servers_, &servers));
+ Node* group = nullptr;
+ TF_RETURN_IF_ERROR(b->AddScalar(group_, &group));
+ Node* eof = nullptr;
+ TF_RETURN_IF_ERROR(b->AddScalar(eof_, &eof));
+ Node* timeout = nullptr;
+ TF_RETURN_IF_ERROR(b->AddScalar(timeout_, &timeout));
+ TF_RETURN_IF_ERROR(
+ b->AddDataset(this, {topics, servers, group, eof, timeout}, output));
+ return Status::OK();
+ }
+
+ private:
+ class Iterator : public DatasetIterator<Dataset> {
+ public:
+ explicit Iterator(const Params& params)
+ : DatasetIterator<Dataset>(params) {}
+
+ Status GetNextInternal(IteratorContext* ctx,
+ std::vector<Tensor>* out_tensors,
+ bool* end_of_sequence) override {
+ mutex_lock l(mu_);
+ do {
+ // We are currently processing a topic, so try to read the next line.
+ if (consumer_.get()) {
+ while (true) {
+ if (limit_ >= 0 &&
+ (topic_partition_->offset() >= limit_ || offset_ >= limit_)) {
+ // EOF current topic
+ break;
+ }
+ std::unique_ptr<RdKafka::Message> message(
+ consumer_->consume(dataset()->timeout_));
+ if (message->err() == RdKafka::ERR_NO_ERROR) {
+ // Produce the line as output.
+ Tensor line_tensor(cpu_allocator(), DT_STRING, {});
+ line_tensor.scalar<string>()() =
+ std::string(static_cast<const char*>(message->payload()),
+ message->len());
+ out_tensors->emplace_back(std::move(line_tensor));
+ *end_of_sequence = false;
+ // Sync offset
+ offset_ = message->offset();
+ return Status::OK();
+ }
+
+ if (message->err() == RdKafka::ERR__PARTITION_EOF &&
+ dataset()->eof_) {
+ // EOF current topic
+ break;
+ }
+ if (message->err() != RdKafka::ERR__TIMED_OUT) {
+ return errors::Internal("Failed to consume:",
+ message->errstr());
+ }
+ message.reset(nullptr);
+ consumer_->poll(0);
+ }
+
+ // We have reached the end of the current topic, so maybe
+ // move on to next topic.
+ ResetStreamsLocked();
+ ++current_topic_index_;
+ }
+
+ // Iteration ends when there are no more topic to process.
+ if (current_topic_index_ == dataset()->topics_.size()) {
+ *end_of_sequence = true;
+ return Status::OK();
+ }
+
+ TF_RETURN_IF_ERROR(SetupStreamsLocked(ctx->env()));
+ } while (true);
+ }
+
+ protected:
+ Status SaveInternal(IteratorStateWriter* writer) override {
+ mutex_lock l(mu_);
+ TF_RETURN_IF_ERROR(writer->WriteScalar(full_name("current_topic_index"),
+ current_topic_index_));
+
+ // `consumer_` is empty if
+ // 1. GetNext has not been called even once.
+ // 2. All topics have been read and iterator has been exhausted.
+ if (consumer_.get()) {
+ TF_RETURN_IF_ERROR(
+ writer->WriteScalar(full_name("current_pos"), offset_));
+ }
+ return Status::OK();
+ }
+
+ Status RestoreInternal(IteratorContext* ctx,
+ IteratorStateReader* reader) override {
+ mutex_lock l(mu_);
+ ResetStreamsLocked();
+ int64 current_topic_index;
+ TF_RETURN_IF_ERROR(reader->ReadScalar(full_name("current_topic_index"),
+ ¤t_topic_index));
+ current_topic_index_ = size_t(current_topic_index);
+ // The key "current_pos" is written only if the iterator was saved
+ // with an open topic.
+ if (reader->Contains(full_name("current_pos"))) {
+ int64 current_pos;
+ TF_RETURN_IF_ERROR(
+ reader->ReadScalar(full_name("current_pos"), ¤t_pos));
+
+ TF_RETURN_IF_ERROR(SetupStreamsLocked(ctx->env()));
+ topic_partition_->set_offset(current_pos);
+ if (topic_partition_->offset() != current_pos) {
+ return errors::Internal("Failed to restore to offset ",
+ current_pos);
+ }
+ offset_ = current_pos;
+ }
+ return Status::OK();
+ }
+
+ private:
+ // Sets up Kafka streams to read from the topic at
+ // `current_topic_index_`.
+ Status SetupStreamsLocked(Env* env) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+ if (current_topic_index_ >= dataset()->topics_.size()) {
+ return errors::InvalidArgument(
+ "current_topic_index_:", current_topic_index_,
+ " >= topics_.size():", dataset()->topics_.size());
+ }
+
+ // Actually move on to next topic.
+ string entry = dataset()->topics_[current_topic_index_];
+
+ std::vector<string> parts = str_util::Split(entry, ":");
+ if (parts.size() < 1) {
+ return errors::InvalidArgument("Invalid parameters: ", entry);
+ }
+ string topic = parts[0];
+ int32 partition = 0;
+ if (parts.size() > 1) {
+ if (!strings::safe_strto32(parts[1], &partition)) {
+ return errors::InvalidArgument("Invalid parameters: ", entry);
+ }
+ }
+ int64 offset = 0;
+ if (parts.size() > 2) {
+ if (!strings::safe_strto64(parts[2], &offset)) {
+ return errors::InvalidArgument("Invalid parameters: ", entry);
+ }
+ }
+
+ topic_partition_.reset(
+ RdKafka::TopicPartition::create(topic, partition, offset));
+
+ offset_ = topic_partition_->offset();
+ limit_ = -1;
+ if (parts.size() > 3) {
+ if (!strings::safe_strto64(parts[3], &limit_)) {
+ return errors::InvalidArgument("Invalid parameters: ", entry);
+ }
+ }
+
+ std::unique_ptr<RdKafka::Conf> conf(
+ RdKafka::Conf::create(RdKafka::Conf::CONF_GLOBAL));
+ std::unique_ptr<RdKafka::Conf> topic_conf(
+ RdKafka::Conf::create(RdKafka::Conf::CONF_TOPIC));
+
+ std::string errstr;
+
+ RdKafka::Conf::ConfResult result =
+ conf->set("default_topic_conf", topic_conf.get(), errstr);
+ if (result != RdKafka::Conf::CONF_OK) {
+ return errors::Internal("Failed to set default_topic_conf:", errstr);
+ }
+
+ result = conf->set("bootstrap.servers", dataset()->servers_, errstr);
+ if (result != RdKafka::Conf::CONF_OK) {
+ return errors::Internal("Failed to set bootstrap.servers ",
+ dataset()->servers_, ":", errstr);
+ }
+ result = conf->set("group.id", dataset()->group_, errstr);
+ if (result != RdKafka::Conf::CONF_OK) {
+ return errors::Internal("Failed to set group.id ", dataset()->group_,
+ ":", errstr);
+ }
+
+ consumer_.reset(RdKafka::KafkaConsumer::create(conf.get(), errstr));
+ if (!consumer_.get()) {
+ return errors::Internal("Failed to create consumer:", errstr);
+ }
+
+ std::vector<RdKafka::TopicPartition*> partitions;
+ partitions.emplace_back(topic_partition_.get());
+ RdKafka::ErrorCode err = consumer_->assign(partitions);
+ if (err != RdKafka::ERR_NO_ERROR) {
+ return errors::Internal(
+ "Failed to assign partition [", topic_partition_->topic(), ", ",
+ topic_partition_->partition(), ", ", topic_partition_->offset(),
+ "]:", RdKafka::err2str(err));
+ }
+
+ return Status::OK();
+ }
+
+ // Resets all Kafka streams.
+ void ResetStreamsLocked() EXCLUSIVE_LOCKS_REQUIRED(mu_) {
+ consumer_->unassign();
+ consumer_->close();
+ consumer_.reset(nullptr);
+ }
+
+ mutex mu_;
+ size_t current_topic_index_ GUARDED_BY(mu_) = 0;
+ int64 offset_ GUARDED_BY(mu_) = 0;
+ int64 limit_ GUARDED_BY(mu_) = -1;
+ std::unique_ptr<RdKafka::TopicPartition> topic_partition_ GUARDED_BY(mu_);
+ std::unique_ptr<RdKafka::KafkaConsumer> consumer_ GUARDED_BY(mu_);
+ };
+
+ const std::vector<string> topics_;
+ const std::string servers_;
+ const std::string group_;
+ const bool eof_;
+ const int64 timeout_;
+ };
+};
+
+REGISTER_KERNEL_BUILDER(Name("KafkaDataset").Device(DEVICE_CPU),
+ KafkaDatasetOp);
+
+} // namespace tensorflow
--- /dev/null
+/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#include "tensorflow/core/framework/common_shape_fns.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/shape_inference.h"
+
+namespace tensorflow {
+
+REGISTER_OP("KafkaDataset")
+ .Input("topics: string")
+ .Input("servers: string")
+ .Input("group: string")
+ .Input("eof: bool")
+ .Input("timeout: int64")
+ .Output("handle: variant")
+ .SetIsStateful()
+ .SetShapeFn(shape_inference::ScalarShape)
+ .Doc(R"doc(
+Creates a dataset that emits the messages of one or more Kafka topics.
+
+topics: A `tf.string` tensor containing one or more subscriptions,
+ in the format of [topic:partition:offset:length],
+ by default length is -1 for unlimited.
+servers: A list of bootstrap servers.
+group: The consumer group id.
+eof: If True, the kafka reader will stop on EOF.
+timeout: The timeout value for the Kafka Consumer to wait
+ (in millisecond).
+)doc");
+
+} // namespace tensorflow
--- /dev/null
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License"); you may not
+# use this file except in compliance with the License. You may obtain a copy of
+# the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+# License for the specific language governing permissions and limitations under
+# the License.
+# ==============================================================================
+"""Tests for KafkaDataset."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.contrib.kafka.python.ops import kafka_dataset_ops
+from tensorflow.python.data.ops import iterator_ops
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import errors
+from tensorflow.python.ops import array_ops
+from tensorflow.python.platform import test
+
+
+class KafkaDatasetTest(test.TestCase):
+
+ def setUp(self):
+ # The Kafka server has to be setup before the test
+ # and tear down after the test manually.
+ # The docker engine has to be installed.
+ #
+ # To setup the Kafka server:
+ # $ bash kafka_test.sh start kafka
+ #
+ # To team down the Kafka server:
+ # $ bash kafka_test.sh stop kafka
+ pass
+
+ def testKafkaDataset(self):
+ topics = array_ops.placeholder(dtypes.string, shape=[None])
+ num_epochs = array_ops.placeholder(dtypes.int64, shape=[])
+ batch_size = array_ops.placeholder(dtypes.int64, shape=[])
+
+ repeat_dataset = kafka_dataset_ops.KafkaDataset(
+ topics, group="test", eof=True).repeat(num_epochs)
+ batch_dataset = repeat_dataset.batch(batch_size)
+
+ iterator = iterator_ops.Iterator.from_structure(batch_dataset.output_types)
+ init_op = iterator.make_initializer(repeat_dataset)
+ init_batch_op = iterator.make_initializer(batch_dataset)
+ get_next = iterator.get_next()
+
+ with self.test_session() as sess:
+ # Basic test: read from topic 0.
+ sess.run(init_op, feed_dict={topics: ["test:0:0:4"], num_epochs: 1})
+ for i in range(5):
+ self.assertEqual("D" + str(i), sess.run(get_next))
+ with self.assertRaises(errors.OutOfRangeError):
+ sess.run(get_next)
+
+ # Basic test: read from topic 1.
+ sess.run(init_op, feed_dict={topics: ["test:0:5:-1"], num_epochs: 1})
+ for i in range(5):
+ self.assertEqual("D" + str(i + 5), sess.run(get_next))
+ with self.assertRaises(errors.OutOfRangeError):
+ sess.run(get_next)
+
+ # Basic test: read from both topics.
+ sess.run(
+ init_op,
+ feed_dict={
+ topics: ["test:0:0:4", "test:0:5:-1"],
+ num_epochs: 1
+ })
+ for j in range(2):
+ for i in range(5):
+ self.assertEqual("D" + str(i + j * 5), sess.run(get_next))
+ with self.assertRaises(errors.OutOfRangeError):
+ sess.run(get_next)
+
+ # Test repeated iteration through both files.
+ sess.run(
+ init_op,
+ feed_dict={
+ topics: ["test:0:0:4", "test:0:5:-1"],
+ num_epochs: 10
+ })
+ for _ in range(10):
+ for j in range(2):
+ for i in range(5):
+ self.assertEqual("D" + str(i + j * 5), sess.run(get_next))
+ with self.assertRaises(errors.OutOfRangeError):
+ sess.run(get_next)
+
+ # Test batched and repeated iteration through both files.
+ sess.run(
+ init_batch_op,
+ feed_dict={
+ topics: ["test:0:0:4", "test:0:5:-1"],
+ num_epochs: 10,
+ batch_size: 5
+ })
+ for _ in range(10):
+ self.assertAllEqual(["D" + str(i) for i in range(5)],
+ sess.run(get_next))
+ self.assertAllEqual(["D" + str(i + 5) for i in range(5)],
+ sess.run(get_next))
+
+
+if __name__ == "__main__":
+ test.main()
--- /dev/null
+#!/usr/bin/env bash
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+
+set -e
+set -o pipefail
+
+if [ "$#" -ne 2 ]; then
+ echo "Usage: $0 start|stop <kafka container name>" >&2
+ exit 1
+fi
+
+container=$2
+if [ "$1" == "start" ]; then
+ docker run -d --rm --net=host --name=$container spotify/kafka
+ echo Wait 5 secs until kafka is up and running
+ sleep 5
+ echo Create test topic
+ docker exec $container bash -c '/opt/kafka_2.11-0.10.1.0/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test'
+ echo Create test message
+ docker exec $container bash -c 'echo -e "D0\nD1\nD2\nD3\nD4\nD5\nD6\nD7\nD8\nD9" > /test'
+ echo Produce test message
+ docker exec $container bash -c '/opt/kafka_2.11-0.10.1.0/bin/kafka-console-producer.sh --topic test --broker-list 127.0.0.1:9092 < /test'
+
+ echo Container $container started successfully
+elif [ "$1" == "stop" ]; then
+ docker rm -f $container
+
+ echo Container $container stopped successfully
+else
+ echo "Usage: $0 start|stop <kafka container name>" >&2
+ exit 1
+fi
+
+
+
--- /dev/null
+# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Kafka Dataset."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.contrib.kafka.python.ops import gen_kafka_ops
+from tensorflow.python.data.ops.readers import Dataset
+from tensorflow.python.framework import dtypes
+from tensorflow.python.framework import ops
+from tensorflow.python.framework import tensor_shape
+
+
+class KafkaDataset(Dataset):
+ """A Kafka Dataset that consumes the message.
+ """
+
+ def __init__(self,
+ topics,
+ servers="localhost",
+ group="",
+ eof=False,
+ timeout=1000):
+ """Create a KafkaReader.
+
+ Args:
+ topics: A `tf.string` tensor containing one or more subscriptions,
+ in the format of [topic:partition:offset:length],
+ by default length is -1 for unlimited.
+ servers: A list of bootstrap servers.
+ group: The consumer group id.
+ eof: If True, the kafka reader will stop on EOF.
+ timeout: The timeout value for the Kafka Consumer to wait
+ (in millisecond).
+ """
+ super(KafkaDataset, self).__init__()
+ self._topics = ops.convert_to_tensor(
+ topics, dtype=dtypes.string, name="topics")
+ self._servers = ops.convert_to_tensor(
+ servers, dtype=dtypes.string, name="servers")
+ self._group = ops.convert_to_tensor(
+ group, dtype=dtypes.string, name="group")
+ self._eof = ops.convert_to_tensor(eof, dtype=dtypes.bool, name="eof")
+ self._timeout = ops.convert_to_tensor(
+ timeout, dtype=dtypes.int64, name="timeout")
+
+ def _as_variant_tensor(self):
+ return gen_kafka_ops.kafka_dataset(self._topics, self._servers, self._group,
+ self._eof, self._timeout)
+
+ @property
+ def output_classes(self):
+ return ops.Tensor
+
+ @property
+ def output_shapes(self):
+ return tensor_shape.scalar()
+
+ @property
+ def output_types(self):
+ return dtypes.string
@@convolution2d_transpose
@@conv3d_transpose
@@convolution3d_transpose
+@@dense_to_sparse
@@dropout
@@elu
@@embedding_lookup_unique
from tensorflow.contrib.layers.python.layers import initializers
from tensorflow.contrib.layers.python.layers import utils
from tensorflow.python.eager import context
+from tensorflow.python.framework import constant_op
from tensorflow.python.framework import dtypes
from tensorflow.python.framework import function
from tensorflow.python.framework import ops
'avg_pool2d', 'avg_pool3d', 'batch_norm', 'bias_add', 'conv2d', 'conv3d',
'conv2d_in_plane', 'conv2d_transpose', 'conv3d_transpose', 'convolution',
'convolution2d', 'convolution2d_in_plane', 'convolution2d_transpose',
- 'convolution3d', 'convolution3d_transpose', 'dropout', 'elu', 'flatten',
- 'fully_connected', 'GDN', 'gdn', 'layer_norm', 'linear', 'pool',
- 'max_pool2d', 'max_pool3d', 'one_hot_encoding', 'relu', 'relu6', 'repeat',
- 'scale_gradient', 'separable_conv2d', 'separable_convolution2d', 'softmax',
- 'spatial_softmax', 'stack', 'unit_norm', 'legacy_fully_connected',
- 'legacy_linear', 'legacy_relu', 'maxout'
+ 'convolution3d', 'convolution3d_transpose', 'dense_to_sparse', 'dropout',
+ 'elu', 'flatten', 'fully_connected', 'GDN', 'gdn', 'layer_norm', 'linear',
+ 'pool', 'max_pool2d', 'max_pool3d', 'one_hot_encoding', 'relu', 'relu6',
+ 'repeat', 'scale_gradient', 'separable_conv2d', 'separable_convolution2d',
+ 'softmax', 'spatial_softmax', 'stack', 'unit_norm',
+ 'legacy_fully_connected', 'legacy_linear', 'legacy_relu', 'maxout'
]
DATA_FORMAT_NCHW = 'NCHW'
return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
+@add_arg_scope
+def dense_to_sparse(tensor, eos_token=0, outputs_collections=None, scope=None):
+ """Converts a dense tensor into a sparse tensor.
+ An example use would be to convert dense labels to sparse ones
+ so that they can be fed to the ctc_loss.
+
+ Args:
+ tensor: An `int` `Tensor` to be converted to a `Sparse`.
+ eos_token: An integer.
+ It is part of the target label that signfies the end of a sentence.
+ outputs_collections: Collection to add the outputs.
+ scope: Optional scope for name_scope.
+ """
+ with variable_scope.variable_scope(scope, 'dense_to_sparse', [tensor]) as sc:
+ tensor = ops.convert_to_tensor(tensor)
+ indices = array_ops.where(
+ math_ops.not_equal(tensor, constant_op.constant(eos_token,
+ tensor.dtype)))
+ values = array_ops.gather_nd(tensor, indices)
+ shape = array_ops.shape(tensor, out_type=dtypes.int64)
+ outputs = sparse_tensor.SparseTensor(indices, values, shape)
+ return utils.collect_named_outputs(outputs_collections, sc.name, outputs)
+
+
@add_arg_scope
def dropout(inputs,
keep_prob=0.5,
from tensorflow.python.ops import nn_ops
from tensorflow.python.ops import partitioned_variables
from tensorflow.python.ops import random_ops
+from tensorflow.python.ops import sparse_ops
from tensorflow.python.ops import state_ops
from tensorflow.python.ops import template
from tensorflow.python.ops import variable_scope
self.assertAllClose(result, expected, rtol=1e-5, atol=1e-5)
+class DenseToSparseTest(test.TestCase):
+
+ def testDenseFromConstantToSparse(self):
+ expected_constant = np.reshape(np.arange(24, dtype=np.int64), (3, 4, 2))
+ tensor = constant_op.constant(expected_constant)
+ sparse = _layers.dense_to_sparse(tensor)
+ dense = sparse_ops.sparse_to_dense(sparse.indices, sparse.dense_shape,
+ sparse.values)
+ with self.test_session() as sess:
+ constant = sess.run(dense)
+ self.assertAllEqual(expected_constant, constant)
+
+
class DropoutTest(test.TestCase):
def testCreateDropout(self):
# Add more points if n_samples is not divisible by n_classes (unbalanced!)
extras = n_samples % n_classes
if extras > 0:
- x_exrta, y_extra = _modes[mode](np.random.rand(extras) * 2 * np.pi, *args,
+ x_extra, y_extra = _modes[mode](np.random.rand(extras) * 2 * np.pi, *args,
**kwargs)
spir_x = np.append(spir_x, x_extra)
spir_y = np.append(spir_y, y_extra)
self.assertRaises(AssertionError, np.testing.assert_array_equal,
spir0.data, spir1.data)
+ def test_spirals_synthetic(self):
+ synthetic.spirals(3)
+
if __name__ == '__main__':
test.main()
self, predictions, expected_shape):
predictions_nparray = np.array(predictions)
self.assertAllEqual(expected_shape, predictions_nparray.shape)
- self.assertTrue(np.issubdtype(predictions_nparray.dtype, np.float))
+ self.assertTrue(np.issubdtype(predictions_nparray.dtype, np.floating))
def testPredict_AsIterableFalse(self):
"""Tests predict method with as_iterable=False."""
copts = [
"-DFARMHASH_NO_CXX_STRING",
] + select({
- "//tensorflow:android_arm64": [
+ str(Label("//tensorflow:android_arm64")): [
"-std=c++11",
"-O3",
],
- "//tensorflow:android_arm": [
+ str(Label("//tensorflow:android_arm")): [
"-mfpu=neon",
"-mfloat-abi=softfp",
"-std=c++11",
"-O3",
],
- "//tensorflow:android_x86": [
+ str(Label("//tensorflow:android_x86")): [
"-DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK",
],
- "//tensorflow:ios_x86_64": [
+ str(Label("//tensorflow:ios_x86_64")): [
"-msse4.1",
],
"//conditions:default": [],
}) + select({
- "//tensorflow:with_default_optimizations": [],
+ str(Label("//tensorflow:with_default_optimizations")): [],
"//conditions:default": ["-DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK"],
})
"bitmap_helpers_impl.h",
"label_image.h",
],
- deps = ["//tensorflow/contrib/lite:string"],
+ deps = [
+ "//tensorflow/contrib/lite:builtin_op_data",
+ "//tensorflow/contrib/lite:framework",
+ "//tensorflow/contrib/lite:schema_fbs_version",
+ "//tensorflow/contrib/lite:string",
+ "//tensorflow/contrib/lite:string_util",
+ "//tensorflow/contrib/lite/kernels:builtin_ops",
+ "//tensorflow/contrib/lite/schema:schema_fbs",
+ ],
)
# TODO(ahentz): Test disabled as it has a memory leek from read_bmp
limitations under the License.
==============================================================================*/
-#ifndef TENSORFLOW_CONTRIB_LITE_EXAMPLES_LABEL_IMAGE_BITMAP_HELPERS_H
-#define TENSORFLOW_CONTRIB_LITE_EXAMPLES_LABEL_IMAGE_BITMAP_HELPERS_H
+#ifndef TENSORFLOW_CONTRIB_LITE_EXAMPLES_LABEL_IMAGE_BITMAP_HELPERS_H_
+#define TENSORFLOW_CONTRIB_LITE_EXAMPLES_LABEL_IMAGE_BITMAP_HELPERS_H_
#include "tensorflow/contrib/lite/examples/label_image/bitmap_helpers_impl.h"
#include "tensorflow/contrib/lite/examples/label_image/label_image.h"
int* channels, Settings* s);
template <class T>
-void downsize(T* out, uint8_t* in, int image_height, int image_width,
- int image_channels, int wanted_height, int wanted_width,
- int wanted_channels, Settings* s);
+void resize(T* out, uint8_t* in, int image_height, int image_width,
+ int image_channels, int wanted_height, int wanted_width,
+ int wanted_channels, Settings* s);
// explicit instantiation
-template void downsize<uint8_t>(uint8_t*, unsigned char*, int, int, int, int,
- int, int, Settings*);
-template void downsize<float>(float*, unsigned char*, int, int, int, int, int,
+template void resize<uint8_t>(uint8_t*, unsigned char*, int, int, int, int, int,
int, Settings*);
+template void resize<float>(float*, unsigned char*, int, int, int, int, int,
+ int, Settings*);
} // namespace label_image
} // namespace tflite
limitations under the License.
==============================================================================*/
-#ifndef TENSORFLOW_CONTRIB_LITE_EXAMPLES_LABEL_IMAGE_BITMAP_HELPERS_IMPL_H
-#define TENSORFLOW_CONTRIB_LITE_EXAMPLES_LABEL_IMAGE_BITMAP_HELPERS_IMPL_H
+#ifndef TENSORFLOW_CONTRIB_LITE_EXAMPLES_LABEL_IMAGE_BITMAP_HELPERS_IMPL_H_
+#define TENSORFLOW_CONTRIB_LITE_EXAMPLES_LABEL_IMAGE_BITMAP_HELPERS_IMPL_H_
+
+#include "tensorflow/contrib/lite/builtin_op_data.h"
+#include "tensorflow/contrib/lite/interpreter.h"
+#include "tensorflow/contrib/lite/kernels/register.h"
+#include "tensorflow/contrib/lite/string_util.h"
+#include "tensorflow/contrib/lite/version.h"
#include "tensorflow/contrib/lite/examples/label_image/label_image.h"
namespace label_image {
template <class T>
-void downsize(T* out, uint8_t* in, int image_height, int image_width,
- int image_channels, int wanted_height, int wanted_width,
- int wanted_channels, Settings* s) {
- for (int y = 0; y < wanted_height; ++y) {
- const int in_y = (y * image_height) / wanted_height;
- uint8_t* in_row = in + (in_y * image_width * image_channels);
- T* out_row = out + (y * wanted_width * wanted_channels);
- for (int x = 0; x < wanted_width; ++x) {
- const int in_x = (x * image_width) / wanted_width;
- uint8_t* in_pixel = in_row + (in_x * image_channels);
- T* out_pixel = out_row + (x * wanted_channels);
- for (int c = 0; c < wanted_channels; ++c) {
- if (s->input_floating)
- out_pixel[c] = (in_pixel[c] - s->input_mean) / s->input_std;
- else
- out_pixel[c] = in_pixel[c];
- }
- }
+void resize(T* out, uint8_t* in, int image_height, int image_width,
+ int image_channels, int wanted_height, int wanted_width,
+ int wanted_channels, Settings* s) {
+ int number_of_pixels = image_height * image_width * image_channels;
+ std::unique_ptr<Interpreter> interpreter(new Interpreter);
+
+ int base_index = 0;
+
+ // two inputs: input and new_sizes
+ interpreter->AddTensors(2, &base_index);
+ // one output
+ interpreter->AddTensors(1, &base_index);
+ // set input and output tensors
+ interpreter->SetInputs({0, 1});
+ interpreter->SetOutputs({2});
+
+ // set parameters of tensors
+ TfLiteQuantizationParams quant;
+ interpreter->SetTensorParametersReadWrite(
+ 0, kTfLiteFloat32, "input",
+ {1, image_height, image_width, image_channels}, quant);
+ interpreter->SetTensorParametersReadWrite(1, kTfLiteInt32, "new_size", {2},
+ quant);
+ interpreter->SetTensorParametersReadWrite(
+ 2, kTfLiteFloat32, "output",
+ {1, wanted_height, wanted_width, wanted_channels}, quant);
+
+ ops::builtin::BuiltinOpResolver resolver;
+ TfLiteRegistration* resize_op =
+ resolver.FindOp(BuiltinOperator_RESIZE_BILINEAR);
+ interpreter->AddNodeWithParameters({0, 1}, {2}, nullptr, 0, nullptr,
+ resize_op, nullptr);
+
+ interpreter->AllocateTensors();
+
+ // fill input image
+ // in[] are integers, cannot do memcpy() directly
+ auto input = interpreter->typed_tensor<float>(0);
+ for (int i = 0; i < number_of_pixels; i++) {
+ input[i] = in[i];
+ }
+
+ // fill new_sizes
+ interpreter->typed_tensor<int>(1)[0] = wanted_height;
+ interpreter->typed_tensor<int>(1)[1] = wanted_width;
+
+ interpreter->Invoke();
+
+ auto output = interpreter->typed_tensor<float>(2);
+ auto output_number_of_pixels =
+ wanted_height * wanted_height * wanted_channels;
+
+ for (int i = 0; i < output_number_of_pixels; i++) {
+ if (s->input_floating)
+ out[i] = (output[i] - s->input_mean) / s->input_std;
+ else
+ out[i] = (uint8_t)output[i];
}
}
} // namespace label_image
} // namespace tflite
-#endif // TENSORFLOW_CONTRIB_LITE_EXAMPLES_LABEL_IMAGE_BITMAP_HELPERS_IMPL_H
+#endif // TENSORFLOW_CONTRIB_LITE_EXAMPLES_LABEL_IMAGE_BITMAP_HELPERS_IMPL_H_
int wanted_width = dims->data[2];
int wanted_channels = dims->data[3];
- if (s->input_floating) {
- downsize<float>(interpreter->typed_tensor<float>(input), in, image_height,
+ switch (interpreter->tensor(input)->type) {
+ case kTfLiteFloat32:
+ s->input_floating = true;
+ resize<float>(interpreter->typed_tensor<float>(input), in, image_height,
image_width, image_channels, wanted_height, wanted_width,
wanted_channels, s);
- } else {
- downsize<uint8_t>(interpreter->typed_tensor<uint8_t>(input), in,
+ break;
+ case kTfLiteUInt8:
+ resize<uint8_t>(interpreter->typed_tensor<uint8_t>(input), in,
image_height, image_width, image_channels, wanted_height,
wanted_width, wanted_channels, s);
+ break;
+ default:
+ LOG(FATAL) << "cannot handle input type "
+ << interpreter->tensor(input)->type << " yet";
+ exit(-1);
}
struct timeval start_time, stop_time;
std::vector<std::pair<float, int>> top_results;
- if (s->input_floating) {
- get_top_n<float>(interpreter->typed_output_tensor<float>(0), output_size,
- num_results, threshold, &top_results, s->input_floating);
- } else {
- get_top_n<uint8_t>(interpreter->typed_output_tensor<uint8_t>(0),
- output_size, num_results, threshold, &top_results,
- s->input_floating);
+ int output = interpreter->outputs()[0];
+ switch (interpreter->tensor(output)->type) {
+ case kTfLiteFloat32:
+ get_top_n<float>(interpreter->typed_output_tensor<float>(0), output_size,
+ num_results, threshold, &top_results, true);
+ break;
+ case kTfLiteUInt8:
+ get_top_n<uint8_t>(interpreter->typed_output_tensor<uint8_t>(0),
+ output_size, num_results, threshold, &top_results,
+ false);
+ break;
+ default:
+ LOG(FATAL) << "cannot handle output type "
+ << interpreter->tensor(input)->type << " yet";
+ exit(-1);
}
std::vector<string> labels;
LOG(INFO) << "label_image\n"
<< "--accelerated, -a: [0|1], use Android NNAPI or note\n"
<< "--count, -c: loop interpreter->Invoke() for certain times\n"
- << "--input_floating, -f: [0|1] type of input layer is floating "
- "point numbers\n"
<< "--input_mean, -b: input mean\n"
<< "--input_std, -s: input standard deviation\n"
<< "--image, -i: image_name.bmp\n"
<< "--labels, -l: labels for the model\n"
- << "--tflite_mode, -m: model_name.tflite\n"
+ << "--tflite_model, -m: model_name.tflite\n"
<< "--threads, -t: number of threads\n"
<< "--verbose, -v: [0|1] print more information\n"
<< "\n";
static struct option long_options[] = {
{"accelerated", required_argument, 0, 'a'},
{"count", required_argument, 0, 'c'},
- {"input_floating", required_argument, 0, 'f'},
{"verbose", required_argument, 0, 'v'},
{"image", required_argument, 0, 'i'},
{"labels", required_argument, 0, 'l'},
s.loop_count = strtol( // NOLINT(runtime/deprecated_fn)
optarg, (char**)NULL, 10);
break;
- case 'f':
- s.input_floating = strtol( // NOLINT(runtime/deprecated_fn)
- optarg, (char**)NULL, 10);
- s.input_layer_type = "float";
- break;
case 'i':
s.input_bmp_name = optarg;
break;
#ifndef TENSORFLOW_CONTRIB_LITE_EXAMPLES_LABEL_IMAGE_LABEL_IMAGE_H
#define TENSORFLOW_CONTRIB_LITE_EXAMPLES_LABEL_IMAGE_LABEL_IMAGE_H
-#include <string>
#include "tensorflow/contrib/lite/string.h"
+namespace tflite {
+namespace label_image {
+
struct Settings {
bool verbose = false;
bool accel = false;
int number_of_threads = 4;
};
+} // namespace label_image
+} // namespace tflite
+
#endif // TENSORFLOW_CONTRIB_LITE_EXAMPLES_LABEL_IMAGE_LABEL_IMAGE_H
label_image for TensorFlow Lite inspired by TensorFlow's label_image.
+
+To build label_image for Android, run $TENSORFLOW_ROOT/configure
+and set Android NDK or configure NDK setting in
+$TENSORFLOW_ROOT/WORKSPACE first.
To build it for android ARMv8:
```
-> bazel build --cxxopt=-std=c++11 \
+> bazel build --config monolithic --cxxopt=-std=c++11 \
--crosstool_top=//external:android/crosstool \
--host_crosstool_top=@bazel_tools//tools/cpp:toolchain \
--cpu=arm64-v8a \
```
or
```
-> bazel build --config android_arm64 --cxxopt=-std=c++11 \
+> bazel build --config android_arm64 --config monolithic --cxxopt=-std=c++11 \
//tensorflow/contrib/lite/examples/label_image:label_image
```
To build it for android arm-v7a:
```
-> bazel build --cxxopt=-std=c++11 \
+> bazel build --config monolithic --cxxopt=-std=c++11 \
--crosstool_top=//external:android/crosstool \
--host_crosstool_top=@bazel_tools//tools/cpp:toolchain \
--cpu=armeabi-v7a \
```
or
```
-> bazel build --config android_arm --cxxopt=-std=c++11 \
+> bazel build --config android_arm --config monolithic --cxxopt=-std=c++11 \
//tensorflow/contrib/lite/examples/label_image:label_image
```
"optimized/neon_tensor_utils.cc",
],
hdrs = [
+ "common.h",
+ "optimized/cpu_check.h",
"optimized/neon_tensor_utils.h",
"optimized/tensor_utils_impl.h",
],
deps = [
":cpu_check",
":portable_tensor_utils",
+ ":types",
"//tensorflow/contrib/lite:builtin_op_data",
"//tensorflow/contrib/lite/kernels:activation_functor",
+ "@arm_neon_2_x86_sse",
+ "@gemmlowp",
],
)
"tensor_utils.cc",
],
hdrs = [
+ "common.h",
+ "compatibility.h",
+ "optimized/cpu_check.h",
+ "optimized/neon_tensor_utils.h",
"optimized/tensor_utils_impl.h",
"reference/portable_tensor_utils.h",
"tensor_utils.h",
+ "types.h",
],
copts = NEON_FLAGS_IF_APPLICABLE,
deps = [
"//tensorflow/contrib/lite/kernels:activation_functor",
"//tensorflow/contrib/lite:builtin_op_data",
+ "@arm_neon_2_x86_sse",
+ "@gemmlowp",
] + select({
":arm": [
":neon_tensor_utils",
":ios_arm64": [
":neon_tensor_utils",
],
+ ":x86_64": [
+ ":neon_tensor_utils",
+ ],
+ ":x86": [
+ ":neon_tensor_utils",
+ ],
+ ":k8": [
+ ":neon_tensor_utils",
+ ],
+ ":darwin": [
+ ":neon_tensor_utils",
+ ],
"//conditions:default": [
":portable_tensor_utils",
],
#endif // __aarch64__
}
-#elif __ARM_NEON
+#elif defined USE_NEON || defined __ARM_NEON
inline bool TestCPUFeatureNeon() { return true; }
#include "tensorflow/contrib/lite/builtin_op_data.h"
#include "tensorflow/contrib/lite/kernels/activation_functor.h"
+#include "tensorflow/contrib/lite/kernels/internal/common.h"
#include "tensorflow/contrib/lite/kernels/internal/optimized/tensor_utils_impl.h"
#ifdef USE_NEON
-#include <arm_neon.h>
#define kFloatWeightsPerNeonLane 4
namespace tflite {
limitations under the License.
==============================================================================*/
#include "tensorflow/contrib/lite/kernels/internal/tensor_utils.h"
+#include "tensorflow/contrib/lite/kernels/internal/common.h"
#ifndef USE_NEON
#if defined(__ARM_NEON__) || defined(__ARM_NEON)
}
/**
- * Specfifies which operands will be the model's inputs and outputs.
+ * Specifies which operands will be the model's inputs and outputs.
*
* An operand cannot be used for both input and output. Doing so will
* return an error.
CHECK(increment == 1 || increment == -1);
bool changed = false;
if (model->operators.empty()) {
+ LOG(INFO) << "Model is empty!!!";
return false;
}
int op_index = increment == 1 ? 0 : model->operators.size() - 1;
// Remove all the resolved arrays.
for (const string& input_name : concat_op->inputs) {
- model->EraseArray(input_name);
+ // Check to prevent removal of shared tensors
+ if (CountOpsWithInput(*model, input_name) == 1) {
+ model->EraseArray(input_name);
+ }
}
// Remove concatenate operator
#ifndef TENSORFLOW_CONTRIB_LITE_TOCO_MODEL_H_
#define TENSORFLOW_CONTRIB_LITE_TOCO_MODEL_H_
+#include <functional>
#include <initializer_list>
#include <memory>
#include <string>
void CheckNoMissingArray(const Model& model) {
for (const auto& op : model.operators) {
for (const auto& input : op->inputs) {
- CHECK(model.HasArray(input) || model.optional_arrays.count(input));
+ CHECK(model.HasArray(input) || model.optional_arrays.count(input))
+ << "Input: " << input << " missing for op: " << op->outputs[0] << ".";
}
for (const auto& output : op->outputs) {
- CHECK(model.HasArray(output));
+ CHECK(model.HasArray(output)) << "Output: " << output << " missing.";
}
}
CheckNonExistentIOArrays(model);
ifeq ($(BUILD_FOR_TEGRA),1)
NVCC := $(JETPACK)/cuda/bin/nvcc
- NVCCFLAGS := -x=cu -D__CUDACC__ -DNVCC -DNVIDIA_TEGRA -ccbin $(NDK_ROOT)/toolchains/$(TOOLCHAIN)/prebuilt/$(ANDROID_HOST_OS_ARCH)/bin/$(BIN_PREFIX)-g++ --std c++11 --expt-relaxed-constexpr -m64 -gencode arch=compute_53,\"code=sm_53\" -gencode arch=compute_62,\"code=sm_62\" -DEIGEN_AVOID_STL_ARRAY -DTENSORFLOW_USE_EIGEN_THREADPOOL -DLANG_CXX11 -DEIGEN_HAS_C99_MATH -DGOOGLE_CUDA=1 -DTF_EXTRA_CUDA_CAPABILITIES=5.3
+ NVCCFLAGS := -x=cu -D__CUDACC__ -DNVCC -DANDROID_TEGRA -ccbin $(NDK_ROOT)/toolchains/$(TOOLCHAIN)/prebuilt/$(ANDROID_HOST_OS_ARCH)/bin/$(BIN_PREFIX)-g++ --std c++11 --expt-relaxed-constexpr -m64 -gencode arch=compute_53,\"code=sm_53\" -gencode arch=compute_62,\"code=sm_62\" -DEIGEN_AVOID_STL_ARRAY -DTENSORFLOW_USE_EIGEN_THREADPOOL -DLANG_CXX11 -DEIGEN_HAS_C99_MATH -DGOOGLE_CUDA=1 -DTF_EXTRA_CUDA_CAPABILITIES=5.3
CXXFLAGS4NVCC =\
-DIS_SLIM_BUILD \
--DNVIDIA_TEGRA \
+-DANDROID_TEGRA \
-fno-exceptions \
-DNDEBUG $(OPTFLAGS) \
-march=armv8-a \
CXXFLAGS +=\
-DGOOGLE_CUDA=1 \
-D__ANDROID_TYPES_FULL__ \
--DNVIDIA_TEGRA \
+-DANDROID_TEGRA \
-DEIGEN_AVOID_STL_ARRAY \
-DEIGEN_HAS_C99_MATH \
-DLANG_CXX11 -DTENSORFLOW_USE_EIGEN_THREADPOOL -DTF_EXTRA_CUDA_CAPABILITIES=5.3
-I$(JETPACK)/cuda/extras/CUPTI/include
- LIBS += \
+ CUDA_LIBS := \
-ltfcuda \
-lcudart_static \
-lcudnn \
-lculibos \
-lcurand_static
- OBJDIR := $(OBJDIR)Tegra/
- LIBDIR := $(LIBDIR)Tegra/
- BINDIR := $(BINDIR)Tegra/
- DEPDIR := $(DEPDIR)Tegra/
+ OBJDIR := $(OBJDIR)android_arm64-v8a/
+ LIBDIR := $(LIBDIR)android_arm64-v8a/
+ BINDIR := $(BINDIR)android_arm64-v8a/
+ DEPDIR := $(DEPDIR)android_arm64-v8a/
TEGRA_LIBS := \
-L$(JETPACK)/cuda/targets/aarch64-linux-androideabi/lib \
tensorflow/core/util/version_info.cc
# Remove duplicates (for version_info.cc)
CORE_CC_ALL_SRCS := $(sort $(CORE_CC_ALL_SRCS))
-CORE_CC_EXCLUDE_SRCS := \
+
+CORE_CC_EXCLUDE_SRCS_NON_GPU := \
$(wildcard tensorflow/core/*/*test.cc) \
$(wildcard tensorflow/core/*/*testutil*) \
$(wildcard tensorflow/core/*/*testlib*) \
$(wildcard tensorflow/core/lib/png/*) \
$(wildcard tensorflow/core/util/events_writer.*) \
$(wildcard tensorflow/core/util/reporter.*) \
-$(wildcard tensorflow/core/platform/default/cuda_libdevice_path.*) \
-$(wildcard tensorflow/core/platform/default/stream_executor.*) \
$(wildcard tensorflow/core/platform/default/test_benchmark.*) \
-$(wildcard tensorflow/core/platform/cuda.h) \
-$(wildcard tensorflow/core/platform/cuda_libdevice_path.*) \
$(wildcard tensorflow/core/platform/cloud/*) \
$(wildcard tensorflow/core/platform/google/*) \
$(wildcard tensorflow/core/platform/google/*/*) \
$(wildcard tensorflow/core/platform/jpeg.*) \
$(wildcard tensorflow/core/platform/png.*) \
$(wildcard tensorflow/core/platform/s3/*) \
-$(wildcard tensorflow/core/platform/stream_executor.*) \
$(wildcard tensorflow/core/platform/windows/*) \
-$(wildcard tensorflow/core/user_ops/*.cu.cc) \
-$(wildcard tensorflow/core/common_runtime/gpu/*) \
-$(wildcard tensorflow/core/common_runtime/gpu_device_factory.*) \
$(wildcard tensorflow/core/grappler/inputs/trivial_test_graph_input_yielder.*) \
$(wildcard tensorflow/core/grappler/inputs/file_input_yielder.*) \
-$(wildcard tensorflow/core/grappler/clusters/single_machine.*)
+$(wildcard tensorflow/core/grappler/clusters/single_machine.*) \
+tensorflow/core/util/cuda_kernel_helper_test.cu.cc
+
+CORE_CC_EXCLUDE_SRCS := \
+$(CORE_CC_EXCLUDE_SRCS_NON_GPU) \
+$(wildcard tensorflow/core/platform/stream_executor.*) \
+$(wildcard tensorflow/core/platform/default/cuda_libdevice_path.*) \
+$(wildcard tensorflow/core/platform/cuda.h) \
+$(wildcard tensorflow/core/platform/cuda_libdevice_path.*) \
+$(wildcard tensorflow/core/user_ops/*.cu.cc) \
+$(wildcard tensorflow/core/common_runtime/gpu/*) \
+$(wildcard tensorflow/core/common_runtime/gpu_device_factory.*)
ifeq ($(BUILD_FOR_TEGRA),1)
-CORE_CC_ALL_SRCS := \
-$(wildcard tensorflow/core/*.cc) \
-$(wildcard tensorflow/core/common_runtime/*.cc) \
-$(wildcard tensorflow/core/common_runtime/gpu/*.cc) \
-$(wildcard tensorflow/core/framework/*.cc) \
-$(wildcard tensorflow/core/graph/*.cc) \
-$(wildcard tensorflow/core/platform/*.cc) \
-$(wildcard tensorflow/core/platform/*/*.cc) \
-$(wildcard tensorflow/core/platform/*/*/*.cc) \
-$(wildcard tensorflow/core/util/*.cc) \
-$(wildcard tensorflow/core/util/*/*.cc) \
-$(wildcard tensorflow/cc/training/*.cc) \
-$(wildcard tensorflow/stream_executor/*.cc) \
-$(wildcard tensorflow/stream_executor/*/*.cc) \
-$(wildcard tensorflow/core/grappler/optimizers/*.cc) \
-$(wildcard tensorflow/core/grappler/*.cc) \
-$(wildcard tensorflow/core/grappler/costs/*.cc) \
-$(wildcard tensorflow/core/grappler/clusters/*.cc) \
-$(wildcard tensorflow/core/grappler/utils/*.cc) \
-$(wildcard tensorflow/core/lib/core/*.cc) \
-$(wildcard tensorflow/core/lib/*/*.cc) \
-tensorflow/core/grappler/inputs/utils.cc \
+CORE_CC_ALL_SRCS := $(CORE_CC_ALL_SRCS) \
tensorflow/core/kernels/concat_lib_gpu.cc \
tensorflow/core/kernels/cuda_solvers.cc \
tensorflow/core/kernels/cudnn_pooling_gpu.cc \
tensorflow/core/kernels/fractional_max_pool_op.cc \
tensorflow/core/kernels/fractional_pool_common.cc \
tensorflow/core/kernels/pooling_ops_3d.cc \
-tensorflow/core/kernels/sparse_fill_empty_rows_op.cc
+tensorflow/core/kernels/sparse_fill_empty_rows_op.cc \
+tensorflow/core/kernels/list_kernels.cc \
+$(wildcard tensorflow/core/common_runtime/gpu/*.cc) \
+$(wildcard tensorflow/stream_executor/*.cc) \
+$(wildcard tensorflow/stream_executor/*/*.cc)
CORE_CC_EXCLUDE_SRCS := \
-$(wildcard tensorflow/core/*/*test.cc) \
-$(wildcard tensorflow/core/*/*testutil*) \
-$(wildcard tensorflow/core/*/*testlib*) \
-$(wildcard tensorflow/core/*/*/*test.cc) \
-$(wildcard tensorflow/core/*/*/*testutil*) \
-$(wildcard tensorflow/core/framework/op_gen_lib.cc) \
-$(wildcard tensorflow/core/lib/gif/*) \
-$(wildcard tensorflow/core/lib/jpeg/*) \
-$(wildcard tensorflow/core/lib/png/*) \
-$(wildcard tensorflow/core/lib/db/*) \
-$(wildcard tensorflow/core/platform/jpeg.*) \
-$(wildcard tensorflow/core/platform/png.*) \
-$(wildcard tensorflow/core/platform/cloud/*) \
-$(wildcard tensorflow/core/platform/s3/*) \
-$(wildcard tensorflow/core/platform/windows/*) \
-$(wildcard tensorflow/core/*/*/*testlib*) \
-$(wildcard tensorflow/cc/training/*test.cc) \
-tensorflow/core/lib/io/record_reader.cc \
-tensorflow/core/util/cuda_kernel_helper_test.cu.cc
+$(CORE_CC_EXCLUDE_SRCS_NON_GPU)
CUDA_CC_SRCS := $(wildcard tensorflow/core/kernels/*.cu.cc)
CUDA_CC_OBJS := $(addprefix $(OBJDIR), $(CUDA_CC_SRCS:.cc=.o))
@mkdir -p $(dir $@)
$(CXX) $(CXXFLAGS) $(INCLUDES) \
-o $(BENCHMARK_NAME) $(BENCHMARK_OBJS) \
- $(LIBFLAGS) $(TEGRA_LIBS) $(LIB_PATH) $(LDFLAGS) $(LIBS)
+ $(LIBFLAGS) $(TEGRA_LIBS) $(LIB_PATH) $(LDFLAGS) $(LIBS) $(CUDA_LIBS)
# NVCC compilation rules for Tegra
ifeq ($(BUILD_FOR_TEGRA),1)
set -e
usage() {
- echo "Usage: NDK_ROOT=<path to ndk root> $(basename "$0") [-Es:t:Tx:a:X]"
+ echo "Usage: NDK_ROOT=<path to ndk root> $(basename "$0") [-Es:t:Tx:a]"
echo "-E enable experimental hexnn ops"
echo "-s [sub_makefiles] sub makefiles separated by white space"
echo "-t [build_target] build target for Android makefile [default=all]"
if [[ -z "${BUILD_ARCH}" ]]; then
# Compile protobuf for the target iOS device architectures.
- tensorflow/contrib/makefile/compile_ios_protobuf.sh -a ${DEFAULT_ARCH}
+ tensorflow/contrib/makefile/compile_ios_protobuf.sh
else
# Compile protobuf for the target iOS device architectures.
tensorflow/contrib/makefile/compile_ios_protobuf.sh -a ${BUILD_ARCH}
GEN_DOWNLOAD_DIR="${GEN_DIR}/downloads"
URL_BASE="https://storage.googleapis.com/download.tensorflow.org"
+ARCH="armeabi-v7a"
+
source "${SCRIPT_DIR}/../build_helper.subr"
rm -rf "${GEN_DIR}"
adb push "${GEN_LIBS_DIR}/libhexagon_nn_skel.so" "/vendor/lib/rfsa/adsp"
adb push -p \
- "${TF_ROOT_DIR}/tensorflow/contrib/makefile/gen/bin/hexagon_graph_execution" \
+ "${TF_ROOT_DIR}/tensorflow/contrib/makefile/gen/bin/android_${ARCH}/hexagon_graph_execution" \
"/data/local/tmp/"
adb wait-for-device
adb shell chmod "${ANDROID_EXEC_FILE_MODE}" \
-o $@ $(INFERENCE_OBJS) $(LIB_OBJS) $(TEGRA_LIBS) \
$(LIBFLAGS) $(LDFLAGS) \
-shared -Wl,-soname,$(INFERENCE_SO_NAME) \
- $(LIBS)
+ $(LIBS) $(CUDA_LIBS)
$(INFERENCE_SO_NAME): $(INFERENCE_SO_PATH)
tensorflow/core/kernels/reduction_ops_common.cc
tensorflow/core/kernels/reduction_ops_any.cc
tensorflow/core/kernels/reduction_ops_all.cc
+tensorflow/core/kernels/roll_op.cc
tensorflow/core/kernels/queue_ops.cc
tensorflow/core/kernels/queue_base.cc
tensorflow/core/kernels/pooling_ops_common.cc
tensorflow/core/ops/no_op.cc
tensorflow/core/ops/nn_ops.cc
tensorflow/core/ops/nn_grad.cc
+tensorflow/core/ops/manip_ops.cc
tensorflow/core/ops/math_ops.cc
tensorflow/core/ops/math_grad.cc
tensorflow/core/ops/logging_ops.cc
tensorflow/core/kernels/warn_about_ints.cc
tensorflow/core/kernels/segment_reduction_ops.cc
tensorflow/core/kernels/batch_util.cc
+tensorflow/core/ops/audio_ops.cc
void MPIRendezvousMgr::AddRequest(RecvTensorRequest request,
const int mpi_dst) {
TF_CHECK_OK(recv_tensor_recent_request_ids_.TrackUnique(
- req.request_id(), "RecvTensor (MPIRendezvousMgr)", req));
+ request.request_id(), "RecvTensor (MPIRendezvousMgr)", request));
const int64 step_id = request.step_id();
const std::string& key = request.rendezvous_key();
Rendezvous::ParsedKey parsed;
#include "tensorflow/contrib/mpi/mpi_msg.pb.h"
#include "tensorflow/contrib/mpi/mpi_utils.h"
#include "tensorflow/core/distributed_runtime/base_rendezvous_mgr.h"
+#include "tensorflow/core/distributed_runtime/recent_request_ids.h"
#include "tensorflow/core/distributed_runtime/request_id.h"
#include "tensorflow/core/distributed_runtime/worker_env.h"
#include "tensorflow/core/protobuf/worker.pb.h"
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
+"""Library of multidimensional LSTM models and related code."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
+
+from tensorflow.contrib.ndlstm.python import lstm1d
+from tensorflow.contrib.ndlstm.python import lstm2d
from tensorflow.contrib.framework.python.ops import variables
from tensorflow.python.framework import constant_op
from tensorflow.python.ops import array_ops
-from tensorflow.python.ops import math_ops
from tensorflow.python.ops import nn_ops
from tensorflow.python.ops import random_ops
from tensorflow.python.ops import rnn
Output sequence (length, batch_size, noutput)
"""
with variable_scope.variable_scope(scope, "SeqLstm", [inputs]):
- # TODO(tmb) make batch size, sequence_length dynamic
- # example: sequence_length = tf.shape(inputs)[0]
- _, batch_size, _ = _shape(inputs)
- lstm_cell = rnn_cell.BasicLSTMCell(noutput, state_is_tuple=False)
- state = array_ops.zeros([batch_size, lstm_cell.state_size])
- sequence_length = int(inputs.get_shape()[0])
- sequence_lengths = math_ops.to_int64(
- array_ops.fill([batch_size], sequence_length))
+ lstm_cell = rnn_cell.BasicLSTMCell(noutput)
if reverse:
inputs = array_ops.reverse_v2(inputs, [0])
outputs, _ = rnn.dynamic_rnn(
- lstm_cell, inputs, sequence_lengths, state, time_major=True)
+ lstm_cell, inputs, time_major=True, dtype=inputs.dtype)
if reverse:
outputs = array_ops.reverse_v2(outputs, [0])
return outputs
'automatically and cannot be injected manually'.format(kwarg))
minimize_kwargs.update(optimizer_kwargs)
- if method == 'SLSQP':
- # SLSQP doesn't support step callbacks. Obviate associated warning
- # message.
- del minimize_kwargs['callback']
import scipy.optimize # pylint: disable=g-import-not-at-top
result = scipy.optimize.minimize(*minimize_args, **minimize_kwargs)
method = optimizer.optimizer_kwargs.get('method')
self.assertEqual('SLSQP', method)
+ def test_callbacks(self):
+ vector_val = np.array([7., -2.], dtype=np.float32)
+ vector = variables.Variable(vector_val, 'vector')
+
+ minimum_location_val = np.arange(2)
+ minimum_location = constant_op.constant(
+ minimum_location_val, dtype=dtypes.float32)
+
+ loss = math_ops.reduce_sum(math_ops.square(vector - minimum_location)) / 2.
+ loss_val_first = ((vector_val - minimum_location_val)**2).sum() / 2.
+
+ optimizer = external_optimizer.ScipyOptimizerInterface(loss, method='SLSQP')
+
+ with self.test_session() as sess:
+ sess.run(variables.global_variables_initializer())
+
+ initial_vector_val = sess.run(vector)
+
+ extra_fetches = [loss]
+
+ step_callback = test.mock.Mock()
+ loss_callback = test.mock.Mock()
+
+ optimizer.minimize(
+ sess,
+ fetches=extra_fetches,
+ loss_callback=loss_callback,
+ step_callback=step_callback)
+
+ loss_val_last = sess.run(loss)
+
+ call_first = test.mock.call(loss_val_first)
+ call_last = test.mock.call(loss_val_last)
+ loss_calls = [call_first, call_last]
+ loss_callback.assert_has_calls(loss_calls, any_order=True)
+
+ args, _ = step_callback.call_args
+ self.assertAllClose(minimum_location_val, args[0])
+
if __name__ == '__main__':
test.main()
def convert(recursive=False, arg_types=None):
"""Decorator that compiles a function to graph mode.
- The decorator is dynamic - invoking compilation whenever the decorated fuction
- is called. This means the parameter values are known at compilation.
+ The decorator is dynamic - invoking compilation whenever the decorated
+ function is called. This means the parameter values are known at compilation.
Args:
recursive: Whether to recusrively convert any functions that the decorator
stride: Stride (int).
total_padding: Total padding to be applied (int).
Returns:
- output_resolution: Ouput dimension (int) or None.
+ output_resolution: Output dimension (int) or None.
"""
if (input_spatial_resolution is None) or (kernel_size is None) or (
stride is None) or (total_padding is None):
[1,1]
[0,2]],
-the the output will be [[ 1, 2, 3]
- [ 0, 0, 0]
- [41,52,63]].
+the output will be [[ 1, 2, 3]
+ [ 0, 0, 0]
+ [41,52,63]].
```
The data must be at least rank 1. The indices must be of shape (?,2) where the
[1,1]
[0,2]],
-the the output will be [[ 1, 2, 3]
- [ 1, 1, 1]
- [40,100,180]].
+the output will be [[ 1, 2, 3]
+ [ 1, 1, 1]
+ [40,100,180]].
```
The data must be at least rank 1. The indices can be of shape (?,2) where the
[1,1]
[0,2]],
-the the output will be [[ 1, 20, 3]
- [ -BIG_VALUE, -BIG_VALUE, -BIG_VALUE]
- [ 400, 20, 60]].
+the output will be [[ 1, 20, 3]
+ [ -BIG_VALUE, -BIG_VALUE, -BIG_VALUE]
+ [ 400, 20, 60]].
```
The data must be at least rank 1. The indices can be of shape (?,2) where the
[1,1]
[0,2]],
-the the output will be [[ 1, 20, 3]
- [ +BIG_VALUE, +BIG_VALUE, +BIG_VALUE]
- [ 1, 5, 3]].
+the output will be [[ 1, 20, 3]
+ [ +BIG_VALUE, +BIG_VALUE, +BIG_VALUE]
+ [ 1, 5, 3]].
```
The data must be at least rank 1. The indices can be of shape (?,2) where the
# Smoke test
self.assertAllClose(res[0], [[0.509682, 0.509682]])
+ def testSRUCellWithDiffSize(self):
+ with self.test_session() as sess:
+ with variable_scope.variable_scope(
+ "root", initializer=init_ops.constant_initializer(0.5)):
+ x = array_ops.zeros([1, 3])
+ m = array_ops.zeros([1, 2])
+ g, _ = contrib_rnn_cell.SRUCell(2)(x, m)
+ sess.run([variables_lib.global_variables_initializer()])
+ res = sess.run([g], {
+ x.name: np.array([[1., 1., 1.]]),
+ m.name: np.array([[0.1, 0.1]])
+ })
+ # Smoke test
+ self.assertAllClose(res[0], [[0.55255556, 0.55255556]])
+
def testBasicLSTMCell(self):
for dtype in [dtypes.float16, dtypes.float32]:
np_dtype = dtype.as_numpy_dtype
self.assertAllClose(expected_c, actual_c, 1e-5)
self.assertAllClose(expected_h, actual_h, 1e-5)
-
if __name__ == "__main__":
test.main()
input_depth = inputs_shape[1].value
- # Here the contributor believes that the following constraints
- # are implied. The reasoning is explained here with reference to
- # the paper https://arxiv.org/pdf/1709.02755.pdf upon which this
- # implementation is based.
- # In section 2.1 Equation 5, specifically:
- # h_t = r_t \odot g(c_t) + (1 - r_t) \odot x_t
- # the pointwise operation between r_t and x_t means they have
- # the same shape (since we are implementing an RNN cell, braodcasting
- # does not happen to input of a single timestep); by the same
- # reasons, x_t has the same shape as h_t, essentially mandating that
- # input_depth = unit_num.
- if input_depth != self._num_units:
- raise ValueError("SRU requires input_depth == num_units, got "
- "input_depth = %s, num_units = %s" % (input_depth,
- self._num_units))
-
self._kernel = self.add_variable(
rnn_cell_impl._WEIGHTS_VARIABLE_NAME,
- shape=[input_depth, 3 * self._num_units])
+ shape=[input_depth, 4 * self._num_units])
self._bias = self.add_variable(
rnn_cell_impl._BIAS_VARIABLE_NAME,
"""Simple recurrent unit (SRU) with num_units cells."""
U = math_ops.matmul(inputs, self._kernel)
- x_bar, f_intermediate, r_intermediate = array_ops.split(
- value=U, num_or_size_splits=3, axis=1)
+ x_bar, f_intermediate, r_intermediate, x_tx = array_ops.split(
+ value=U, num_or_size_splits=4, axis=1)
f_r = math_ops.sigmoid(
nn_ops.bias_add(
f, r = array_ops.split(value=f_r, num_or_size_splits=2, axis=1)
c = f * state + (1.0 - f) * x_bar
- h = r * self._activation(c) + (1.0 - r) * inputs
+ h = r * self._activation(c) + (1.0 - r) * x_tx
return h, c
_monotonic_probability_fn, sigmoid_noise=sigmoid_noise, mode=mode,
seed=sigmoid_noise_seed)
super(LuongMonotonicAttention, self).__init__(
- query_layer=layers_core.Dense(
- num_units, name="query_layer", use_bias=False, dtype=dtype),
+ query_layer=None,
memory_layer=layers_core.Dense(
num_units, name="memory_layer", use_bias=False, dtype=dtype),
memory=memory,
"""
default_signature = signatures.default_signature
signature_def = meta_graph_pb2.SignatureDef()
- if default_signature.WhichOneof("type") == "regression_signature":
+ if (default_signature.WhichOneof("type") ==
+ legacy_constants.REGRESSION_SIGNATURE):
regression_signature = default_signature.regression_signature
signature_def.method_name = signature_constants.REGRESS_METHOD_NAME
_add_input_to_signature_def(regression_signature.input.tensor_name,
_add_output_to_signature_def(regression_signature.output.tensor_name,
signature_constants.REGRESS_OUTPUTS,
signature_def)
- elif default_signature.WhichOneof("type") == "classification_signature":
+ elif (default_signature.WhichOneof("type") ==
+ legacy_constants.CLASSIFICATION_SIGNATURE):
classification_signature = default_signature.classification_signature
signature_def.method_name = signature_constants.CLASSIFY_METHOD_NAME
_add_input_to_signature_def(classification_signature.input.tensor_name,
signature_constants.PREDICT_OUTPUTS]
# TODO(pdudnik): what if there are other signatures? Mimic cr/140900781 once
# it is submitted.
- if (input_signature.WhichOneof("type") != "generic_signature" or
- output_signature.WhichOneof("type") != "generic_signature"):
+ if (input_signature.WhichOneof("type") != legacy_constants.GENERIC_SIGNATURE
+ or output_signature.WhichOneof("type") !=
+ legacy_constants.GENERIC_SIGNATURE):
raise RuntimeError("Named input and output signatures can only be "
"up-converted if they are generic signature. "
"Input signature type is %s, output signature type is "
SIGNATURES_KEY = "serving_signatures"
ASSETS_KEY = "serving_assets"
GRAPH_KEY = "serving_graph"
+REGRESSION_SIGNATURE = "regression_signature"
+CLASSIFICATION_SIGNATURE = "classification_signature"
+GENERIC_SIGNATURE = "generic_signature"
from tensorflow.contrib.metrics.python.ops import metric_ops
from tensorflow.contrib.slim.python.slim import evaluation
from tensorflow.contrib.training.python.training import evaluation as evaluation_lib
-from tensorflow.core.protobuf import saver_pb2
from tensorflow.python.debug.lib import debug_data
from tensorflow.python.debug.wrappers import hooks
from tensorflow.python.framework import constant_op
def _prepareCheckpoint(self, checkpoint_path):
init_op = control_flow_ops.group(variables.global_variables_initializer(),
variables.local_variables_initializer())
- saver = saver_lib.Saver(write_version=saver_pb2.SaverDef.V1)
+ saver = saver_lib.Saver()
with self.test_session() as sess:
sess.run(init_op)
saver.save(sess, checkpoint_path)
low=-1.0, high=1.0, size=np.prod(shape_)).reshape(shape_).astype(dtype_)
# Make a selfadjoint, positive definite.
a_np = np.dot(a_np.T, a_np)
+ # jacobi preconditioner
+ jacobi_np = np.zeros_like(a_np)
+ jacobi_np[range(a_np.shape[0]), range(a_np.shape[1])] = (
+ 1.0 / a_np.diagonal())
rhs_np = np.random.uniform(
low=-1.0, high=1.0, size=shape_[0]).astype(dtype_)
+ x_np = np.zeros_like(rhs_np)
tol = 1e-6 if dtype_ == np.float64 else 1e-3
max_iter = 20
with self.test_session() as sess:
if use_static_shape_:
a = constant_op.constant(a_np)
rhs = constant_op.constant(rhs_np)
+ x = constant_op.constant(x_np)
+ jacobi = constant_op.constant(jacobi_np)
else:
a = array_ops.placeholder(dtype_)
rhs = array_ops.placeholder(dtype_)
+ x = array_ops.placeholder(dtype_)
+ jacobi = array_ops.placeholder(dtype_)
operator = util.create_operator(a)
- cg_graph = linear_equations.conjugate_gradient(
- operator, rhs, tol=tol, max_iter=max_iter)
- if use_static_shape_:
- cg_val = sess.run(cg_graph)
- else:
- cg_val = sess.run(cg_graph, feed_dict={a: a_np, rhs: rhs_np})
- norm_r0 = np.linalg.norm(rhs_np)
- norm_r = np.sqrt(cg_val.gamma)
- self.assertLessEqual(norm_r, tol * norm_r0)
- # Validate that we get an equally small residual norm with numpy
- # using the computed solution.
- r_np = rhs_np - np.dot(a_np, cg_val.x)
- norm_r_np = np.linalg.norm(r_np)
- self.assertLessEqual(norm_r_np, tol * norm_r0)
+ preconditioners = [
+ None, util.identity_operator(a),
+ util.create_operator(jacobi)
+ ]
+ cg_results = []
+ for preconditioner in preconditioners:
+ cg_graph = linear_equations.conjugate_gradient(
+ operator,
+ rhs,
+ preconditioner=preconditioner,
+ x=x,
+ tol=tol,
+ max_iter=max_iter)
+ if use_static_shape_:
+ cg_val = sess.run(cg_graph)
+ else:
+ cg_val = sess.run(
+ cg_graph,
+ feed_dict={
+ a: a_np,
+ rhs: rhs_np,
+ x: x_np,
+ jacobi: jacobi_np
+ })
+ norm_r0 = np.linalg.norm(rhs_np)
+ norm_r = np.linalg.norm(cg_val.r)
+ self.assertLessEqual(norm_r, tol * norm_r0)
+ # Validate that we get an equally small residual norm with numpy
+ # using the computed solution.
+ r_np = rhs_np - np.dot(a_np, cg_val.x)
+ norm_r_np = np.linalg.norm(r_np)
+ self.assertLessEqual(norm_r_np, tol * norm_r0)
+ cg_results.append(cg_val)
+ # Validate that we get same results using identity_preconditioner
+ # and None
+ self.assertEqual(cg_results[0].i, cg_results[1].i)
+ self.assertAlmostEqual(cg_results[0].gamma, cg_results[1].gamma)
+ self.assertAllClose(cg_results[0].r, cg_results[1].r, rtol=tol)
+ self.assertAllClose(cg_results[0].x, cg_results[1].x, rtol=tol)
+ self.assertAllClose(cg_results[0].p, cg_results[1].p, rtol=tol)
return [test_conjugate_gradient]
def testCreateOperatorUnknownShape(self):
self._testCreateOperator(False)
+ def _testIdentityOperator(self, use_static_shape_):
+ for dtype in np.float32, np.float64:
+ a_np = np.array([[1., 2.], [3., 4.], [5., 6.]], dtype=dtype)
+ x_np = np.array([[2.], [-3.]], dtype=dtype)
+ y_np = np.array([[2], [-3.], [5.]], dtype=dtype)
+ with self.test_session() as sess:
+ if use_static_shape_:
+ a = constant_op.constant(a_np, dtype=dtype)
+ x = constant_op.constant(x_np, dtype=dtype)
+ y = constant_op.constant(y_np, dtype=dtype)
+ else:
+ a = array_ops.placeholder(dtype)
+ x = array_ops.placeholder(dtype)
+ y = array_ops.placeholder(dtype)
+ id_op = util.identity_operator(a)
+ ax = id_op.apply(x)
+ aty = id_op.apply_adjoint(y)
+ op_shape = ops.convert_to_tensor(id_op.shape)
+ if use_static_shape_:
+ op_shape_val, ax_val, aty_val = sess.run([op_shape, ax, aty])
+ else:
+ op_shape_val, ax_val, aty_val = sess.run(
+ [op_shape, ax, aty], feed_dict={
+ a: a_np,
+ x: x_np,
+ y: y_np
+ })
+ self.assertAllEqual(op_shape_val, [3, 2])
+ self.assertAllClose(ax_val, x_np)
+ self.assertAllClose(aty_val, y_np)
+
+ def testIdentityOperator(self):
+ self._testIdentityOperator(True)
+
+ def testIdentityOperatorUnknownShape(self):
+ self._testIdentityOperator(False)
+
def testL2Norm(self):
with self.test_session():
x_np = np.array([[2], [-3.], [5.]])
from tensorflow.python.framework import ops
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import control_flow_ops
+from tensorflow.python.ops import linalg_ops
from tensorflow.python.ops import math_ops
def conjugate_gradient(operator,
rhs,
+ preconditioner=None,
+ x=None,
tol=1e-4,
max_iter=20,
name="conjugate_gradient"):
vector with the result of applying the operator to `x`, i.e. if
`operator` represents matrix `A`, `apply` should return `A * x`.
rhs: A rank-1 `Tensor` of shape `[N]` containing the right-hand size vector.
+ preconditioner: An object representing a linear operator, see `operator`
+ for detail. The preconditioner should approximate the inverse of `A`.
+ An efficient preconditioner could dramatically improve the rate of
+ convergence. If `preconditioner` represents matrix `M`(`M` approximates
+ `A^{-1}`), the algorithm uses `preconditioner.apply(x)` to estimate
+ `A^{-1}x`. For this to be useful, the cost of applying `M` should be
+ much lower than computing `A^{-1}` directly.
+ x: A rank-1 `Tensor` of shape `[N]` containing the initial guess for the
+ solution.
tol: A float scalar convergence tolerance.
max_iter: An integer giving the maximum number of iterations.
name: A name scope for the operation.
- x: A rank-1 `Tensor` of shape `[N]` containing the computed solution.
- r: A rank-1 `Tensor` of shape `[M]` containing the residual vector.
- p: A rank-1 `Tensor` of shape `[N]`. `A`-conjugate basis vector.
- - gamma: \\(||r||_2^2\\)
+ - gamma: \\(r \dot M \dot r\\), equivalent to \\(||r||_2^2\\) when
+ `preconditioner=None`.
"""
# ephemeral class holding CG state.
cg_state = collections.namedtuple("CGState", ["i", "x", "r", "p", "gamma"])
def stopping_criterion(i, state):
- return math_ops.logical_and(i < max_iter, state.gamma > tol)
+ return math_ops.logical_and(i < max_iter, linalg_ops.norm(state.r) > tol)
- # TODO(rmlarsen): add preconditioning
- def cg_step(i, state):
+ def cg_step(i, state): # pylint: disable=missing-docstring
z = operator.apply(state.p)
alpha = state.gamma / util.dot(state.p, z)
x = state.x + alpha * state.p
r = state.r - alpha * z
- gamma = util.l2norm_squared(r)
- beta = gamma / state.gamma
- p = r + beta * state.p
+ if preconditioner is None:
+ gamma = util.dot(r, r)
+ beta = gamma / state.gamma
+ p = r + beta * state.p
+ else:
+ q = preconditioner.apply(r)
+ gamma = util.dot(r, q)
+ beta = gamma / state.gamma
+ p = q + beta * state.p
return i + 1, cg_state(i + 1, x, r, p, gamma)
with ops.name_scope(name):
n = operator.shape[1:]
rhs = array_ops.expand_dims(rhs, -1)
- gamma0 = util.l2norm_squared(rhs)
- tol = tol * tol * gamma0
- x = array_ops.expand_dims(
- array_ops.zeros(
- n, dtype=rhs.dtype.base_dtype), -1)
+ if x is None:
+ x = array_ops.expand_dims(
+ array_ops.zeros(n, dtype=rhs.dtype.base_dtype), -1)
+ r0 = rhs
+ else:
+ x = array_ops.expand_dims(x, -1)
+ r0 = rhs - operator.apply(x)
+ if preconditioner is None:
+ p0 = r0
+ else:
+ p0 = preconditioner.apply(r0)
+ gamma0 = util.dot(r0, p0)
+ tol *= linalg_ops.norm(r0)
i = constant_op.constant(0, dtype=dtypes.int32)
- state = cg_state(i=i, x=x, r=rhs, p=rhs, gamma=gamma0)
+ state = cg_state(i=i, x=x, r=r0, p=p0, gamma=gamma0)
_, state = control_flow_ops.while_loop(stopping_criterion, cg_step,
[i, state])
return cg_state(
apply_adjoint=lambda v: math_ops.matmul(matrix, v, adjoint_a=True))
+def identity_operator(matrix):
+ """Creates a linear operator from a rank-2 identity tensor."""
+
+ linear_operator = collections.namedtuple(
+ "LinearOperator", ["shape", "dtype", "apply", "apply_adjoint"])
+ shape = matrix.get_shape()
+ if shape.is_fully_defined():
+ shape = shape.as_list()
+ else:
+ shape = array_ops.shape(matrix)
+ return linear_operator(
+ shape=shape,
+ dtype=matrix.dtype,
+ apply=lambda v: v,
+ apply_adjoint=lambda v: v)
+
+
# TODO(rmlarsen): Measure if we should just call matmul.
def dot(x, y):
return math_ops.reduce_sum(math_ops.conj(x) * y)
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
+from absl import flags
import os
import subprocess
import tensorflow as tf
-tf.flags.DEFINE_string('service_addr', '',
- 'Address of TPU profiler service e.g. localhost:8466')
-tf.flags.DEFINE_string('logdir', '',
- 'Path of TensorBoard log directory e.g. /tmp/tb_log')
-tf.flags.DEFINE_integer('duration_ms', 2000, 'Duration of tracing in ms.')
+flags.DEFINE_string(
+ 'service_addr', None, 'Address of TPU profiler service e.g. '
+ 'localhost:8466')
+flags.DEFINE_string(
+ 'logdir', None, 'Path of TensorBoard log directory e.g. /tmp/tb_log, '
+ 'gs://tb_bucket')
+flags.DEFINE_integer('duration_ms', 2000, 'Duration of tracing in ms.')
+flags.DEFINE_integer(
+ 'num_tracing_attempts', 3, 'Automatically retry N times when no trace '
+ 'event is collected.')
+flags.DEFINE_boolean(
+ 'include_dataset_ops', True, 'Set to false to profile longer TPU '
+ 'device traces.')
-FLAGS = tf.flags.FLAGS
+FLAGS = flags.FLAGS
EXECUTABLE = 'data/capture_tpu_profile'
if not FLAGS.service_addr or not FLAGS.logdir:
sys.exit('service_addr and logdir must be provided.')
executable_path = os.path.join(os.path.dirname(__file__), EXECUTABLE)
+ logdir = os.path.expandvars(os.path.expanduser(FLAGS.logdir))
cmd = [executable_path]
- cmd.append('--logdir='+FLAGS.logdir)
+ cmd.append('--logdir='+logdir)
cmd.append('--service_addr='+FLAGS.service_addr)
cmd.append('--duration_ms='+str(FLAGS.duration_ms))
+ cmd.append('--num_tracing_attempts='+str(FLAGS.num_tracing_attempts))
+ cmd.append('--include_dataset_ops='+str(FLAGS.include_dataset_ops).lower())
subprocess.call(cmd)
from setuptools import setup
-_VERSION = '1.3.0-a1'
+_VERSION = '1.5.0-rc1'
CONSOLE_SCRIPTS = [
'capture_tpu_profile=cloud_tpu_profiler.main:run_main',
]
-REQUIRED_PACKAGES = [
- 'tensorflow >= 1.2.0',
-]
-
setup(
name='cloud_tpu_profiler',
version=_VERSION.replace('-', ''),
entry_points={
'console_scripts': CONSOLE_SCRIPTS,
},
- install_requires=REQUIRED_PACKAGES,
classifiers=[
# How mature is this project? Common values are
# 3 - Alpha
# 4 - Beta
# 5 - Production/Stable
- 'Development Status :: 3 - Alpha',
-
+ 'Development Status :: 4 - Beta',
'Intended Audience :: Developers',
'Intended Audience :: Education',
'Intended Audience :: Science/Research',
-
'License :: OSI Approved :: Apache Software License',
-
'Programming Language :: Python :: 2',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.4',
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
-
'Topic :: Scientific/Engineering',
'Topic :: Scientific/Engineering :: Mathematics',
'Topic :: Scientific/Engineering :: Artificial Intelligence',
'Topic :: Software Development :: Libraries :: Python Modules',
],
license='Apache 2.0',
- keywords='tensorflow performance tpu',)
+ keywords='tensorflow performance tpu',
+)
"framework/reader_interface.h",
"framework/reader_op_kernel.h",
"framework/register_types.h",
+ "framework/register_types_traits.h",
"framework/resource_mgr.h",
"framework/resource_op_kernel.h",
"framework/selective_registration.h",
"list_ops",
"lookup_ops",
"logging_ops",
+ "manip_ops",
"math_ops",
"nn_ops",
"no_op",
":list_ops_op_lib",
":logging_ops_op_lib",
":lookup_ops_op_lib",
+ ":manip_ops_op_lib",
":math_ops_op_lib",
":nn_ops_op_lib",
":no_op_op_lib",
"//tensorflow/core/kernels:list_kernels",
"//tensorflow/core/kernels:lookup",
"//tensorflow/core/kernels:logging",
+ "//tensorflow/core/kernels:manip",
"//tensorflow/core/kernels:math",
"//tensorflow/core/kernels:multinomial_op",
"//tensorflow/core/kernels:nn",
deps = [
":protos_all_cc_impl",
"//third_party/eigen3",
+ "@nsync//:nsync_cpp",
"@protobuf_archive//:protobuf",
],
alwayslink = 1,
description: <<END
Note that this routine only supports wildcard characters in the
basename portion of the pattern, not in the directory portion.
+Note also that the order of filenames returned can be non-deterministic.
END
}
--- /dev/null
+op {
+ graph_op_name: "Roll"
+ in_arg {
+ name: "shift"
+ description: <<END
+Dimension must be 0-D or 1-D. `shift[i]` specifies the number of places by which
+elements are shifted positively (towards larger indices) along the dimension
+specified by `axis[i]`. Negative shifts will roll the elements in the opposite
+direction.
+END
+ }
+ in_arg {
+ name: "axis"
+ description: <<END
+Dimension must be 0-D or 1-D. `axis[i]` specifies the dimension that the shift
+`shift[i]` should occur. If the same axis is referenced more than once, the
+total shift for that axis will be the sum of all the shifts that belong to that
+axis.
+END
+ }
+ out_arg {
+ name: "output"
+ description: <<END
+Has the same shape and size as the input. The elements are shifted
+positively (towards larger indices) by the offsets of `shift` along the
+dimensions of `axis`.
+END
+ }
+ summary: "Rolls the elements of a tensor along an axis."
+ description: <<END
+The elements are shifted positively (towards larger indices) by the offset of
+`shift` along the dimension of `axis`. Negative `shift` values will shift
+elements in the opposite direction. Elements that roll passed the last position
+will wrap around to the first and vice versa. Multiple shifts along multiple
+axes may be specified.
+
+For example:
+
+```
+# 't' is [0, 1, 2, 3, 4]
+roll(t, shift=2, axis=0) ==> [3, 4, 0, 1, 2]
+
+# shifting along multiple dimensions
+# 't' is [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
+roll(t, shift=[1, -2], axis=[0, 1]) ==> [[7, 8, 9, 5, 6], [2, 3, 4, 0, 1]]
+
+# shifting along the same axis multiple times
+# 't' is [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
+roll(t, shift=[2, -3], axis=[1, 1]) ==> [[1, 2, 3, 4, 0], [6, 7, 8, 9, 5]]
+```
+END
+}
--- /dev/null
+op {
+ graph_op_name: "UnravelIndex"
+ in_arg {
+ name: "indices"
+ description: <<END
+An 0-D or 1-D `int` Tensor whose elements are indices into the
+flattened version of an array of dimensions dims.
+END
+ }
+ in_arg {
+ name: "dims"
+ description: <<END
+An 1-D `int` Tensor. The shape of the array to use for unraveling
+indices.
+END
+ }
+ out_arg {
+ name: "output"
+ description: <<END
+An 2-D (or 1-D if indices is 0-D) tensor where each row has the
+same shape as the indices array.
+END
+ }
+ summary: "Converts a flat index or array of flat indices into a tuple of"
+ description: <<END
+coordinate arrays.
+
+@compatibility(numpy)
+Equivalent to np.unravel_index
+@end_compatibility
+END
+}
// is necessary.
min_system_memory *= 2;
#endif
-#if defined(NVIDIA_TEGRA)
+
+#if defined(ANDROID_TEGRA)
// 1GB system mem for NVIDIA Tegra devices since they use the same mem for RAM
// and Video RAM
min_system_memory = 1 << 30;
"//tensorflow/core:core_cpu_internal",
"//tensorflow/core:lib",
"//tensorflow/core:protos_all_cc",
+ "//tensorflow/core:worker_proto_cc",
],
)
const auto count = run_state->count;
pss.collect_timeline =
req.options().trace_level() == RunOptions::FULL_TRACE;
+ pss.collect_rpcs = req.options().trace_level() == RunOptions::FULL_TRACE;
pss.report_tensor_allocations_upon_oom =
req.options().report_tensor_allocations_upon_oom();
TRACEPRINTF("stepid %llu", step_id);
pss.collect_timeline = req.options().trace_level() == RunOptions::FULL_TRACE;
+ pss.collect_rpcs = req.options().trace_level() == RunOptions::FULL_TRACE;
pss.report_tensor_allocations_upon_oom =
req.options().report_tensor_allocations_upon_oom();
// Build the cost model every 'build_cost_model_every' steps after skipping an
});
}
+void GrpcWorker::LoggingAsync(const LoggingRequest* request,
+ LoggingResponse* response, StatusCallback done) {
+ auto env = this->env();
+ if (env) {
+ auto session_mgr = (SessionMgr*)env->session_mgr;
+ if (session_mgr) {
+ session_mgr->SetLogging(request->rpc_logging());
+ for (const auto& step_id : request->fetch_step_id()) {
+ session_mgr->RetrieveLogs(step_id, response);
+ }
+ if (request->clear()) {
+ session_mgr->ClearLogs();
+ }
+ }
+ }
+ done(Status::OK());
+}
+
WorkerEnv* GrpcWorker::env() { return env_; }
std::unique_ptr<GrpcWorker> NewGrpcWorker(WorkerEnv* env) {
::grpc::ByteBuffer* response,
StatusCallback done);
+ virtual void LoggingAsync(const LoggingRequest* request,
+ LoggingResponse* response, StatusCallback done);
+
WorkerEnv* env();
private:
TF_RETURN_IF_ERROR(worker_cache_factory_(server_def, &worker_cache));
}
+ if (worker_cache != nullptr & default_worker_cache_.get() != nullptr) {
+ worker_cache->SetLogging(this->is_logging_active_);
+ }
+
CHECK(!worker_env_->local_devices.empty())
<< "The WorkerEnv must have at least one device in `local_devices`.";
+
std::vector<Device*> renamed_devices;
for (Device* d : worker_env_->local_devices) {
renamed_devices.push_back(RenamedDevice::NewRenamedDevice(
return legacy_session_;
}
+void SessionMgr::SetLogging(bool active) {
+ mutex_lock l(mu_);
+ this->is_logging_active_ = active;
+ // Legacy Session
+ if (legacy_session_) {
+ auto* worker_cache = legacy_session_->worker_cache.get();
+ if (worker_cache) {
+ worker_cache->SetLogging(active);
+ }
+ }
+
+ for (const auto& session_kv : sessions_) {
+ auto session = session_kv.second.get();
+ if (session) {
+ auto* worker_cache = session->worker_cache.get();
+ if (worker_cache) {
+ worker_cache->SetLogging(active);
+ }
+ }
+ }
+}
+
+void SessionMgr::RetrieveLogs(tensorflow::int64 step_id,
+ LoggingResponse* response) {
+ mutex_lock l(mu_);
+ // Legacy Session
+ if (legacy_session_) {
+ auto* worker_cache = legacy_session_->worker_cache.get();
+ if (worker_cache) {
+ auto step_stats = StepStats();
+ if (worker_cache->RetrieveLogs(step_id, &step_stats)) {
+ auto* labeled_step_stats = response->add_step();
+ labeled_step_stats->set_step_id(step_id);
+ labeled_step_stats->mutable_step_stats()->Swap(&step_stats);
+ }
+ }
+ }
+ for (const auto& session_kv : sessions_) {
+ auto session = session_kv.second.get();
+ if (session) {
+ auto* worker_cache = session->worker_cache.get();
+ if (worker_cache) {
+ auto step_stats = StepStats();
+ if (worker_cache->RetrieveLogs(step_id, &step_stats)) {
+ auto* labeled_step_stats = response->add_step();
+ labeled_step_stats->set_step_id(step_id);
+ labeled_step_stats->mutable_step_stats()->Swap(&step_stats);
+ }
+ }
+ }
+ }
+}
+
+void SessionMgr::ClearLogs() {
+ mutex_lock l(mu_);
+ // Legacy Session
+ if (legacy_session_) {
+ auto* worker_cache = legacy_session_->worker_cache.get();
+ if (worker_cache) {
+ worker_cache->ClearLogs();
+ }
+ }
+
+ for (const auto& session_kv : sessions_) {
+ auto session = session_kv.second.get();
+ if (session) {
+ auto* worker_cache = session->worker_cache.get();
+ if (worker_cache) {
+ worker_cache->ClearLogs();
+ }
+ }
+ }
+}
} // namespace tensorflow
#include "tensorflow/core/lib/core/status.h"
#include "tensorflow/core/platform/mutex.h"
#include "tensorflow/core/protobuf/tensorflow_server.pb.h"
+#include "tensorflow/core/protobuf/worker.pb.h"
namespace tensorflow {
static string WorkerNameFromServerDef(const ServerDef& server_def);
+ void SetLogging(bool active);
+
+ void RetrieveLogs(tensorflow::int64 step_id, LoggingResponse* response);
+
+ void ClearLogs();
+
private:
const WorkerEnv* const worker_env_; // Not owned.
std::unique_ptr<WorkerCacheInterface> default_worker_cache_;
std::shared_ptr<WorkerSession> legacy_session_;
+ bool is_logging_active_ = false;
+
const WorkerCacheFactory worker_cache_factory_;
std::shared_ptr<WorkerSession> WorkerSessionForSessionUnlocked(
*/
#if !defined(IS_MOBILE_PLATFORM) || defined(SUPPORT_SELECTIVE_REGISTRATION) || \
- defined(NVIDIA_TEGRA)
+ defined(ANDROID_TEGRA)
// All types are supported, so all macros are invoked.
//
// Special casing UnaryOpFn per op and per device.
UnaryVariantOpRegistry::VariantUnaryOpFn* UnaryVariantOpRegistry::GetUnaryOpFn(
VariantUnaryOp op, StringPiece device, StringPiece type_name) {
- auto found = unary_op_fns.find(std::make_tuple(op, device, type_name));
+ auto found = unary_op_fns.find({op, device, type_name});
if (found == unary_op_fns.end()) return nullptr;
return &found->second;
}
CHECK_EQ(existing, nullptr)
<< "Unary VariantUnaryOpFn for type_name: " << type_name
<< " already registered for device type: " << device;
- unary_op_fns.insert(
- std::pair<std::tuple<VariantUnaryOp, StringPiece, StringPiece>,
- VariantUnaryOpFn>(
- std::make_tuple(op, GetPersistentStringPiece(device),
- GetPersistentStringPiece(type_name)),
- unary_op_fn));
+ unary_op_fns.insert(std::pair<FuncTuple<VariantUnaryOp>, VariantUnaryOpFn>(
+ {op, GetPersistentStringPiece(device),
+ GetPersistentStringPiece(type_name)},
+ unary_op_fn));
}
namespace {
UnaryVariantOpRegistry::VariantBinaryOpFn*
UnaryVariantOpRegistry::GetBinaryOpFn(VariantBinaryOp op, StringPiece device,
StringPiece type_name) {
- auto found = binary_op_fns.find(std::make_tuple(op, device, type_name));
+ auto found = binary_op_fns.find({op, device, type_name});
if (found == binary_op_fns.end()) return nullptr;
return &found->second;
}
CHECK_EQ(existing, nullptr)
<< "Unary VariantBinaryOpFn for type_name: " << type_name
<< " already registered for device type: " << device;
- binary_op_fns.insert(
- std::pair<std::tuple<VariantBinaryOp, StringPiece, StringPiece>,
- VariantBinaryOpFn>(
- std::make_tuple(op, GetPersistentStringPiece(device),
- GetPersistentStringPiece(type_name)),
- add_fn));
+ binary_op_fns.insert(std::pair<FuncTuple<VariantBinaryOp>, VariantBinaryOpFn>(
+ {op, GetPersistentStringPiece(device),
+ GetPersistentStringPiece(type_name)},
+ add_fn));
}
namespace {
device_copy_fns;
// Map std::tuple<Op, device, type_name> to function.
+
+ // this breaks by falling victim to "too perfect forwarding"
+ // see https://stackoverflow.com/questions/44475317/variadic-template-issue
+ // and references therein
+ template <typename Op>
+ struct FuncTuple {
+ FuncTuple(const Op& op, const StringPiece& dev, const StringPiece& tname)
+ : op_type_(op), device_(dev), typename_(tname){};
+ Op op_type_;
+ StringPiece device_, typename_;
+ };
+ // friend declaration for operator==
+ // needed for clang
+ template <typename Op>
+ friend bool operator==(const FuncTuple<Op>& l, const FuncTuple<Op>& r);
struct TupleHash {
template <typename Op>
std::size_t operator()(
ret = Hash64Combine(ret, sp_hasher_(std::get<2>(x)));
return ret;
}
+
+ template <typename Op>
+ std::size_t operator()(const FuncTuple<Op>& x) const {
+ // The hash of an enum is just its value as a std::size_t.
+ std::size_t ret = static_cast<std::size_t>(x.op_type_);
+ ret = Hash64Combine(ret, sp_hasher_(x.device_));
+ ret = Hash64Combine(ret, sp_hasher_(x.typename_));
+ return ret;
+ }
StringPieceHasher sp_hasher_;
};
- std::unordered_map<std::tuple<VariantUnaryOp, StringPiece, StringPiece>,
- VariantUnaryOpFn, TupleHash>
+ std::unordered_map<FuncTuple<VariantUnaryOp>, VariantUnaryOpFn, TupleHash>
unary_op_fns;
- std::unordered_map<std::tuple<VariantBinaryOp, StringPiece, StringPiece>,
- VariantBinaryOpFn, TupleHash>
+ std::unordered_map<FuncTuple<VariantBinaryOp>, VariantBinaryOpFn, TupleHash>
binary_op_fns;
// Find or insert a string into a persistent string storage
- // container; return the StringPiece pointing to the permanent
- // string location.
+ // container; return the StringPiece pointing to the permanent string
+ // location.
static StringPiece GetPersistentStringPiece(const string& str) {
const auto string_storage = PersistentStringStorage();
auto found = string_storage->find(str);
}
}
};
-
+template <typename Op>
+inline bool operator==(const UnaryVariantOpRegistry::FuncTuple<Op>& lhs,
+ const UnaryVariantOpRegistry::FuncTuple<Op>& rhs) {
+ return (lhs.op_type_ == rhs.op_type_) && (lhs.device_ == rhs.device_) &&
+ (lhs.typename_ == rhs.typename_);
+}
// Gets a TensorShape from a Tensor containing a scalar Variant.
// Returns an Internal error if the Variant does not have a registered shape
// function, or if it's a serialized Variant that cannot be decoded.
namespace tensorflow {
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
// This pass implements rewriting of graph to support following scenarios:
// (A) Merging nodes in the graph
return Status::OK();
}
-#else // INTEL_MKL_DNN
+#else // INTEL_MKL_ML
// This pass implements rewriting of graph to support following scenarios:
// (A) Merging nodes in the graph
// NOTE: names are alphabetically sorted.
rinfo_.push_back({csinfo_.addn, mkl_op_registry::GetMklOpName(csinfo_.addn),
CopyAttrsAddN, AddNRewrite});
- /* rinfo_.push_back({csinfo_.add,
- mkl_op_registry::GetMklOpName(csinfo_.add),
- CopyAttrsDataType, AlwaysRewrite}); */
+ rinfo_.push_back({csinfo_.add, mkl_op_registry::GetMklOpName(csinfo_.add),
+ CopyAttrsDataType, AlwaysRewrite});
rinfo_.push_back({csinfo_.avg_pool,
mkl_op_registry::GetMklOpName(csinfo_.avg_pool),
CopyAttrsPooling, AlwaysRewrite});
rinfo_.push_back({csinfo_.max_pool_grad,
mkl_op_registry::GetMklOpName(csinfo_.max_pool_grad),
CopyAttrsPooling, AlwaysRewrite});
- /*
+
rinfo_.push_back({csinfo_.maximum,
mkl_op_registry::GetMklOpName(csinfo_.maximum),
CopyAttrsDataType, AlwaysRewrite});
rinfo_.push_back({csinfo_.mul,
mkl_op_registry::GetMklOpName(csinfo_.mul),
CopyAttrsDataType, AlwaysRewrite});
- */
rinfo_.push_back({csinfo_.relu, mkl_op_registry::GetMklOpName(csinfo_.relu),
CopyAttrsDataType, AlwaysRewrite});
rinfo_.push_back({csinfo_.relu_grad,
rinfo_.push_back({csinfo_.softmax,
mkl_op_registry::GetMklOpName(csinfo_.softmax),
CopyAttrsDataType, AlwaysRewrite});
- /*
+
rinfo_.push_back({csinfo_.squared_difference,
mkl_op_registry::GetMklOpName(csinfo_.squared_difference),
CopyAttrsDataType, AlwaysRewrite});
rinfo_.push_back({csinfo_.sub,
mkl_op_registry::GetMklOpName(csinfo_.sub),
CopyAttrsDataType, AlwaysRewrite});
- */
// Add info about which ops to add workspace edge to and the slots.
wsinfo_.push_back({csinfo_.lrn, csinfo_.lrn_grad, 0, 2, 1, 3});
return Status::OK();
}
-#endif // INTEL_MKL_DNN
+#endif // INTEL_MKL_ML
} // namespace tensorflow
#endif
namespace tensorflow {
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
namespace {
} // namespace
-#else // INTEL_MKL_DNN
+#else // INTEL_MKL_ML
namespace {
} // namespace
-#endif // INTEL_MKL_DNN
+#endif // INTEL_MKL_ML
} // namespace tensorflow
return Binary(g, "ReverseV2", tensor, axis);
}
+Node* Roll(Graph* g, Node* input, Node* shift, Node* axis) {
+ Node* ret;
+ TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Roll", g->op_registry())
+ .Input(input)
+ .Input(shift)
+ .Input(axis)
+ .Finalize(g, &ret));
+ return ret;
+}
+
Node* Error(Graph* g, Node* input, const string& errmsg) {
Node* ret;
TF_CHECK_OK(NodeBuilder(g->NewName("n"), "Error")
// Output dtype determined by lam.
Node* RandomPoisson(Graph* g, Node* shape, Node* lam);
+// Rolls tensor by an offset of <shift> along the corresponding
+// <axis> dimensions.
+Node* Roll(Graph* g, Node* input, Node* shift, Node* axis);
+
// Generates random parameters from the truncated standard normal distribution
// of the nput shape
Node* TruncatedNormal(Graph* g, Node* input, DataType dtype);
":transpose_op",
":unique_op",
":unpack_op",
+ ":unravel_index_op",
":where_op",
],
)
deps = ARRAY_DEPS + [":split_lib"],
)
+tf_kernel_library(
+ name = "unravel_index_op",
+ prefix = "unravel_index_op",
+ deps = ARRAY_DEPS,
+)
+
tf_kernel_library(
name = "where_op",
srcs = ["where_op.cc"],
],
)
+cc_library(
+ name = "manip",
+ deps = [
+ ":roll_op",
+ ],
+)
+
+MANIP_DEPS = [
+ "//tensorflow/core:framework",
+ "//tensorflow/core:lib",
+ "//tensorflow/core:manip_ops_op_lib",
+ "//third_party/eigen3",
+]
+
+tf_kernel_library(
+ name = "roll_op",
+ prefix = "roll_op",
+ deps = MANIP_DEPS,
+)
+
+tf_cc_test(
+ name = "roll_op_test",
+ size = "small",
+ srcs = ["roll_op_test.cc"],
+ deps = [
+ ":ops_testutil",
+ ":ops_util",
+ ":roll_op",
+ "//tensorflow/core:core_cpu",
+ "//tensorflow/core:core_cpu_internal",
+ "//tensorflow/core:framework",
+ "//tensorflow/core:lib",
+ "//tensorflow/core:protos_all_cc",
+ "//tensorflow/core:test",
+ "//tensorflow/core:test_main",
+ "//tensorflow/core:testlib",
+ ],
+)
+
MATH_DEPS = [
":bounds_check",
":fill_functor",
typename TTypes<bool>::ConstMatrix input,
typename TTypes<uint8>::Matrix output, bool /*thresh*/, int64 start,
int64 limit) {
- // NOTE(ebrevdo): This assumes memory is little-endian.
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+ for (int64 i = start; i < limit; ++i) {
+ uint8* out = output.data() + i;
+ const int64 block = *reinterpret_cast<const int64*>(input.data() + 8 * i);
+ *out = ((((block & (1LL << (7 * 8))) >> (7 * 8 - 7))) |
+ (((block & (1LL << (6 * 8))) >> (6 * 8 - 6))) |
+ (((block & (1LL << (5 * 8))) >> (5 * 8 - 5))) |
+ (((block & (1LL << (4 * 8))) >> (4 * 8 - 4))) |
+ (((block & (1LL << (3 * 8))) >> (3 * 8 - 3))) |
+ (((block & (1LL << (2 * 8))) >> (2 * 8 - 2))) |
+ (((block & (1LL << 8)) >> (1 * 8 - 1))) | (((block & (1LL)))));
+ }
+#else
for (int64 i = start; i < limit; ++i) {
uint8* out = output.data() + i;
const int64 block = *reinterpret_cast<const int64*>(input.data() + 8 * i);
(((block & (1LL << (2 * 8))) >> (2 * 8 - 5))) |
(((block & (1LL << 8)) >> (1 * 8 - 6))) | (((block & (1LL)) << 7)));
}
+#endif
}
};
errors::InvalidArgument("channels must be 0, 1, 3 or 4, got ",
channels_));
}
+ inline int32 ByteSwapInt32ForBigEndian(int32 x) {
+#if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return le32toh(x);
+#else
+ return x;
+#endif
+ }
void Compute(OpKernelContext* context) override {
const Tensor& contents = context->input(0);
input.size(), " bytes"));
const uint8* img_bytes = reinterpret_cast<const uint8*>(input.data());
- const int32 header_size = internal::SubtleMustCopy(
+ int32 header_size_ = internal::SubtleMustCopy(
*(reinterpret_cast<const int32*>(img_bytes + 10)));
- const int32 width = internal::SubtleMustCopy(
+ const int32 header_size = ByteSwapInt32ForBigEndian(header_size_);
+ int32 width_ = internal::SubtleMustCopy(
*(reinterpret_cast<const int32*>(img_bytes + 18)));
- const int32 height = internal::SubtleMustCopy(
+ const int32 width = ByteSwapInt32ForBigEndian(width_);
+ int32 height_ = internal::SubtleMustCopy(
*(reinterpret_cast<const int32*>(img_bytes + 22)));
- const int32 bpp = internal::SubtleMustCopy(
+ const int32 height = ByteSwapInt32ForBigEndian(height_);
+ int32 bpp_ = internal::SubtleMustCopy(
*(reinterpret_cast<const int32*>(img_bytes + 28)));
+ const int32 bpp = ByteSwapInt32ForBigEndian(bpp_);
if (channels_) {
OP_REQUIRES(context, (channels_ == bpp / 8),
// * sum(generated_diff_pooling_sequence) = input_length
// * Let's define floor(input_length / output_length) = K, then
// K <= generated_diff_pooling_sequence[i] <= K+1
-// For example, when input_length = 10, output_length = 6, the followings are
+// For example, when input_length = 10, output_length = 6, the following are
// valid pooling sequence:
// * [1, 2, 2, 1, 2, 2]
// * [1, 1, 2, 2, 2, 2]
#include "mkl_dnn_types.h"
#include "tensorflow/core/util/mkl_util.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "mkldnn.hpp"
using mkldnn::stream;
using mkldnn::sum;
namespace tensorflow {
typedef Eigen::ThreadPoolDevice CPUDevice;
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
template <typename Device, typename T>
class MklAddNOp : public OpKernel {
} MklAddNOpContext;
};
-#else // INTEL_MKL_DNN
+#else // INTEL_MKL_ML
template <typename Device, typename T>
class MklAddNOp : public OpKernel {
public:
: src2_tensor.dims();
// if the shapes of two tensors are not same raise op error
TensorShape src1_shape, src2_shape;
- src1_shape = src1_tensor.shape();
- src2_shape = src2_tensor.shape();
+ src1_shape = input1_in_mkl_format ? src1_mkl_shape.GetTfShape()
+ : src1_tensor.shape();
+ src2_shape = input2_in_mkl_format ? src2_mkl_shape.GetTfShape()
+ : src2_tensor.shape();
+
if (!src1_shape.IsSameSize(src2_shape)) {
ctx->SetStatus(errors::InvalidArgument(
"Inputs to operation ", this->name(), " of type ",
#include "tensorflow/core/kernels/mkl_pooling_ops_common.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "mkldnn.hpp"
using mkldnn::algorithm;
using mkldnn::engine;
typedef Eigen::ThreadPoolDevice CPUDevice;
-// For now, MKL-ML is default. So making MKL-DNN not a default choice.
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
template <typename Device, typename T>
class MklAvgPoolingOp : public OpKernel {
TensorFormat data_format_;
}; // MklAvgPoolingGradOp
-#else // INTEL_MKL_DNN is defined
+#else
template <typename Device, typename T>
class MklAvgPoolingOp : public MklPoolingForwardOpBase<T> {
memory::dims output_dims_mkl_order;
this->GetOutputDims(pool_params, &output_dims_mkl_order);
+ // If input is an empty tensor, allocate an empty output tensor and return
+ if (input_tensor.NumElements() == 0) {
+ MklDnnShape output_mkl_shape;
+ output_mkl_shape.SetMklTensor(false);
+ TensorShape output_tf_shape;
+ if (pool_params.data_format == TensorFormat::FORMAT_NCHW) {
+ output_tf_shape = MklDnnDimsToTFShape(output_dims_mkl_order);
+ } else {
+ memory::dims output_dims_NHWC_order;
+ output_dims_NHWC_order = {pool_params.tensor_in_batch,
+ static_cast<int>(pool_params.out_height),
+ static_cast<int>(pool_params.out_width),
+ pool_params.out_depth};
+ output_tf_shape = MklDnnDimsToTFShape(output_dims_NHWC_order);
+ }
+ const int kOutputIndex = 0;
+ AllocateOutputSetMklShape(context, kOutputIndex, &output_tensor,
+ output_tf_shape, output_mkl_shape);
+ CHECK_NOTNULL(output_tensor);
+ return;
+ }
+
// If input is in Mkl layout, then just get the memory format from it
// directly, instead of using input data_format to AvgPool.
if (dnn_shape_input.IsMklTensor()) {
}
}; // MklAvgPoolingGradOp
-#endif // INTEL_MKL_DNN
+#endif // INTEL_MKL_ML
REGISTER_KERNEL_BUILDER(Name("_MklAvgPool")
.Device(DEVICE_CPU)
#include "mkl_dnn_types.h"
#include "tensorflow/core/util/mkl_util.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "mkldnn.hpp"
using mkldnn::concat;
// we need to have empty Compute because Compute is pure virtual function.
void Compute(OpKernelContext* c) {}
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
void Compute(OpKernelContext* c, const std::vector<Tensor>& values) {
const Tensor* concat_dim_tensor;
#endif
};
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
// --------------------------------------------------------------------------
// Mkl Concat Op
#include "mkl_dnn_types.h"
#include "tensorflow/core/util/mkl_util.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "mkldnn.hpp"
using mkldnn::convolution_backward_weights;
typedef Eigen::ThreadPoolDevice CPUDevice;
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
template <typename Device, class T>
class MklConv2DCustomBackpropFilterOp : public OpKernel {
TF_CALL_float(REGISTER_MKL_FILTER_KERNELS);
#undef REGISTER_MKL_FILTER_KERNELS
-#endif // INTEL_MKL_DNN
+#endif // INTEL_MKL_ML
} // namespace tensorflow
#include "tensorflow/core/util/use_cudnn.h"
#include "tensorflow/core/util/work_sharder.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "mkldnn.hpp"
using mkldnn::convolution_backward_data;
typedef Eigen::ThreadPoolDevice CPUDevice;
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
template <typename Device, class T>
class MklConv2DCustomBackpropInputOp : public OpKernel {
}
};
-#endif // INTEL_MKL_DNN
+#endif // INTEL_MKL_ML
#define REGISTER_MKL_CPU_KERNELS(T) \
REGISTER_KERNEL_BUILDER(Name("_MklConv2DBackpropInput") \
#include "tensorflow/core/util/mkl_util.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
+
#include "mkldnn.hpp"
using mkldnn::prop_kind;
typedef Eigen::ThreadPoolDevice CPUDevice;
-// For now, MKL-ML is default. So making MKL-DNN not a default choice.
-#ifndef INTEL_MKL_DNN
+// MKL-DNN is now default. MKL-ML must be specified explicitly.
+#ifdef INTEL_MKL_ML
template <typename Device, typename T, bool biasEnabled>
class MklConv2DOp : public OpKernel {
#include "tensorflow/core/util/mkl_util.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "mkldnn.hpp"
using mkldnn::prop_kind;
namespace tensorflow {
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
class MklDnnConvUtil {
protected:
Padding padding_;
TensorFormat data_format_;
};
-#endif // INTEL_MKL_DNN
+#endif // INTEL_MKL_ML
/////////////////////////////////////////////////////////////////////
/// Dummy Mkl op that is just used for operators that are intermediate
-/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
+/* Copyright 2015 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0(the "License");
you may not use this file except in compliance with the License.
#include "mkl_dnn_types.h"
#include "tensorflow/core/util/mkl_util.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "mkldnn.hpp"
using mkldnn::batch_normalization_backward;
namespace tensorflow {
using CPUDevice = Eigen::ThreadPoolDevice;
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
template <typename Device, typename T>
class MklFusedBatchNormOp : public OpKernel {
};
#endif
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
template <typename Device, typename T>
class MklFusedBatchNormOp : public OpKernel {
#include "mkl_dnn_types.h"
#include "tensorflow/core/util/mkl_util.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "mkldnn.hpp"
#endif
namespace tensorflow {
typedef Eigen::ThreadPoolDevice CPUDevice;
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
template <typename Device, typename T>
class MklIdentityOp : public OpKernel {
#include "tensorflow/core/kernels/mkl_tfconv_op.h"
#include "tensorflow/core/util/mkl_util.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "mkldnn.hpp"
using mkldnn::stream;
// convert the TF format input to MKL format
///////////////////////////////////////////////////////////
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
template <typename Device, typename T>
class MklInputConversionOp : public OpKernel {
public:
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
// If both inputs are in MKL format
if (input_shape_0.IsMklTensor() && input_shape_1.IsMklTensor()) {
- // If both have the same shape, pass them through
if (tf_shapes_are_same) {
- VLOG(1) << "MklInputConversionOp: No conversion needed, "
- << "copying MKL inputs with identical shapes to output";
-
- ForwardMklTensorInToOut(context, 0, 0);
- ForwardMklTensorInToOut(context, 1, 1);
- return;
+ auto input0_md = input_shape_0.GetMklLayout();
+ auto input1_md = input_shape_1.GetMklLayout();
+
+ // If both have the same shape and same format, pass them through
+ if (input0_md.data.format == input1_md.data.format) {
+ VLOG(1) << "MklInputConversionOp: No conversion needed, "
+ << "copying MKL inputs with identical shapes to output";
+
+ ForwardMklTensorInToOut(context, 0, 0);
+ ForwardMklTensorInToOut(context, 1, 1);
+ return;
+ } else {
+ VLOG(1) << "MklInputConversionOp: Shape is same, but format is "
+ "different, "
+ << "need to convert to same format";
+
+ // Convert input0, and keep input1 unchanged
+ // Create MklDnnShape for output mkl tensor based on input0
+ Tensor* tensor_out;
+ MklDnnShape mkl_output_mkl_shape;
+ mkl_output_mkl_shape.SetMklTensor(true);
+ mkl_output_mkl_shape.SetElemType(MklDnnType<T>());
+ mkl_output_mkl_shape.SetTfLayout(input_shape_0.GetDimension(),
+ input_shape_0.GetSizesAsMklDnnDims(),
+ input_shape_0.GetTfDataFormat());
+
+ // Get MKL layout from input1 as destination layout
+ mkl_output_mkl_shape.SetMklLayout(&input1_md);
+
+ // Create output Mkl tensor for index 0
+ AllocateOutputSetMklShape(context, 0, &tensor_out,
+ input_tensor_0.shape(),
+ mkl_output_mkl_shape);
+
+ // Create MklDnnData object for input0 tesnsor
+ auto cpu_engine = engine(engine::cpu, 0);
+ MklDnnData<T> input(&cpu_engine);
+ input.SetUsrMem(input0_md, &input_tensor_0);
+
+ // Create reorder from input0's layout to input1's layout
+ std::vector<primitive> net;
+ CHECK_EQ(input.CheckReorderToOpMem(
+ memory::primitive_desc(input1_md, cpu_engine),
+ tensor_out, &net),
+ true);
+ stream(stream::kind::eager).submit(net).wait();
+
+ // Input1 will be passed through
+ ForwardMklTensorInToOut(context, 1, 1);
+ return;
+ }
}
// Sanity check
#include "tensorflow/core/util/work_sharder.h"
#endif
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "mkldnn.hpp"
using mkldnn::lrn_across_channels;
using mkldnn::lrn_backward;
} // namespace
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
template <typename T>
class MklLRNOp : public OpKernel {
float beta_;
};
-#endif // INTEL_MKL_DNN
+#endif // INTEL_MKL_ML
#define REGISTER_MKL_LRN_CPU(T) \
REGISTER_KERNEL_BUILDER(Name("_MklLRN") \
#include "tensorflow/core/util/mkl_util.h"
#include "tensorflow/core/util/padding.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include <algorithm>
#include "mkldnn.hpp"
using mkldnn::algorithm;
typedef Eigen::ThreadPoolDevice CPUDevice;
-// For now, MKL-ML is default. So making MKL-DNN not a default choice.
-#ifndef INTEL_MKL_DNN
+// MKL-DNN is now default. MKL-ML must be specified explicitly.
+#ifdef INTEL_MKL_ML
// An implementation of MaxPooling (forward).
template <typename Device, typename T>
bool workspace_enabled_;
}; // MklMaxPoolingGradOp
-#else // INTEL_MKL_DNN is defined
+#else
// An implementation of MaxPooling (forward).
template <typename Device, typename T>
}
}; // MklMaxPoolingGradOp
-#endif // INTEL_MKL_DNN
+#endif // INTEL_MKL_ML
REGISTER_KERNEL_BUILDER(Name("_MklMaxPool")
.Device(DEVICE_CPU)
Init(context, ksize, stride, padding, data_format);
}
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
// Initialization for MKL format
void MklPoolParameters::Init(OpKernelContext* context,
const std::vector<int32>& ksize,
Init(context, ksize, stride, padding, data_format);
}
-#endif // INTEL_MKL_DNN
+#endif // INTEL_MKL_ML
// Common Initialization for TensorFlow and MKL formats
void MklPoolParameters::Init(OpKernelContext* context,
const std::vector<int32>& ksize,
OP_REQUIRES_OK(context, GetWindowedOutputSizeVerbose(
tensor_in_cols, window_cols, col_stride,
padding, &out_width, &pad_left, &pad_right));
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
// TF can work with int64, but mkldnn only supports int32
// Fail if the height or width are greater than MAX_INT
#include "tensorflow/core/util/mkl_util.h"
#include "tensorflow/core/util/padding.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "mkldnn.hpp"
using mkldnn::memory;
using mkldnn::pooling_backward;
void Init(OpKernelContext* context, const std::vector<int32>& ksize,
const std::vector<int32>& stride, Padding padding,
TensorFormat data_format, const TensorShape& tensor_in_shape);
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
void Init(OpKernelContext* context, const std::vector<int32>& ksize,
const std::vector<int32>& stride, Padding padding,
TensorFormat data_format, const MklShape* mkl_in_shape);
TensorFormat data_format);
};
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
template <class T>
class MklPoolingOpBase : public OpKernel {
return grad_reorder_needed ? target_diff_dst_md : original_input_grad_md;
}
};
-#endif // INTEL_MKL_DNN
+#endif // INTEL_MKL_ML
//-------------------------------------------------------------------
// Utility functions
#include "tensorflow/core/platform/default/logging.h"
#include "tensorflow/core/util/mkl_util.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "mkldnn.hpp"
using mkldnn::algorithm;
}
};
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
template <typename Device, typename T>
class MklReluOp : public OpKernel {
mkl_context.MklCleanup();
}
-#else // INTEL_MKL_DNN
+#else // INTEL_MKL_ML
template <typename Device, typename T, algorithm alg_kind>
class MklReluOpBase : public OpKernel {
MklReluGradOp<CPUDevice, type>);
TF_CALL_float(REGISTER_RELU_MKL_SUPPORTED_KERNELS_TYPES);
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
// register dnn kernels for supported operations and supported types
#define REGISTER_ELU_MKL_SUPPORTED_KERNELS_TYPES(type) \
#include "mkl_dnn_types.h"
#include "tensorflow/core/util/mkl_util.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "mkldnn.hpp"
using mkldnn::stream;
#endif
public:
explicit MklReshapeOp(OpKernelConstruction* context) : OpKernel(context) {}
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
void Compute(OpKernelContext* context) override {
const Tensor& input = MklGetInput(context, 0);
const Tensor& sizes = MklGetInput(context, 1);
}
}
-#endif // INTEL_MKL_DNN
+#endif // INTEL_MKL_ML
private:
const int kInputSlotIdx = 0;
// See docs in ../ops/nn_ops.cc.
#ifdef INTEL_MKL
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
#include "tensorflow/core/framework/numeric_op.h"
} // namespace tensorflow
-#endif // INTEL_MKL_DNN
+#endif // INTEL_MKL_ML
#endif // INTEL_MKL
#include "mkl_dnn_types.h"
#include "tensorflow/core/util/mkl_util.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
using mkldnn::stream;
#endif
VLOG(1) << "MKLToTFConversion complete successfully.";
}
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
static void ConvertMklToTf(OpKernel* op_kernel, OpKernelContext* context,
string data_format_str, DataType op_data_type,
bool has_avx512f, uint input_number) {
--- /dev/null
+/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#include "tensorflow/core/framework/common_shape_fns.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/register_types_traits.h"
+#include "tensorflow/core/framework/shape_inference.h"
+#include "tensorflow/core/lib/gtl/array_slice.h"
+#include "tensorflow/core/platform/types.h"
+#include "tensorflow/core/util/work_sharder.h"
+
+namespace tensorflow {
+
+#define EIGEN_USE_THREADS
+using CPUDevice = Eigen::ThreadPoolDevice;
+
+// dim_size - the size of each dimension
+// dim_range - the number of indices over in the flattened tensor
+// you need to skip in order to make it over from one side of a dimension
+// to the other. Used to make the shifts wrap around after a threshold.
+// threshold - the index for each dimension that the roll starts to wrap
+// back to the front
+template <typename T>
+void DoRoll(OpKernelContext* context, const int64 num_elements,
+ const int num_dims, const gtl::ArraySlice<int>& dim_size,
+ const T* input, T* output, const gtl::ArraySlice<int>& threshold,
+ const gtl::ArraySlice<int64>& dim_range) {
+ auto work = [input, output, num_dims, &dim_size, &threshold, &dim_range](
+ int64 start, int64 end) {
+ // array of indices for each dimension
+ gtl::InlinedVector<int, 4> indices(num_dims);
+ int offset = 0; // the shift along the flattened tensor for current element
+ // initialize indices and offset
+ for (int i = 0; i < num_dims; i++) {
+ // stride is the number of indices over in the flattened tensor
+ // you need to skip in order to make it over to an adjacent element
+ // along a dimension. dim_size[i] != 0 because we set it to max(dim, 1)
+ const int64 stride = dim_range[i] / dim_size[i];
+ const int shift = dim_size[i] - threshold[i];
+ const int indx = (start / stride) % dim_size[i];
+ indices[i] = indx;
+ // calculate dimension index after the shift
+ const int shifted_indx = (indx + shift) % dim_size[i];
+ offset += (shifted_indx - indx) * stride;
+ }
+
+ for (int64 i = start; i < end; i++) {
+ output[i + offset] = input[i];
+ // create next combination of indices
+ // while at it adjust offset if needed
+ for (int j = num_dims - 1; j >= 0; j--) {
+ const int indx = (indices[j] + 1) % dim_size[j];
+ indices[j] = indx;
+ if (indx != 0) {
+ if (indx == threshold[j]) { // we've reached the threshold
+ // dim_range[j] = threshold[j] + shift[j]
+ // offset = shift[j] + ... other offsets
+ // offset - dim_range[j] = -threshold[j] + ... other offsets
+ // thus we undo our previous offset as well as add a new offset of
+ // -threshold[j] in one operation
+ offset -= dim_range[j]; // now wraps around
+ }
+ break; // indx != 0 don't need to carry
+ } else if (threshold[j] != 0) { // if threshold is 0 shift is 0
+ offset += dim_range[j]; // indx became 0 so reverse wrap around
+ }
+ }
+ }
+ };
+ // Shard
+ auto worker_threads = context->device()->tensorflow_cpu_worker_threads();
+ // 15 - expiramentally determined with float and bool types
+ const int cost_per_element = 15 * sizeof(T); // rough esitmate
+ Shard(worker_threads->num_threads, worker_threads->workers, num_elements,
+ cost_per_element, std::move(work));
+}
+
+// dim_size - the size of each dimension
+// dim_range - the number of indices over in the flattened tensor
+// you need to skip in order to make it over from one side of a dimension
+// to the other. Used to make the shifts wrap around after a threshold.
+// threshold - the index for each dimension that the roll starts to wrap
+// back to the front
+// isd - inner shift dimension
+template <typename T>
+// Use memcpy to copy memory in groups when the data type supports memcpy
+void DoRollWithMemcpy(OpKernelContext* context, const int64 num_elements,
+ const int num_dims, const gtl::ArraySlice<int>& dim_size,
+ const T* input, T* output,
+ const gtl::ArraySlice<int>& threshold,
+ const gtl::ArraySlice<int64>& dim_range,
+ const int64 isd) {
+ auto work = [input, output, num_dims, &dim_size, &threshold, &dim_range, isd](
+ int64 start, int64 end) {
+ // the number of indices over in the flattened tensor you need to skip in
+ // order to make it over from one side of the isd to the other
+ const int64 isd_range = std::max<int>(dim_range[isd], 1);
+ // the distance along the flattend tensor to the next element in the isd
+ const int64 isd_stride = isd_range / std::max<int>(dim_size[isd], 1);
+
+ // start and end represent the i-th group currently so we will convert
+ // them into numbers representing the i-th elements.
+ // there are 2 groups per isd one for all elements before threshold[isd]
+ // and another for all elements after threshold[isd].
+ const int64 start_remainder = (start % 2) * threshold[isd] * isd_stride;
+ const int64 end_remainder = (end % 2) * threshold[isd] * isd_stride;
+ start = (start / 2) * isd_range + start_remainder;
+ end = (end / 2) * isd_range + end_remainder;
+
+ const T* in_ptr = &input[0];
+ T* out_ptr = &output[0];
+ in_ptr += start;
+ out_ptr += start;
+
+ // array of indices for each dimension
+ // indicies = [i, j, k, l, m, n]
+ gtl::InlinedVector<int, 4> indicies(num_dims);
+ // the offset needed to make all inner non-shifting dimensions become 0
+ int64 remainder_offset = 0;
+ // initialize indicies
+ for (int i = 0; i < num_dims; i++) {
+ // stride is the number of indices over in the flattened tensor
+ // you need to skip in order to make it over to an adjacent element
+ // along a dimension. dim_size[i] != 0 because we set it to max(dim, 1)
+ const int64 stride = dim_range[i] / dim_size[i];
+ const int shift = dim_size[i] - threshold[i];
+ const int indx = (start / stride) % dim_size[i];
+ indicies[i] = indx;
+ // calculate dimension index after the shift
+ int out_indx = (indx + shift) % dim_size[i];
+ if (i > isd) {
+ // trailing zeroes for indices after the inner shifted dimension
+ out_indx = 0;
+ remainder_offset += (out_indx - indx) * stride;
+ }
+ out_ptr += (out_indx - indx) * stride;
+ }
+ // set trailing zeroes for indices after the inner shifted dimension
+ for (int i = num_dims - 1; i > isd; i--) indicies[i] = 0;
+
+ // the number of indices in the isd dimension the next group will skip
+ // to make it to the next threshold or end point
+ int isd_indx_skip = 0;
+ // the size of the next group
+ int64 group_size = 0;
+ // initialize isd_indx_skip and group_size
+ if (indicies[isd] < threshold[isd]) {
+ isd_indx_skip = threshold[isd] - indicies[isd];
+ group_size = isd_indx_skip * isd_stride + remainder_offset;
+ } else {
+ isd_indx_skip = dim_size[isd] - indicies[isd];
+ group_size = isd_indx_skip * isd_stride + remainder_offset;
+ }
+
+ int64 i = start;
+ while (i < end) {
+ // copy group of elements
+ memcpy(out_ptr, in_ptr, group_size * sizeof(T));
+
+ // shift i and the pointers over to the next group position
+ i += group_size;
+ out_ptr += group_size;
+ in_ptr += group_size;
+
+ // produce next combination of indices and adjust the out_ptr position
+ // to fix the offset if necessary
+ // the isd (inner shift dim) should skip to next threshold or endpoint
+ // all dimensions to the left increment by 1 when a digit is carried
+ // all dimensions to the right remain set to 0
+ // +1 +1 +1 +isd_indx_skip
+ // indicies = [i, j, k, l, 0, 0]
+ // ^isd
+ for (int j = isd; j >= 0; j--) {
+ int inc = 1;
+ if (j == isd) inc = isd_indx_skip;
+ const int indx = (indicies[j] + inc) % dim_size[j];
+ indicies[j] = indx;
+ if (indx != 0) {
+ if (indx == threshold[j]) {
+ out_ptr -= dim_range[j]; // now wraps around
+ }
+ break; // indx != 0 don't need to carry
+ } else if (threshold[j] != 0) { // if threshold is 0 shift is 0
+ out_ptr += dim_range[j]; // indx became 0 so reverse wrap around
+ }
+ }
+
+ // set isd_indx_skip and group_size for next iteration
+ if (indicies[isd] < threshold[isd]) {
+ isd_indx_skip = threshold[isd] - indicies[isd];
+ group_size = isd_indx_skip * isd_stride;
+ } else {
+ isd_indx_skip = dim_size[isd] - indicies[isd];
+ group_size = isd_indx_skip * isd_stride;
+ }
+ }
+ };
+ // Shard
+ auto worker_threads = context->device()->tensorflow_cpu_worker_threads();
+ const int64 ave_group_size = dim_range[isd] / 2;
+ const int total_work = 2 * num_elements / std::max<int>(dim_range[isd], 1);
+ // 25000 - expiramentally determined with float and bool types
+ const int cost_per_group = 25000 * sizeof(T) * ave_group_size;
+ Shard(worker_threads->num_threads, worker_threads->workers, total_work,
+ cost_per_group, std::move(work));
+}
+
+template <typename Device, typename T, typename Tshift, typename Taxis>
+class RollOp : public OpKernel {
+ public:
+ explicit RollOp(OpKernelConstruction* context) : OpKernel(context) {}
+
+ void Compute(OpKernelContext* context) override {
+ // Grab the input tensor
+ const Tensor& input = context->input(0);
+ const Tensor& shift = context->input(1);
+ const Tensor& axis = context->input(2);
+
+ auto shift_flat = shift.flat<Tshift>();
+ auto axis_flat = axis.flat<Taxis>();
+
+ OP_REQUIRES(context, TensorShapeUtils::IsVectorOrHigher(input.shape()),
+ errors::InvalidArgument("input must be 1-D or higher"));
+ OP_REQUIRES(context, shift.shape().dims() <= 1,
+ errors::InvalidArgument(
+ "shift must be a scalar or a 1-D vector. Found: ",
+ shift.shape().DebugString()));
+ OP_REQUIRES(context, axis.shape().dims() <= 1,
+ errors::InvalidArgument(
+ "axis must be a scalar or a 1-D vector. Found: ",
+ axis.shape().DebugString()));
+ OP_REQUIRES(
+ context, shift.shape() == axis.shape(),
+ errors::InvalidArgument("shift and axis must have the same size"));
+ const int64 num_elements = input.NumElements();
+ const int num_shifts = static_cast<int>(shift_flat.size());
+ const int num_dims = input.dims();
+
+ // if there are any duplicate axes, shift_mod_sum will have the
+ // total modulo sum of shifts for each dimension
+ gtl::InlinedVector<int, 4> shift_mod_sum(num_dims, 0);
+ for (int i = 0; i < num_shifts; i++) {
+ const int axis = axis_flat(i);
+ OP_REQUIRES(context, axis < num_dims,
+ errors::InvalidArgument("axis ", axis, " is out of range"));
+ const int ds = std::max<int>(static_cast<int>(input.dim_size(axis)), 1);
+ const int sum = shift_mod_sum[axis] + static_cast<int>(shift_flat(i));
+ // modulo that works with negatives: ((x % y) + y) % y
+ shift_mod_sum[axis] = (sum % ds + ds) % ds;
+ }
+ // the size of each dimension
+ gtl::InlinedVector<int, 4> dim_size(num_dims);
+ // threshold[i] is the index that the roll starts to wrap back to the front
+ gtl::InlinedVector<int, 4> threshold(num_dims);
+ // dim_range is the number of indices over in the flattened tensor
+ // you need to skip in order to make it over from one side of a dimension
+ // to the other. Used to make the shifts wrap around after a threshold.
+ gtl::InlinedVector<int64, 4> dim_range(num_dims);
+ int64 dim_size_prod = 1; // dimension size product
+ // inner shift dimension (inner most shifted dimension)
+ int64 isd = 0;
+ for (int i = num_dims - 1; i >= 0; i--) {
+ if (isd == 0 && shift_mod_sum[i] != 0) isd = i;
+ const int ds = std::max<int>(static_cast<int>(input.dim_size(i)), 1);
+ dim_size[i] = ds;
+ threshold[i] = (ds - shift_mod_sum[i]) % ds;
+ dim_size_prod *= static_cast<int64>(input.dim_size(i));
+ dim_range[i] = dim_size_prod;
+ }
+
+ Tensor* output = NULL;
+ OP_REQUIRES_OK(context,
+ context->allocate_output(0, input.shape(), &output));
+ auto input_flat = input.flat<T>().data();
+ auto output_flat = output->flat<T>().data();
+
+ if (std::is_same<Device, CPUDevice>::value) {
+ if (DataTypeCanUseMemcpy(DataTypeToEnum<T>::v())) {
+ // V2 copies memory in groups instead of element by element
+ DoRollWithMemcpy<T>(context, num_elements, num_dims, dim_size,
+ input_flat, output_flat, threshold, dim_range, isd);
+ } else {
+ // incase memcpy does not work for current data type
+ DoRoll<T>(context, num_elements, num_dims, dim_size, input_flat,
+ output_flat, threshold, dim_range);
+ }
+ }
+ }
+};
+
+// Register the CPU kernels.
+#define REGISTER_CPU(type) \
+ REGISTER_KERNEL_BUILDER(Name("Roll") \
+ .Device(DEVICE_CPU) \
+ .TypeConstraint<type>("T") \
+ .TypeConstraint<int32>("Tshift") \
+ .TypeConstraint<int32>("Taxis"), \
+ RollOp<CPUDevice, type, int32, int32>) \
+ REGISTER_KERNEL_BUILDER(Name("Roll") \
+ .Device(DEVICE_CPU) \
+ .TypeConstraint<type>("T") \
+ .TypeConstraint<int64>("Tshift") \
+ .TypeConstraint<int32>("Taxis"), \
+ RollOp<CPUDevice, type, int64, int32>) \
+ REGISTER_KERNEL_BUILDER(Name("Roll") \
+ .Device(DEVICE_CPU) \
+ .TypeConstraint<type>("T") \
+ .TypeConstraint<int32>("Tshift") \
+ .TypeConstraint<int64>("Taxis"), \
+ RollOp<CPUDevice, type, int32, int64>) \
+ REGISTER_KERNEL_BUILDER(Name("Roll") \
+ .Device(DEVICE_CPU) \
+ .TypeConstraint<type>("T") \
+ .TypeConstraint<int64>("Tshift") \
+ .TypeConstraint<int64>("Taxis"), \
+ RollOp<CPUDevice, type, int64, int64>)
+
+TF_CALL_ALL_TYPES(REGISTER_CPU);
+#undef REGISTER_CPU
+} // namespace tensorflow
--- /dev/null
+/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#include <functional>
+#include <memory>
+
+#include "tensorflow/core/common_runtime/device.h"
+#include "tensorflow/core/common_runtime/device_factory.h"
+#include "tensorflow/core/common_runtime/kernel_benchmark_testlib.h"
+#include "tensorflow/core/framework/allocator.h"
+#include "tensorflow/core/framework/fake_input.h"
+#include "tensorflow/core/framework/node_def_builder.h"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/tensor.h"
+#include "tensorflow/core/framework/types.h"
+#include "tensorflow/core/framework/types.pb.h"
+#include "tensorflow/core/kernels/ops_testutil.h"
+#include "tensorflow/core/kernels/ops_util.h"
+#include "tensorflow/core/lib/io/path.h"
+#include "tensorflow/core/lib/strings/strcat.h"
+#include "tensorflow/core/platform/test.h"
+#include "tensorflow/core/platform/test_benchmark.h"
+
+namespace tensorflow {
+namespace {
+
+class RollOpTest : public OpsTestBase {
+ protected:
+ void MakeOp(DataType data_type, DataType index_type) {
+ TF_ASSERT_OK(NodeDefBuilder("myop", "Roll")
+ .Input(FakeInput(data_type))
+ .Input(FakeInput(index_type))
+ .Input(FakeInput(index_type))
+ .Finalize(node_def()));
+ TF_ASSERT_OK(InitOp());
+ }
+};
+
+TEST_F(RollOpTest, ScalarIndices) {
+ MakeOp(DT_FLOAT, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<float>(TensorShape({5}), {0, 1, 2, 3, 4});
+ AddInputFromArray<int32>(TensorShape({}), {3});
+ AddInputFromArray<int32>(TensorShape({}), {0});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_FLOAT, TensorShape({5}));
+ test::FillValues<float>(&expected, {2, 3, 4, 0, 1});
+ test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, ScalarIndices_NoMemcpy) {
+ MakeOp(DT_STRING, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<string>(TensorShape({5}), {"a", "b", "c", "d", "e"});
+ AddInputFromArray<int32>(TensorShape({}), {3});
+ AddInputFromArray<int32>(TensorShape({}), {0});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_STRING, TensorShape({5}));
+ test::FillValues<string>(&expected, {"c", "d", "e", "a", "b"});
+ test::ExpectTensorEqual<string>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, ScalarIndices_Complex) {
+ MakeOp(DT_COMPLEX64, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<std::complex<float>>(
+ TensorShape({5}), {std::complex<float>(0, 10), std::complex<float>(1, 11),
+ std::complex<float>(2, 12), std::complex<float>(3, 13),
+ std::complex<float>(4, 14)});
+ AddInputFromArray<int32>(TensorShape({}), {3});
+ AddInputFromArray<int32>(TensorShape({}), {0});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_COMPLEX64, TensorShape({5}));
+ test::FillValues<std::complex<float>>(
+ &expected, {std::complex<float>(2, 12), std::complex<float>(3, 13),
+ std::complex<float>(4, 14), std::complex<float>(0, 10),
+ std::complex<float>(1, 11)});
+ test::ExpectTensorEqual<std::complex<float>>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, Simple_TwoD32) {
+ MakeOp(DT_FLOAT, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<float>(TensorShape({3, 5}),
+ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14});
+ AddInputFromArray<int32>(TensorShape({2}), {2, -1});
+ AddInputFromArray<int32>(TensorShape({2}), {0, 1});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_FLOAT, TensorShape({3, 5}));
+ test::FillValues<float>(&expected,
+ {6, 7, 8, 9, 5, 11, 12, 13, 14, 10, 1, 2, 3, 4, 0});
+ test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, Simple_TwoD32_NoMemcpy) {
+ MakeOp(DT_STRING, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<string>(TensorShape({3, 5}),
+ {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
+ "k", "l", "m", "n", "o"});
+ AddInputFromArray<int32>(TensorShape({2}), {2, -1});
+ AddInputFromArray<int32>(TensorShape({2}), {0, 1});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_STRING, TensorShape({3, 5}));
+ test::FillValues<string>(&expected, {"g", "h", "i", "j", "f", "l", "m", "n",
+ "o", "k", "b", "c", "d", "e", "a"});
+ test::ExpectTensorEqual<string>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, Simple_ThreeD32) {
+ MakeOp(DT_FLOAT, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<float>(TensorShape({2, 2, 3}),
+ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11});
+ AddInputFromArray<int32>(TensorShape({3}), {1, -1, -1});
+ AddInputFromArray<int32>(TensorShape({3}), {0, 1, 2});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_FLOAT, TensorShape({2, 2, 3}));
+ test::FillValues<float>(&expected, {10, 11, 9, 7, 8, 6, 4, 5, 3, 1, 2, 0});
+ test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, Simple_ThreeD32_NoMemcpy) {
+ MakeOp(DT_STRING, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<string>(
+ TensorShape({2, 2, 3}),
+ {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l"});
+ AddInputFromArray<int32>(TensorShape({3}), {1, -1, -1});
+ AddInputFromArray<int32>(TensorShape({3}), {0, 1, 2});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_STRING, TensorShape({2, 2, 3}));
+ test::FillValues<string>(
+ &expected, {"k", "l", "j", "h", "i", "g", "e", "f", "d", "b", "c", "a"});
+ test::ExpectTensorEqual<string>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, Simple_TwoD64) {
+ MakeOp(DT_FLOAT, DT_INT64);
+
+ // Feed and run
+ AddInputFromArray<float>(TensorShape({5, 3}),
+ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14});
+ AddInputFromArray<int64>(TensorShape({2}), {-1, 4});
+ AddInputFromArray<int64>(TensorShape({2}), {0, 1});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_FLOAT, TensorShape({5, 3}));
+ test::FillValues<float>(&expected,
+ {5, 3, 4, 8, 6, 7, 11, 9, 10, 14, 12, 13, 2, 0, 1});
+ test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, Simple_TwoD64_NoMemcpy) {
+ MakeOp(DT_STRING, DT_INT64);
+
+ // Feed and run
+ AddInputFromArray<string>(TensorShape({5, 3}),
+ {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
+ "k", "l", "m", "n", "o"});
+ AddInputFromArray<int64>(TensorShape({2}), {-1, 4});
+ AddInputFromArray<int64>(TensorShape({2}), {0, 1});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_STRING, TensorShape({5, 3}));
+ test::FillValues<string>(&expected, {"f", "d", "e", "i", "g", "h", "l", "j",
+ "k", "o", "m", "n", "c", "a", "b"});
+ test::ExpectTensorEqual<string>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, Simple_ThreeD64) {
+ MakeOp(DT_FLOAT, DT_INT64);
+
+ // Feed and run
+ AddInputFromArray<float>(TensorShape({4, 1, 3}),
+ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11});
+ AddInputFromArray<int64>(TensorShape({3}), {4, 3, 2});
+ AddInputFromArray<int64>(TensorShape({3}), {0, 1, 2});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_FLOAT, TensorShape({4, 1, 3}));
+ test::FillValues<float>(&expected, {1, 2, 0, 4, 5, 3, 7, 8, 6, 10, 11, 9});
+ test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, Simple_ThreeD64_NoMemcpy) {
+ MakeOp(DT_STRING, DT_INT64);
+
+ // Feed and run
+ AddInputFromArray<string>(
+ TensorShape({4, 1, 3}),
+ {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l"});
+ AddInputFromArray<int64>(TensorShape({3}), {4, 3, 2});
+ AddInputFromArray<int64>(TensorShape({3}), {0, 1, 2});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_STRING, TensorShape({4, 1, 3}));
+ test::FillValues<string>(
+ &expected, {"b", "c", "a", "e", "f", "d", "h", "i", "g", "k", "l", "j"});
+ test::ExpectTensorEqual<string>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, ZeroShift_ThreeD32) {
+ MakeOp(DT_FLOAT, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<float>(TensorShape({2, 2, 3}),
+ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11});
+ AddInputFromArray<int32>(TensorShape({3}), {0, 0, 0});
+ AddInputFromArray<int32>(TensorShape({3}), {0, 1, 2});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_FLOAT, TensorShape({2, 2, 3}));
+ test::FillValues<float>(&expected, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11});
+ test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, ZeroShift_ThreeD32_NoMemcpy) {
+ MakeOp(DT_STRING, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<string>(
+ TensorShape({2, 2, 3}),
+ {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l"});
+ AddInputFromArray<int32>(TensorShape({3}), {0, 0, 0});
+ AddInputFromArray<int32>(TensorShape({3}), {0, 1, 2});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_STRING, TensorShape({2, 2, 3}));
+ test::FillValues<string>(
+ &expected, {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l"});
+ test::ExpectTensorEqual<string>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, ZeroSize_ThreeD32) {
+ MakeOp(DT_FLOAT, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<float>(TensorShape({5, 0, 0}), {});
+ AddInputFromArray<int32>(TensorShape({}), {1});
+ AddInputFromArray<int32>(TensorShape({}), {0});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_FLOAT, TensorShape({5, 0, 0}));
+ test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, ZeroSize_ThreeD32_NoMemcpy) {
+ MakeOp(DT_STRING, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<string>(TensorShape({5, 0, 0}), {});
+ AddInputFromArray<int32>(TensorShape({}), {1});
+ AddInputFromArray<int32>(TensorShape({}), {0});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_STRING, TensorShape({5, 0, 0}));
+ test::ExpectTensorEqual<string>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, OneSize_ThreeD32) {
+ MakeOp(DT_FLOAT, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<float>(TensorShape({1, 1, 1}), {5});
+ AddInputFromArray<int32>(TensorShape({}), {1});
+ AddInputFromArray<int32>(TensorShape({}), {0});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_FLOAT, TensorShape({1, 1, 1}));
+ test::FillValues<float>(&expected, {5});
+ test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, OneSize_ThreeD32_NoMemcpy) {
+ MakeOp(DT_STRING, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<string>(TensorShape({1, 1, 1}), {"a"});
+ AddInputFromArray<int32>(TensorShape({}), {1});
+ AddInputFromArray<int32>(TensorShape({}), {0});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_STRING, TensorShape({1, 1, 1}));
+ test::FillValues<string>(&expected, {"a"});
+ test::ExpectTensorEqual<string>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, MultiShifts_TwoD32) {
+ MakeOp(DT_FLOAT, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<float>(TensorShape({3, 5}),
+ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14});
+ AddInputFromArray<int32>(TensorShape({4}), {-2, 2, -1, 1});
+ AddInputFromArray<int32>(TensorShape({4}), {1, 0, 0, 1});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_FLOAT, TensorShape({3, 5}));
+ test::FillValues<float>(&expected,
+ {11, 12, 13, 14, 10, 1, 2, 3, 4, 0, 6, 7, 8, 9, 5});
+ test::ExpectTensorEqual<float>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, MultiShifts_TwoD32_NoMemcpy) {
+ MakeOp(DT_STRING, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<string>(TensorShape({3, 5}),
+ {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
+ "k", "l", "m", "n", "o"});
+ AddInputFromArray<int32>(TensorShape({4}), {-2, 2, -1, 1});
+ AddInputFromArray<int32>(TensorShape({4}), {1, 0, 0, 1});
+ TF_ASSERT_OK(RunOpKernel());
+
+ // Check the output.
+ Tensor expected(allocator(), DT_STRING, TensorShape({3, 5}));
+ test::FillValues<string>(&expected, {"l", "m", "n", "o", "k", "b", "c", "d",
+ "e", "a", "g", "h", "i", "j", "f"});
+ test::ExpectTensorEqual<string>(expected, *GetOutput(0));
+}
+
+TEST_F(RollOpTest, Error_InputMustBeVectorOrHigher) {
+ MakeOp(DT_FLOAT, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<float>(TensorShape({}), {7});
+ AddInputFromArray<int32>(TensorShape({}), {1});
+ AddInputFromArray<int32>(TensorShape({}), {0});
+ Status s = RunOpKernel();
+ EXPECT_TRUE(StringPiece(s.ToString()).contains("input must be 1-D or higher"))
+ << s;
+}
+
+TEST_F(RollOpTest, Error_AxisMustBeScalarOrVector) {
+ MakeOp(DT_FLOAT, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<float>(TensorShape({2, 2}), {1, 2, 3, 4});
+ AddInputFromArray<int32>(TensorShape({}), {1});
+ AddInputFromArray<int32>(TensorShape({1, 2}), {0, 1});
+ Status s = RunOpKernel();
+ EXPECT_TRUE(StringPiece(s.ToString())
+ .contains("axis must be a scalar or a 1-D vector"))
+ << s;
+}
+
+TEST_F(RollOpTest, Error_ShiftMustBeScalarOrVector) {
+ MakeOp(DT_FLOAT, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<float>(TensorShape({2, 2}), {1, 2, 3, 4});
+ AddInputFromArray<int32>(TensorShape({1, 2}), {0, 1});
+ AddInputFromArray<int32>(TensorShape({}), {1});
+ Status s = RunOpKernel();
+ EXPECT_TRUE(StringPiece(s.ToString())
+ .contains("shift must be a scalar or a 1-D vector"))
+ << s;
+}
+
+TEST_F(RollOpTest, Error_ShiftAndAxisMustBeSameSize) {
+ MakeOp(DT_FLOAT, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<float>(TensorShape({2, 2}), {1, 2, 3, 4});
+ AddInputFromArray<int32>(TensorShape({1}), {1});
+ AddInputFromArray<int32>(TensorShape({2}), {0, 1});
+ Status s = RunOpKernel();
+ EXPECT_TRUE(StringPiece(s.ToString())
+ .contains("shift and axis must have the same size"))
+ << s;
+}
+
+TEST_F(RollOpTest, Error_AxisOutOfRange) {
+ MakeOp(DT_FLOAT, DT_INT32);
+
+ // Feed and run
+ AddInputFromArray<float>(TensorShape({4}), {1, 2, 3, 4});
+ AddInputFromArray<int32>(TensorShape({}), {1});
+ AddInputFromArray<int32>(TensorShape({}), {1});
+ Status s = RunOpKernel();
+ EXPECT_TRUE(StringPiece(s.ToString()).contains("is out of range")) << s;
+}
+
+// isd - (inner shift dimension) The inner most dimension to be shifted.
+// All outer dimensions will also be shifted for testing.
+static Graph* RollGraph(const TensorShape& shape, int isd) {
+ Graph* g = new Graph(OpRegistry::Global());
+ Tensor input(DT_FLOAT, shape);
+ input.flat<float>().setRandom();
+ const int dims = static_cast<int>(input.dims());
+ Tensor shift(DT_INT32, TensorShape({dims}));
+ for (int i = 0; i < dims; i++) {
+ // shift the inner shift dimension and all outer dimensions
+ shift.flat<int32>()(i) = (i <= isd) ? 2 : 0;
+ }
+ Tensor axis(DT_INT32, TensorShape({dims}));
+ for (int i = 0; i < dims; i++) {
+ axis.flat<int32>()(i) = i;
+ }
+ test::graph::Roll(g, test::graph::Constant(g, input),
+ test::graph::Constant(g, shift),
+ test::graph::Constant(g, axis));
+ return g;
+}
+
+#define BM_ROLL_OUTER(DEVICE) \
+ static void BM_##DEVICE##_roll_outer(int iters, int rows, int columns) { \
+ TensorShape shape{rows, columns}; \
+ const int64 num_items = static_cast<int64>(iters) * shape.num_elements(); \
+ testing::ItemsProcessed(num_items); \
+ testing::BytesProcessed(num_items * sizeof(float)); \
+ testing::UseRealTime(); \
+ test::Benchmark(#DEVICE, RollGraph(shape, 0)).Run(iters); \
+ } \
+ BENCHMARK(BM_##DEVICE##_roll_outer) \
+ ->ArgPair(256, 256) \
+ ->ArgPair(512, 512) \
+ ->ArgPair(1024, 1024) \
+ ->ArgPair(2048, 2048)
+
+#define BM_ROLL_ALL(DEVICE) \
+ static void BM_##DEVICE##_roll_all(int iters, int rows, int columns) { \
+ TensorShape shape{rows, columns}; \
+ const int64 num_items = static_cast<int64>(iters) * shape.num_elements(); \
+ testing::ItemsProcessed(num_items); \
+ testing::BytesProcessed(num_items * sizeof(float)); \
+ testing::UseRealTime(); \
+ test::Benchmark(#DEVICE, RollGraph(shape, 1)).Run(iters); \
+ } \
+ BENCHMARK(BM_##DEVICE##_roll_all) \
+ ->ArgPair(256, 256) \
+ ->ArgPair(512, 512) \
+ ->ArgPair(1024, 1024) \
+ ->ArgPair(2048, 2048)
+
+BM_ROLL_OUTER(cpu);
+BM_ROLL_ALL(cpu);
+} // namespace
+} // namespace tensorflow
--- /dev/null
+/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#define EIGEN_USE_THREADS
+
+#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
+#include "tensorflow/core/framework/op_kernel.h"
+#include "tensorflow/core/framework/register_types.h"
+#include "tensorflow/core/framework/tensor.h"
+#include "tensorflow/core/framework/types.h"
+
+namespace tensorflow {
+
+namespace {
+template <typename T>
+struct mod_op {
+ const T operator()(const T& a, const T& b) const { return a % b; }
+};
+} // namespace
+
+typedef Eigen::ThreadPoolDevice CPUDevice;
+
+template <typename Tidx>
+class UnravelIndexOp : public OpKernel {
+ public:
+ explicit UnravelIndexOp(OpKernelConstruction* ctx) : OpKernel(ctx) {}
+
+ void Compute(OpKernelContext* ctx) override {
+ const Tensor& indices_tensor = ctx->input(0);
+ OP_REQUIRES(ctx,
+ TensorShapeUtils::IsVector(indices_tensor.shape()) ||
+ TensorShapeUtils::IsScalar(indices_tensor.shape()),
+ errors::InvalidArgument(
+ "The indices can only be scalar or vector, got \"",
+ indices_tensor.shape().DebugString(), "\""));
+
+ const Tensor& dims_tensor = ctx->input(1);
+ OP_REQUIRES(
+ ctx, TensorShapeUtils::IsVector(dims_tensor.shape()),
+ errors::InvalidArgument("The indices can only be 1-D, got \"",
+ dims_tensor.shape().DebugString(), "\""));
+
+ auto dims = dims_tensor.vec<Tidx>();
+
+ Eigen::array<bool, 1> reverse({true});
+
+ Tensor strides_tensor;
+ OP_REQUIRES_OK(ctx,
+ ctx->allocate_temp(DataTypeToEnum<Tidx>::value,
+ TensorShape({dims_tensor.NumElements()}),
+ &strides_tensor));
+
+ auto strides = strides_tensor.vec<Tidx>();
+ strides = dims.reverse(reverse)
+ .scan(0, Eigen::internal::ProdReducer<Tidx>(), false)
+ .reverse(reverse);
+
+ Tensor strides_shifted_tensor;
+ OP_REQUIRES_OK(ctx,
+ ctx->allocate_temp(DataTypeToEnum<Tidx>::value,
+ TensorShape({dims_tensor.NumElements()}),
+ &strides_shifted_tensor));
+
+ auto strides_shifted = strides_shifted_tensor.vec<Tidx>();
+ strides_shifted = dims.reverse(reverse)
+ .scan(0, Eigen::internal::ProdReducer<Tidx>(), true)
+ .reverse(reverse);
+
+ Tensor* output_tensor = nullptr;
+ if (TensorShapeUtils::IsScalar(indices_tensor.shape())) {
+ OP_REQUIRES_OK(
+ ctx, ctx->allocate_output(0, TensorShape({dims_tensor.NumElements()}),
+ &output_tensor));
+
+ auto output = output_tensor->vec<Tidx>();
+
+ output = output.constant(indices_tensor.scalar<Tidx>()());
+ output = output.binaryExpr(strides, mod_op<Tidx>()) / strides_shifted;
+ } else {
+ OP_REQUIRES_OK(
+ ctx, ctx->allocate_output(0,
+ TensorShape({dims_tensor.NumElements(),
+ indices_tensor.NumElements()}),
+ &output_tensor));
+
+ auto output = output_tensor->matrix<Tidx>();
+
+ Eigen::array<int64, 2> reshape{{dims_tensor.NumElements(), 1}};
+ Eigen::array<int64, 2> bcast({1, indices_tensor.NumElements()});
+ Eigen::array<int64, 2> indices_reshape{{1, indices_tensor.NumElements()}};
+ Eigen::array<int64, 2> indices_bcast({dims_tensor.NumElements(), 1});
+
+ output = indices_tensor.vec<Tidx>()
+ .reshape(indices_reshape)
+ .broadcast(indices_bcast);
+ output = output.binaryExpr(strides.reshape(reshape).broadcast(bcast),
+ mod_op<Tidx>()) /
+ strides_shifted.reshape(reshape).broadcast(bcast);
+ }
+ }
+};
+
+#define REGISTER_KERNEL(type) \
+ REGISTER_KERNEL_BUILDER( \
+ Name("UnravelIndex").Device(DEVICE_CPU).TypeConstraint<type>("Tidx"), \
+ UnravelIndexOp<type>);
+TF_CALL_int32(REGISTER_KERNEL) TF_CALL_int64(REGISTER_KERNEL)
+#undef REGISTER_KERNEL
+
+} // namespace tensorflow
return Status::OK();
}
+// To limit memory usage, the default implementation of SkipNBytes() only reads
+// 8MB at a time.
+static constexpr int64 kMaxSkipSize = 8 * 1024 * 1024;
+
+Status RandomAccessInputStream::SkipNBytes(int64 bytes_to_skip) {
+ if (bytes_to_skip < 0) {
+ return errors::InvalidArgument("Can't skip a negative number of bytes");
+ }
+ std::unique_ptr<char[]> scratch(new char[kMaxSkipSize]);
+ // Try to read 1 bytes first, if we could complete the read then EOF is
+ // not reached yet and we could return.
+ if (bytes_to_skip > 0) {
+ StringPiece data;
+ Status s = file_->Read(pos_ + bytes_to_skip - 1, 1, &data, scratch.get());
+ if ((s.ok() || errors::IsOutOfRange(s)) && data.size() == 1) {
+ pos_ += bytes_to_skip;
+ return Status::OK();
+ }
+ }
+ // Read kDefaultSkipSize at a time till bytes_to_skip.
+ while (bytes_to_skip > 0) {
+ int64 bytes_to_read = std::min<int64>(kMaxSkipSize, bytes_to_skip);
+ StringPiece data;
+ Status s = file_->Read(pos_, bytes_to_read, &data, scratch.get());
+ if (s.ok() || errors::IsOutOfRange(s)) {
+ pos_ += data.size();
+ } else {
+ return s;
+ }
+ if (data.size() < bytes_to_read) {
+ return errors::OutOfRange("reached end of file");
+ }
+ bytes_to_skip -= bytes_to_read;
+ }
+ return Status::OK();
+}
+
int64 RandomAccessInputStream::Tell() const { return pos_; }
} // namespace io
Status ReadNBytes(int64 bytes_to_read, string* result) override;
+ Status SkipNBytes(int64 bytes_to_skip) override;
+
int64 Tell() const override;
Status Seek(int64 position) {
return Status::OK();
});
+REGISTER_OP("UnravelIndex")
+ .Input("indices: Tidx")
+ .Input("dims: Tidx")
+ .Output("output: Tidx")
+ .Attr("Tidx: {int32, int64} = DT_INT32")
+ .SetShapeFn([](InferenceContext* c) { return Status::OK(); });
+
// --------------------------------------------------------------------------
// TODO(josh11b): Remove the >= 2 constraint, once we can rewrite the graph
// in the N == 1 case to remove the node.
.Output("selected_indices: int32")
.Attr("iou_threshold: float = 0.5")
.SetShapeFn([](InferenceContext* c) {
+ // Get inputs and validate ranks.
+ ShapeHandle boxes;
+ TF_RETURN_IF_ERROR(c->WithRank(c->input(0), 2, &boxes));
+ ShapeHandle scores;
+ TF_RETURN_IF_ERROR(c->WithRank(c->input(1), 1, &scores));
+ ShapeHandle max_output_size;
+ TF_RETURN_IF_ERROR(c->WithRank(c->input(2), 0, &max_output_size));
+ // The boxes is a 2-D float Tensor of shape [num_boxes, 4].
+ DimensionHandle unused;
+ TF_RETURN_IF_ERROR(c->WithValue(c->Dim(boxes, 1), 4, &unused));
+
c->set_output(0, c->Vector(c->UnknownDim()));
return Status::OK();
});
.Input("iou_threshold: float")
.Output("selected_indices: int32")
.SetShapeFn([](InferenceContext* c) {
+ // Get inputs and validate ranks.
+ ShapeHandle boxes;
+ TF_RETURN_IF_ERROR(c->WithRank(c->input(0), 2, &boxes));
+ ShapeHandle scores;
+ TF_RETURN_IF_ERROR(c->WithRank(c->input(1), 1, &scores));
+ ShapeHandle max_output_size;
+ TF_RETURN_IF_ERROR(c->WithRank(c->input(2), 0, &max_output_size));
+ ShapeHandle iou_threshold;
+ TF_RETURN_IF_ERROR(c->WithRank(c->input(3), 0, &iou_threshold));
+ // The boxes is a 2-D float Tensor of shape [num_boxes, 4].
+ DimensionHandle unused;
+ TF_RETURN_IF_ERROR(c->WithValue(c->Dim(boxes, 1), 4, &unused));
+
c->set_output(0, c->Vector(c->UnknownDim()));
return Status::OK();
});
--- /dev/null
+/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+==============================================================================*/
+
+#include "tensorflow/core/framework/common_shape_fns.h"
+#include "tensorflow/core/framework/op.h"
+#include "tensorflow/core/framework/shape_inference.h"
+
+namespace tensorflow {
+
+// --------------------------------------------------------------------------
+REGISTER_OP("Roll")
+ .Input("input: T")
+ .Input("shift: Tshift")
+ .Input("axis: Taxis")
+ .Output("output: T")
+ .Attr("T: type")
+ .Attr("Tshift: {int32,int64}")
+ .Attr("Taxis: {int32,int64}")
+ .SetShapeFn(shape_inference::UnchangedShape);
+
+} // namespace tensorflow
.Input("input: T")
.Input("mkl_input: uint8")
.Output("output: T")
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
.Output("workspace: T")
#else
.Output("workspace: uint8")
.Input("orig_input: T")
.Input("orig_output: T")
.Input("grad: T")
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
.Input("workspace: T")
#else
.Input("workspace: uint8")
.Input("input: T")
.Input("mkl_input: uint8")
.Output("output: T")
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
.Output("workspace: T")
#else
.Output("workspace: uint8")
.Input("input_grads: T")
.Input("input_image: T")
.Input("output_image: T")
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
.Input("workspace: T")
#else
.Input("workspace: uint8")
void InfoAboutUnusedCPUFeatures() {
std::call_once(g_cpu_feature_guard_warn_once_flag, [] {
string missing_instructions;
-#ifdef PLATFORM_WINDOWS
+#if defined(_MSC_VER) && !defined(__clang__)
+
#ifndef __AVX__
CheckIfFeatureUnused(CPUFeature::AVX, "AVX", missing_instructions);
#endif // __AVX__
#ifndef __AVX2__
CheckIfFeatureUnused(CPUFeature::AVX2, "AVX2", missing_instructions);
#endif // __AVX2__
-#else // ifdef platform windows
+
+#else // if defined(_MSC_VER) && !defined(__clang__)
+
#ifndef __SSE__
CheckIfFeatureUnused(CPUFeature::SSE, "SSE", missing_instructions);
#endif // __SSE__
#ifndef __FMA__
CheckIfFeatureUnused(CPUFeature::FMA, "FMA", missing_instructions);
#endif // __FMA__
-#endif // else of ifdef platform windows
+#endif // else of if defined(_MSC_VER) && !defined(__clang__)
if (!missing_instructions.empty()) {
LOG(INFO) << "Your CPU supports instructions that this TensorFlow "
<< "binary was not compiled to use:" << missing_instructions;
class CpuUtils {
public:
// Constant for invalid frequency.
- // This value is returned when the furequency is not obtained somehow.
+ // This value is returned when the frequency is not obtained somehow.
static constexpr int64 INVALID_FREQUENCY = -1;
static constexpr uint64 DUMMY_CYCLE_CLOCK = 1;
static int64 GetCycleCounterFrequency();
#endif
- // Return micro secound per each clock
+ // Return micro second per each clock
// As this method caches the cpu frequency internally,
// the first call will incur overhead, but not subsequent calls.
static double GetMicroSecPerClock();
#include <aws/core/Aws.h>
#include <aws/core/config/AWSProfileConfigLoader.h>
#include <aws/core/utils/FileSystemUtils.h>
+#include <aws/core/utils/StringUtils.h>
#include <aws/core/utils/logging/AWSLogging.h>
#include <aws/core/utils/logging/LogSystemInterface.h>
#include <aws/s3/S3Client.h>
return cfg;
};
+void ShutdownClient(Aws::S3::S3Client* s3_client) {
+ if (s3_client != nullptr) {
+ delete s3_client;
+ Aws::SDKOptions options;
+ Aws::ShutdownAPI(options);
+ AWSLogSystem::ShutdownAWSLogging();
+ }
+}
+
Status ParseS3Path(const string& fname, bool empty_object_ok, string* bucket,
string* object) {
if (!bucket || !object) {
class S3RandomAccessFile : public RandomAccessFile {
public:
- S3RandomAccessFile(const string& bucket, const string& object)
- : bucket_(bucket), object_(object) {}
+ S3RandomAccessFile(const string& bucket, const string& object,
+ std::shared_ptr<Aws::S3::S3Client> s3_client)
+ : bucket_(bucket), object_(object), s3_client_(s3_client) {}
Status Read(uint64 offset, size_t n, StringPiece* result,
char* scratch) const override {
- Aws::S3::S3Client s3Client(GetDefaultClientConfig());
Aws::S3::Model::GetObjectRequest getObjectRequest;
getObjectRequest.WithBucket(bucket_.c_str()).WithKey(object_.c_str());
string bytes = strings::StrCat("bytes=", offset, "-", offset + n - 1);
getObjectRequest.SetResponseStreamFactory([]() {
return Aws::New<Aws::StringStream>(kS3FileSystemAllocationTag);
});
- auto getObjectOutcome = s3Client.GetObject(getObjectRequest);
+ auto getObjectOutcome = this->s3_client_->GetObject(getObjectRequest);
if (!getObjectOutcome.IsSuccess()) {
n = 0;
*result = StringPiece(scratch, n);
private:
string bucket_;
string object_;
+ std::shared_ptr<Aws::S3::S3Client> s3_client_;
};
class S3WritableFile : public WritableFile {
public:
- S3WritableFile(const string& bucket, const string& object)
+ S3WritableFile(const string& bucket, const string& object,
+ std::shared_ptr<Aws::S3::S3Client> s3_client)
: bucket_(bucket),
object_(object),
+ s3_client_(s3_client),
sync_needed_(true),
outfile_(Aws::MakeShared<Aws::Utils::TempFile>(
kS3FileSystemAllocationTag, "/tmp/s3_filesystem_XXXXXX",
if (!sync_needed_) {
return Status::OK();
}
- Aws::Client::ClientConfiguration clientConfig = GetDefaultClientConfig();
- clientConfig.connectTimeoutMs = 300000;
- clientConfig.requestTimeoutMs = 600000;
- Aws::S3::S3Client s3Client(clientConfig);
Aws::S3::Model::PutObjectRequest putObjectRequest;
putObjectRequest.WithBucket(bucket_.c_str()).WithKey(object_.c_str());
long offset = outfile_->tellp();
outfile_->seekg(0);
putObjectRequest.SetBody(outfile_);
putObjectRequest.SetContentLength(offset);
- auto putObjectOutcome = s3Client.PutObject(putObjectRequest);
+ auto putObjectOutcome = this->s3_client_->PutObject(putObjectRequest);
outfile_->clear();
outfile_->seekp(offset);
if (!putObjectOutcome.IsSuccess()) {
private:
string bucket_;
string object_;
+ std::shared_ptr<Aws::S3::S3Client> s3_client_;
bool sync_needed_;
std::shared_ptr<Aws::Utils::TempFile> outfile_;
};
} // namespace
-S3FileSystem::S3FileSystem() {
- AWSLogSystem::InitializeAWSLogging();
-
- Aws::SDKOptions options;
- options.cryptoOptions.sha256Factory_create_fn = []() {
- return Aws::MakeShared<S3SHA256Factory>(S3CryptoAllocationTag);
- };
- options.cryptoOptions.sha256HMACFactory_create_fn = []() {
- return Aws::MakeShared<S3SHA256HmacFactory>(S3CryptoAllocationTag);
- };
- Aws::InitAPI(options);
-}
-
-S3FileSystem::~S3FileSystem() {
- Aws::SDKOptions options;
- Aws::ShutdownAPI(options);
+S3FileSystem::S3FileSystem()
+ : s3_client_(nullptr, ShutdownClient), client_lock_() {}
+
+S3FileSystem::~S3FileSystem() {}
+
+// Initializes s3_client_, if needed, and returns it.
+std::shared_ptr<Aws::S3::S3Client> S3FileSystem::GetS3Client() {
+ std::lock_guard<mutex> lock(this->client_lock_);
+
+ if (this->s3_client_.get() == nullptr) {
+ AWSLogSystem::InitializeAWSLogging();
+
+ Aws::SDKOptions options;
+ options.cryptoOptions.sha256Factory_create_fn = []() {
+ return Aws::MakeShared<S3SHA256Factory>(S3CryptoAllocationTag);
+ };
+ options.cryptoOptions.sha256HMACFactory_create_fn = []() {
+ return Aws::MakeShared<S3SHA256HmacFactory>(S3CryptoAllocationTag);
+ };
+ Aws::InitAPI(options);
+
+ // The creation of S3Client disables virtual addressing:
+ // S3Client(clientConfiguration, signPayloads, useVirtualAdressing = true)
+ // The purpose is to address the issue encountered when there is an `.`
+ // in the bucket name. Due to TLS hostname validation or DNS rules,
+ // the bucket may not be resolved. Disabling of virtual addressing
+ // should address the issue. See GitHub issue 16397 for details.
+ this->s3_client_ = std::shared_ptr<Aws::S3::S3Client>(new Aws::S3::S3Client(
+ GetDefaultClientConfig(),
+ Aws::Client::AWSAuthV4Signer::PayloadSigningPolicy::Never, false));
+ }
- AWSLogSystem::ShutdownAWSLogging();
+ return this->s3_client_;
}
Status S3FileSystem::NewRandomAccessFile(
const string& fname, std::unique_ptr<RandomAccessFile>* result) {
string bucket, object;
TF_RETURN_IF_ERROR(ParseS3Path(fname, false, &bucket, &object));
- result->reset(new S3RandomAccessFile(bucket, object));
+ result->reset(new S3RandomAccessFile(bucket, object, this->GetS3Client()));
return Status::OK();
}
std::unique_ptr<WritableFile>* result) {
string bucket, object;
TF_RETURN_IF_ERROR(ParseS3Path(fname, false, &bucket, &object));
- result->reset(new S3WritableFile(bucket, object));
+ result->reset(new S3WritableFile(bucket, object, this->GetS3Client()));
return Status::OK();
}
string bucket, object;
TF_RETURN_IF_ERROR(ParseS3Path(fname, false, &bucket, &object));
- result->reset(new S3WritableFile(bucket, object));
+ result->reset(new S3WritableFile(bucket, object, this->GetS3Client()));
while (true) {
status = reader->Read(offset, kS3ReadAppendableFileBufferSize, &read_chunk,
prefix.push_back('/');
}
- Aws::S3::S3Client s3Client(GetDefaultClientConfig());
Aws::S3::Model::ListObjectsRequest listObjectsRequest;
listObjectsRequest.WithBucket(bucket.c_str())
.WithPrefix(prefix.c_str())
Aws::S3::Model::ListObjectsResult listObjectsResult;
do {
- auto listObjectsOutcome = s3Client.ListObjects(listObjectsRequest);
+ auto listObjectsOutcome =
+ this->GetS3Client()->ListObjects(listObjectsRequest);
if (!listObjectsOutcome.IsSuccess()) {
string error = strings::StrCat(
listObjectsOutcome.GetError().GetExceptionName().c_str(), ": ",
string bucket, object;
TF_RETURN_IF_ERROR(ParseS3Path(fname, true, &bucket, &object));
- Aws::S3::S3Client s3Client(GetDefaultClientConfig());
if (object.empty()) {
Aws::S3::Model::HeadBucketRequest headBucketRequest;
headBucketRequest.WithBucket(bucket.c_str());
- auto headBucketOutcome = s3Client.HeadBucket(headBucketRequest);
+ auto headBucketOutcome = this->GetS3Client()->HeadBucket(headBucketRequest);
if (!headBucketOutcome.IsSuccess()) {
string error = strings::StrCat(
headBucketOutcome.GetError().GetExceptionName().c_str(), ": ",
headObjectRequest.WithBucket(bucket.c_str()).WithKey(object.c_str());
headObjectRequest.SetResponseStreamFactory(
[]() { return Aws::New<Aws::StringStream>(kS3FileSystemAllocationTag); });
- auto headObjectOutcome = s3Client.HeadObject(headObjectRequest);
+ auto headObjectOutcome = this->GetS3Client()->HeadObject(headObjectRequest);
if (headObjectOutcome.IsSuccess()) {
stats->length = headObjectOutcome.GetResult().GetContentLength();
stats->is_directory = 0;
.WithMaxKeys(1);
listObjectsRequest.SetResponseStreamFactory(
[]() { return Aws::New<Aws::StringStream>(kS3FileSystemAllocationTag); });
- auto listObjectsOutcome = s3Client.ListObjects(listObjectsRequest);
+ auto listObjectsOutcome =
+ this->GetS3Client()->ListObjects(listObjectsRequest);
if (listObjectsOutcome.IsSuccess()) {
if (listObjectsOutcome.GetResult().GetContents().size() > 0) {
stats->length = 0;
string bucket, object;
TF_RETURN_IF_ERROR(ParseS3Path(fname, false, &bucket, &object));
- Aws::S3::S3Client s3Client(GetDefaultClientConfig());
Aws::S3::Model::DeleteObjectRequest deleteObjectRequest;
deleteObjectRequest.WithBucket(bucket.c_str()).WithKey(object.c_str());
- auto deleteObjectOutcome = s3Client.DeleteObject(deleteObjectRequest);
+ auto deleteObjectOutcome =
+ this->GetS3Client()->DeleteObject(deleteObjectRequest);
if (!deleteObjectOutcome.IsSuccess()) {
string error = strings::StrCat(
deleteObjectOutcome.GetError().GetExceptionName().c_str(), ": ",
TF_RETURN_IF_ERROR(ParseS3Path(dirname, true, &bucket, &object));
if (object.empty()) {
- Aws::S3::S3Client s3Client(GetDefaultClientConfig());
Aws::S3::Model::HeadBucketRequest headBucketRequest;
headBucketRequest.WithBucket(bucket.c_str());
- auto headBucketOutcome = s3Client.HeadBucket(headBucketRequest);
+ auto headBucketOutcome = this->GetS3Client()->HeadBucket(headBucketRequest);
if (!headBucketOutcome.IsSuccess()) {
return errors::NotFound("The bucket ", bucket, " was not found.");
}
string bucket, object;
TF_RETURN_IF_ERROR(ParseS3Path(dirname, false, &bucket, &object));
- Aws::S3::S3Client s3Client(GetDefaultClientConfig());
string prefix = object;
if (prefix.back() != '/') {
prefix.push_back('/');
.WithMaxKeys(2);
listObjectsRequest.SetResponseStreamFactory(
[]() { return Aws::New<Aws::StringStream>(kS3FileSystemAllocationTag); });
- auto listObjectsOutcome = s3Client.ListObjects(listObjectsRequest);
+ auto listObjectsOutcome =
+ this->GetS3Client()->ListObjects(listObjectsRequest);
if (listObjectsOutcome.IsSuccess()) {
auto contents = listObjectsOutcome.GetResult().GetContents();
if (contents.size() > 1 ||
}
}
- Aws::S3::S3Client s3Client(GetDefaultClientConfig());
-
Aws::S3::Model::CopyObjectRequest copyObjectRequest;
Aws::S3::Model::DeleteObjectRequest deleteObjectRequest;
Aws::S3::Model::ListObjectsResult listObjectsResult;
do {
- auto listObjectsOutcome = s3Client.ListObjects(listObjectsRequest);
+ auto listObjectsOutcome =
+ this->GetS3Client()->ListObjects(listObjectsRequest);
if (!listObjectsOutcome.IsSuccess()) {
string error = strings::StrCat(
listObjectsOutcome.GetError().GetExceptionName().c_str(), ": ",
Aws::String src_key = object.GetKey();
Aws::String target_key = src_key;
target_key.replace(0, src_object.length(), target_object.c_str());
- Aws::String source = Aws::String(src_bucket.c_str()) + "/" + src_key;
+ Aws::String source = Aws::String(src_bucket.c_str()) + "/" +
+ Aws::Utils::StringUtils::URLEncode(src_key.c_str());
copyObjectRequest.SetBucket(target_bucket.c_str());
copyObjectRequest.SetKey(target_key);
copyObjectRequest.SetCopySource(source);
- auto copyObjectOutcome = s3Client.CopyObject(copyObjectRequest);
+ auto copyObjectOutcome =
+ this->GetS3Client()->CopyObject(copyObjectRequest);
if (!copyObjectOutcome.IsSuccess()) {
string error = strings::StrCat(
copyObjectOutcome.GetError().GetExceptionName().c_str(), ": ",
deleteObjectRequest.SetBucket(src_bucket.c_str());
deleteObjectRequest.SetKey(src_key.c_str());
- auto deleteObjectOutcome = s3Client.DeleteObject(deleteObjectRequest);
+ auto deleteObjectOutcome =
+ this->GetS3Client()->DeleteObject(deleteObjectRequest);
if (!deleteObjectOutcome.IsSuccess()) {
string error = strings::StrCat(
deleteObjectOutcome.GetError().GetExceptionName().c_str(), ": ",
#ifndef TENSORFLOW_CONTRIB_S3_S3_FILE_SYSTEM_H_
#define TENSORFLOW_CONTRIB_S3_S3_FILE_SYSTEM_H_
+#include <aws/s3/S3Client.h>
#include "tensorflow/core/platform/env.h"
+#include "tensorflow/core/platform/mutex.h"
namespace tensorflow {
Status GetFileSize(const string& fname, uint64* size) override;
Status RenameFile(const string& src, const string& target) override;
+
+ private:
+ // Returns the member S3 client, initializing as-needed.
+ // When the client tries to access the object in S3, e.g.,
+ // s3://bucket-name/path/to/object
+ // the behavior could be controlled by various environmental
+ // variables.
+ // By default S3 access regional endpoint, with region
+ // controlled by `AWS_REGION`. The endpoint could be overridden
+ // explicitly with `S3_ENDPOINT`. S3 uses HTTPS by default.
+ // If S3_USE_HTTPS=0 is specified, HTTP is used. Also,
+ // S3_VERIFY_SSL=0 could disable SSL verification in case
+ // HTTPS is used.
+ // This S3 Client does not support Virtual Hosted–Style Method
+ // for a bucket.
+ std::shared_ptr<Aws::S3::S3Client> GetS3Client();
+
+ std::shared_ptr<Aws::S3::S3Client> s3_client_;
+ // Lock held when checking for s3_client_ initialization.
+ mutex client_lock_;
};
} // namespace tensorflow
TEST_F(S3FileSystemTest, FileExists) {
const string fname = TmpDir("FileExists");
+ // Ensure the file doesn't yet exist.
+ TF_ASSERT_OK(s3fs.DeleteFile(fname));
EXPECT_EQ(error::Code::NOT_FOUND, s3fs.FileExists(fname).code());
TF_ASSERT_OK(WriteString(fname, "test"));
TF_EXPECT_OK(s3fs.FileExists(fname));
// Byte order defines provided by gcc. MSVC doesn't define those so
// we define them here.
// We assume that all windows platform out there are little endian.
+#if defined(_MSC_VER) && !defined(__clang__)
#define __ORDER_LITTLE_ENDIAN__ 0x4d2
#define __ORDER_BIG_ENDIAN__ 0x10e1
#define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
+#endif
#endif // TENSORFLOW_PLATFORM_WINDOWS_CPU_INFO_H_
# can also generate memory profile using `-select bytes`
tfprof> code -select accelerator_micros -max_depth 100000 -output pprof:outfile=<filename> -trim_name_regexes .*apply_op.*
-# Use pprof to visualize the generated file.
-pprof -png --nodecount=100 --sample_index=1 <filename>
+# Use google-pprof, from the google-perftools package to visualize the generated file.
+# On Ubuntu you can install it with `apt-get install it google-perftools`.
+google-pprof --pdf --nodecount=100 <filename>
```

const MultiGraphNodeProto& ShowMultiGraphNode(const string& cmd,
const Options& opts) const;
- // A a (partial) graph to existing graph.
+ // Add a (partial) graph to existing graph.
void AddGraph(std::unique_ptr<GraphDef> graph);
// Add a step of run time meta data.
MultiGraphNodeProto empty_multi_graph_node_;
std::map<int64, string> id_to_string_;
- // Graph nodes covered by RunMetdata, that is traced with run time stats.
+ // Graph nodes covered by RunMetadata, that is traced with run time stats.
std::set<int64> covered_nodes_;
};
"graph_path,op_log_path,run_meta_path\n");
std::unique_ptr<GraphDef> graph(new GraphDef());
if (!FLAGS_graph_path.empty()) {
- TF_CHECK_OK(
- ReadProtoFile(Env::Default(), FLAGS_graph_path, graph.get(), false));
+ s = ReadProtoFile(Env::Default(), FLAGS_graph_path, graph.get(), false);
+ if (!s.ok()) {
+ fprintf(stderr, "Failed to read graph_path: %s\n",
+ s.ToString().c_str());
+ return 1;
+ }
}
std::unique_ptr<OpLogProto> op_log(new OpLogProto());
// TF_VERSION_SUFFIX is non-empty for pre-releases (e.g. "-alpha", "-alpha.1",
// "-beta", "-rc", "-rc.1")
-#define TF_VERSION_SUFFIX "-rc1"
+#define TF_VERSION_SUFFIX ""
#define TF_STR_HELPER(x) #x
#define TF_STR(x) TF_STR_HELPER(x)
#include "tensorflow/core/util/padding.h"
#include "tensorflow/core/util/tensor_format.h"
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
#include "mkldnn.hpp"
using mkldnn::engine;
nullptr; // TF dimension corresponding to this MKL dimension
};
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
// Forward decl
TensorFormat MklDnnDataFormatToTFDataFormat(memory::format format);
typedef std::vector<MklShape> MklShapeList;
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
typedef std::vector<MklDnnShape> MklDnnShapeList;
#endif
return true;
}
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
template <typename T>
inline Tensor ConvertMklToTF(OpKernelContext* context, const Tensor& mkl_tensor,
const MklShape& mkl_shape) {
sizeof(uint8));
}
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
inline void GetMklShape(OpKernelContext* ctext, int n, MklDnnShape* mklshape) {
mklshape->DeSerializeMklDnnShape(
ctext->input(GetTensorMetaDataIndex(n, ctext->num_inputs()))
ctext->input_list(name, input_tensors);
}
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
inline void GetMklShapeList(OpKernelContext* ctext, StringPiece name,
MklShapeList* mkl_shapes) {
#endif
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
/// Get shape of input tensor pointed by 'input_idx' in TensorShape format.
/// If the input tensor is in MKL layout, then obtains TensorShape from
/// MklShape.
second_tensor->flat<uint8>().size() * sizeof(uint8));
}
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
// Allocate the second output tensor that will contain
// the MKL shape serialized
inline void AllocateOutputSetMklShape(OpKernelContext* ctext, int n,
second_tensor->flat<uint8>().size() * sizeof(uint8));
}
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
// Allocate the output tensor, create a second output tensor that will contain
// the MKL shape serialized
inline void AllocateOutputSetMklShape(OpKernelContext* ctext, int n,
// Allocates a temp tensor and returns the data buffer for temporary storage.
// Currently
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
template <typename T>
inline void AllocTmpBuffer(OpKernelContext* context, Tensor* tensor_out,
const memory::primitive_desc& pd, void** buf_out) {
context->set_output(idx_meta_out, meta_output);
}
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
inline void CopyTfTensorInToOutWithShape(OpKernelContext* context, int idx_in,
int idx_out,
const TensorShape& shape) {
}
#endif
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
inline void ForwardTfTensorInToOut(OpKernelContext* context, int idx_in,
int idx_out) {
}
}
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
inline void ForwardMklTensorInToOutWithMklShape(OpKernelContext* context,
int idx_in, int idx_out,
const MklDnnShape& mkl_shape) {
AllocateOutputSetMklShape(context, idx_data_out, mkl_shape_output);
}
-#ifndef INTEL_MKL_DNN
+#ifdef INTEL_MKL_ML
// We don't need these functions in MKLDNN. We have defined equality operator
// on MklDnnShape class directly.
// -------------------------------------------------------------------
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
/// Return MKL-DNN data type (memory::data_type) for input type T
///
}
};
-#endif // INTEL_MKL_DNN
+#endif // INTEL_MKL_ML
} // namespace tensorflow
#endif // INTEL_MKL
namespace tensorflow {
namespace {
-#ifdef INTEL_MKL_DNN
+#ifndef INTEL_MKL_ML
TEST(MklUtilTest, MklDnnTfShape) {
auto cpu_engine = engine(engine::cpu, 0);
EXPECT_EQ(b_md2.data.format, mkldnn_blocked);
}
-#endif // INTEL_MKL_DNN
+#endif // INTEL_MKL_ML
} // namespace
} // namespace tensorflow
Lukasz~Kaiser and
Manjunath~Kudlur and
Josh~Levenberg and
- Dan~Man\'{e} and
+ Dandelion~Man\'{e} and
Rajat~Monga and
Sherry~Moore and
Derek~Murray and
with inner structure (e.g. a spectrogram):
```python
-# `magnitude_spectrograms` is a [batch_size, ?, 127] tensor of spectrograms. We
+# `magnitude_spectrograms` is a [batch_size, ?, 129] tensor of spectrograms. We
# would like to produce overlapping fixed-size spectrogram patches; for example,
# for use in a situation where a fixed size input is needed.
magnitude_spectrograms = tf.abs(tf.contrib.signal.stft(
signals, frame_length=256, frame_step=64, fft_length=256))
-# `spectrogram_patches` is a [batch_size, ?, 64, 127] tensor containing a
-# variable number of [64, 127] spectrogram patches per batch item.
+# `spectrogram_patches` is a [batch_size, ?, 64, 129] tensor containing a
+# variable number of [64, 129] spectrogram patches per batch item.
spectrogram_patches = tf.contrib.signal.frame(
magnitude_spectrograms, frame_length=64, frame_step=16, axis=1)
```
The `model_fn` returns an
@{tf.estimator.EstimatorSpec$`EstimatorSpec`} which is a simple structure
indicating to the `Estimator` which operations should be run to accomplish
-varions tasks.
+various tasks.
cd models/samples/core/get_started
```
-In this document we wil be looking at
+In this document we will be looking at
[`custom_estimator.py`](https://github.com/tensorflow/models/blob/master/samples/core/get_started/custom_estimator.py).
You can run it with the following command:
To implement a typical model function, you must do the following:
-* (Define the model)[#define_the_model].
+* [Define the model](#define_the_model).
* Specify additional calculations for each of
the [three different modes](#modes):
* [Predict](#predict)
shuffled. The Iris data set only contains 150 examples.
The @{tf.data.Dataset.repeat$`repeat`} method has the `Dataset` restart when
-it reaches the end. To limit the number of epochss, set the `count` argument.
+it reaches the end. To limit the number of epochs, set the `count` argument.
The @{tf.data.Dataset.repeat$`batch`} method collects a number of examples and
stacks them, to create batches. This adds a dimension to their shape. The new
We will start by building a function to parse a single line.
-The following `iris_data.parse_line` function acomplishes this taks using the
+The following `iris_data.parse_line` function accomplishes this task using the
@{tf.decode_csv} function, and some simple python code:
We must parse each of the lines in the dataset in order to generate the
contains far fewer cells than an indicator column.
Let's look at an example comparing indicator and embedding columns. Suppose our
-input examples consists of different words from a limited palette of only 81
-words. Further suppose that the data set provides provides the following input
+input examples consist of different words from a limited palette of only 81
+words. Further suppose that the data set provides the following input
words in 4 separate examples:
* `"dog"`
We now have a trained model that produces good evaluation results.
We can now use the trained model to predict the species of an Iris flower
-based on some unlabeled measurments. As with training and evaluation, we make
+based on some unlabeled measurements. As with training and evaluation, we make
predictions using a single function call:
```python
OS="linux" # Change to "darwin" for macOS
TARGET_DIRECTORY="/usr/local"
curl -L \
- "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-${TF_TYPE}-${OS}-x86_64-1.5.0-rc1.tar.gz" |
+ "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-${TF_TYPE}-${OS}-x86_64-1.5.0.tar.gz" |
sudo tar -C $TARGET_DIRECTORY -xz
The `tar` command extracts the TensorFlow C library into the `lib`
TF_TYPE="cpu" # Change to "gpu" for GPU support
TARGET_DIRECTORY='/usr/local'
curl -L \
- "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-${TF_TYPE}-$(go env GOOS)-x86_64-1.5.0-rc1.tar.gz" |
+ "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-${TF_TYPE}-$(go env GOOS)-x86_64-1.5.0.tar.gz" |
sudo tar -C $TARGET_DIRECTORY -xz
The `tar` command extracts the TensorFlow C library into the `lib`
<dependency>
<groupId>org.tensorflow</groupId>
<artifactId>tensorflow</artifactId>
- <version>1.5.0-rc1</version>
+ <version>1.5.0</version>
</dependency>
```
<dependency>
<groupId>org.tensorflow</groupId>
<artifactId>tensorflow</artifactId>
- <version>1.5.0-rc1</version>
+ <version>1.5.0</version>
</dependency>
</dependencies>
</project>
<dependency>
<groupId>org.tensorflow</groupId>
<artifactId>libtensorflow</artifactId>
- <version>1.5.0-rc1</version>
+ <version>1.5.0</version>
</dependency>
<dependency>
<groupId>org.tensorflow</groupId>
<artifactId>libtensorflow_jni_gpu</artifactId>
- <version>1.5.0-rc1</version>
+ <version>1.5.0</version>
</dependency>
```
Take the following steps to install TensorFlow for Java on Linux or macOS:
1. Download
- [libtensorflow.jar](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.5.0-rc1.jar),
+ [libtensorflow.jar](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.5.0.jar),
which is the TensorFlow Java Archive (JAR).
2. Decide whether you will run TensorFlow for Java on CPU(s) only or with
OS=$(uname -s | tr '[:upper:]' '[:lower:]')
mkdir -p ./jni
curl -L \
- "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow_jni-${TF_TYPE}-${OS}-x86_64-1.5.0-rc1.tar.gz" |
+ "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow_jni-${TF_TYPE}-${OS}-x86_64-1.5.0.tar.gz" |
tar -xz -C ./jni
### Install on Windows
Take the following steps to install TensorFlow for Java on Windows:
1. Download
- [libtensorflow.jar](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.5.0-rc1.jar),
+ [libtensorflow.jar](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.5.0.jar),
which is the TensorFlow Java Archive (JAR).
2. Download the following Java Native Interface (JNI) file appropriate for
- [TensorFlow for Java on Windows](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow_jni-cpu-windows-x86_64-1.5.0-rc1.zip).
+ [TensorFlow for Java on Windows](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow_jni-cpu-windows-x86_64-1.5.0.zip).
3. Extract this .zip file.
downloaded `.jar` in your `classpath` by using the `-cp` compilation flag
as follows:
-<pre><b>javac -cp libtensorflow-1.5.0-rc1.jar HelloTF.java</b></pre>
+<pre><b>javac -cp libtensorflow-1.5.0.jar HelloTF.java</b></pre>
### Running
For example, the following command line executes the `HelloTF` program on Linux
and macOS X:
-<pre><b>java -cp libtensorflow-1.5.0-rc1.jar:. -Djava.library.path=./jni HelloTF</b></pre>
+<pre><b>java -cp libtensorflow-1.5.0.jar:. -Djava.library.path=./jni HelloTF</b></pre>
And the following command line executes the `HelloTF` program on Windows:
-<pre><b>java -cp libtensorflow-1.5.0-rc1.jar;. -Djava.library.path=jni HelloTF</b></pre>
+<pre><b>java -cp libtensorflow-1.5.0.jar;. -Djava.library.path=jni HelloTF</b></pre>
If the program prints <tt>Hello from <i>version</i></tt>, you've successfully
installed TensorFlow for Java and are ready to use the API. If the program
mechanisms described in this guide, then the following NVIDIA software
must be installed on your system:
- * CUDA® Toolkit 8.0. For details, see
+ * CUDA® Toolkit 9.0. For details, see
[NVIDIA's documentation](http://docs.nvidia.com/cuda/cuda-installation-guide-linux/#axzz4VZnqTJ2A).
Ensure that you append the relevant Cuda pathnames to the
`LD_LIBRARY_PATH` environment variable as described in the
NVIDIA documentation.
- * The NVIDIA drivers associated with CUDA Toolkit 8.0.
- * cuDNN v6.0. For details, see
+ * The NVIDIA drivers associated with CUDA Toolkit 9.0.
+ * cuDNN v7.0. For details, see
[NVIDIA's documentation](https://developer.nvidia.com/cudnn).
Ensure that you create the `CUDA_HOME` environment variable as
described in the NVIDIA documentation.
Virtualenv environment:
<pre>(tensorflow)$ <b>pip3 install --upgrade \
- https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0rc1-cp34-cp34m-linux_x86_64.whl</b></pre>
+ https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0-cp34-cp34m-linux_x86_64.whl</b></pre>
If you encounter installation problems, see
[Common Installation Problems](#common_installation_problems).
<pre>
$ <b>sudo pip3 install --upgrade \
- https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0rc1-cp34-cp34m-linux_x86_64.whl</b>
+ https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0-cp34-cp34m-linux_x86_64.whl</b>
</pre>
If this step fails, see
<pre>
(tensorflow)$ <b>pip install --ignore-installed --upgrade \
- https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0rc1-cp34-cp34m-linux_x86_64.whl</b></pre>
+ https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0-cp34-cp34m-linux_x86_64.whl</b></pre>
<a name="ValidateYourInstallation"></a>
CPU only:
<pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0rc1-cp27-none-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0-cp27-none-linux_x86_64.whl
</pre>
GPU support:
<pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.5.0rc1-cp27-none-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.5.0-cp27-none-linux_x86_64.whl
</pre>
Note that GPU support requires the NVIDIA hardware and software described in
CPU only:
<pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0rc1-cp34-cp34m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0-cp34-cp34m-linux_x86_64.whl
</pre>
GPU support:
<pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.5.0rc1-cp34-cp34m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.5.0-cp34-cp34m-linux_x86_64.whl
</pre>
Note that GPU support requires the NVIDIA hardware and software described in
CPU only:
<pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0rc1-cp35-cp35m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0-cp35-cp35m-linux_x86_64.whl
</pre>
GPU support:
<pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.5.0rc1-cp35-cp35m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.5.0-cp35-cp35m-linux_x86_64.whl
</pre>
CPU only:
<pre>
-https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0rc1-cp36-cp36m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.5.0-cp36-cp36m-linux_x86_64.whl
</pre>
GPU support:
<pre>
-https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.5.0rc1-cp36-cp36m-linux_x86_64.whl
+https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.5.0-cp36-cp36m-linux_x86_64.whl
</pre>
TensorFlow in the active Virtualenv is as follows:
<pre> $ <b>pip3 install --upgrade \
- https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.5.0rc1-py2-none-any.whl</b></pre>
+ https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.5.0-py3-none-any.whl</b></pre>
If you encounter installation problems, see
[Common Installation Problems](#common-installation-problems).
issue the following command:
<pre> $ <b>sudo pip3 install --upgrade \
- https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.5.0rc1-py2-none-any.whl</b> </pre>
+ https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.5.0-py3-none-any.whl</b> </pre>
If the preceding command fails, see
[installation problems](#common-installation-problems).
TensorFlow for Python 2.7:
<pre> (<i>targetDirectory</i>)$ <b>pip install --ignore-installed --upgrade \
- https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.5.0rc1-py2-none-any.whl</b></pre>
+ https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.5.0-py2-none-any.whl</b></pre>
<a name="ValidateYourInstallation"></a>
<pre>
-https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.5.0rc1-py2-none-any.whl
+https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.5.0-py2-none-any.whl
</pre>
<pre>
-https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.5.0rc1-py3-none-any.whl
+https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.5.0-py3-none-any.whl
</pre>
The following NVIDIA <i>software</i> must be installed on your system:
- * NVIDIA's Cuda Toolkit (>= 7.0). We recommend version 8.0.
+ * NVIDIA's Cuda Toolkit (>= 7.0). We recommend version 9.0.
For details, see
[NVIDIA's documentation](http://docs.nvidia.com/cuda/cuda-installation-guide-linux/#axzz4VZnqTJ2A).
Ensure that you append the relevant Cuda pathnames to the
CUDA support will be enabled for TensorFlow
Do you want to use clang as CUDA compiler? [y/N]
nvcc will be used as CUDA compiler
-Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: <b>8.0</b>
-Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
+Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: <b>9.0</b>
+Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
-Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: <b>6</b>
-Please specify the location where cuDNN 6 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
+Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: <b>7</b>
+Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
The filename of the `.whl` file depends on your platform.
For example, the following command will install the pip package
-for TensorFlow 1.5.0rc1 on Linux:
+for TensorFlow 1.5.0 on Linux:
<pre>
-$ <b>sudo pip install /tmp/tensorflow_pkg/tensorflow-1.5.0rc1-py2-none-any.whl</b>
+$ <b>sudo pip install /tmp/tensorflow_pkg/tensorflow-1.5.0-py2-none-any.whl</b>
</pre>
## Validate your installation
<table>
<tr><th>Version:</th><th>CPU/GPU:</th><th>Python Version:</th><th>Compiler:</th><th>Build Tools:</th><th>cuDNN:</th><th>CUDA:</th></tr>
-<tr><td>tensorflow-1.5.0-rc1</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>GCC 4.8</td><td>Bazel 0.8.0</td><td>N/A</td><td>N/A</td></tr>
-<tr><td>tensorflow_gpu-1.5.0-rc1</td><td>GPU</td><td>2.7, 3.3-3.6</td><td>GCC 4.8</td><td>Bazel 0.8.0</td><td>7</td><td>9</td></tr>
+<tr><td>tensorflow-1.5.0</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>GCC 4.8</td><td>Bazel 0.8.0</td><td>N/A</td><td>N/A</td></tr>
+<tr><td>tensorflow_gpu-1.5.0</td><td>GPU</td><td>2.7, 3.3-3.6</td><td>GCC 4.8</td><td>Bazel 0.8.0</td><td>7</td><td>9</td></tr>
<tr><td>tensorflow-1.4.0</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>GCC 4.8</td><td>Bazel 0.5.4</td><td>N/A</td><td>N/A</td></tr>
<tr><td>tensorflow_gpu-1.4.0</td><td>GPU</td><td>2.7, 3.3-3.6</td><td>GCC 4.8</td><td>Bazel 0.5.4</td><td>6</td><td>8</td></tr>
<tr><td>tensorflow-1.3.0</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>GCC 4.8</td><td>Bazel 0.4.5</td><td>N/A</td><td>N/A</td></tr>
**Mac**
<table>
<tr><th>Version:</th><th>CPU/GPU:</th><th>Python Version:</th><th>Compiler:</th><th>Build Tools:</th><th>cuDNN:</th><th>CUDA:</th></tr>
-<tr><td>tensorflow-1.5.0-rc1</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>Clang from xcode</td><td>Bazel 0.8.1</td><td>N/A</td><td>N/A</td></tr>
+<tr><td>tensorflow-1.5.0</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>Clang from xcode</td><td>Bazel 0.8.1</td><td>N/A</td><td>N/A</td></tr>
<tr><td>tensorflow-1.4.0</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>Clang from xcode</td><td>Bazel 0.5.4</td><td>N/A</td><td>N/A</td></tr>
<tr><td>tensorflow-1.3.0</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>Clang from xcode</td><td>Bazel 0.4.5</td><td>N/A</td><td>N/A</td></tr>
<tr><td>tensorflow-1.2.0</td><td>CPU</td><td>2.7, 3.3-3.6</td><td>Clang from xcode</td><td>Bazel 0.4.5</td><td>N/A</td><td>N/A</td></tr>
**Windows**
<table>
<tr><th>Version:</th><th>CPU/GPU:</th><th>Python Version:</th><th>Compiler:</th><th>Build Tools:</th><th>cuDNN:</th><th>CUDA:</th></tr>
-<tr><td>tensorflow-1.5.0-rc1</td><td>CPU</td><td>3.5-3.6</td><td>MSVC 2015 update 3</td><td>Cmake v3.6.3</td><td>N/A</td><td>N/A</td></tr>
-<tr><td>tensorflow_gpu-1.5.0-rc1</td><td>GPU</td><td>3.5-3.6</td><td>MSVC 2015 update 3</td><td>Cmake v3.6.3</td><td>7</td><td>9</td></tr>
+<tr><td>tensorflow-1.5.0</td><td>CPU</td><td>3.5-3.6</td><td>MSVC 2015 update 3</td><td>Cmake v3.6.3</td><td>N/A</td><td>N/A</td></tr>
+<tr><td>tensorflow_gpu-1.5.0</td><td>GPU</td><td>3.5-3.6</td><td>MSVC 2015 update 3</td><td>Cmake v3.6.3</td><td>7</td><td>9</td></tr>
<tr><td>tensorflow-1.4.0</td><td>CPU</td><td>3.5-3.6</td><td>MSVC 2015 update 3</td><td>Cmake v3.6.3</td><td>N/A</td><td>N/A</td></tr>
<tr><td>tensorflow_gpu-1.4.0</td><td>GPU</td><td>3.5-3.6</td><td>MSVC 2015 update 3</td><td>Cmake v3.6.3</td><td>6</td><td>8</td></tr>
<tr><td>tensorflow-1.3.0</td><td>CPU</td><td>3.5-3.6</td><td>MSVC 2015 update 3</td><td>Cmake v3.6.3</td><td>N/A</td><td>N/A</td></tr>
described in this guide, then the following NVIDIA software must be
installed on your system:
- * CUDA® Toolkit 8.0. For details, see
+ * CUDA® Toolkit 9.0. For details, see
[NVIDIA's
documentation](http://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/)
Ensure that you append the relevant Cuda pathnames to the `%PATH%`
environment variable as described in the NVIDIA documentation.
- * The NVIDIA drivers associated with CUDA Toolkit 8.0.
- * cuDNN v6.0. For details, see
+ * The NVIDIA drivers associated with CUDA Toolkit 9.0.
+ * cuDNN v7.0. For details, see
[NVIDIA's documentation](https://developer.nvidia.com/cudnn).
Note that cuDNN is typically installed in a different location from the
other CUDA DLLs. Ensure that you add the directory where you installed
@{tf.Tensor} accepts an optional `name` argument. For example,
`tf.constant(42.0, name="answer")` creates a new @{tf.Operation} named
`"answer"` and returns a @{tf.Tensor} named `"answer:0"`. If the default graph
- already contained an operation named `"answer"`, the TensorFlow would append
+ already contains an operation named `"answer"`, then TensorFlow would append
`"_1"`, `"_2"`, and so on to the name, in order to make it unique.
* The @{tf.name_scope} function makes it possible to add a **name scope** prefix
to all operations created in a particular context. The current name scope
prefix is a `"/"`-delimited list of the names of all active @{tf.name_scope}
context managers. If a name scope has already been used in the current
- context, TensorFlow appens `"_1"`, `"_2"`, and so on. For example:
+ context, TensorFlow appends `"_1"`, `"_2"`, and so on. For example:
```python
c_0 = tf.constant(0, name="c") # => operation named "c"
filegroup(
name = "external_assets",
srcs = [
- "@inception5h//:model_files",
+ "@inception_v1//:model_files",
"@mobile_ssd//:model_files",
"@speech_commands//:model_files",
"@stylize//:model_files",
def nativeBuildRule = 'buildNativeBazel'
def demoLibPath = '../../../bazel-bin/tensorflow/examples/android/libtensorflow_demo.so'
def inferenceLibPath = '../../../bazel-bin/tensorflow/contrib/android/libtensorflow_inference.so'
+
+// Override for Makefile builds.
if (nativeBuildSystem == 'makefile') {
nativeBuildRule = 'buildNativeMake'
- demoLibPath = '../../../tensorflow/contrib/makefile/gen/lib/libtensorflow_demo.so'
- inferenceLibPath = '../../../tensorflow/contrib/makefile/gen/lib/libtensorflow_inference.so'
+ demoLibPath = '../../../tensorflow/contrib/makefile/gen/lib/android_' + cpuType + '/libtensorflow_demo.so'
+ inferenceLibPath = '../../../tensorflow/contrib/makefile/gen/lib/android_' + cpuType + '/libtensorflow_inference.so'
}
// If building with Bazel, this is the location of the bazel binary.
'-s', \
'tensorflow/contrib/makefile/sub_makefiles/android/Makefile.in', \
'-t', \
- 'libtensorflow_inference.so libtensorflow_demo.so' \
+ 'libtensorflow_inference.so libtensorflow_demo.so all' \
+ , '-a', cpuType \
//, '-T' // Uncomment to skip protobuf and speed up subsequent builds.
}
*/
// hard coded model files
// LINT.IfChange
-def models = ['inception5h.zip',
+def models = ['inception_v1.zip',
'object_detection/ssd_mobilenet_v1_android_export.zip',
'stylize_v1.zip',
'speech_commands_conv_actions.zip']
try {
Camera.Parameters parameters = camera.getParameters();
- parameters.setFocusMode(Camera.Parameters.FOCUS_MODE_CONTINUOUS_PICTURE);
-
+ List<String> focusModes = parameters.getSupportedFocusModes();
+ if (focusModes != null
+ && focusModes.contains(Camera.Parameters.FOCUS_MODE_CONTINUOUS_PICTURE)) {
+ parameters.setFocusMode(Camera.Parameters.FOCUS_MODE_CONTINUOUS_PICTURE);
+ }
List<Camera.Size> cameraSizes = parameters.getSupportedPreviewSizes();
Size[] sizes = new Size[cameraSizes.size()];
int i = 0;
final int w,
final int h,
final int rowStride,
- final int sensorOrienation,
+ final int sensorOrientation,
final byte[] frame,
final long timestamp) {
if (objectTracker == null && !initialized) {
objectTracker = ObjectTracker.getInstance(w, h, rowStride, true);
frameWidth = w;
frameHeight = h;
- this.sensorOrientation = sensorOrienation;
+ this.sensorOrientation = sensorOrientation;
initialized = true;
if (objectTracker == null) {
apt-get clean && \
rm -rf /var/lib/apt/lists/*
-RUN pip install scikit-learn pyreadline Pillow
+RUN pip install scikit-learn pyreadline Pillow imageio
RUN rm -rf /notebooks/*
ADD *.ipynb /notebooks/
WORKDIR /notebooks
":layers",
":lib",
":list_ops",
+ ":manip_ops",
":math_ops",
":metrics",
":nn",
],
)
+tf_gen_op_wrapper_private_py(
+ name = "manip_ops_gen",
+ visibility = [
+ "//learning/brain/python/ops:__pkg__",
+ "//tensorflow/python/kernel_tests:__pkg__",
+ ],
+)
+
tf_gen_op_wrapper_private_py(
name = "math_ops_gen",
visibility = [
":linalg_grad",
":linalg_ops",
":logging_ops",
+ ":manip_grad",
+ ":manip_ops",
":math_grad",
":math_ops",
":platform",
],
)
+py_library(
+ name = "manip_grad",
+ srcs = ["ops/manip_grad.py"],
+ srcs_version = "PY2AND3",
+ deps = [
+ ":control_flow_ops",
+ ":framework_for_generated_wrappers",
+ ":manip_ops",
+ ],
+)
+
+py_library(
+ name = "manip_ops",
+ srcs = ["ops/manip_ops.py"],
+ srcs_version = "PY2AND3",
+ deps = [
+ ":dtypes",
+ ":framework_ops",
+ ":manip_ops_gen",
+ "//third_party/py/numpy",
+ ],
+)
+
py_library(
name = "logging_ops",
srcs = ["ops/logging_ops.py"],
":linalg_ops",
":logging_ops",
":lookup_ops",
+ ":manip_grad",
+ ":manip_ops",
":math_grad",
":math_ops",
":numerics",
from tensorflow.python.layers import layers
from tensorflow.python.ops import bitwise_ops as bitwise
from tensorflow.python.ops import image_ops as image
+from tensorflow.python.ops import manip_ops as manip
from tensorflow.python.ops import metrics
from tensorflow.python.ops import nn
from tensorflow.python.ops import sets
'linalg',
'logging',
'losses',
+ 'manip',
'metrics',
'newaxis',
'nn',
import numpy as np
+from six.moves import xrange # pylint: disable=redefined-builtin
from tensorflow.python.client import session
from tensorflow.python.framework import dtypes
from tensorflow.python.framework import ops
tensors: A nested structure of tensors.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return TensorDataset(tensors)
0th dimension.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return TensorSliceDataset(tensors)
sparse_tensor: A `tf.SparseTensor`.
Returns:
- A `Dataset` of rank-(N-1) sparse tensors.
+ Dataset: A `Dataset` of rank-(N-1) sparse tensors.
"""
return SparseTensorSliceDataset(sparse_tensor)
`generator`.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
if not callable(generator):
raise TypeError("`generator` must be callable.")
len(args) == 3 -> start = args[0], stop = args[1, stop = args[2]
Returns:
- A `RangeDataset`.
+ Dataset: A `RangeDataset`.
Raises:
ValueError: if len(args) == 0.
datasets: A nested structure of datasets.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return ZipDataset(datasets)
dataset: `Dataset` to be concatenated.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return ConcatenateDataset(self, dataset)
maximum number elements that will be buffered when prefetching.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return PrefetchDataset(self, buffer_size)
- /path/to/dir/b.py
- /path/to/dir/c.py
+ NOTE: The order of the file names returned can be non-deterministic.
+
Args:
file_pattern: A string or scalar string `tf.Tensor`, representing
the filename pattern that will be matched.
Returns:
- A `Dataset` of strings corresponding to file names.
+ Dataset: A `Dataset` of strings corresponding to file names.
"""
return Dataset.from_tensor_slices(gen_io_ops.matching_files(file_pattern))
indefinitely.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return RepeatDataset(self, count)
iterated over. (Defaults to `True`.)
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return ShuffleDataset(self, buffer_size, seed, reshuffle_each_iteration)
If a filename is not provided, the dataset will be cached in memory.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return CacheDataset(self, filename)
dataset, the new dataset will contain all elements of this dataset.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return TakeDataset(self, count)
is -1, skips the entire dataset.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return SkipDataset(self, count)
index: A `tf.int64` scalar `tf.Tensor`, representing the worker index.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
Raises:
ValueError: if `num_shards` or `index` are illegal values. Note: error
consecutive elements of this dataset to combine in a single batch.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return BatchDataset(self, batch_size)
the empty string for string types.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return PaddedBatchDataset(self, batch_size, padded_shapes, padding_values)
specified, elements will be processed sequentially.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
if num_parallel_calls is None:
return MapDataset(self, map_func)
`Dataset`.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return FlatMapDataset(self, map_func)
input element before cycling to another input element.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return InterleaveDataset(self, map_func, cycle_length, block_length)
scalar `tf.bool` tensor.
Returns:
- A `Dataset`.
+ Dataset: A `Dataset`.
"""
return FilterDataset(self, predicate)
Args:
transformation_func: A function that takes one `Dataset` argument and
- returns a `Dataset`.
+ returns a `Dataset`.
Returns:
- The `Dataset` returned by applying `transformation_func` to this dataset.
+ Dataset: The `Dataset` returned by applying `transformation_func` to this
+ dataset.
"""
dataset = transformation_func(self)
if not isinstance(dataset, Dataset):
and the return value will contain the results in the same structure.
Args:
- func: A callable that acceps as many arguments are there are structures.
+ func: A callable that accepts as many arguments are there are structures.
*structure: scalar, or tuple or list of constructed scalars and/or other
tuples/lists, or scalars. Note: numpy arrays are considered scalars.
**check_types_dict: only valid keyword argument is `check_types`. If set to
The `inputs`, can be thought of as having the same structure as
`shallow_tree`, but with leaf nodes that are themselves tree structures.
- This function therefore will return something with the same base structure as
- `shallow_tree`.
+ This function, therefore, will return something with the same base structure
+ as `shallow_tree`.
Examples:
tensors: a tensor structure to serialize.
Returns:
- `tensors` with any sparse tensors replaced by the their serialized version.
+ `tensors` with any sparse tensors replaced by their serialized version.
"""
ret = nest.pack_sequence_as(tensors, [
if not isinstance(tensor, np.ndarray) or not np.size(tensor):
return debugger_cli_common.RichTextLines([
"No numeric summary available due to empty tensor."])
- elif (np.issubdtype(tensor.dtype, np.float) or
+ elif (np.issubdtype(tensor.dtype, np.floating) or
np.issubdtype(tensor.dtype, np.complex) or
np.issubdtype(tensor.dtype, np.integer)):
counts = [
# Also return False for data types that cannot be represented as numpy
# arrays.
return False
- elif (np.issubdtype(tensor.dtype, np.float) or
+ elif (np.issubdtype(tensor.dtype, np.floating) or
np.issubdtype(tensor.dtype, np.complex) or
np.issubdtype(tensor.dtype, np.integer)):
return np.any(np.isnan(tensor)) or np.any(np.isinf(tensor))
continue
numpy_dtype = output.dtype.as_numpy_dtype
- if (np.issubdtype(numpy_dtype, np.float) or
+ if (np.issubdtype(numpy_dtype, np.floating) or
np.issubdtype(numpy_dtype, np.complex) or
np.issubdtype(numpy_dtype, np.integer)):
try:
# Create a second DNNClassifier, warm-started from the first. Use a
# learning_rate = 0.0 optimizer to check values (use SGD so we don't have
- # accumulator values that change). Use a a new FeatureColumn with a
+ # accumulator values that change). Use a new FeatureColumn with a
# different vocabulary for occupation.
new_vocab_list = ['doctor', 'consultant', 'engineer']
new_vocab_file = os.path.join(self._ckpt_and_vocab_dir,
# Create a second LinearClassifier, warm-started from the first. Use a
# learning_rate = 0.0 optimizer to check values (use SGD so we don't have
- # accumulator values that change). Use a a new FeatureColumn with a
+ # accumulator values that change). Use a new FeatureColumn with a
# different vocabulary for occupation.
new_vocab_list = ['doctor', 'consultant', 'engineer']
new_vocab_file = os.path.join(self._ckpt_and_vocab_dir,
from tensorflow.python.training import training
from tensorflow.python.training import training_util
from tensorflow.python.util import compat
+from tensorflow.python.util import compat_internal
from tensorflow.python.util import nest
self._config = config
# Model directory.
- model_dir = compat.path_to_str(model_dir)
+ model_dir = compat_internal.path_to_str(model_dir)
if (model_dir is not None) and (self._config.model_dir is not None):
if model_dir != self._config.model_dir:
# TODO(alanyee): remove this suppression after it is no longer needed
from tensorflow.core.protobuf import config_pb2
from tensorflow.python.platform import tf_logging as logging
from tensorflow.python.training import server_lib
-from tensorflow.python.util import compat
+from tensorflow.python.util import compat_internal
_USE_DEFAULT = object()
if tf_config:
logging.info('TF_CONFIG environment variable: %s', tf_config)
- model_dir = _get_model_dir(tf_config, compat.path_to_str(model_dir))
+ model_dir = _get_model_dir(tf_config,
+ compat_internal.path_to_str(model_dir))
RunConfig._replace(
self,
return dict(list(base_config.items()) + list(config.items()))
-class Conv3DTranspose(tf_convolutional_layers.Conv3D, Layer):
+class Conv3DTranspose(tf_convolutional_layers.Conv3DTranspose, Layer):
"""Transposed convolution layer (sometimes called Deconvolution).
The need for transposed convolutions generally arises
],
)
+cuda_py_test(
+ name = "manip_ops_test",
+ size = "small",
+ srcs = ["manip_ops_test.py"],
+ additional_deps = [
+ "//third_party/py/numpy",
+ "//tensorflow/python:manip_ops",
+ "//tensorflow/python:client_testlib",
+ "//tensorflow/python:framework_for_generated_wrappers",
+ ],
+ tags = ["no_windows_gpu"],
+)
+
cuda_py_test(
name = "matmul_op_test",
size = "small",
self.assertAllEqual(y.eval(), [2, 4, 3, 0, 1])
+class UnravelIndexTest(test_util.TensorFlowTestCase):
+
+ def testUnravelIndex(self):
+ with self.test_session():
+ for dtype in [dtypes.int32, dtypes.int64]:
+ indices_1 = constant_op.constant(1621, dtype=dtype)
+ dims_1 = constant_op.constant([6, 7, 8, 9], dtype=dtype)
+ out_1 = array_ops.unravel_index(indices_1, dims_1)
+ self.assertAllEqual(out_1.eval(), [3, 1, 4, 1])
+
+ indices_2 = constant_op.constant([1621], dtype=dtype)
+ dims_2 = constant_op.constant([6, 7, 8, 9], dtype=dtype)
+ out_2 = array_ops.unravel_index(indices_2, dims_2)
+ self.assertAllEqual(out_2.eval(), [[3], [1], [4], [1]])
+
+ indices_3 = constant_op.constant([22, 41, 37], dtype=dtype)
+ dims_3 = constant_op.constant([7, 6], dtype=dtype)
+ out_3 = array_ops.unravel_index(indices_3, dims_3)
+ self.assertAllEqual(out_3.eval(), [[3, 6, 6], [4, 5, 1]])
+
+
class GuaranteeConstOpTest(test_util.TensorFlowTestCase):
def testSimple(self):
def testZerosLikeCPU(self):
for dtype in [
- dtypes_lib.float32, dtypes_lib.float64, dtypes_lib.int8,
- dtypes_lib.uint8, dtypes_lib.int16, dtypes_lib.uint16, dtypes_lib.int32,
- dtypes_lib.int64, dtypes_lib.bool, dtypes_lib.complex64,
- dtypes_lib.complex128, dtypes_lib.string
+ dtypes_lib.half, dtypes_lib.float32, dtypes_lib.float64,
+ dtypes_lib.int8, dtypes_lib.uint8, dtypes_lib.int16, dtypes_lib.uint16,
+ dtypes_lib.int32, dtypes_lib.int64, dtypes_lib.bool,
+ dtypes_lib.complex64, dtypes_lib.complex128, dtypes_lib.string
]:
self._compareZeros(dtype, fully_defined_shape=False, use_gpu=False)
self._compareZeros(dtype, fully_defined_shape=True, use_gpu=False)
def testZerosLikeGPU(self):
for dtype in [
- dtypes_lib.float32, dtypes_lib.float64, dtypes_lib.int32,
- dtypes_lib.bool, dtypes_lib.int64, dtypes_lib.string
+ dtypes_lib.half, dtypes_lib.float32, dtypes_lib.float64,
+ dtypes_lib.int32, dtypes_lib.int64, dtypes_lib.complex64,
+ dtypes_lib.complex128, dtypes_lib.bool
]:
self._compareZeros(dtype, fully_defined_shape=False, use_gpu=True)
self._compareZeros(dtype, fully_defined_shape=True, use_gpu=True)
import numpy as np
+from six.moves import xrange # pylint: disable=redefined-builtin
from tensorflow.contrib import layers
from tensorflow.python.client import session as session_lib
from tensorflow.python.framework import constant_op
dilations=[2, 2],
padding="VALID")
- # TODO this currently fails.
+ # TODO(yzhwang): this currently fails.
# self._VerifyValues(tensor_in_sizes=[1, 8, 8, 1],
# filter_in_sizes=[2, 2, 1, 1],
# strides=[4, 4], padding="SAME",
import os
import time
+from six.moves import xrange
from tensorflow.python.client import session
from tensorflow.python.framework import ops
from tensorflow.python.ops import array_ops
-# -*- coding: utf-8 -*-
+# -*- coding: utf-8 -*-
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# Compute the expected loss 'manually'.
total = np.zeros((batch_size,))
for b in range(batch_size):
- for i in range(dims):
- for j in range(dims):
+ for i in range(dims - 1):
+ for j in range(i + 1, dims):
x = self._predictions[b, i].item() - self._predictions[b, j].item()
y = self._labels[b, i].item() - self._labels[b, j].item()
diff = (x - y)
total[b] += (diff * diff)
- self._expected_losses = np.divide(total, 9.0)
+ self._expected_losses = np.divide(total, 3.0)
def testValueErrorThrownWhenWeightIsNone(self):
with self.test_session():
[[4, 8, 12], [1, 2, 3], [4, 5, 6]],
[[8, 1, 3], [7, 8, 9], [10, 11, 12]],
])
- self._test_valid_weights(
- labels, predictions, expected_loss=122.22222)
+ self._test_valid_weights(labels, predictions, expected_loss=137.5)
def test3dWeightedScalar(self):
labels = np.array([
])
weight = 3.0
self._test_valid_weights(
- labels, predictions, expected_loss=weight * 122.22222,
- weights=weight)
+ labels, predictions, expected_loss=weight * 137.5, weights=weight)
def _test_invalid_weights(
self, labels, predictions, weights=1.0):
])
self._test_valid_weights(
# TODO(ptucker): This doesn't look right.
- labels, predictions, expected_loss=9 * 122.22222,
+ labels,
+ predictions,
+ expected_loss=9 * 137.5,
weights=np.ones((2, 3, 3)))
def testLossWithAllZeroBatchSpecificWeights(self):
--- /dev/null
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Tests for manip_ops."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import numpy as np
+
+from tensorflow.python.framework import constant_op
+from tensorflow.python.framework import errors_impl
+from tensorflow.python.framework import test_util
+from tensorflow.python.ops import gradient_checker
+from tensorflow.python.ops import manip_ops
+from tensorflow.python.platform import test as test_lib
+
+# pylint: disable=g-import-not-at-top
+try:
+ from distutils.version import StrictVersion as Version
+ # numpy.roll for multiple shifts was introduced in numpy version 1.12.0
+ NP_ROLL_CAN_MULTISHIFT = Version(np.version.version) >= Version("1.12.0")
+except ImportError:
+ NP_ROLL_CAN_MULTISHIFT = False
+# pylint: enable=g-import-not-at-top
+
+
+class RollTest(test_util.TensorFlowTestCase):
+
+ def _testRoll(self, np_input, shift, axis):
+ expected_roll = np.roll(np_input, shift, axis)
+ with self.test_session():
+ roll = manip_ops.roll(np_input, shift, axis)
+ self.assertAllEqual(roll.eval(), expected_roll)
+
+ def _testGradient(self, np_input, shift, axis):
+ with self.test_session():
+ inx = constant_op.constant(np_input.tolist())
+ xs = list(np_input.shape)
+ y = manip_ops.roll(inx, shift, axis)
+ # Expected y's shape to be the same
+ ys = xs
+ jacob_t, jacob_n = gradient_checker.compute_gradient(
+ inx, xs, y, ys, x_init_value=np_input)
+ self.assertAllClose(jacob_t, jacob_n, rtol=1e-5, atol=1e-5)
+
+ def _testAll(self, np_input, shift, axis):
+ self._testRoll(np_input, shift, axis)
+ if np_input.dtype == np.float32:
+ self._testGradient(np_input, shift, axis)
+
+ def testIntTypes(self):
+ for t in [np.int32, np.int64]:
+ self._testAll(np.random.randint(-100, 100, (5)).astype(t), 3, 0)
+ if NP_ROLL_CAN_MULTISHIFT:
+ self._testAll(
+ np.random.randint(-100, 100, (4, 4, 3)).astype(t), [1, -2, 3],
+ [0, 1, 2])
+ self._testAll(
+ np.random.randint(-100, 100, (4, 2, 1, 3)).astype(t), [0, 1, -2],
+ [1, 2, 3])
+
+ def testFloatTypes(self):
+ for t in [np.float32, np.float64]:
+ self._testAll(np.random.rand(5).astype(t), 2, 0)
+ if NP_ROLL_CAN_MULTISHIFT:
+ self._testAll(np.random.rand(3, 4).astype(t), [1, 2], [1, 0])
+ self._testAll(np.random.rand(1, 3, 4).astype(t), [1, 0, -3], [0, 1, 2])
+
+ def testComplexTypes(self):
+ for t in [np.complex64, np.complex128]:
+ x = np.random.rand(4, 4).astype(t)
+ self._testAll(x + 1j * x, 2, 0)
+ if NP_ROLL_CAN_MULTISHIFT:
+ x = np.random.rand(2, 5).astype(t)
+ self._testAll(x + 1j * x, [1, 2], [1, 0])
+ x = np.random.rand(3, 2, 1, 1).astype(t)
+ self._testAll(x + 1j * x, [2, 1, 1, 0], [0, 3, 1, 2])
+
+ def testRollInputMustVectorHigherRaises(self):
+ tensor = 7
+ shift = 1
+ axis = 0
+ with self.test_session():
+ with self.assertRaisesRegexp(errors_impl.InvalidArgumentError,
+ "input must be 1-D or higher"):
+ manip_ops.roll(tensor, shift, axis).eval()
+
+ def testRollAxisMustBeScalarOrVectorRaises(self):
+ tensor = [[1, 2], [3, 4]]
+ shift = 1
+ axis = [[0, 1]]
+ with self.test_session():
+ with self.assertRaisesRegexp(errors_impl.InvalidArgumentError,
+ "axis must be a scalar or a 1-D vector"):
+ manip_ops.roll(tensor, shift, axis).eval()
+
+ def testRollShiftMustBeScalarOrVectorRaises(self):
+ tensor = [[1, 2], [3, 4]]
+ shift = [[0, 1]]
+ axis = 1
+ with self.test_session():
+ with self.assertRaisesRegexp(errors_impl.InvalidArgumentError,
+ "shift must be a scalar or a 1-D vector"):
+ manip_ops.roll(tensor, shift, axis).eval()
+
+ def testRollShiftAndAxisMustBeSameSizeRaises(self):
+ tensor = [[1, 2], [3, 4]]
+ shift = [1]
+ axis = [0, 1]
+ with self.test_session():
+ with self.assertRaisesRegexp(errors_impl.InvalidArgumentError,
+ "shift and axis must have the same size"):
+ manip_ops.roll(tensor, shift, axis).eval()
+
+ def testRollAxisOutOfRangeRaises(self):
+ tensor = [1, 2]
+ shift = 1
+ axis = 1
+ with self.test_session():
+ with self.assertRaisesRegexp(errors_impl.InvalidArgumentError,
+ "is out of range"):
+ manip_ops.roll(tensor, shift, axis).eval()
+
+
+if __name__ == "__main__":
+ test_lib.main()
import numpy as np
+from six.moves import xrange # pylint: disable=redefined-builtin
from tensorflow.contrib import rnn as contrib_rnn
from tensorflow.core.protobuf import config_pb2
from tensorflow.python.client import session
a = [[1, 2], [3, 4]]
b = [[1, 2], [3, 4]]
# Invalid static axes.
- for axes_value in -1, 0, [1], [[1]], [[1], [0, 1]]:
+ for axes_value in -1, 3, [1], [[1]], [[1], [0, 1]]:
with self.assertRaises(ValueError):
math_ops.tensordot(a, b, axes_value)
# Test case for 11950
def test_valid_axis(self):
- for axes_value in [1, 2], [[1], [2]]:
+ for axes_value in [1, 2], [[1], [2]], [[], []], 0:
with self.test_session() as sess:
np_a = np.ones((3, 3))
np_b = np.array([2, 3, 1])[None, None]
self.assertAllEqual(tf_ans, np_ans)
def test_partial_shape_inference(self):
- a = array_ops.placeholder(dtypes.float32)
- b = array_ops.placeholder(dtypes.float32)
- axes = ([1], [0])
- output = math_ops.tensordot(a, b, axes)
- self.assertEqual(output.get_shape().ndims, None)
- a.set_shape([None, 2])
- b.set_shape([2, 3])
- output = math_ops.tensordot(a, b, axes)
- output_shape = output.get_shape()
- self.assertEqual(output_shape.ndims, 2)
- output_shape = output_shape.as_list()
- self.assertEqual(output_shape[0], None)
- self.assertEqual(output_shape[1], 3)
- a = array_ops.placeholder(dtypes.float32)
- b = array_ops.placeholder(dtypes.float32)
- a.set_shape([2, 2])
- b.set_shape([2, None])
- output = math_ops.tensordot(a, b, axes)
- output_shape = output.get_shape()
- self.assertEqual(output_shape.ndims, 2)
- output_shape = output_shape.as_list()
- self.assertEqual(output_shape[0], 2)
- self.assertEqual(output_shape[1], None)
+ for axes in ([1], [0]), 1:
+ a = array_ops.placeholder(dtypes.float32)
+ b = array_ops.placeholder(dtypes.float32)
+ output = math_ops.tensordot(a, b, axes)
+ self.assertEqual(output.get_shape().ndims, None)
+ a.set_shape([None, 2])
+ b.set_shape([2, 3])
+ output = math_ops.tensordot(a, b, axes)
+ output_shape = output.get_shape()
+ self.assertEqual(output_shape.ndims, 2)
+ output_shape = output_shape.as_list()
+ self.assertEqual(output_shape[0], None)
+ self.assertEqual(output_shape[1], 3)
+ a = array_ops.placeholder(dtypes.float32)
+ b = array_ops.placeholder(dtypes.float32)
+ a.set_shape([2, 2])
+ b.set_shape([2, None])
+ output = math_ops.tensordot(a, b, axes)
+ output_shape = output.get_shape()
+ self.assertEqual(output_shape.ndims, 2)
+ output_shape = output_shape.as_list()
+ self.assertEqual(output_shape[0], 2)
+ self.assertEqual(output_shape[1], None)
def _get_tensordot_tests(dtype_, rank_a_, rank_b_, num_dims_, dynamic_shape_):
low=-1.0, high=1.0, size=np.prod(shape)).reshape(shape).astype(dtype_)
b_np = np.random.uniform(
low=-1.0, high=1.0, size=np.prod(shape)).reshape(shape).astype(dtype_)
- all_axes = [1]
- if a_np.ndim > 1:
+ all_axes = [0, 1]
+ if a_np.ndim > 2:
all_axes.append(a_np.ndim - 1)
for axes in all_axes:
np_ans = np.tensordot(a_np, b_np, axes=axes)
# Do some special casing of equality of indices: if indices
# are not the same, but values are floating type, ensure that
# the values are within epsilon of each other.
- if not np.issubdtype(np_expected_values.dtype, np.float):
+ if not np.issubdtype(np_expected_values.dtype, np.floating):
# Values are not floating point type; check indices exactly
self.assertAllEqual(np_expected_indices, indices)
else:
strides = (1, 1, 1) + self.strides
spatial_start_dim = 2
- # Explictly broadcast inputs and kernels to 4D.
+ # Explicitly broadcast inputs and kernels to 4D.
# TODO(fchollet): refactor when a native separable_conv1d op is available.
inputs = array_ops.expand_dims(inputs, spatial_start_dim)
depthwise_kernel = array_ops.expand_dims(self.depthwise_kernel, 0)
dtype=self.dtype)
else:
self.bias = None
+ self.built = True
def call(self, inputs):
inputs_shape = array_ops.shape(inputs)
if self.use_bias:
outputs_shape = outputs.shape.as_list()
+ if outputs_shape[0] is None:
+ outputs_shape[0] = -1
if self.data_format == 'channels_first':
outputs_4d = array_ops.reshape(outputs, [
outputs_shape[0], outputs_shape[1],
output_shape[c_axis] = self.filters
output_shape[d_axis] = utils.deconv_output_length(
- output_shape[d_axis], stride_d, kernel_d, self.padding)
+ output_shape[d_axis], kernel_d, self.padding, stride_d)
output_shape[h_axis] = utils.deconv_output_length(
- output_shape[h_axis], stride_h, kernel_h, self.padding)
+ output_shape[h_axis], kernel_h, self.padding, stride_h)
output_shape[w_axis] = utils.deconv_output_length(
- output_shape[w_axis], stride_w, kernel_w, self.padding)
+ output_shape[w_axis], kernel_w, self.padding, stride_w)
return tensor_shape.TensorShape(output_shape)
for single_value in value_tuple:
try:
int(single_value)
- except ValueError:
+ except (ValueError, TypeError):
raise ValueError('The `' + name + '` argument must be a tuple of ' +
str(n) + ' integers. Received: ' + str(value) + ' '
'including element ' + str(single_value) + ' of type' +
@@reshape
@@squeeze
@@expand_dims
+@@unravel_index
@@meshgrid
@@slice
@@strided_slice
Args:
tensor: A `Tensor`.
- dtype: A type for the returned `Tensor`. Must be `float32`, `float64`,
- `int8`, `uint8`, `int16`, `uint16`, int32`, `int64`,
- `complex64`, `complex128` or `bool`.
+ dtype: A type for the returned `Tensor`. Must be `float16`, `float32`,
+ `float64`, `int8`, `uint8`, `int16`, `uint16`, `int32`, `int64`,
+ `complex64`, `complex128`, `bool` or `string`.
name: A name for the operation (optional).
optimize: if true, attempt to statically determine the shape of 'tensor'
and encode it as a constant.
For example, if `elems` is `(t1, [t2, t3])` and `initializer` is
`[i1, i2]` then an appropriate signature for `fn` in `python2` is:
- `fn = lambda (acc_p1, acc_p2), (t1 [t2, t3]):` and `fn` must return a list,
+ `fn = lambda (acc_p1, acc_p2), (t1, [t2, t3]):` and `fn` must return a list,
`[acc_n1, acc_n2]`. An alternative correct signature for `fn`, and the
one that works in `python3`, is:
`fn = lambda a, t:`, where `a` and `t` correspond to the input tuples.
from tensorflow.python.ops import linalg_grad # pylint: disable=unused-import
from tensorflow.python.ops import linalg_ops # pylint: disable=unused-import
from tensorflow.python.ops import logging_ops # pylint: disable=unused-import
+from tensorflow.python.ops import manip_grad # pylint: disable=unused-import
from tensorflow.python.ops import math_grad # pylint: disable=unused-import
from tensorflow.python.ops import math_ops
from tensorflow.python.ops import resource_variable_ops
@@grayscale_to_rgb
@@hsv_to_rgb
@@rgb_to_hsv
+@@rgb_to_yiq
+@@yiq_to_rgb
+@@rgb_to_yuv
+@@yuv_to_rgb
@@convert_image_dtype
@@adjust_brightness
@@random_brightness
bounding_boxes,
seed=None,
seed2=None,
- min_object_covered=None,
+ min_object_covered=0.1,
aspect_ratio_range=None,
area_range=None,
max_attempts=None,
return gen_image_ops._non_max_suppression_v2(boxes, scores, max_output_size,
iou_threshold)
# pylint: enable=protected-access
+
+
+_rgb_to_yiq_kernel = [[0.299, 0.59590059,
+ 0.2115], [0.587, -0.27455667, -0.52273617],
+ [0.114, -0.32134392, 0.31119955]]
+
+
+def rgb_to_yiq(images):
+ """Converts one or more images from RGB to YIQ.
+
+ Outputs a tensor of the same shape as the `images` tensor, containing the YIQ
+ value of the pixels.
+ The output is only well defined if the value in images are in [0,1].
+
+ Args:
+ images: 2-D or higher rank. Image data to convert. Last dimension must be
+ size 3.
+
+ Returns:
+ images: tensor with the same shape as `images`.
+ """
+ images = ops.convert_to_tensor(images, name='images')
+ kernel = ops.convert_to_tensor(
+ _rgb_to_yiq_kernel, dtype=images.dtype, name='kernel')
+ ndims = images.get_shape().ndims
+ return math_ops.tensordot(images, kernel, axes=[[ndims - 1], [0]])
+
+
+_yiq_to_rgb_kernel = [[1, 1, 1], [0.95598634, -0.27201283, -1.10674021],
+ [0.6208248, -0.64720424, 1.70423049]]
+
+
+def yiq_to_rgb(images):
+ """Converts one or more images from YIQ to RGB.
+
+ Outputs a tensor of the same shape as the `images` tensor, containing the RGB
+ value of the pixels.
+ The output is only well defined if the Y value in images are in [0,1],
+ I value are in [-0.5957,0.5957] and Q value are in [-0.5226,0.5226].
+
+ Args:
+ images: 2-D or higher rank. Image data to convert. Last dimension must be
+ size 3.
+
+ Returns:
+ images: tensor with the same shape as `images`.
+ """
+ images = ops.convert_to_tensor(images, name='images')
+ kernel = ops.convert_to_tensor(
+ _yiq_to_rgb_kernel, dtype=images.dtype, name='kernel')
+ ndims = images.get_shape().ndims
+ return math_ops.tensordot(images, kernel, axes=[[ndims - 1], [0]])
+
+
+_rgb_to_yuv_kernel = [[0.299, -0.14714119,
+ 0.61497538], [0.587, -0.28886916, -0.51496512],
+ [0.114, 0.43601035, -0.10001026]]
+
+
+def rgb_to_yuv(images):
+ """Converts one or more images from RGB to YUV.
+
+ Outputs a tensor of the same shape as the `images` tensor, containing the YUV
+ value of the pixels.
+ The output is only well defined if the value in images are in [0,1].
+
+ Args:
+ images: 2-D or higher rank. Image data to convert. Last dimension must be
+ size 3.
+
+ Returns:
+ images: tensor with the same shape as `images`.
+ """
+ images = ops.convert_to_tensor(images, name='images')
+ kernel = ops.convert_to_tensor(
+ _rgb_to_yuv_kernel, dtype=images.dtype, name='kernel')
+ ndims = images.get_shape().ndims
+ return math_ops.tensordot(images, kernel, axes=[[ndims - 1], [0]])
+
+
+_yuv_to_rgb_kernel = [[1, 1, 1], [0, -0.394642334, 2.03206185],
+ [1.13988303, -0.58062185, 0]]
+
+
+def yuv_to_rgb(images):
+ """Converts one or more images from YUV to RGB.
+
+ Outputs a tensor of the same shape as the `images` tensor, containing the RGB
+ value of the pixels.
+ The output is only well defined if the Y value in images are in [0,1],
+ U and V value are in [-0.5,0.5].
+
+ Args:
+ images: 2-D or higher rank. Image data to convert. Last dimension must be
+ size 3.
+
+ Returns:
+ images: tensor with the same shape as `images`.
+ """
+ images = ops.convert_to_tensor(images, name='images')
+ kernel = ops.convert_to_tensor(
+ _yuv_to_rgb_kernel, dtype=images.dtype, name='kernel')
+ ndims = images.get_shape().ndims
+ return math_ops.tensordot(images, kernel, axes=[[ndims - 1], [0]])
self.assertAllClose(rgb_tf, rgb_np)
+class RGBToYIQTest(test_util.TensorFlowTestCase):
+
+ def testBatch(self):
+ # Build an arbitrary RGB image
+ np.random.seed(7)
+ batch_size = 5
+ shape = (batch_size, 2, 7, 3)
+
+ for nptype in [np.float32, np.float64]:
+ inp = np.random.rand(*shape).astype(nptype)
+
+ # Convert to YIQ and back, as a batch and individually
+ with self.test_session(use_gpu=True) as sess:
+ batch0 = constant_op.constant(inp)
+ batch1 = image_ops.rgb_to_yiq(batch0)
+ batch2 = image_ops.yiq_to_rgb(batch1)
+ split0 = array_ops.unstack(batch0)
+ split1 = list(map(image_ops.rgb_to_yiq, split0))
+ split2 = list(map(image_ops.yiq_to_rgb, split1))
+ join1 = array_ops.stack(split1)
+ join2 = array_ops.stack(split2)
+ batch1, batch2, join1, join2 = sess.run([batch1, batch2, join1, join2])
+
+ # Verify that processing batch elements together is the same as separate
+ self.assertAllClose(batch1, join1, rtol=1e-4, atol=1e-4)
+ self.assertAllClose(batch2, join2, rtol=1e-4, atol=1e-4)
+ self.assertAllClose(batch2, inp, rtol=1e-4, atol=1e-4)
+
+
+class RGBToYUVTest(test_util.TensorFlowTestCase):
+
+ def testBatch(self):
+ # Build an arbitrary RGB image
+ np.random.seed(7)
+ batch_size = 5
+ shape = (batch_size, 2, 7, 3)
+
+ for nptype in [np.float32, np.float64]:
+ inp = np.random.rand(*shape).astype(nptype)
+
+ # Convert to YUV and back, as a batch and individually
+ with self.test_session(use_gpu=True) as sess:
+ batch0 = constant_op.constant(inp)
+ batch1 = image_ops.rgb_to_yuv(batch0)
+ batch2 = image_ops.yuv_to_rgb(batch1)
+ split0 = array_ops.unstack(batch0)
+ split1 = list(map(image_ops.rgb_to_yuv, split0))
+ split2 = list(map(image_ops.yuv_to_rgb, split1))
+ join1 = array_ops.stack(split1)
+ join2 = array_ops.stack(split2)
+ batch1, batch2, join1, join2 = sess.run([batch1, batch2, join1, join2])
+
+ # Verify that processing batch elements together is the same as separate
+ self.assertAllClose(batch1, join1, rtol=1e-4, atol=1e-4)
+ self.assertAllClose(batch2, join2, rtol=1e-4, atol=1e-4)
+ self.assertAllClose(batch2, inp, rtol=1e-4, atol=1e-4)
+
+
class GrayscaleToRGBTest(test_util.TensorFlowTestCase):
def _RGBToGrayscale(self, images):
self.assertAllEqual([3], end.get_shape().as_list())
self.assertAllEqual([1, 1, 4], bbox_for_drawing.get_shape().as_list())
+ def testDefaultMinObjectCovered(self):
+ # By default min_object_covered=0.1 if not provided
+ with self.test_session(use_gpu=True):
+ image_size = constant_op.constant(
+ [40, 50, 1], shape=[3], dtype=dtypes.int32)
+ bounding_box = constant_op.constant(
+ [0.0, 0.0, 1.0, 1.0],
+ shape=[4],
+ dtype=dtypes.float32,
+ )
+ begin, end, bbox_for_drawing = image_ops.sample_distorted_bounding_box(
+ image_size=image_size,
+ bounding_boxes=bounding_box,
+ aspect_ratio_range=(0.75, 1.33),
+ area_range=(0.05, 1.0))
+
+ self.assertAllEqual([3], begin.get_shape().as_list())
+ self.assertAllEqual([3], end.get_shape().as_list())
+ self.assertAllEqual([1, 1, 4], bbox_for_drawing.get_shape().as_list())
+
class ResizeImagesTest(test_util.TensorFlowTestCase):
boxes, scores, max_output_size, iou_threshold).eval()
self.assertAllClose(selected_indices, [3, 0, 5])
+ def testInvalidShape(self):
+ # The boxes should be 2D of shape [num_boxes, 4].
+ with self.assertRaisesRegexp(ValueError,
+ "Shape must be rank 2 but is rank 1"):
+ boxes = constant_op.constant([0.0, 0.0, 1.0, 1.0])
+ scores = constant_op.constant([0.9])
+ image_ops.non_max_suppression(boxes, scores, 3, 0.5)
+
+ with self.assertRaisesRegexp(ValueError, "Dimension must be 4 but is 3"):
+ boxes = constant_op.constant([[0.0, 0.0, 1.0]])
+ scores = constant_op.constant([0.9])
+ image_ops.non_max_suppression(boxes, scores, 3, 0.5)
+
+ # The scores should be 1D of shape [num_boxes].
+ with self.assertRaisesRegexp(ValueError,
+ "Shape must be rank 1 but is rank 2"):
+ boxes = constant_op.constant([[0.0, 0.0, 1.0, 1.0]])
+ scores = constant_op.constant([[0.9]])
+ image_ops.non_max_suppression(boxes, scores, 3, 0.5)
+
+ # The max_output_size should be a scaler (0-D).
+ with self.assertRaisesRegexp(ValueError,
+ "Shape must be rank 0 but is rank 1"):
+ boxes = constant_op.constant([[0.0, 0.0, 1.0, 1.0]])
+ scores = constant_op.constant([0.9])
+ image_ops.non_max_suppression(boxes, scores, [3], 0.5)
+
+ # The iou_threshold should be a scaler (0-D).
+ with self.assertRaisesRegexp(ValueError,
+ "Shape must be rank 0 but is rank 2"):
+ boxes = constant_op.constant([[0.0, 0.0, 1.0, 1.0]])
+ scores = constant_op.constant([0.9])
+ image_ops.non_max_suppression(boxes, scores, 3, [[0.5]])
+
if __name__ == "__main__":
googletest.main()
# https://j-towns.github.io/papers/svd-derivative.pdf
a = op.inputs[0]
a_shape = a.get_shape().with_rank_at_least(2)
+ grad_s_mat = array_ops.matrix_diag(grad_s)
- if op.get_attr("compute_uv"):
- # TODO(rmlarsen): Make this work with complex types.
- if a.dtype.is_complex:
- raise NotImplementedError(
- "SVD gradient is not implemented for complex types and "
- "compute_uv=True.")
- grad_u_shape = grad_u.get_shape().with_rank_at_least(2)
- grad_v_shape = grad_v.get_shape().with_rank_at_least(2)
- m = a_shape[-2].merge_with(grad_u_shape[-2])
- n = a_shape[-1].merge_with(grad_v_shape[-2])
- batch_shape = a_shape[:-2].merge_with(grad_u_shape[:-2]).merge_with(
- grad_v_shape[:-2])
- a_shape = batch_shape.concatenate([m, n])
+ if not op.get_attr("compute_uv"):
+ s, u, v = linalg_ops.svd(a, compute_uv=True)
+ grad_a = math_ops.matmul(u, math_ops.matmul(grad_s_mat, v, adjoint_b=True))
+ grad_a.set_shape(a_shape)
+ return grad_a
+
+ full_matrices = op.get_attr("full_matrices")
+
+ # TODO(rmlarsen): Make this work with complex types.
+ if a.dtype.is_complex:
+ raise NotImplementedError(
+ "SVD gradient is not implemented for complex types and "
+ "compute_uv=True.")
+ grad_u_shape = grad_u.get_shape().with_rank_at_least(2)
+ grad_v_shape = grad_v.get_shape().with_rank_at_least(2)
+ m = a_shape[-2].merge_with(grad_u_shape[-2])
+ n = a_shape[-1].merge_with(grad_v_shape[-2])
+ batch_shape = a_shape[:-2].merge_with(grad_u_shape[:-2]).merge_with(
+ grad_v_shape[:-2])
+ a_shape = batch_shape.concatenate([m, n])
m = a_shape[-2].value
n = a_shape[-1].value
"SVD gradient has not been implemented for input with unknown "
"inner matrix shape.")
- if not op.get_attr("compute_uv"):
- s, u, v = linalg_ops.svd(a, compute_uv=True, full_matrices=True)
- else:
- s = op.outputs[0]
- u = op.outputs[1]
- v = op.outputs[2]
+ s = op.outputs[0]
+ u = op.outputs[1]
+ v = op.outputs[2]
use_adjoint = False
if m > n:
grad_u, grad_v = grad_v, grad_u
with ops.control_dependencies([grad_s, grad_u, grad_v]):
- grad_s_mat = array_ops.matrix_diag(grad_s)
- if not op.get_attr("compute_uv"):
- if use_adjoint:
- grad_a = math_ops.matmul(
- v[..., :, :m], math_ops.matmul(u, grad_s_mat), adjoint_b=True)
- else:
- grad_a = math_ops.matmul(u,
- math_ops.matmul(
- grad_s_mat, v[..., :, :m], adjoint_b=True))
- grad_a.set_shape(a_shape)
- return grad_a
-
- if op.get_attr("full_matrices") and abs(m - n) > 1:
+ if full_matrices and abs(m - n) > 1:
raise NotImplementedError(
"svd gradient is not implemented for abs(m - n) > 1 "
"when full_matrices is True")
gv1t_v1 = math_ops.matmul(gv1t, v1)
term2_nous = gv1t - math_ops.matmul(gv1t_v1, v1, adjoint_b=True)
- if op.get_attr("full_matrices"):
+ if full_matrices:
v2 = v[..., :, m:n]
grad_v2 = grad_v[..., :, m:n]
num_present_per_batch = _num_present(diffs, weights, per_batch=True)
term1 = 2.0 * _safe_div(sum_squares_diff_per_batch,
- num_present_per_batch)
+ num_present_per_batch - 1)
sum_diff = math_ops.reduce_sum(
diffs, reduction_indices=reduction_indices, keep_dims=True)
- term2 = 2.0 * _safe_div(math_ops.square(sum_diff),
- math_ops.square(num_present_per_batch))
+ term2 = 2.0 * _safe_div(
+ math_ops.square(sum_diff),
+ math_ops.multiply(num_present_per_batch, num_present_per_batch - 1))
weighted_losses = math_ops.multiply(term1 - term2, weights)
loss = math_ops.reduce_sum(weighted_losses)
--- /dev/null
+# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Gradients for operators defined in manip_ops.py."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.python.framework import ops
+from tensorflow.python.ops import manip_ops
+
+
+@ops.RegisterGradient("Roll")
+def _RollGrad(op, grad):
+ # The gradient is just the roll reversed
+ shift = op.inputs[1]
+ axis = op.inputs[2]
+ roll_grad = manip_ops.roll(grad, -shift, axis)
+ return roll_grad, None, None
--- /dev/null
+# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Operators for manipulating tensors.
+
+@@roll
+"""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+from tensorflow.python.ops import gen_manip_ops as _gen_manip_ops
+from tensorflow.python.util.all_util import remove_undocumented
+
+
+# pylint: disable=protected-access
+def roll(input, shift, axis): # pylint: disable=redefined-builtin
+ return _gen_manip_ops.roll(input, shift, axis)
+
+
+roll.__doc__ = _gen_manip_ops.roll.__doc__
+# pylint: enable=protected-access
+
+_allowed_symbols = ['roll']
+
+remove_undocumented(__name__, allowed_exception_list=_allowed_symbols)
"""Generates two sets of contraction axes for the two tensor arguments."""
a_shape = a.get_shape()
if isinstance(axes, compat.integral_types):
- if axes < 1:
- raise ValueError("'axes' must be at least 1.")
+ if axes < 0:
+ raise ValueError("'axes' must be at least 0.")
if a_shape.ndims is not None:
- return range(a_shape.ndims - axes, a_shape.ndims), range(axes)
+ if axes > a_shape.ndims:
+ raise ValueError("'axes' must not be larger than the number of "
+ "dimensions of tensor %s." % a)
+ return (list(xrange(a_shape.ndims - axes, a_shape.ndims)),
+ list(xrange(axes)))
else:
rank = array_ops.rank(a)
return (range(rank - axes, rank, dtype=dtypes.int32),
def _copy_some_through(current, candidate):
"""Copy some tensors through via array_ops.where."""
def copy_fn(cur_i, cand_i):
+ # TensorArray and scalar get passed through.
+ if isinstance(cur_i, tensor_array_ops.TensorArray):
+ return cand_i
+ if cur_i.shape.ndims == 0:
+ return cand_i
+ # Otherwise propagate the old or the new value.
with ops.colocate_with(cand_i):
return array_ops.where(elements_finished, cur_i, cand_i)
return nest.map_structure(copy_fn, current, candidate)
# Imports the following modules so that @RegisterGradient get executed.
from tensorflow.python.ops import array_grad
from tensorflow.python.ops import data_flow_grad
+from tensorflow.python.ops import manip_grad
from tensorflow.python.ops import math_grad
from tensorflow.python.ops import sparse_grad
from tensorflow.python.ops import spectral_grad
# TODO(vrv): Switch to import * once we're okay with exposing the module.
from tensorflow.python.ops.confusion_matrix import confusion_matrix
from tensorflow.python.ops.control_flow_ops import Assert
+from tensorflow.python.ops.control_flow_ops import case
+from tensorflow.python.ops.control_flow_ops import cond
from tensorflow.python.ops.control_flow_ops import group
from tensorflow.python.ops.control_flow_ops import no_op
+# pylint: disable=redefined-builtin
from tensorflow.python.ops.control_flow_ops import tuple
-from tensorflow.python.ops.control_flow_ops import cond
-from tensorflow.python.ops.control_flow_ops import case
+# pylint: enable=redefined-builtin
from tensorflow.python.ops.control_flow_ops import while_loop
from tensorflow.python.ops.data_flow_ops import *
from tensorflow.python.ops.functional_ops import *
from tensorflow.python.ops.logging_ops import get_summary_op
from tensorflow.python.ops.lookup_ops import initialize_all_tables
from tensorflow.python.ops.lookup_ops import tables_initializer
+from tensorflow.python.ops.manip_ops import *
from tensorflow.python.ops.math_ops import *
from tensorflow.python.ops.numerics import *
from tensorflow.python.ops.parsing_ops import *
from tensorflow.python.ops import io_ops as _io_ops
from tensorflow.python.ops import linalg_ops as _linalg_ops
from tensorflow.python.ops import logging_ops as _logging_ops
+from tensorflow.python.ops import manip_ops as _manip_ops
from tensorflow.python.ops import math_ops as _math_ops
from tensorflow.python.ops import numerics as _numerics
from tensorflow.python.ops import parsing_ops as _parsing_ops
_allowed_symbols_misc +
_allowed_symbols_partitioned_variables)
-remove_undocumented(__name__, _allowed_symbols,
- [_sys.modules[__name__],
- _array_ops,
- _check_ops,
- _clip_ops,
- _confusion_matrix,
- _control_flow_ops,
- _constant_op,
- _data_flow_ops,
- _functional_ops,
- _gradients,
- _histogram_ops,
- _init_ops,
- _io_ops,
- _linalg_ops,
- _logging_ops,
- _math_ops,
- _numerics,
- _parsing_ops,
- _partitioned_variables,
- _random_ops,
- _script_ops,
- _session_ops,
- _sparse_ops,
- _special_math_ops,
- _state_ops,
- _string_ops,
- _template,
- _tensor_array_ops,
- _variable_scope,
- _variables,])
+remove_undocumented(__name__, _allowed_symbols, [
+ _sys.modules[__name__],
+ _array_ops,
+ _check_ops,
+ _clip_ops,
+ _confusion_matrix,
+ _control_flow_ops,
+ _constant_op,
+ _data_flow_ops,
+ _functional_ops,
+ _gradients,
+ _histogram_ops,
+ _init_ops,
+ _io_ops,
+ _linalg_ops,
+ _logging_ops,
+ _manip_ops,
+ _math_ops,
+ _numerics,
+ _parsing_ops,
+ _partitioned_variables,
+ _random_ops,
+ _script_ops,
+ _session_ops,
+ _sparse_ops,
+ _special_math_ops,
+ _state_ops,
+ _string_ops,
+ _template,
+ _tensor_array_ops,
+ _variable_scope,
+ _variables,
+])
asset_tensors_dictionary = _get_asset_tensors(export_dir,
meta_graph_def_to_load)
- main_op_tensor = _get_main_op_tensor(meta_graph_def_to_load)
+ main_op_tensor = (
+ _get_main_op_tensor(meta_graph_def_to_load) or
+ (_get_legacy_init_op_tensor(meta_graph_def_to_load)))
if main_op_tensor is not None:
sess.run(fetches=[main_op_tensor], feed_dict=asset_tensors_dictionary)
- else:
- legacy_init_op_tensor = _get_legacy_init_op_tensor(meta_graph_def_to_load)
- if legacy_init_op_tensor is not None:
- sess.run(
- fetches=[legacy_init_op_tensor], feed_dict=asset_tensors_dictionary)
return meta_graph_def_to_load
variable_names_blacklist="",
input_meta_graph_def=None,
input_saved_model_dir=None,
- saved_model_tags=None):
+ saved_model_tags=None,
+ checkpoint_version=saver_pb2.SaverDef.V2):
"""Converts all variables in a graph and checkpoint into constants."""
del restore_op_name, filename_tensor_name # Unused by updated loading code.
_ = importer.import_graph_def(input_graph_def, name="")
with session.Session() as sess:
if input_saver_def:
- saver = saver_lib.Saver(saver_def=input_saver_def)
+ saver = saver_lib.Saver(
+ saver_def=input_saver_def, write_version=checkpoint_version)
saver.restore(sess, input_checkpoint)
elif input_meta_graph_def:
restorer = saver_lib.import_meta_graph(
# 'global_step' or a similar housekeeping element) so skip it.
continue
var_list[key] = tensor
- saver = saver_lib.Saver(var_list=var_list)
+ saver = saver_lib.Saver(
+ var_list=var_list, write_version=checkpoint_version)
saver.restore(sess, input_checkpoint)
if initializer_nodes:
sess.run(initializer_nodes.split(","))
variable_names_blacklist="",
input_meta_graph=None,
input_saved_model_dir=None,
- saved_model_tags=tag_constants.SERVING):
+ saved_model_tags=tag_constants.SERVING,
+ checkpoint_version=saver_pb2.SaverDef.V2):
"""Converts all variables in a graph and checkpoint into constants."""
input_graph_def = None
if input_saved_model_dir:
if input_saver:
input_saver_def = _parse_input_saver_proto(input_saver, input_binary)
freeze_graph_with_def_protos(
- input_graph_def, input_saver_def, input_checkpoint, output_node_names,
- restore_op_name, filename_tensor_name, output_graph, clear_devices,
- initializer_nodes, variable_names_whitelist, variable_names_blacklist,
- input_meta_graph_def, input_saved_model_dir, saved_model_tags.split(","))
+ input_graph_def,
+ input_saver_def,
+ input_checkpoint,
+ output_node_names,
+ restore_op_name,
+ filename_tensor_name,
+ output_graph,
+ clear_devices,
+ initializer_nodes,
+ variable_names_whitelist,
+ variable_names_blacklist,
+ input_meta_graph_def,
+ input_saved_model_dir,
+ saved_model_tags.split(","),
+ checkpoint_version=checkpoint_version)
def main(unused_args):
FLAGS.output_graph, FLAGS.clear_devices, FLAGS.initializer_nodes,
FLAGS.variable_names_whitelist, FLAGS.variable_names_blacklist,
FLAGS.input_meta_graph, FLAGS.input_saved_model_dir,
- FLAGS.saved_model_tags)
+ FLAGS.saved_model_tags, FLAGS.checkpoint_version)
if __name__ == "__main__":
type=str,
default="",
help="TensorFlow variables file to load.")
+ parser.add_argument(
+ "--checkpoint_version",
+ type=int,
+ default=saver_pb2.SaverDef.V2,
+ help="Tensorflow variable file format")
parser.add_argument(
"--output_graph",
type=str,
input_meta_graph = checkpoint_meta_graph_file
freeze_graph.freeze_graph(
- input_graph_path, input_saver_def_path, input_binary, checkpoint_path,
- output_node_names, restore_op_name, filename_tensor_name,
- output_graph_path, clear_devices, "", "", input_meta_graph)
+ input_graph_path,
+ input_saver_def_path,
+ input_binary,
+ checkpoint_path,
+ output_node_names,
+ restore_op_name,
+ filename_tensor_name,
+ output_graph_path,
+ clear_devices,
+ "",
+ "",
+ input_meta_graph,
+ checkpoint_version=saver_write_version)
# Now we make sure the variable is now a constant, and that the graph still
# produces the expected result.
bias_add_op.op = "BiasAdd"
bias_add_op.name = node.name
bias_add_op.attr["T"].CopyFrom(conv_op.attr["T"])
+ bias_add_op.attr["data_format"].CopyFrom(conv_op.attr["data_format"])
bias_add_op.input.extend([new_conv_op.name, offset_op.name])
new_ops.extend([scaled_weights_op, new_conv_op, offset_op, bias_add_op])
self.assertNotEqual("BatchNormWithGlobalNormalization", node.op)
def testFoldFusedBatchNorms(self):
- with self.test_session() as sess:
- inputs = [1, 4, 2, 5, 3, 6, -1, -4, -2, -5, -3, -6]
- input_op = constant_op.constant(
- np.array(inputs), shape=[1, 1, 6, 2], dtype=dtypes.float32)
- weights = [1, 2, 3, 4, 0.1, 0.2, 0.3, 0.4]
- weights_op = constant_op.constant(
- np.array(weights), shape=[1, 2, 2, 2], dtype=dtypes.float32)
- conv_op = nn_ops.conv2d(
- input_op, weights_op, [1, 1, 1, 1], padding="SAME", name="conv_op")
- mean_op = constant_op.constant(
- np.array([10, 20]), shape=[2], dtype=dtypes.float32)
- variance_op = constant_op.constant(
- np.array([0.25, 0.5]), shape=[2], dtype=dtypes.float32)
- beta_op = constant_op.constant(
- np.array([0.1, 0.6]), shape=[2], dtype=dtypes.float32)
- gamma_op = constant_op.constant(
- np.array([1.0, 2.0]), shape=[2], dtype=dtypes.float32)
- ops.get_default_graph().graph_def_versions.producer = 9
- gen_nn_ops._fused_batch_norm(
- conv_op,
- gamma_op,
- beta_op,
- mean_op,
- variance_op,
- 0.00001,
- is_training=False,
- name="output")
- original_graph_def = sess.graph_def
- original_result = sess.run(["output:0"])
- optimized_graph_def = optimize_for_inference_lib.fold_batch_norms(
- original_graph_def)
-
- with self.test_session() as sess:
- _ = importer.import_graph_def(
- optimized_graph_def, input_map={}, name="optimized")
- optimized_result = sess.run(["optimized/output:0"])
-
- self.assertAllClose(
- original_result, optimized_result, rtol=1e-04, atol=1e-06)
-
- for node in optimized_graph_def.node:
- self.assertNotEqual("FusedBatchNorm", node.op)
+ for data_format, use_gpu in [("NHWC", False), ("NCHW", True)]:
+ with self.test_session(use_gpu=use_gpu) as sess:
+ inputs = [1, 4, 2, 5, 3, 6, -1, -4, -2, -5, -3, -6]
+ input_op = constant_op.constant(
+ np.array(inputs),
+ shape=[1, 1, 6, 2] if data_format == "NHWC" else [1, 2, 1, 6],
+ dtype=dtypes.float32)
+ weights = [1, 2, 3, 4, 0.1, 0.2, 0.3, 0.4]
+ weights_op = constant_op.constant(
+ np.array(weights), shape=[1, 2, 2, 2], dtype=dtypes.float32)
+ conv_op = nn_ops.conv2d(
+ input_op,
+ weights_op, [1, 1, 1, 1],
+ padding="SAME",
+ data_format=data_format,
+ name="conv_op")
+ mean_op = constant_op.constant(
+ np.array([10, 20]), shape=[2], dtype=dtypes.float32)
+ variance_op = constant_op.constant(
+ np.array([0.25, 0.5]), shape=[2], dtype=dtypes.float32)
+ beta_op = constant_op.constant(
+ np.array([0.1, 0.6]), shape=[2], dtype=dtypes.float32)
+ gamma_op = constant_op.constant(
+ np.array([1.0, 2.0]), shape=[2], dtype=dtypes.float32)
+ ops.get_default_graph().graph_def_versions.producer = 9
+ gen_nn_ops._fused_batch_norm(
+ conv_op,
+ gamma_op,
+ beta_op,
+ mean_op,
+ variance_op,
+ 0.00001,
+ is_training=False,
+ data_format=data_format,
+ name="output")
+ original_graph_def = sess.graph_def
+ original_result = sess.run(["output:0"])
+ optimized_graph_def = optimize_for_inference_lib.fold_batch_norms(
+ original_graph_def)
+
+ with self.test_session(use_gpu=use_gpu) as sess:
+ _ = importer.import_graph_def(
+ optimized_graph_def, input_map={}, name="optimized")
+ optimized_result = sess.run(["optimized/output:0"])
+
+ self.assertAllClose(
+ original_result, optimized_result, rtol=1e-04, atol=1e-06)
+
+ for node in optimized_graph_def.node:
+ self.assertNotEqual("FusedBatchNorm", node.op)
def testFuseResizePadAndConv(self):
with self.test_session() as sess:
import numpy as np
+from six import integer_types
from tensorflow.contrib.saved_model.python.saved_model import reader
from tensorflow.contrib.saved_model.python.saved_model import signature_def_utils
from tensorflow.core.example import example_pb2
elif isinstance(feature_list[0], str):
example.features.feature[feature_name].bytes_list.value.extend(
feature_list)
- elif isinstance(feature_list[0], (int, long)):
+ elif isinstance(feature_list[0], integer_types):
example.features.feature[feature_name].int64_list.value.extend(
feature_list)
else:
`CheckpointSaverHook`, as in this example:
```python
- class ExampleCheckpointSaverListerner(CheckpointSaverListener):
+ class ExampleCheckpointSaverListener(CheckpointSaverListener):
def begin(self):
# You can add ops to the graph here.
print('Starting the session.')
print('Done with the session.')
...
- listener = ExampleCheckpointSaverListerner()
+ listener = ExampleCheckpointSaverListener()
saver_hook = tf.train.CheckpointSaverHook(
checkpoint_dir, listeners=[listener])
with tf.train.MonitoredTrainingSession(chief_only_hooks=[saver_hook]):
def match_filenames_once(pattern, name=None):
"""Save the list of files matching pattern, so it is only computed once.
+ NOTE: The order of the files returned can be non-deterministic.
+
Args:
pattern: A file pattern (glob), or 1D tensor of file patterns.
name: A name for the operations (optional).
[Stripping Default-Valued Attributes](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md#stripping-default-valued-attributes).
Returns:
- A string: path prefix used for the checkpoint files. If the saver is
- sharded, this string ends with: '-?????-of-nnnnn' where 'nnnnn'
- is the number of shards created.
+ A string: path prefix used for the checkpoint files. If checkpoint
+ format is V1 and the saver is sharded, this string ends with:
+ '-?????-of-nnnnn' where 'nnnnn' is the number of shards created.
If the saver is empty, returns None.
Raises:
return
if save_path is None:
raise ValueError("Can't load save_path when it is None.")
+ if (os.path.isfile(save_path) and
+ self._write_version not in (
+ saver_pb2.SaverDef.V1, saver_pb2.SaverDef.LEGACY)):
+ raise ValueError("The specified path: %s is a file."
+ " Please specify only the path prefix"
+ " to the checkpoint files." % save_path)
logging.info("Restoring parameters from %s", save_path)
if context.in_graph_mode():
sess.run(self.saver_def.restore_op_name,
--- /dev/null
+# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Functions for Python 2 vs. 3 compatibility that are private to TensorFlow."""
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+
+def path_to_str(path):
+ """Returns the file system path representation of a `PathLike` object,
+ else as it is.
+
+ Args:
+ path: An object that can be converted to path representation.
+
+ Returns:
+ A `str` object.
+ """
+ if hasattr(path, "__fspath__"):
+ path = as_str_any(path.__fspath__())
+ return path
result = StringToDriverVersion(version);
}
#else
-#if !defined(PLATFORM_WINDOWS) && !defined(NVIDIA_TEGRA)
+#if !defined(PLATFORM_WINDOWS) && !defined(ANDROID_TEGRA)
// Callback used when iterating through DSOs. Looks for the driver-interfacing
// DSO and yields its version number into the callback data, when found.
auto iterate_phdr =
}
/* static */ port::Status DsoLoader::GetLibcuptiDsoHandle(void** dso_handle) {
+#if defined(ANDROID_TEGRA)
+ // On Android devices the CUDA version number is not added to the library
+ // name.
+ return GetDsoHandle(
+ FindDsoPath(port::Env::Default()->FormatLibraryFileName("cupti", ""),
+ GetCudaCuptiLibraryPath()),
+ dso_handle);
+#else
return GetDsoHandle(FindDsoPath(port::Env::Default()->FormatLibraryFileName(
"cupti", GetCudaVersion()),
GetCudaCuptiLibraryPath()),
dso_handle);
+#endif
}
static mutex& GetRpathMutex() {
clean_dep("//tensorflow:darwin"): [
"-Wl,%s" % (_make_search_paths("@loader_path", levels_to_root),),
],
+ clean_dep("//tensorflow:windows"): [],
+ clean_dep("//tensorflow:windows_msvc"): [],
"//conditions:default": [
"-Wl,%s" % (_make_search_paths("$$ORIGIN", levels_to_root),),
],
"-Wl,-install_name,@rpath/" + name.split("/")[-1],
],
"//conditions:default": [
+ "-Wl,-soname," + name.split("/")[-1],
],
}),
**kwargs)
"//tensorflow:android": [
"-pie",
],
+ clean_dep("//tensorflow:windows"): [],
+ clean_dep("//tensorflow:windows_msvc"): [],
"//conditions:default": [
"-lpthread",
"-lm"
"//conditions:default": [
"-lm",
],
+ clean_dep("//tensorflow:windows"): [],
+ clean_dep("//tensorflow:windows_msvc"): [],
clean_dep("//tensorflow:darwin"): [],
}),)
name: "rgb_to_hsv"
argspec: "args=[\'images\', \'name\'], varargs=None, keywords=None, defaults=[\'None\'], "
}
+ member_method {
+ name: "rgb_to_yiq"
+ argspec: "args=[\'images\'], varargs=None, keywords=None, defaults=None"
+ }
+ member_method {
+ name: "rgb_to_yuv"
+ argspec: "args=[\'images\'], varargs=None, keywords=None, defaults=None"
+ }
member_method {
name: "rot90"
argspec: "args=[\'image\', \'k\', \'name\'], varargs=None, keywords=None, defaults=[\'1\', \'None\'], "
}
member_method {
name: "sample_distorted_bounding_box"
- argspec: "args=[\'image_size\', \'bounding_boxes\', \'seed\', \'seed2\', \'min_object_covered\', \'aspect_ratio_range\', \'area_range\', \'max_attempts\', \'use_image_if_no_bounding_boxes\', \'name\'], varargs=None, keywords=None, defaults=[\'None\', \'None\', \'None\', \'None\', \'None\', \'None\', \'None\', \'None\'], "
+ argspec: "args=[\'image_size\', \'bounding_boxes\', \'seed\', \'seed2\', \'min_object_covered\', \'aspect_ratio_range\', \'area_range\', \'max_attempts\', \'use_image_if_no_bounding_boxes\', \'name\'], varargs=None, keywords=None, defaults=[\'None\', \'None\', \'0.1\', \'None\', \'None\', \'None\', \'None\', \'None\'], "
}
member_method {
name: "total_variation"
name: "transpose_image"
argspec: "args=[\'image\'], varargs=None, keywords=None, defaults=None"
}
+ member_method {
+ name: "yiq_to_rgb"
+ argspec: "args=[\'images\'], varargs=None, keywords=None, defaults=None"
+ }
+ member_method {
+ name: "yuv_to_rgb"
+ argspec: "args=[\'images\'], varargs=None, keywords=None, defaults=None"
+ }
}
path: "tensorflow.keras.layers.Conv3DTranspose"
tf_class {
is_instance: "<class \'tensorflow.python.keras._impl.keras.layers.convolutional.Conv3DTranspose\'>"
+ is_instance: "<class \'tensorflow.python.layers.convolutional.Conv3DTranspose\'>"
is_instance: "<class \'tensorflow.python.layers.convolutional.Conv3D\'>"
is_instance: "<class \'tensorflow.python.layers.convolutional._Conv\'>"
is_instance: "<class \'tensorflow.python.keras._impl.keras.engine.topology.Layer\'>"
path: "tensorflow.keras.layers.Convolution3DTranspose"
tf_class {
is_instance: "<class \'tensorflow.python.keras._impl.keras.layers.convolutional.Conv3DTranspose\'>"
+ is_instance: "<class \'tensorflow.python.layers.convolutional.Conv3DTranspose\'>"
is_instance: "<class \'tensorflow.python.layers.convolutional.Conv3D\'>"
is_instance: "<class \'tensorflow.python.layers.convolutional._Conv\'>"
is_instance: "<class \'tensorflow.python.keras._impl.keras.engine.topology.Layer\'>"
--- /dev/null
+path: "tensorflow.manip"
+tf_module {
+ member_method {
+ name: "roll"
+ argspec: "args=[\'input\', \'shift\', \'axis\'], varargs=None, keywords=None, defaults=None"
+ }
+}
name: "losses"
mtype: "<type \'module\'>"
}
+ member {
+ name: "manip"
+ mtype: "<type \'module\'>"
+ }
member {
name: "metrics"
mtype: "<type \'module\'>"
name: "unique_with_counts"
argspec: "args=[\'x\', \'out_idx\', \'name\'], varargs=None, keywords=None, defaults=[\"<dtype: \'int32\'>\", \'None\'], "
}
+ member_method {
+ name: "unravel_index"
+ argspec: "args=[\'indices\', \'dims\', \'name\'], varargs=None, keywords=None, defaults=[\'None\'], "
+ }
member_method {
name: "unsorted_segment_max"
argspec: "args=[\'data\', \'segment_ids\', \'num_segments\', \'name\'], varargs=None, keywords=None, defaults=[\'None\'], "
EXTRA_LICENSES_FILE="$(mktemp)_extra_licenses.log"
echo "Getting external dependencies for ${BUILD_TARGET}"
- bazel query "attr('licenses', 'notice', deps(${BUILD_TARGET}))" --no_implicit_deps --no_host_deps --keep_going \
+ bazel query "attr('licenses', 'notice', deps(${BUILD_TARGET}))" --keep_going \
| grep -E -v "^//tensorflow" \
| sed -e 's|:.*||' \
| sort \
echo
echo "Getting list of external licenses mentioned in ${LICENSES_TARGET}."
- bazel query "deps(${LICENSES_TARGET})" --no_implicit_deps --no_host_deps --keep_going \
+ bazel query "deps(${LICENSES_TARGET})" --keep_going \
| grep -E -v "^//tensorflow" \
| sed -e 's|:.*||' \
| sort \
EXTERNAL_LICENSES_CHECK_END_TIME=$(date +'%s')
+ # Blacklist
+ echo ${MISSING_LICENSES_FILE}
+ grep -e "@bazel_tools//third_party/" -e "@com_google_absl//absl" -e "@org_tensorflow//" -v ${MISSING_LICENSES_FILE} > temp.txt
+ mv temp.txt ${MISSING_LICENSES_FILE}
+
+ # Whitelist
+ echo ${EXTRA_LICENSE_FILE}
+ grep -e "@bazel_tools//src/" -e "@bazel_tools//tools/" -e "@com_google_absl//" -e "//external" -e "@local" -v ${EXTRA_LICENSES_FILE} > temp.txt
+ mv temp.txt ${EXTRA_LICENSES_FILE}
+
+
+
echo
echo "do_external_licenses_check took $((EXTERNAL_LICENSES_CHECK_END_TIME - EXTERNAL_LICENSES_CHECK_START_TIME)) s"
echo
# build_libtensorflow_tarball in ../builds/libtensorflow.sh
# cannot be used on Windows since it relies on pkg_tar rules.
# So we do something special here
-bazel build -c opt \
+bazel build -c opt --copt=/arch:AVX \
tensorflow:libtensorflow.so \
tensorflow/tools/lib_package:clicenses_generate \
tensorflow/java:libtensorflow_jni.so \
# build_libtensorflow_tarball in ../builds/libtensorflow.sh
# cannot be used on Windows since it relies on pkg_tar rules.
# So we do something special here
-bazel build -c opt \
+bazel build -c opt --copt=/arch:AVX \
tensorflow:libtensorflow.so \
tensorflow/tools/lib_package:clicenses_generate \
tensorflow/java:libtensorflow_jni.so \
import os
from IPython.lib import passwd
+c = c # pylint:disable=undefined-variable
c.NotebookApp.ip = '*'
c.NotebookApp.port = int(os.getenv('PORT', 8888))
c.NotebookApp.open_browser = False
"""
def __init__(self, name):
- """Creata a Metadata builder.
+ """Create a Metadata builder.
Args:
name: The name of the page being described by the Metadata block.
"//third_party/hadoop:LICENSE.txt",
"//third_party/eigen3:LICENSE",
"//third_party/fft2d:LICENSE",
+ "@aws//:LICENSE",
"@boringssl//:LICENSE",
"@com_googlesource_code_re2//:LICENSE",
"@cub_archive//:LICENSE.TXT",
"@jemalloc//:COPYING",
"@jpeg//:LICENSE.md",
"@libxsmm_archive//:LICENSE",
+ "@llvm//:LICENSE.TXT",
"@lmdb//:LICENSE",
"@local_config_sycl//sycl:LICENSE.text",
+ "@nasm//:LICENSE",
"@nsync//:LICENSE",
"@png_archive//:LICENSE",
"@protobuf_archive//:LICENSE",
"//third_party/hadoop:LICENSE.txt",
"//third_party/eigen3:LICENSE",
"//third_party/fft2d:LICENSE",
+ "@aws//:LICENSE",
"@boringssl//:LICENSE",
"@com_googlesource_code_re2//:LICENSE",
"@cub_archive//:LICENSE.TXT",
"@libxsmm_archive//:LICENSE",
"@lmdb//:LICENSE",
"@local_config_sycl//sycl:LICENSE.text",
+ "@nasm//:LICENSE",
"@nsync//:LICENSE",
"@png_archive//:LICENSE",
"@protobuf_archive//:LICENSE",
"//third_party/eigen3:LICENSE",
"//third_party/fft2d:LICENSE",
"//third_party/hadoop:LICENSE.txt",
+ "@absl_py//absl/flags:LICENSE",
+ "@arm_neon_2_x86_sse//:LICENSE",
+ "@astor_archive//:LICENSE",
+ "@aws//:LICENSE",
"@boringssl//:LICENSE",
+ "@com_google_absl//:LICENSE",
"@com_googlesource_code_re2//:LICENSE",
"@cub_archive//:LICENSE.TXT",
"@curl//:COPYING",
"@eigen_archive//:COPYING.MPL2",
"@farmhash_archive//:COPYING",
"@fft2d//:fft/readme.txt",
+ "@flatbuffers//:LICENSE.txt",
+ "@gast_archive//:PKG-INFO",
"@gemmlowp//:LICENSE",
"@gif_archive//:COPYING",
"@grpc//:LICENSE",
"@lmdb//:LICENSE",
"@local_config_sycl//sycl:LICENSE.text",
"@grpc//third_party/nanopb:LICENSE.txt",
+ "@nasm//:LICENSE",
"@nsync//:LICENSE",
+ "@pcre//:LICENCE",
"@png_archive//:LICENSE",
"@protobuf_archive//:LICENSE",
"@six_archive//:LICENSE",
"@snappy//:COPYING",
+ "@swig//:LICENSE",
+ "@termcolor_archive//:COPYING.txt",
"@zlib_archive//:zlib.h",
"@org_python_pypi_backports_weakref//:LICENSE",
] + if_mkl([
fi
fi
fi
- # Install toco as a binary in aux-bin.
mkdir "${TMPDIR}/tensorflow/aux-bin"
+ # Install toco as a binary in aux-bin.
cp bazel-bin/tensorflow/contrib/lite/toco/toco ${TMPDIR}/tensorflow/aux-bin/
fi
# This version string is semver compatible, but incompatible with pip.
# For pip, we will remove all '-' characters from this string, and use the
# result for pip.
-_VERSION = '1.5.0-rc1'
+_VERSION = '1.5.0'
REQUIRED_PACKAGES = [
'absl-py >= 0.1.6',
'numpy >= 1.12.1',
'six >= 1.10.0',
'protobuf >= 3.4.0',
- 'tensorflow-tensorboard >= 0.4.0',
+ 'tensorflow-tensorboard >= 1.5.0, < 1.6.0',
'termcolor >= 1.1.0',
]
matches = ['../' + x for x in find_files('*', 'external') if '.py' not in x]
-so_lib_paths = [i for i in os.listdir('.')
- if os.path.isdir(i)
- and fnmatch.fnmatch(i, '_solib_*')]
+so_lib_paths = [
+ i for i in os.listdir('.')
+ if os.path.isdir(i) and fnmatch.fnmatch(i, '_solib_*')
+]
for path in so_lib_paths:
matches.extend(
],
sha256 = "5996380e3e8b981f55d1c8d58e709c00dbb4806ba367be75d0925a68cc2f6478",
strip_prefix = "abseil-cpp-720c017e30339fd1786ce4aac68bc8559736e53f",
+ build_file = str(Label("//third_party:com_google_absl.BUILD")),
)
tf_http_archive(
name = "eigen_archive",
urls = [
- "https://mirror.bazel.build/bitbucket.org/eigen/eigen/get/14e1418fcf12.tar.gz",
- "https://bitbucket.org/eigen/eigen/get/14e1418fcf12.tar.gz",
+ "https://mirror.bazel.build/bitbucket.org/eigen/eigen/get/2355b229ea4c.tar.gz",
+ "https://bitbucket.org/eigen/eigen/get/2355b229ea4c.tar.gz",
],
- sha256 = "2b526c6888639025323fd4f2600533c0f982d304ea48e4f1663e8066bd9f6368",
- strip_prefix = "eigen-eigen-14e1418fcf12",
+ sha256 = "0cadb31a35b514bf2dfd6b5d38205da94ef326ec6908fc3fd7c269948467214f",
+ strip_prefix = "eigen-eigen-2355b229ea4c",
build_file = str(Label("//third_party:eigen.BUILD")),
)
build_file = str(Label("//third_party:nccl.BUILD")),
)
+ tf_http_archive(
+ name = "kafka",
+ urls = [
+ "https://mirror.bazel.build/github.com/edenhill/librdkafka/archive/v0.11.1.tar.gz",
+ "https://github.com/edenhill/librdkafka/archive/v0.11.1.tar.gz",
+ ],
+ sha256 = "dd035d57c8f19b0b612dd6eefe6e5eebad76f506e302cccb7c2066f25a83585e",
+ strip_prefix = "librdkafka-0.11.1",
+ build_file = str(Label("//third_party:kafka/BUILD")),
+ patch_file = str(Label("//third_party/kafka:config.patch")),
+ )
+
tf_http_archive(
name = "aws",
urls = [
--- /dev/null
+package(default_visibility = ["//visibility:public"])
+
+licenses(["notice"]) # Apache
+
+exports_files(["LICENSE"])
licenses(["notice"]) # Apache 2.0
+exports_files(["LICENSE.txt"])
+
config_setting(
name = "freebsd",
values = {"cpu": "freebsd"},
licenses(["notice"]) # BSD 3-clause
-exports_files(["LICENSE"])
+exports_files(["PKG-INFO"])
py_library(
name = "gast",
if src_dir != None:
src_dir = _norm_path(src_dir)
dest_dir = _norm_path(dest_dir)
- files = _read_dir(repository_ctx, src_dir)
+ files = '\n'.join(sorted(_read_dir(repository_ctx, src_dir).splitlines()))
# Create a list with the src_dir stripped to use for outputs.
dest_files = files.replace(src_dir, '').splitlines()
src_files = files.splitlines()
"-mfloat-abi=softfp",
"-fprefetch-loop-arrays",
],
+ ":linux_ppc64le": [
+ "-mcpu=power8",
+ "-mtune=power8",
+ ],
"//conditions:default": [],
})
":k8": [":simd_x86_64"],
":armeabi-v7a": [":simd_armv7a"],
":arm64-v8a": [":simd_armv8a"],
+ ":linux_ppc64le": [":simd_altivec"],
"//conditions:default": [":simd_none"],
}),
)
+cc_library(
+ name = "simd_altivec",
+ srcs = [
+ "jchuff.h",
+ "jconfig.h",
+ "jdct.h",
+ "jerror.h",
+ "jinclude.h",
+ "jmorecfg.h",
+ "jpegint.h",
+ "jpeglib.h",
+ "jsimd.h",
+ "jsimddct.h",
+ "simd/jccolor-altivec.c",
+ "simd/jcgray-altivec.c",
+ "simd/jcsample.h",
+ "simd/jcsample-altivec.c",
+ "simd/jdcolor-altivec.c",
+ "simd/jdmerge-altivec.c",
+ "simd/jdsample-altivec.c",
+ "simd/jfdctfst-altivec.c",
+ "simd/jfdctint-altivec.c",
+ "simd/jidctfst-altivec.c",
+ "simd/jidctint-altivec.c",
+ "simd/jquanti-altivec.c",
+ "simd/jsimd.h",
+ "simd/jsimd_altivec.h",
+ "simd/jsimd_powerpc.c",
+ ],
+ hdrs = [
+ "simd/jccolext-altivec.c", # should have been named .inc
+ "simd/jcgryext-altivec.c", # should have been named .inc
+ "simd/jdcolext-altivec.c", # should have been named .inc
+ "simd/jdmrgext-altivec.c", # should have been named .inc
+ ],
+ copts = libjpegturbo_copts,
+ nocopts = libjpegturbo_nocopts,
+)
+
cc_library(
name = "simd_x86_64",
srcs = [
":k8": "cp $(location jconfig_nowin_simd.h) $@",
":armeabi-v7a": "cp $(location jconfig_nowin_simd.h) $@",
":arm64-v8a": "cp $(location jconfig_nowin_simd.h) $@",
+ ":linux_ppc64le": "cp $(location jconfig_nowin_simd.h) $@",
"//conditions:default": "cp $(location jconfig_nowin_nosimd.h) $@",
}),
)
name = "windows_msvc",
values = {"cpu": "x64_windows_msvc"},
)
+
+config_setting(
+ name = "linux_ppc64le",
+ values = {"cpu": "ppc"},
+)
--- /dev/null
+# Description:
+# Kafka C/C++ (librdkafka) client library
+
+licenses(["notice"]) # 2-clause BSD license
+
+exports_files(["LICENSE"])
+
+cc_library(
+ name = "kafka",
+ srcs = [
+ "config.h",
+ "src-cpp/ConfImpl.cpp",
+ "src-cpp/ConsumerImpl.cpp",
+ "src-cpp/HandleImpl.cpp",
+ "src-cpp/KafkaConsumerImpl.cpp",
+ "src-cpp/MessageImpl.cpp",
+ "src-cpp/MetadataImpl.cpp",
+ "src-cpp/QueueImpl.cpp",
+ "src-cpp/RdKafka.cpp",
+ "src-cpp/TopicImpl.cpp",
+ "src-cpp/TopicPartitionImpl.cpp",
+ "src/crc32c.c",
+ "src/crc32c.h",
+ "src/lz4.c",
+ "src/lz4.h",
+ "src/lz4frame.c",
+ "src/lz4frame.h",
+ "src/lz4frame_static.h",
+ "src/lz4hc.c",
+ "src/lz4hc.h",
+ "src/lz4opt.h",
+ "src/queue.h",
+ "src/rd.h",
+ "src/rdaddr.c",
+ "src/rdaddr.h",
+ "src/rdatomic.h",
+ "src/rdavg.h",
+ "src/rdavl.c",
+ "src/rdavl.h",
+ "src/rdbuf.c",
+ "src/rdbuf.h",
+ "src/rdcrc32.h",
+ "src/rddl.h",
+ "src/rdendian.h",
+ "src/rdgz.c",
+ "src/rdgz.h",
+ "src/rdinterval.h",
+ "src/rdkafka.c",
+ "src/rdkafka.h",
+ "src/rdkafka_assignor.c",
+ "src/rdkafka_assignor.h",
+ "src/rdkafka_broker.c",
+ "src/rdkafka_broker.h",
+ "src/rdkafka_buf.c",
+ "src/rdkafka_buf.h",
+ "src/rdkafka_cgrp.c",
+ "src/rdkafka_cgrp.h",
+ "src/rdkafka_conf.c",
+ "src/rdkafka_conf.h",
+ "src/rdkafka_event.h",
+ "src/rdkafka_feature.c",
+ "src/rdkafka_feature.h",
+ "src/rdkafka_int.h",
+ "src/rdkafka_interceptor.c",
+ "src/rdkafka_interceptor.h",
+ "src/rdkafka_lz4.c",
+ "src/rdkafka_lz4.h",
+ "src/rdkafka_metadata.c",
+ "src/rdkafka_metadata.h",
+ "src/rdkafka_metadata_cache.c",
+ "src/rdkafka_msg.c",
+ "src/rdkafka_msg.h",
+ "src/rdkafka_msgset.h",
+ "src/rdkafka_msgset_reader.c",
+ "src/rdkafka_msgset_writer.c",
+ "src/rdkafka_offset.c",
+ "src/rdkafka_offset.h",
+ "src/rdkafka_op.c",
+ "src/rdkafka_op.h",
+ "src/rdkafka_partition.c",
+ "src/rdkafka_partition.h",
+ "src/rdkafka_pattern.c",
+ "src/rdkafka_pattern.h",
+ "src/rdkafka_proto.h",
+ "src/rdkafka_queue.c",
+ "src/rdkafka_queue.h",
+ "src/rdkafka_range_assignor.c",
+ "src/rdkafka_request.c",
+ "src/rdkafka_request.h",
+ "src/rdkafka_roundrobin_assignor.c",
+ "src/rdkafka_sasl.c",
+ "src/rdkafka_sasl.h",
+ "src/rdkafka_sasl_int.h",
+ "src/rdkafka_sasl_plain.c",
+ "src/rdkafka_subscription.c",
+ "src/rdkafka_subscription.h",
+ "src/rdkafka_timer.c",
+ "src/rdkafka_timer.h",
+ "src/rdkafka_topic.c",
+ "src/rdkafka_topic.h",
+ "src/rdkafka_transport.c",
+ "src/rdkafka_transport.h",
+ "src/rdkafka_transport_int.h",
+ "src/rdlist.c",
+ "src/rdlist.h",
+ "src/rdlog.c",
+ "src/rdlog.h",
+ "src/rdports.c",
+ "src/rdports.h",
+ "src/rdposix.h",
+ "src/rdrand.c",
+ "src/rdrand.h",
+ "src/rdregex.c",
+ "src/rdregex.h",
+ "src/rdstring.c",
+ "src/rdstring.h",
+ "src/rdsysqueue.h",
+ "src/rdtime.h",
+ "src/rdtypes.h",
+ "src/rdunittest.c",
+ "src/rdunittest.h",
+ "src/rdvarint.c",
+ "src/rdvarint.h",
+ "src/snappy.c",
+ "src/snappy.h",
+ "src/tinycthread.c",
+ "src/tinycthread.h",
+ "src/xxhash.c",
+ "src/xxhash.h",
+ ],
+ hdrs = [
+ "config.h",
+ ],
+ defines = [
+ ],
+ includes = [
+ "src",
+ "src-cpp",
+ ],
+ linkopts = [
+ "-lpthread",
+ ],
+ visibility = ["//visibility:public"],
+ deps = [
+ "@boringssl//:ssl",
+ ],
+)
--- /dev/null
+diff -Naur a/config.h b/config.h
+--- a/config.h 1970-01-01 00:00:00.000000000 +0000
++++ b/config.h 2017-10-28 00:57:03.316957390 +0000
+@@ -0,0 +1,40 @@
++#pragma once
++#define WITHOUT_OPTIMIZATION 0
++#define ENABLE_DEVEL 0
++#define ENABLE_REFCNT_DEBUG 0
++#define ENABLE_SHAREDPTR_DEBUG 0
++
++#define HAVE_ATOMICS_32 1
++#define HAVE_ATOMICS_32_SYNC 1
++
++#if (HAVE_ATOMICS_32)
++# if (HAVE_ATOMICS_32_SYNC)
++# define ATOMIC_OP32(OP1,OP2,PTR,VAL) __sync_ ## OP1 ## _and_ ## OP2(PTR, VAL)
++# else
++# define ATOMIC_OP32(OP1,OP2,PTR,VAL) __atomic_ ## OP1 ## _ ## OP2(PTR, VAL, __ATOMIC_SEQ_CST)
++# endif
++#endif
++
++#define HAVE_ATOMICS_64 1
++#define HAVE_ATOMICS_64_SYNC 1
++
++#if (HAVE_ATOMICS_64)
++# if (HAVE_ATOMICS_64_SYNC)
++# define ATOMIC_OP64(OP1,OP2,PTR,VAL) __sync_ ## OP1 ## _and_ ## OP2(PTR, VAL)
++# else
++# define ATOMIC_OP64(OP1,OP2,PTR,VAL) __atomic_ ## OP1 ## _ ## OP2(PTR, VAL, __ATOMIC_SEQ_CST)
++# endif
++#endif
++
++
++#define WITH_ZLIB 1
++#define WITH_LIBDL 1
++#define WITH_PLUGINS 0
++#define WITH_SNAPPY 1
++#define WITH_SOCKEM 1
++#define WITH_SSL 1
++#define WITH_SASL 0
++#define WITH_SASL_SCRAM 0
++#define WITH_SASL_CYRUS 0
++#define HAVE_REGEX 1
++#define HAVE_STRNDUP 1
licenses(["notice"]) # BSD
-exports_files(["COPYING"])
+exports_files(["LICENCE"])
cc_library(
name = "pcre",
if src_dir != None:
src_dir = _norm_path(src_dir)
dest_dir = _norm_path(dest_dir)
- files = _read_dir(repository_ctx, src_dir)
+ files = '\n'.join(sorted(_read_dir(repository_ctx, src_dir).splitlines()))
# Create a list with the src_dir stripped to use for outputs.
dest_files = files.replace(src_dir, '').splitlines()
src_files = files.splitlines()
licenses(["notice"]) # MIT
-exports_files(["LICENSE"])
+exports_files(["COPYING.txt"])
py_library(
name = "termcolor",