Jiri Simsa [Sat, 26 May 2018 00:05:33 +0000 (17:05 -0700)]
[tf.data] Fixing concurrency issue in `map_and_batch`.
PiperOrigin-RevId:
198124860
Alexandre Passos [Fri, 25 May 2018 23:43:40 +0000 (16:43 -0700)]
Ignore while loops instead of mangling them in the automatic control dependencies.
PiperOrigin-RevId:
198122188
A. Unique TensorFlower [Fri, 25 May 2018 23:43:29 +0000 (16:43 -0700)]
Extracts the 'remove reverse node' optimization into its own method.
PiperOrigin-RevId:
198122165
A. Unique TensorFlower [Fri, 25 May 2018 23:07:25 +0000 (16:07 -0700)]
Automated g4 rollback of changelist
198087342
PiperOrigin-RevId:
198117552
Nick Felt [Fri, 25 May 2018 22:11:46 +0000 (15:11 -0700)]
Add warning to LookupOrCreate about reentrancy issue
PiperOrigin-RevId:
198110382
Igor Ganichev [Fri, 25 May 2018 20:58:51 +0000 (13:58 -0700)]
Add EagerTensor profiler and device shape utilities
This change includes the following steps to make
EagerTensor profiler work:
- Add a PaddedShapeFn to XlaDevice::Metadata. We need a
backend-independent way to get a fully-padded shape and
its layout on the device. This function is set during
device construction. CPU and GPU devices effectively get
an identity function since they neither change the layout
nor pad. TPU gets the appropriate function.
- Add TFE_TensorDebugInfo struct and C API methods for it.
These methods are necessary to fetch the shape and layout
from under the C API to the Python level. This can be a home
for more debug information later.
- Make EagerTensor weak referencable. This involves adding a
pointer to the list of current weak references. This addition
should have negligible overhead when profiler is not used.
The only operations on this field are setting it to null on
construction and checking if it is null on destruction.
- Adding C++ functions callable from Python to register an instance
of EagerTensorProfiler and retrieve debug information for a given
EagerTensor. These functions are used in the new "inspect" module.
- Finally, writing the actual profiler.
PiperOrigin-RevId:
198098380
A. Unique TensorFlower [Fri, 25 May 2018 20:44:36 +0000 (13:44 -0700)]
Disable //tensorflow/contrib/lite/python:lite_test on Windows
PiperOrigin-RevId:
198096344
A. Unique TensorFlower [Fri, 25 May 2018 20:39:25 +0000 (13:39 -0700)]
[tpu:profiler] Capture the data for generating a memory viewer of the profiling results.
PiperOrigin-RevId:
198095564
Sanjoy Das [Fri, 25 May 2018 20:38:24 +0000 (13:38 -0700)]
[TF:XLA] Bump open source llvm revision to r333273
PiperOrigin-RevId:
198095416
Alexandre Passos [Fri, 25 May 2018 20:20:13 +0000 (13:20 -0700)]
Public API to switch between eager execution and graph building.
Now, after tf.enable_eager_execution() has been executed, entering the context
manager of a tf.Graph will enable graph mode. So, for example
```
tf.enable_eager_execution()
with tf.Graph().as_default():
c = tf.constant(1.0) # this is a graph tensor
c2 = tf.constant(1.0) # this is an eager tensor
```
The main use-case of this is allowing documentation writers to make a single
notebook which starts with eager execution and seamlessly transitions to
building graphs.
This also makes many explicit enablings of graph mode in the code redundant
(a cleanup cl will follow).
PiperOrigin-RevId:
198092991
A. Unique TensorFlower [Fri, 25 May 2018 19:58:55 +0000 (12:58 -0700)]
Use functions to build dense splits. Tensorflow Function invocations share the same graph so using them reduces the graph construction overhead.
PiperOrigin-RevId:
198090110
A. Unique TensorFlower [Fri, 25 May 2018 19:56:40 +0000 (12:56 -0700)]
[tpu:profiler] Minor change in the description of tool name proto.
PiperOrigin-RevId:
198089875
A. Unique TensorFlower [Fri, 25 May 2018 19:54:49 +0000 (12:54 -0700)]
Add ScopedAllocatorOptimizer in support of CollectiveReduce.
The efficiency of CollectiveReduce is greatly improved by merging
multiple parallel reductions over smaller tensors into a single
reduction over a larger tensor that is the concatentation of the
smaller tensors. Because CollectiveReduce is essentially an
element-wise array operation which operates on a 1-D reshape of
the input tensor it is eligible for a ScopedAllocation optimization.
The optimization works by looking for serially independent instances
of CollectiveReduce that lie within the same name-scope tier and
have the same control-flow (e.g. loop) embedding structure. Where
two or more such nodes are found the upstream nodes that generate
their inputs are modified to write their outputs into consecutive
regions of a single tensor buffer maintained by a ScopedAllocator.
The multiple CollectiveReduce nodes are then replaced by a single
CollectiveReduce that operates in-place on the backing buffer.
The effectiveness of the optimization depends on there being candidate
CollectiveReduce nodes with these characteristics that become eligible
for execution at close to the same time. If the name scope is too
large, and includes nodes that become execution eligible at very different
times, this graph rewrite could result in a slowdown.
Note that this optimization is experimental: it is not guaranteed to
work, especially for ops other than CollectiveReduce.
PiperOrigin-RevId:
198089642
A. Unique TensorFlower [Fri, 25 May 2018 19:35:50 +0000 (12:35 -0700)]
enhance Tensorflow GBDT and GBRT model by exposing a new two dimensional output in prediction ops (example id, tree leaf node index id) for input as other model features
PiperOrigin-RevId:
198087342
A. Unique TensorFlower [Fri, 25 May 2018 19:22:45 +0000 (12:22 -0700)]
Extracts the 'simplify slice' optimization into its own method.
PiperOrigin-RevId:
198085532
Peter Hawkins [Fri, 25 May 2018 19:04:49 +0000 (12:04 -0700)]
[TF:XLA] Register Switch and Merge ops on XLA devices.
PiperOrigin-RevId:
198083156
Derek Murray [Fri, 25 May 2018 18:42:33 +0000 (11:42 -0700)]
Automated g4 rollback of changelist
192848921
PiperOrigin-RevId:
198079927
A. Unique TensorFlower [Fri, 25 May 2018 18:34:30 +0000 (11:34 -0700)]
Extracts the 'simplify strided slice' optimization into its own method.
PiperOrigin-RevId:
198078724
Igor Ganichev [Fri, 25 May 2018 18:27:39 +0000 (11:27 -0700)]
Bump TPU batch size and wrap apply_grads in defun
PiperOrigin-RevId:
198077643
Akshay Modi [Fri, 25 May 2018 18:02:42 +0000 (11:02 -0700)]
Release C++ lock before calling back into python
PiperOrigin-RevId:
198073059
A. Unique TensorFlower [Fri, 25 May 2018 17:54:38 +0000 (10:54 -0700)]
DepthwiseConv optimizations.
PiperOrigin-RevId:
198071709
Mark Daoust [Fri, 25 May 2018 17:45:27 +0000 (10:45 -0700)]
Link to tf.estimator docs for premade estimators.
PiperOrigin-RevId:
198070157
A. Unique TensorFlower [Fri, 25 May 2018 15:55:24 +0000 (08:55 -0700)]
Code simplification in dump_graphviz.cc:
Just output all arrays, before writing edges, so we don't
need to keep track of which arrays we've already output.
PiperOrigin-RevId:
198055327
Shanqing Cai [Fri, 25 May 2018 13:56:38 +0000 (06:56 -0700)]
Minor clarification to model_to_estimator() doc string
PiperOrigin-RevId:
198044106
Asim Shankar [Fri, 25 May 2018 09:23:06 +0000 (02:23 -0700)]
eager: Update introduction notebooks.
PiperOrigin-RevId:
198022387
Asim Shankar [Fri, 25 May 2018 08:36:23 +0000 (01:36 -0700)]
Fix typo, fix build.
PiperOrigin-RevId:
198017870
A. Unique TensorFlower [Fri, 25 May 2018 03:36:45 +0000 (20:36 -0700)]
Extracts the 'simplify tile node' optimization into its own method.
PiperOrigin-RevId:
197996636
A. Unique TensorFlower [Fri, 25 May 2018 02:49:05 +0000 (19:49 -0700)]
Go: Update generated wrapper functions for TensorFlow ops.
PiperOrigin-RevId:
197993384
A. Unique TensorFlower [Fri, 25 May 2018 02:45:27 +0000 (19:45 -0700)]
Initialize the score threshold to -inf to avoid filtering out negative logits
PiperOrigin-RevId:
197993147
A. Unique TensorFlower [Fri, 25 May 2018 02:20:31 +0000 (19:20 -0700)]
Update ops-related pbtxt files.
PiperOrigin-RevId:
197991672
Yifei Feng [Fri, 25 May 2018 02:12:26 +0000 (19:12 -0700)]
Merge changes from github.
Revert #18413. Too many internal test failures due to the name scope change caused by this change.
Revert #18192. Cannot use re2::StringPiece internally. Need alternative for set call. Will pull and clean this up in a separate change.
PiperOrigin-RevId:
197991247
A. Unique TensorFlower [Fri, 25 May 2018 01:55:30 +0000 (18:55 -0700)]
Extracts the 'simplify pad node' optimization into its own method.
PiperOrigin-RevId:
197989813
Sanjoy Das [Fri, 25 May 2018 01:23:48 +0000 (18:23 -0700)]
Rename TileLoader to MemoryTile; NFC
In a later change I will expand MemoryTile to store tiles and load "3d" tiles
(where we broadcast along one dimension as we load).
PiperOrigin-RevId:
197987185
A. Unique TensorFlower [Fri, 25 May 2018 00:48:21 +0000 (17:48 -0700)]
Add heuristic on picking NHWC layout for (V100, fp16) convolutions.
Also move AlgorithmPicker after layout assignment, as now
cudnn_convolution_runner will return failures on invalid input layouts.
Also add a backend debug option to switch the layout heuristic. By default
it has the old behavior (all NCHW).
PiperOrigin-RevId:
197983747
A. Unique TensorFlower [Fri, 25 May 2018 00:06:34 +0000 (17:06 -0700)]
Enabling some potential optimization using the restrict qualifier.
PiperOrigin-RevId:
197979118
Akshay Modi [Thu, 24 May 2018 23:53:33 +0000 (16:53 -0700)]
When converting a numpy float64 to an EagerTensor, always ensure that it
becomes a float64 tensor.
Earlier py_seq_tensor would fall back to a float32 if not explicitly requesting
a float64 (which would not happen if we had no other information).
PiperOrigin-RevId:
197977260
Igor Ganichev [Thu, 24 May 2018 23:44:17 +0000 (16:44 -0700)]
Don't XLA-compile naked variable reads
Before this change, when we executed a naked variable read (i.e. outside of
a defun, directly running <xla_device>->Compute()), tf2xla kernel would
copy the variable's tensor leading to many unnecessary copies.
This change uses the regular non-tf2xla kernel for naked variable reads
and marks the tf2xla one for CompilationOnly().
PiperOrigin-RevId:
197976146
A. Unique TensorFlower [Thu, 24 May 2018 23:31:48 +0000 (16:31 -0700)]
move wide string manipulations out of windows_file_system
PiperOrigin-RevId:
197974385
Akshay Modi [Thu, 24 May 2018 23:20:31 +0000 (16:20 -0700)]
Remove _get_backward_fn and depend on _gradient_function directly.
(_magic_gradient_function was renamed to _gradient_function)
Before:
entry {
name: "MicroBenchmarks.benchmark_tf_gradient_forward_identity"
iters: 30000
wall_time: 5.
88456789653
extras {
key: "examples_per_sec"
value {
double_value: 169936.011885
}
}
}
After:
entry {
name: "MicroBenchmarks.benchmark_tf_gradient_forward_identity"
iters: 30000
wall_time: 5.
04853725433
extras {
key: "examples_per_sec"
value {
double_value: 198077.175551
}
}
}
PiperOrigin-RevId:
197972668
Yu-Cheng Ling [Thu, 24 May 2018 23:02:11 +0000 (16:02 -0700)]
Fix the generated builtin_ops enum header.
PiperOrigin-RevId:
197969642
A. Unique TensorFlower [Thu, 24 May 2018 22:53:44 +0000 (15:53 -0700)]
Extracts the 'simplify squeeze node' optimization into its own method.
PiperOrigin-RevId:
197968452
David Majnemer [Thu, 24 May 2018 22:45:25 +0000 (15:45 -0700)]
[XLA] Remove maps with a single instruction
These maps aren't really pulling their weight, fold them to the instruction
that they compute.
PiperOrigin-RevId:
197967117
Priya Gupta [Thu, 24 May 2018 22:35:13 +0000 (15:35 -0700)]
Avoid infinite recursion when checking for indexed slices.
PiperOrigin-RevId:
197965508
Igor Saprykin [Thu, 24 May 2018 22:28:03 +0000 (15:28 -0700)]
Allow combinations to be used on the class level. Make "mode" optional.
Applying a generator to a class is the same as applying that generator to every member of that class. It is meant to allow avoiding repetition in some cases.
The implementation relies on some internals of parameterized tests and how it works with a class level declaration: https://github.com/abseil/abseil-py/blob/master/absl/testing/parameterized.py#L319.
The "mode" argument is required before this change. To accommodate cases where execution mode isn't the point of the test, "mode" became optional with "graph" mode being default. Another idea I had was to pick a random mode by default.
PiperOrigin-RevId:
197964501
A. Unique TensorFlower [Thu, 24 May 2018 22:27:00 +0000 (15:27 -0700)]
Add local_init_run_options to SessionManager and Supervisor so that
collective_graph_key can be passed in when collective ops are used
in variable initialization.
PiperOrigin-RevId:
197964316
Sanjoy Das [Thu, 24 May 2018 22:19:40 +0000 (15:19 -0700)]
Rename getInt64 to GetInt64 to follow Google style
PiperOrigin-RevId:
197963232
A. Unique TensorFlower [Thu, 24 May 2018 21:59:29 +0000 (14:59 -0700)]
Windows build script change for release job
PiperOrigin-RevId:
197959602
A. Unique TensorFlower [Thu, 24 May 2018 21:59:05 +0000 (14:59 -0700)]
Small fix so that GDN can run on TPU
PiperOrigin-RevId:
197959536
Francois Chollet [Thu, 24 May 2018 21:58:15 +0000 (14:58 -0700)]
Raise ValueError when calling model.summary() before it is built
PiperOrigin-RevId:
197959372
Peter Hawkins [Thu, 24 May 2018 21:20:39 +0000 (14:20 -0700)]
[TF:XLA] Avoid buffer copy when copying a Tensor onto an XLA device.
PiperOrigin-RevId:
197952565
Chris Leary [Thu, 24 May 2018 21:03:41 +0000 (14:03 -0700)]
[XLA] Convert infeed call to take a LiteralSlice.
PiperOrigin-RevId:
197949637
Shanqing Cai [Thu, 24 May 2018 21:02:30 +0000 (14:02 -0700)]
tfdbg: fix issue where total source file size exceeds gRPC message size limit
* Source file content is now sent one by one, making it less likely that individual
messages will have sizes above the 4-MB gRPC message size limit.
* In case the message for a single source file exceeds the limit, the client handles
it gracefully by skipping the sending and print a warning message.
Fixes: https://github.com/tensorflow/tensorboard/issues/1118
PiperOrigin-RevId:
197949416
Akshay Agrawal [Thu, 24 May 2018 20:30:15 +0000 (13:30 -0700)]
Fix bugs with the code blocks in defun's docstring.
PiperOrigin-RevId:
197943921
A. Unique TensorFlower [Thu, 24 May 2018 20:19:47 +0000 (13:19 -0700)]
Automated g4 rollback of changelist
197868028
PiperOrigin-RevId:
197942379
A. Unique TensorFlower [Thu, 24 May 2018 20:18:32 +0000 (13:18 -0700)]
add maxpoolgrad transposer for layout optimizer.
PiperOrigin-RevId:
197942180
Amit Patankar [Thu, 24 May 2018 20:15:37 +0000 (13:15 -0700)]
Removing outdated links.
PiperOrigin-RevId:
197941740
A. Unique TensorFlower [Thu, 24 May 2018 20:13:42 +0000 (13:13 -0700)]
Extracts the Simplify Pack optimization into its own method.
PiperOrigin-RevId:
197941474
Nick Felt [Thu, 24 May 2018 20:07:50 +0000 (13:07 -0700)]
Ensure ResourceMgr::LookupOrCreate calls create fn just once
This addresses a race condition where LookupOrCreate is called at the same time from two threads, and both Lookup()s fail, so the creator() function is run twice, even though only a single Create() will then succeed.
The motivation is that some creator() functions have side-effects, e.g. tf.contrib.summary.create_file_writer()'s init op opens an events file. This change ensures that if two init ops for file writers with the same resource name are run in the same session.run() call, only one events file will be created. (Current behavior will often open two files; typically the second one overwrites the first but this won't happen if the filename_suffix values are different or the timestamps happen to straddle a second boundary.)
PiperOrigin-RevId:
197940607
A. Unique TensorFlower [Thu, 24 May 2018 20:03:10 +0000 (13:03 -0700)]
Updated documentation for tf.reduce_join.
PiperOrigin-RevId:
197939808
A. Unique TensorFlower [Thu, 24 May 2018 20:00:07 +0000 (13:00 -0700)]
Only wait for one of the input tensors to be ready.
The waiting was implemented to avoid reading stale models as much as possible.
However with this dependency, each input column creates a Send/Recv to PS0
which slows down training significantly.
Colocate Quantile and Stats accumulators for the same handler.
PiperOrigin-RevId:
197939327
A. Unique TensorFlower [Thu, 24 May 2018 19:34:02 +0000 (12:34 -0700)]
Modify tf.image.central_crop to support batched-input.
Currently central_crop works on singular images with dynamic dimensions. For large image classification models, it would be nice if central_crop can be modified to support batched input. This CL makes that change.
PiperOrigin-RevId:
197935606
Benoit Steiner [Thu, 24 May 2018 19:23:32 +0000 (12:23 -0700)]
Mark queue related ops as having side effect
PiperOrigin-RevId:
197933941
Jacques Pienaar [Thu, 24 May 2018 19:22:04 +0000 (12:22 -0700)]
Don't use hex floats.
Hex float literals are in C11 and C++17, but not in C++11, so use plain float notation.
PiperOrigin-RevId:
197933744
A. Unique TensorFlower [Thu, 24 May 2018 19:11:41 +0000 (12:11 -0700)]
Fix doc: "--input_arrays" instead of "--input_array".
PiperOrigin-RevId:
197932202
Sanjoy Das [Thu, 24 May 2018 18:54:56 +0000 (11:54 -0700)]
[TF:XLA] Bump open source llvm revision to r333167
PiperOrigin-RevId:
197929434
Mark Daoust [Thu, 24 May 2018 18:44:15 +0000 (11:44 -0700)]
Fix `tf_inspect.getargspec` callable objects other than functions.
PiperOrigin-RevId:
197927601
Derek Murray [Thu, 24 May 2018 18:37:12 +0000 (11:37 -0700)]
[tf.data] Add `tf.contrib.data.choose_from_datasets()`.
This is a deterministic counterpart to `tf.contrib.data.sample_from_datasets()`.
PiperOrigin-RevId:
197926386
A. Unique TensorFlower [Thu, 24 May 2018 18:29:34 +0000 (11:29 -0700)]
Extracts the 'Move Constants Past Enter Node' optimization into its own method.
PiperOrigin-RevId:
197924962
Allen Lavoie [Thu, 24 May 2018 18:23:18 +0000 (11:23 -0700)]
Make the existing checkpointable data structure a CheckpointableDataStructure
Gives it better/more consistent handling of Layers.
PiperOrigin-RevId:
197923880
A. Unique TensorFlower [Thu, 24 May 2018 18:18:45 +0000 (11:18 -0700)]
boosted_trees: used double precision instead of single precision while accumulating batches within MakeStatsSummary, as float type faces numerical precision problems when batch gets larger and stats gets smaller.
PiperOrigin-RevId:
197923022
Derek Murray [Thu, 24 May 2018 18:14:17 +0000 (11:14 -0700)]
Deprecate `DeviceBase::GetStepAllocator()` and replace with calls to `GetAllocator()`.
The `GetStepAllocator()` API relied on the existence of a "step resource manager",
which is no longer a concept in the runtime (it was replaced by "step containers").
Since the additional flexibility does not appear to be used in the codebase, the
`GetScopedAllocator()` seems to provide a similar extension point (based on step IDs),
and the `OpKernelContext::get_allocator()` method is called frequently, this change
simplifies the implementation somewhat.
The `GetStepAllocator()` method is retained as a non-virtual stub that forwards to
`GetAllocator()`, because at least one third-party library (libxsmm) calls this
interface directly.
PiperOrigin-RevId:
197922154
Francois Chollet [Thu, 24 May 2018 18:11:42 +0000 (11:11 -0700)]
Add shape validation for symbolic tensors passed to fit (only graph mode).
PiperOrigin-RevId:
197921675
Akshay Agrawal [Thu, 24 May 2018 17:58:47 +0000 (10:58 -0700)]
Fix convert_to_tensor logic in GradientDescentOptimizer's _prepare method
Previously, eagerly executing an optimizer that had been used in a `defun`
led to a cryptic error because the learning rate tensor supplied to the update
op was in fact a vestigial graph Tensor.
PiperOrigin-RevId:
197919104
Nupur Garg [Thu, 24 May 2018 17:53:28 +0000 (10:53 -0700)]
Improve TOCO Python API.
PiperOrigin-RevId:
197918102
A. Unique TensorFlower [Thu, 24 May 2018 17:52:18 +0000 (10:52 -0700)]
Fix build failure introduced by cl/
197457316
PiperOrigin-RevId:
197917867
Alexandre Passos [Thu, 24 May 2018 17:38:48 +0000 (10:38 -0700)]
Warn about tf.Variable semantics
PiperOrigin-RevId:
197915380
Allen Lavoie [Thu, 24 May 2018 17:30:41 +0000 (10:30 -0700)]
Add a checkpointable map data structure
PiperOrigin-RevId:
197913890
Justin Lebar [Thu, 24 May 2018 17:05:10 +0000 (10:05 -0700)]
[XLA] Speed up slice_test again.
Previous patch missed one instance of creating a constant inside of
slice_test.
PiperOrigin-RevId:
197909685
Benjamin Kramer [Thu, 24 May 2018 16:50:19 +0000 (09:50 -0700)]
[XLA] Devectorize constant-sized arrays
A sufficiently smart compiler could promote these from heap to stack, in
practice no compiler does that. Remove the superfluous heap allocations
manually.
PiperOrigin-RevId:
197907388
A. Unique TensorFlower [Thu, 24 May 2018 16:28:43 +0000 (09:28 -0700)]
Fix a bug in BestExporter - estimator.model_dir should be property instead of a function.
PiperOrigin-RevId:
197904351
A. Unique TensorFlower [Thu, 24 May 2018 16:15:17 +0000 (09:15 -0700)]
Internal change.
PiperOrigin-RevId:
197902509
A. Unique TensorFlower [Thu, 24 May 2018 16:05:41 +0000 (09:05 -0700)]
Extracts the 'switch with same input' optimization into its own method.
PiperOrigin-RevId:
197900929
A. Unique TensorFlower [Thu, 24 May 2018 14:00:24 +0000 (07:00 -0700)]
When using fake infeed data, fill the infeed when it is empty.
This makes sure we avoid OOM when there is too much infeed data to send it at
once, and we also don't need the magic "num_infeeds" parameter anymore.
PiperOrigin-RevId:
197886121
Dan Moldovan [Thu, 24 May 2018 13:20:04 +0000 (06:20 -0700)]
Style guide edits: refer to the broader Google style guide, which is what was actually used in the code, to replace some of the rules that were spelled out explicitly.
Use AutoGraph, rather than TensorFlow AutoGraph for name.
PiperOrigin-RevId:
197881802
A. Unique TensorFlower [Thu, 24 May 2018 10:48:24 +0000 (03:48 -0700)]
Automated g4 rollback of changelist
197477959
PiperOrigin-RevId:
197868028
A. Unique TensorFlower [Thu, 24 May 2018 09:54:37 +0000 (02:54 -0700)]
Allow to generate fake infeed buffers with shapes derived from the computation.
When replaying a computation from a HloSnapshot, we want to be able to provide fake
infeed data. This was already possible when the infeed shape is known by providing
it with the --fake_infeed_shape flag. With this CL, we add the option to derive it
from the provided HloSnapshot. Also, we transfer the infeed shape a fixed number of
times instead of infinitely many times (configurable with a flag).
Otherwise we will definitely run out of memory at some point.
PiperOrigin-RevId:
197863412
A. Unique TensorFlower [Thu, 24 May 2018 08:03:36 +0000 (01:03 -0700)]
[XLA:GPU] Basic multi-output fusion for GPU.
Take a conservative approach and attempt multi-output fusion in cases where "regular" fusion is not an option.
PiperOrigin-RevId:
197852598
Sanjoy Das [Thu, 24 May 2018 05:59:22 +0000 (22:59 -0700)]
Implement support for reshape in IndexedArrayAnalysis
PiperOrigin-RevId:
197843589
A. Unique TensorFlower [Thu, 24 May 2018 05:33:53 +0000 (22:33 -0700)]
Add unit tests to tflite kernels
PiperOrigin-RevId:
197842122
Karmel Allison [Thu, 24 May 2018 03:53:15 +0000 (20:53 -0700)]
Resolve name collisions with assets in SavedModels by deduplicating names that
point to distinct files.
PiperOrigin-RevId:
197835288
A. Unique TensorFlower [Thu, 24 May 2018 03:39:31 +0000 (20:39 -0700)]
Add support for is_recompute optional kwarg to functions decorated with recompute_grad
PiperOrigin-RevId:
197834316
A. Unique TensorFlower [Thu, 24 May 2018 03:03:20 +0000 (20:03 -0700)]
Set the correct shape in transformed distribution.
Also add distribution_util.maybe_get_static_event_ndims to be reused in bijector and transformed distribution classes.
PiperOrigin-RevId:
197831651
Billy Lamberta [Thu, 24 May 2018 01:46:20 +0000 (18:46 -0700)]
Moves estimator getting started docs into programmer's guide.
Update path references and magic links.
Remove getting started with estimators doc.
Add redirects.
PiperOrigin-RevId:
197826223
Shashi Shekhar [Thu, 24 May 2018 01:45:30 +0000 (18:45 -0700)]
Add back some public interface methods.
PiperOrigin-RevId:
197826136
A. Unique TensorFlower [Thu, 24 May 2018 01:38:34 +0000 (18:38 -0700)]
HloSharding parsing from string, used by new Sharding HloMatcher for ease of use.
PiperOrigin-RevId:
197825588
A. Unique TensorFlower [Thu, 24 May 2018 01:13:23 +0000 (18:13 -0700)]
Extracts the SimplifyReduction optimization into its own method.
PiperOrigin-RevId:
197823183
Priya Gupta [Thu, 24 May 2018 00:58:42 +0000 (17:58 -0700)]
Aggregating IndexedSlices: Do not require first element to be IndexedSlices.
PiperOrigin-RevId:
197821479
Justin Lebar [Thu, 24 May 2018 00:52:29 +0000 (17:52 -0700)]
[XLA] Speed up SliceTest.
- Use parameters rather than constants, because LLVM and ptxas are slow
with large constants.
- Use iota rather than filling with random values, because the latter is
slow.
PiperOrigin-RevId:
197820897
Sanjoy Das [Thu, 24 May 2018 00:49:42 +0000 (17:49 -0700)]
Cache generated LLVM IR for GEBP
After this change all generated GEBPs with the same shape will share a single
llvm::Function.
This is NFC for any actual workloads because the GEBP emitter isn't exercised by
normal code-paths yet.
PiperOrigin-RevId:
197820606
Nupur Garg [Thu, 24 May 2018 00:44:32 +0000 (17:44 -0700)]
Add import.
PiperOrigin-RevId:
197820050