A. Unique TensorFlower [Thu, 17 May 2018 05:33:38 +0000 (22:33 -0700)]
Move BUILD file to OVIC folder and add model validation.
PiperOrigin-RevId:
196940252
A. Unique TensorFlower [Thu, 17 May 2018 05:22:36 +0000 (22:22 -0700)]
Update installation documentation to reflect that CUDA 8 and cuDNN 6 are minimal supported versions.
Remove GPU instructions for MacOS because GPUs are not supported anymore.
PiperOrigin-RevId:
196939548
A. Unique TensorFlower [Thu, 17 May 2018 04:40:06 +0000 (21:40 -0700)]
Sort tags before logging.
PiperOrigin-RevId:
196936678
Derek Murray [Thu, 17 May 2018 04:15:44 +0000 (21:15 -0700)]
[tf.data] Accept NumPy dtype objects in `Dataset.from_generator(..., output_types=...)`.
PiperOrigin-RevId:
196935179
James Qin [Thu, 17 May 2018 03:52:32 +0000 (20:52 -0700)]
Add more logging in BaseGPUDevice::ComputeHelper for kernel completion.
PiperOrigin-RevId:
196933479
A. Unique TensorFlower [Thu, 17 May 2018 03:31:29 +0000 (20:31 -0700)]
Making GetOptionalInput from kernel_util.h return a pointer to const data.
PiperOrigin-RevId:
196932028
Sanjoy Das [Thu, 17 May 2018 03:15:53 +0000 (20:15 -0700)]
[TF:XLA] Bump open source llvm revision to r332236
PiperOrigin-RevId:
196930996
Justine Tunney [Thu, 17 May 2018 02:27:28 +0000 (19:27 -0700)]
Internal change.
PiperOrigin-RevId:
196927869
A. Unique TensorFlower [Thu, 17 May 2018 02:12:18 +0000 (19:12 -0700)]
Enhance DenseLayer + XLA compatibility test cases to cover compilation behavior differences in different jit modes.
PiperOrigin-RevId:
196926896
James Qin [Thu, 17 May 2018 01:41:39 +0000 (18:41 -0700)]
Append device name in executor logging
PiperOrigin-RevId:
196924318
A. Unique TensorFlower [Thu, 17 May 2018 01:23:20 +0000 (18:23 -0700)]
[TF:XLA] Make noinline function work with control flow.
1) Make the local function library for control flow self-contained. The control flow function could refer to a noinline function not defined in the local library. Copy the missing FunctionDefs from the glocal library to the local one.
2) Fix the index used to get the output shapes for functional nodes.
PiperOrigin-RevId:
196922649
A. Unique TensorFlower [Thu, 17 May 2018 01:13:00 +0000 (18:13 -0700)]
[XLA] Remove XlaOp::GetShape. It only works when the buidler of the XlaOp is not freeed.
PiperOrigin-RevId:
196921647
Skye Wanderman-Milne [Thu, 17 May 2018 01:05:52 +0000 (18:05 -0700)]
Remove _USE_C_API staging in tests now that the C API is enabled by default.
This is in preparation for removing the _USE_C_API toggle altogether.
PiperOrigin-RevId:
196920890
Skye Wanderman-Milne [Thu, 17 May 2018 01:03:01 +0000 (18:03 -0700)]
Remove _USE_C_API staging in tests now that the C API is enabled by default.
This is in preparation for removing the _USE_C_API toggle altogether.
PiperOrigin-RevId:
196920481
Igor Ganichev [Thu, 17 May 2018 00:48:22 +0000 (17:48 -0700)]
Fix typo in TensorHandle
PiperOrigin-RevId:
196919119
Jiri Simsa [Thu, 17 May 2018 00:25:36 +0000 (17:25 -0700)]
Re-enabling a test after a previous fix.
PiperOrigin-RevId:
196916467
A. Unique TensorFlower [Thu, 17 May 2018 00:24:39 +0000 (17:24 -0700)]
Remove no-op statement. tf_additional_lib_srcs only selects .cc files. When we
do tf_additional_lib_srcs(exclude=[**/*.cc]) we are selecting zero files, and
the statement can be safely removed.
PiperOrigin-RevId:
196916359
Jianwei Xie [Thu, 17 May 2018 00:23:02 +0000 (17:23 -0700)]
Adds basic TPU replicate training support for Keras.
PiperOrigin-RevId:
196916177
Justin Lebar [Thu, 17 May 2018 00:21:22 +0000 (17:21 -0700)]
[XLA] Improve documentation on HloModule, HloComputation, and HloInstruction.
PiperOrigin-RevId:
196915982
Justin Lebar [Thu, 17 May 2018 00:09:42 +0000 (17:09 -0700)]
[XLA] Add documentation explaining FusionKind.
PiperOrigin-RevId:
196914484
A. Unique TensorFlower [Thu, 17 May 2018 00:05:33 +0000 (17:05 -0700)]
Remove unused inclusions
PiperOrigin-RevId:
196913890
Justin Lebar [Wed, 16 May 2018 23:56:48 +0000 (16:56 -0700)]
[XLA:GPU] Add op-tracing to XLA:GPU.
PiperOrigin-RevId:
196912575
Akshay Modi [Wed, 16 May 2018 23:43:29 +0000 (16:43 -0700)]
Allow for remote eager execution.
PiperOrigin-RevId:
196910675
A. Unique TensorFlower [Wed, 16 May 2018 23:16:46 +0000 (16:16 -0700)]
Remove unused inclusions
PiperOrigin-RevId:
196906815
Jeremy Lau [Wed, 16 May 2018 22:54:49 +0000 (15:54 -0700)]
Move DoesNotUseOperandBuffer and CanShareOperandBufferWithUser from
liveness_util to methods on TuplePointsToAnalysis and HloDataflowAnalysis.
PiperOrigin-RevId:
196903216
Dimitris Vardoulakis [Wed, 16 May 2018 22:53:34 +0000 (15:53 -0700)]
[TF:XLA] Take subcomputations into account during HLO scheduling.
In the List scheduler, if an instruction calls subcomputations, we count the memory usage of the subcomputation towards the memory usage of the parent instruction.
PiperOrigin-RevId:
196903042
A. Unique TensorFlower [Wed, 16 May 2018 22:28:11 +0000 (15:28 -0700)]
Fixing test for Topk kernel in TFlite
PiperOrigin-RevId:
196899232
Mustafa Ispir [Wed, 16 May 2018 22:27:34 +0000 (15:27 -0700)]
Make sparse_cross operations publicly available.
PiperOrigin-RevId:
196899145
Jacques Pienaar [Wed, 16 May 2018 22:02:06 +0000 (15:02 -0700)]
Add test for 64-bit clz and sign.
PiperOrigin-RevId:
196894702
Sanjoy Das [Wed, 16 May 2018 22:01:22 +0000 (15:01 -0700)]
Fix typo in comment
PiperOrigin-RevId:
196894582
A. Unique TensorFlower [Wed, 16 May 2018 21:53:11 +0000 (14:53 -0700)]
Add a parameter to the adaptive shared batcher which allows the user to set a lower bound for in_flight_batches_limit.
This can help prevent overloads which may occur during large traffic shifts - a small value learned during a period of low load can be unsuitable at high load.
PiperOrigin-RevId:
196893320
Allen Lavoie [Wed, 16 May 2018 20:52:54 +0000 (13:52 -0700)]
Checkpointable: move python/training/checkpointable_* to python/training/checkpointable/
Need to add some new checkpointable files in core (specifically I had some checkpointable data structures in mind), and prefixing more files with "checkpointable_" in python/training/ seems dirty.
No functional changes, just some branching and build/import fiddling.
PiperOrigin-RevId:
196883136
Peter Hawkins [Wed, 16 May 2018 20:34:10 +0000 (13:34 -0700)]
Automated g4 rollback of changelist
196691101
PiperOrigin-RevId:
196879933
A. Unique TensorFlower [Wed, 16 May 2018 20:27:49 +0000 (13:27 -0700)]
BUILD cleanup in contrib/lite/...
PiperOrigin-RevId:
196878865
Brian Patton [Wed, 16 May 2018 20:22:53 +0000 (13:22 -0700)]
Fix the gradient of reduce_prod for complex dtypes.
Fixes #12514
PiperOrigin-RevId:
196878148
Jacques Pienaar [Wed, 16 May 2018 20:12:25 +0000 (13:12 -0700)]
Remove sorted as types not sortable.
PiperOrigin-RevId:
196876502
A. Unique TensorFlower [Wed, 16 May 2018 19:54:32 +0000 (12:54 -0700)]
Fix broken link.
PiperOrigin-RevId:
196873792
Igor Ganichev [Wed, 16 May 2018 19:50:41 +0000 (12:50 -0700)]
Add a test for compiled tfe.defun in GradientTape
PiperOrigin-RevId:
196873235
Ayush Dubey [Wed, 16 May 2018 19:32:04 +0000 (12:32 -0700)]
Remove redundant initialization of collective params.
subdiv_permutations is being resized twice in GenerateSubdivParams.
PiperOrigin-RevId:
196870781
A. Unique TensorFlower [Wed, 16 May 2018 19:24:59 +0000 (12:24 -0700)]
Use sequence_length arg for dynamic_rnn within RNNEstimator
This does not affect correctness, but improves performance by skipping padded
parts of the sequence. Also correct documentation in rnn.py that states the
opposite.
PiperOrigin-RevId:
196869793
Nick Desaulniers [Wed, 16 May 2018 19:21:35 +0000 (12:21 -0700)]
[TF:XLA:CPU] enable s32 reduce-window
PiperOrigin-RevId:
196869296
A. Unique TensorFlower [Wed, 16 May 2018 19:16:33 +0000 (12:16 -0700)]
Fix the CCFLAGS mismatch.
PiperOrigin-RevId:
196868601
Jacques Pienaar [Wed, 16 May 2018 19:15:37 +0000 (12:15 -0700)]
Expand tests to include int64 output type.
PiperOrigin-RevId:
196868485
A. Unique TensorFlower [Wed, 16 May 2018 19:04:13 +0000 (12:04 -0700)]
Turn off MirroredStrategy Dataset prefetching in tests when using the
combinations library. It adds some small non-determinism to the input
batches which can make tests flaky.
Also add a default DistributionStrategy combination.
PiperOrigin-RevId:
196866569
Michael Case [Wed, 16 May 2018 18:51:48 +0000 (11:51 -0700)]
Internal Change.
PiperOrigin-RevId:
196864489
Benjamin Kramer [Wed, 16 May 2018 18:45:56 +0000 (11:45 -0700)]
[XLA:GPU] Emit the final write of the tuple pointers
Turns out this doesn't matter when the fusion is emitted as a kernel, but does
when the whole thing is inlined. Oops.
PiperOrigin-RevId:
196863545
Nupur Garg [Wed, 16 May 2018 17:46:46 +0000 (10:46 -0700)]
Fixes tflite_diff script.
PiperOrigin-RevId:
196852157
Benjamin Kramer [Wed, 16 May 2018 17:40:57 +0000 (10:40 -0700)]
[XLA:GPU] Teach ir_emitter_nested how to deal with multi output loop fusion
Most of the plumbing is there already, just set up a loop emitter with a target
for each tuple element. For a simple case the output looks reasonable, though I
haven't checked correctness of anything complex.
PiperOrigin-RevId:
196850926
Michael Case [Wed, 16 May 2018 17:39:46 +0000 (10:39 -0700)]
Remove more Estimator dependencies from core TensorFlow.
Cleaning up some imports and deps I left behind that are no
longer used.
PiperOrigin-RevId:
196850661
A. Unique TensorFlower [Wed, 16 May 2018 17:20:37 +0000 (10:20 -0700)]
Removed C++ ABSL includes from tensorflow/core and tensorflow/compiler.
This is necessary to prevent undefined behavior caused by ODR violations, which could occur if different versions of ABSL are linked together.
PiperOrigin-RevId:
196847315
Jianwei Xie [Wed, 16 May 2018 17:18:05 +0000 (10:18 -0700)]
Add TPUContext for input_fn invocation.
PiperOrigin-RevId:
196846795
Nick Desaulniers [Wed, 16 May 2018 17:03:12 +0000 (10:03 -0700)]
[TF:XLA:INTERPRETER] speed up select and scatter by avoiding memory allocation in loops
HandleSelectAndScatter() has 2 IterateThroughWindow() blocks. Before, we spent (in percent total program time):
11.98% Literal::CreateR0() = 10.82% (block1) + 1.16% (block2)
4.91% Literal::~Literal() = 4.44% (block1) + 0.51% (block2)
1.52% operator delete = 1.38% (block1) + 0.14% (block2)
=====
18.41% total
After:
1.99% Literal::~Literal() = 1.83% (block1) + 0.16% (block2)
0.68% operator delete = 0.61% (block1) + 0.07% (block2)
=====
2.67% total
PiperOrigin-RevId:
196844177
A. Unique TensorFlower [Wed, 16 May 2018 17:02:30 +0000 (10:02 -0700)]
Add tf.contrib.data.make_tf_record_dataset() like make_csv_dataset() and
make_batched_features_dataset(), that is easy and fast by default.
PiperOrigin-RevId:
196844073
Younghee Kwon [Wed, 16 May 2018 16:44:48 +0000 (09:44 -0700)]
boosted_trees: accept integer labels properly now the same as float labels; added tests about labels.
PiperOrigin-RevId:
196841265
A. Unique TensorFlower [Wed, 16 May 2018 16:26:51 +0000 (09:26 -0700)]
Don't initialize GPUs if none will be used.
PiperOrigin-RevId:
196838739
A. Unique TensorFlower [Wed, 16 May 2018 16:17:15 +0000 (09:17 -0700)]
Migrating BestModelExportStrategy to core library.
PiperOrigin-RevId:
196837506
A. Unique TensorFlower [Wed, 16 May 2018 15:45:28 +0000 (08:45 -0700)]
Modify tf.contrib.distributions.BatchReshape to behave a bit more like
tf.reshape: accept a single unknown dimension and infer partial shape
information statically.
PiperOrigin-RevId:
196833267
A. Unique TensorFlower [Wed, 16 May 2018 15:35:29 +0000 (08:35 -0700)]
Resolved inconsistency with shape inference for tf.reduce_join when passing non-Tensor values.
Removed deprecated arguments in tf.reduce_join test.
PiperOrigin-RevId:
196832183
Peter Hawkins [Wed, 16 May 2018 15:10:02 +0000 (08:10 -0700)]
[XLA] Expose MinimumMemoryForComputation in hlo_scheduling.h
PiperOrigin-RevId:
196829414
A. Unique TensorFlower [Wed, 16 May 2018 14:47:44 +0000 (07:47 -0700)]
Add support libraries in core/platform.
PiperOrigin-RevId:
196826860
A. Unique TensorFlower [Wed, 16 May 2018 14:01:04 +0000 (07:01 -0700)]
Migrate HloExecutionProfileTest to textual HLO
Also add lhs_contracting_dims and rhs_contracting_dims to make the test more realistic.
Before, the dot operation was created with CreateBinary instead of CreateCanonicalDot.
PiperOrigin-RevId:
196822255
A. Unique TensorFlower [Wed, 16 May 2018 13:30:19 +0000 (06:30 -0700)]
Employ array flat sizes more directly in optimized_ops, some places in reference_ops.h.
PiperOrigin-RevId:
196819423
A. Unique TensorFlower [Wed, 16 May 2018 12:20:47 +0000 (05:20 -0700)]
internal change
PiperOrigin-RevId:
196813574
A. Unique TensorFlower [Wed, 16 May 2018 12:11:33 +0000 (05:11 -0700)]
Refactor HloInstruction::Fuse and add a method for multi-output fusion.
PiperOrigin-RevId:
196813042
Alexandre Passos [Wed, 16 May 2018 10:56:17 +0000 (03:56 -0700)]
Improving variable_scope documentation.
PiperOrigin-RevId:
196807465
A. Unique TensorFlower [Wed, 16 May 2018 10:43:10 +0000 (03:43 -0700)]
Implementation of transpose_conv
PiperOrigin-RevId:
196806646
Tom Hennigan [Wed, 16 May 2018 06:54:39 +0000 (23:54 -0700)]
Add performance notes for in-context gradient calls.
Also:
* Add _{start,stop}_recording methods to GradientTape.
* Add performance notes when calling gradient in recording context for
persistent tapes.
* s/tfe.GradientTape/tf.GradientTape/ in docstrings.
PiperOrigin-RevId:
196786148
David Majnemer [Wed, 16 May 2018 06:00:32 +0000 (23:00 -0700)]
[TF:XLA] Make softplus more accurate
The softplus function computes log(exp(x) + 1).
We computed it this way but with special cases to handle underflow and
overflow.
This was done by comparing the input against a quantity with the
magnitude 13.
94238515. Note that this quantity is not representable as a single
precision float and is instead rounded to 13.9423847.
If softplus would overflow, it will be approximated as x.
If softplus would underflow, it will be approximated as exp(x).
Unfortunately, this can provide inaccurate results for negative floats close to
the threshold.
For example: consider x = -13.
9274826049805. softmax(x) is ~8.
94068849e-7;
rounded to the nearest single precision float, this is 8.940689e-7.
In this case, x is quite close to the underflow threshold but not close enough
to be approximated by exp(x) == 8.
94069273e-7.
Rather, it gets calculated using the canonical definition of softmax and comes
to 8.
34464686e-7.
This result comes out to be wrong by 1,048,568 ULPs.
Instead, we can compute it the way one would compute LogSumExp(x, 0):
max(x, 0) + log(exp(x - max(x, 0)) + exp(0 - max(x, 0)))
When x is positive, this is:
x + log(exp(0) + exp(-x))
When x is negative, this is:
log(exp(x) + exp(0))
When x is 0, this is:
log(exp(0) + exp(0))
exp(0) evaluates to 1 which gives us:
if x is positive, x + log(1 + exp(-x))
if x is negative, log(exp(x) + 1)
if x is zero, log(2)
These three cases can be combined like so:
max(x, 0) + log(exp(-abs(x)) + 1)
Further, we can increase the fidelity of the log calculation by using log1p:
max(x, 0) + log1p(exp(-abs(x)))
This computation naturally handles underflow and overflow while also providing
more numerically accurate results for a few small, positive, floating point
values.
PiperOrigin-RevId:
196782814
Derek Murray [Wed, 16 May 2018 05:55:44 +0000 (22:55 -0700)]
Fix bug in `WorkerService::Logging()` handler.
Since transitioning to proto3, it was not possible to distinguish between the absence of
LoggingRequest::rpc_logging and it being set to false. This led to a bug that ignored
log-disabling messages in some implementations, which meant that logging was never
disabled. This fix adds explicit fields in LoggingRequest for enabling and disabling RPC
logging.
PiperOrigin-RevId:
196782547
Martin Wicke [Wed, 16 May 2018 05:38:20 +0000 (22:38 -0700)]
Trivial message cleanup.
PiperOrigin-RevId:
196781381
Priya Gupta [Wed, 16 May 2018 05:06:10 +0000 (22:06 -0700)]
Enable gpu tests for cross_tower_ops_test
PiperOrigin-RevId:
196779286
A. Unique TensorFlower [Wed, 16 May 2018 04:35:27 +0000 (21:35 -0700)]
Remove check for axis == 3 since if the input dimension is not 4, the input axis is not necessary 3.
And change the test as well.
PiperOrigin-RevId:
196777020
A. Unique TensorFlower [Wed, 16 May 2018 03:14:18 +0000 (20:14 -0700)]
Hardcode two exceptions to the list of files allowed in a 'platform'
PiperOrigin-RevId:
196771621
A. Unique TensorFlower [Wed, 16 May 2018 03:11:01 +0000 (20:11 -0700)]
Fix TfLite Convolution handle input_bacthe incorrectly for 1*1 kernel,
and improve test coverage for conv ops.
PiperOrigin-RevId:
196771421
A. Unique TensorFlower [Wed, 16 May 2018 02:19:00 +0000 (19:19 -0700)]
Fix transpose_conv typo in optimized_ops. input_depth should be output_depth.
PiperOrigin-RevId:
196767951
Jiri Simsa [Wed, 16 May 2018 01:32:01 +0000 (18:32 -0700)]
[tf.data] Fixing a race in map_and_batch.
PiperOrigin-RevId:
196764211
Benoit Steiner [Wed, 16 May 2018 01:15:32 +0000 (18:15 -0700)]
Optimize batch normalization when possible
PiperOrigin-RevId:
196762618
Kay Zhu [Wed, 16 May 2018 00:49:43 +0000 (17:49 -0700)]
[TF:XLA] Remove the need for memcpy from Tensor->Literal.
Introducing a new LiteralOwningSlice class that is similar to LiteraSlice, but owns the root piece.
PiperOrigin-RevId:
196759785
Alexandre Passos [Tue, 15 May 2018 23:26:43 +0000 (16:26 -0700)]
Fixes false-positive in tf.enable_eager_execution
Simply using ops which have name scopes and other things will trigger
adding a graph to the stack.
PiperOrigin-RevId:
196748657
A. Unique TensorFlower [Tue, 15 May 2018 22:46:14 +0000 (15:46 -0700)]
Remove misleading declaration-as-default that results in a deleted constructor, and a misguided comment.
PiperOrigin-RevId:
196742616
A. Unique TensorFlower [Tue, 15 May 2018 22:46:03 +0000 (15:46 -0700)]
Remove unused BUILD dependencies
PiperOrigin-RevId:
196742598
Allen Lavoie [Tue, 15 May 2018 21:26:21 +0000 (14:26 -0700)]
Checkpointable: Restore-on-create for name-based checkpoints when executing eagerly
Should make loading name-based checkpoints more natural with object-based APIs when executing eagerly. Before this CL they could be loaded, but users needed to use "run_restore_ops" after all variables were created (which is less useful and confusing).
PiperOrigin-RevId:
196729311
A. Unique TensorFlower [Tue, 15 May 2018 21:24:57 +0000 (14:24 -0700)]
Change the block size used in the triangular system solve step of the Cholesky decomposition algorithm to be the same block size as the Cholesky blocks. Dramatically speeds up compilation time with negligible affect on runtime.
PiperOrigin-RevId:
196729081
Allen Lavoie [Tue, 15 May 2018 20:57:39 +0000 (13:57 -0700)]
Remove a tensor_slice_reader warning when using HDF5 in Model.load_weights.
It may make sense to remove the logged warning entirely, but this change just adds an extra check for the filename.
Fixes #19289.
PiperOrigin-RevId:
196724395
A. Unique TensorFlower [Tue, 15 May 2018 20:38:34 +0000 (13:38 -0700)]
Don't ever use cuDNN to perform depthwise convolutions on CPU.
PiperOrigin-RevId:
196721302
Alexandre Passos [Tue, 15 May 2018 20:30:13 +0000 (13:30 -0700)]
Fixes potential race in ResourceHandleOp::Compute
Fixes #19299
PiperOrigin-RevId:
196720004
Priya Gupta [Tue, 15 May 2018 20:20:01 +0000 (13:20 -0700)]
Handle delayed variable initialization in MirroredStrategy. Test with RNN layer.
Bug reported and solution suggested in #19069
PiperOrigin-RevId:
196718454
Alexandre Passos [Tue, 15 May 2018 20:07:49 +0000 (13:07 -0700)]
Allows scan to work in reverse as well as forward
PiperOrigin-RevId:
196716758
Akshay Agrawal [Tue, 15 May 2018 19:28:33 +0000 (12:28 -0700)]
Improve documentation for tf.contrib.eager.defun.
PiperOrigin-RevId:
196711029
Benjamin Kramer [Tue, 15 May 2018 19:12:44 +0000 (12:12 -0700)]
[XLA] Make HloCSE compare computations
This shows up when you have two otherwise identical instructions that call a
computation, like a fusion or a reduce. Even if the called computations are
identical but not the same it wouldn't get CSE'd. I was a bit worried about the
compile time impact of comparing full computations, but this only happens if
everything else already compares equal. The impact on compile time of
benchmarks seems to be within the noise.
PiperOrigin-RevId:
196708782
Yu-Cheng Ling [Tue, 15 May 2018 18:25:25 +0000 (11:25 -0700)]
Refactoring: Remove mutable_op_resolver module
PiperOrigin-RevId:
196700821
Jianwei Xie [Tue, 15 May 2018 18:08:25 +0000 (11:08 -0700)]
Use the new compile op and print user-friendly error message without mixing with infeed/outfeed if compilation fails.
PiperOrigin-RevId:
196697690
Peter Hawkins [Tue, 15 May 2018 17:33:48 +0000 (10:33 -0700)]
Automated g4 rollback of changelist
196683444
PiperOrigin-RevId:
196691101
Benjamin Kramer [Tue, 15 May 2018 17:23:27 +0000 (10:23 -0700)]
[XLA] Cache computations when creating reduces in algebraic simplifier or batchnorm expander
Otherwise we create a lot of identical small computations. This shouldn't have
an effect except for cluttering the HLO, but turns out HloCSE doesn't look
inside of the computation of reduces, effectively never eliminating reduces
that were produced via this code path.
While there clean up some YAGNI, this only worked for F32 anyways, so just
hardcode it.
PiperOrigin-RevId:
196689316
Peter Hawkins [Tue, 15 May 2018 16:47:50 +0000 (09:47 -0700)]
[TF:XLA] Generalize existing support for keeping variables on an XLA device in reshaped form, instead allowing XLA devices to keep all tensors in a reshaped form outside an XLA computation.
PiperOrigin-RevId:
196683444
A. Unique TensorFlower [Tue, 15 May 2018 15:38:49 +0000 (08:38 -0700)]
Adding --distinct_host_configuration=false in tools/bazel.rc
When building TensorFlow, the host and target platforms are usually the same. So we don't have to distinct them by default. This helps avoid building the same targets twice.
If we need to do cross compilation, add --config=cross-compile to distinct them.
PiperOrigin-RevId:
196673728
A. Unique TensorFlower [Tue, 15 May 2018 15:09:04 +0000 (08:09 -0700)]
update doc
PiperOrigin-RevId:
196670274
A. Unique TensorFlower [Tue, 15 May 2018 14:28:24 +0000 (07:28 -0700)]
Small polishing changes in stream executor, no functional changes.
PiperOrigin-RevId:
196665609
A. Unique TensorFlower [Tue, 15 May 2018 09:48:07 +0000 (02:48 -0700)]
internal change
PiperOrigin-RevId:
196640024
A. Unique TensorFlower [Tue, 15 May 2018 08:22:13 +0000 (01:22 -0700)]
Reland improve fusion logic of (a dot b) * alpha
The previous fusion approach didn't work because a multiplication by a scalar value
will be changed into an explicit broadcast.
Another issue that is fixed in this CL is retrieving the constant value from
the literal. This depends on the PrimitiveType, before we always assumed it to be double.
Also when checking ImplementedAsGemm() we should not call it recursively, but instead just the check related to kDot.
Finally add an execution test and adjust the fusion logic test.
The fix for the issue that caused the revert is that we check earlier that consumer->operand_count() is 2.
Also, we fix the call to Get() to pass {} instead of {0}.
And we handle an output fusion node in GemmThunk to extract the dimension numbers from the dot operation.
PiperOrigin-RevId:
196631031