review.tizen.org Git - platform/upstream/pytorch.git/log

projects / platform / upstream / pytorch.git / log

Bram Wasti [Sat, 1 Dec 2018 01:27:22 +0000 (17:27 -0800)]

Fix 'unknown type name 'optional'' (#14383)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14383

D11669870 seems to have missed a spot that wasn't triggered before the stacked code above

Reviewed By: smessmer

Differential Revision: D13198269

fbshipit-source-id: 74592bedae0721acee744e31ca95253ea6efdedb

commit | commitdiff | tree

Wanchao Liang [Sat, 1 Dec 2018 01:22:48 +0000 (17:22 -0800)]

fix double precision cast from pybind (#14417)

Summary:
JIT world only have double, not float, so when insertConstant, we need to cast the python `float_` to double instead of float. This will fix the incorrect `math.pi` and other high precision constants value
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14417

Differential Revision: D13282975

Pulled By: wanchaol

fbshipit-source-id: 26a4c89ffc044d28598af673aebfec95153a869e

commit | commitdiff | tree

Elias Ellison [Sat, 1 Dec 2018 00:53:55 +0000 (16:53 -0800)]

Revert existing no_grad_embedding_renorm_ from aten (#14639)

Summary:
Remove no_grad_embedding_renorm_ from aten. Setting the derivatives of the inputs to false has different semantics from calling with no_grad(), because it will not error if an input is modified and then has it's grad accessed.

Instead, make a custom op, and use NoGradGuard.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14639

Differential Revision: D13285604

Pulled By: eellison

fbshipit-source-id: c7d343fe8f22e369669e92799f167674f124ffe7

commit | commitdiff | tree

Yan Zhu [Sat, 1 Dec 2018 00:50:37 +0000 (16:50 -0800)]

cuda implementation for PackSegment to support presence mask (#14635)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14635

as title

Reviewed By: enosair

Differential Revision: D13254097

fbshipit-source-id: b9f40109e2889907c925f9a4df9da14f67f45f38

commit | commitdiff | tree

svcscm [Sat, 1 Dec 2018 00:18:46 +0000 (16:18 -0800)]

Updating submodules

Reviewed By: yns88

fbshipit-source-id: 17487c327cbe48969dff397656fe90efcf23b699

commit | commitdiff | tree

Zeming Lin [Fri, 30 Nov 2018 22:43:29 +0000 (14:43 -0800)]

Build distributed libs in build_libtorch.py (#14037)

Summary:
This patch detects and builds c10d and gloo for the C++ API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14037

Reviewed By: ezyang

Differential Revision: D13283801

Pulled By: ebetica

fbshipit-source-id: 006dbb691344819833da6b4b844c1f0572942135

commit | commitdiff | tree

Gregory Chanan [Fri, 30 Nov 2018 22:13:51 +0000 (14:13 -0800)]

Remove methods from _th_triu_ and _th_addcmul_. (#14624)

Summary:
These somehow slipped through when we moved all of Declarations.cwrap to functions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14624

Reviewed By: ezyang

Differential Revision: D13277434

Pulled By: gchanan

fbshipit-source-id: e83451e2d0fdafb55635d4b757688a501454bf8c

commit | commitdiff | tree

Wei Yang [Fri, 30 Nov 2018 22:08:35 +0000 (14:08 -0800)]

sparse.mm(S, D) (#14526)

Summary:
- add `sparse.mm(S, D)` with backward
- for `sparse.addmm()`, relax input constraint so that sparse matrix input doesn't have to coalesced
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14526

Reviewed By: ezyang

Differential Revision: D13252990

Pulled By: weiyangfb

fbshipit-source-id: 8fdb14144405a2122d4b8447ad4055cd0330e6e8

commit | commitdiff | tree

Freddie Mendoza [Fri, 30 Nov 2018 22:04:45 +0000 (14:04 -0800)]

Put back linker flag for OpenMP to prevent build break on ppc64le (#14569)

Summary:
See #14539
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14569

Differential Revision: D13282161

Pulled By: ezyang

fbshipit-source-id: 13a1131b26fa300b037f66d1919b97d14033f9e5

commit | commitdiff | tree

Peter Goldsborough [Fri, 30 Nov 2018 21:28:19 +0000 (13:28 -0800)]

Remove OptionsGuard from ATen (#14524)

Summary:
Resubmission of https://github.com/pytorch/pytorch/pull/13738
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14524

Differential Revision: D13268031

Pulled By: goldsborough

fbshipit-source-id: fb306464b673c05ebd26d0f44d688ccd92d1d8c5

commit | commitdiff | tree

Jerry Zhang [Fri, 30 Nov 2018 21:19:31 +0000 (13:19 -0800)]

Explicitly ban uninitialized tensors when invoking Predictor classes (#14377)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14377

att

Reviewed By: dzhulgakov

Differential Revision: D13197348

fbshipit-source-id: 85a451bde3a57a8acdd3af548606c05e223896a6

commit | commitdiff | tree

Fei Sun [Fri, 30 Nov 2018 21:12:34 +0000 (13:12 -0800)]

Report timer in benchmarking when requested

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14570

Reviewed By: llyfacebook

Differential Revision: D13264904

Pulled By: sf-wind

fbshipit-source-id: fd05bc32202b7734dc911e3c792357ddf9ecedee

commit | commitdiff | tree

Peter Goldsborough [Fri, 30 Nov 2018 20:27:45 +0000 (12:27 -0800)]

Fix inheritance for SharedDataset (#14629)

Summary:
ezyang ebetica

CC jaliyae
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14629

Differential Revision: D13278988

Pulled By: goldsborough

fbshipit-source-id: 53afbcd1f3fc5cb23046ff92c4345cd90abd4584

commit | commitdiff | tree

David Riazati [Fri, 30 Nov 2018 20:10:49 +0000 (12:10 -0800)]

Move module tests to common_nn (#14578)

Summary:
This moves `new_module_tests` from `test_nn.py` to `common_nn.py` so
that they can be used in `test_jit.py` without running any of
`test_nn.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14578

Differential Revision: D13268286

Pulled By: driazati

fbshipit-source-id: 6e8654a4c29ab754d656ac83820c14d1c1843e03

commit | commitdiff | tree

svcscm [Fri, 30 Nov 2018 19:43:28 +0000 (11:43 -0800)]

Updating submodules

Reviewed By: yns88

fbshipit-source-id: 863e9e2a1f0810f96494cabae1724622b9eb91ff

commit | commitdiff | tree

Brennan Vincent [Fri, 30 Nov 2018 19:11:51 +0000 (11:11 -0800)]

Remove default constructor lines that do nothing, and fix warnings with clang trunk (#14300)

Summary:
The lines removed in this diff were no-op, but confusing: the default constructors in `store_handler.h` are implicitly deleted, since `std::runtime_error` has no default constructor.

Clang added a warning for this behavior [in September 2018](https://reviews.llvm.org/rL343285) (note that the warning is not just for cxx2a, despite the slightly confusing commit message), so building pytorch with a recent build of clang trunk causes spew of this warning, which is fixed by the present PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14300

Differential Revision: D13260039

Pulled By: umanwizard

fbshipit-source-id: 92788dbd6794253e788ef26bde250a66d8fb917e

commit | commitdiff | tree

Roy Li [Fri, 30 Nov 2018 19:10:25 +0000 (11:10 -0800)]

remove copy_wrapper (#13937)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13937

We can now replace s_copy_ with our new _copy_ function. Experimented with moving s_copy_ out of VariableManualType.cpp, but seemed like there was enough special casing to warrant it staying.

Reviewed By: ezyang

Differential Revision: D13053648

fbshipit-source-id: e9e04d460baf4ee49b500212cf91b95221acd769

commit | commitdiff | tree

Roy Li [Fri, 30 Nov 2018 19:10:25 +0000 (11:10 -0800)]

Move non_blocking copies to aten (#13866)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13866

just a straightforward port

Reviewed By: ezyang

Differential Revision: D13011878

fbshipit-source-id: f288efebf78fa634abfb681b938b44277064d5b6

commit | commitdiff | tree

Roy Li [Fri, 30 Nov 2018 19:10:25 +0000 (11:10 -0800)]

Move cuda copy to aten (#13348)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13348

Move cross device, cpu to device, device to cpu copies to aten. Most of it is a direct port, main difference is that we dispatch from a single _copy_ function for copies.

Reviewed By: ezyang

Differential Revision: D12850690

fbshipit-source-id: c2e3f336796b4ae38be6027d2ec131a274a6aa8c

commit | commitdiff | tree

Roy Li [Fri, 30 Nov 2018 19:10:25 +0000 (11:10 -0800)]

Move THTensor_(copy) to aten (#13603)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13603
P
Moved vectorized CPU copy to aten. Notable changes mainly in _copy_same_type_.

Reviewed By: ezyang

Differential Revision: D12936031

fbshipit-source-id: 00d28813e3160595e73d104f76685e13154971c1

commit | commitdiff | tree

Sam Gross [Fri, 30 Nov 2018 19:00:48 +0000 (11:00 -0800)]

Changes based on @gchanan's review of #13420 (#14441)

Summary:
```
The most significant change is that this fixes the error message when
indexing an empty tensor with an out-of-bounds index. For example:

x = torch.ones(10, 0)
x[:, [3, 4]]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14441

Differential Revision: D13226737

Pulled By: colesbury

fbshipit-source-id: d1c4a35a30e3217e3d1727d13f6b354a4a3b2a24

commit | commitdiff | tree

Michael Carilli [Fri, 30 Nov 2018 18:47:07 +0000 (10:47 -0800)]

Accumulate grad fix (#14587)

Summary:
Rebased version of https://github.com/pytorch/pytorch/pull/13337.

I don't think the lint errors in the original PR had to do with files I touched, so hopefully the rebase fixes them.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14587

Differential Revision: D13277428

Pulled By: soumith

fbshipit-source-id: f04c186b1dd4889b4250597eef87f9e9bf7b2426

commit | commitdiff | tree

fehiepsi [Fri, 30 Nov 2018 18:44:56 +0000 (10:44 -0800)]

Fix expanded mvn and lowrankmvn (#14557)

Summary:
This PR fixes an issue of the slowness expanded MVN.

A notebook to show the problem is [here](https://gist.github.com/fehiepsi/b15ac2978f1045d6d96b1d35b640d742). Basically, mvn's sample and log_prob have expensive computations based on `cholesky` and `trtrs`. We can save a lot of computation based on caching the unbroadcasted version of `scale_tril` (or `cov_diag`, `cov_factor` in lowrank mvn).
When expanding, this cached tensor should not be expanded together with other arguments.

Ref: https://github.com/uber/pyro/issues/1586

cc neerajprad fritzo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14557

Differential Revision: D13277408

Pulled By: soumith

fbshipit-source-id: a6b16f999b008d5da148ccf519b7f32d9c6a5351

commit | commitdiff | tree

Jerry Zhang [Fri, 30 Nov 2018 18:44:43 +0000 (10:44 -0800)]

Tensor construction: combine Resize+mutable_data - 2/4 (#14205)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14205

Original commit changeset: 8f9fb55842ae

Reviewed By: dzhulgakov

Differential Revision: D13126263

fbshipit-source-id: 12ba89e31b7738a81ec5c660ea7b79e8576c35dc

commit | commitdiff | tree

Daya S Khudia [Fri, 30 Nov 2018 17:34:07 +0000 (09:34 -0800)]

Unit tests need better compilation flow (#14547)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14547

Unit tests used in dnnlowp need a better compilation flow as some of them need avx. Disabling for now so that pytorch builds with fbgemm.

Reviewed By: jianyuh

Differential Revision: D13240933

fbshipit-source-id: e2e187b758c5d89e524470cd261ce35493f427a2

commit | commitdiff | tree

Soumith Chintala [Fri, 30 Nov 2018 17:31:07 +0000 (09:31 -0800)]

clean up linkage options (#14609)

Summary: minor code cleanup

Differential Revision: D13277803

Pulled By: soumith

fbshipit-source-id: 5ef925fe95037cab540b329054d7070c1ea7031e

commit | commitdiff | tree

Soumith Chintala [Fri, 30 Nov 2018 17:24:21 +0000 (09:24 -0800)]

set mkl_set_dynamic to false (#13868)

Differential Revision: D13277331

Pulled By: soumith

fbshipit-source-id: 692bb7d5157235e00dea4776d1991bb07e16ff85

commit | commitdiff | tree

Soumith Chintala [Fri, 30 Nov 2018 07:34:23 +0000 (23:34 -0800)]

fix USE_SYSTEM_NCCL build (#14606)

Summary:
fixes https://github.com/pytorch/pytorch/issues/14537
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14606

Differential Revision: D13274156

Pulled By: soumith

fbshipit-source-id: f834715e8e17dacf60be459b0efffba1d4df40ae

commit | commitdiff | tree

CircleCI [Fri, 30 Nov 2018 07:22:15 +0000 (23:22 -0800)]

Set output of aten::mm to have the same output type as the original node after op canonicalization. (#14602)

Summary:
In CanonalizeOp, addmm is separated into mm and add. But output dimension and type are not preserved for the aten::mm node. Fixing this so that the dumped graph after this pass contains accurate information.
sample output:
before:
%6 : Dynamic = aten::mm(%input.2, %5), scope: LinearModel/Sequential[model]/Linear[full0]
after:
%6 : Float(32, 200) = aten::mm(%input.2, %5), scope: LinearModel/Sequential[model]/Linear[full0]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14602

Differential Revision: D13273754

Pulled By: soumith

fbshipit-source-id: 82e22b5f30e9eb6ba9249c5a2216955421f39cc7

commit | commitdiff | tree

David Riazati [Fri, 30 Nov 2018 06:18:43 +0000 (22:18 -0800)]

Add binary cross entropy to standard lib

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14583

Differential Revision: D13269423

Pulled By: driazati

fbshipit-source-id: 7cc1594d8189c3e8f2d4ce0462fdc0a03683006e

commit | commitdiff | tree

David Riazati [Fri, 30 Nov 2018 06:16:52 +0000 (22:16 -0800)]

Add InstanceNorm, Distance modules to Script

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551

Differential Revision: D13272741

Pulled By: driazati

fbshipit-source-id: 3e4fe870d0e268903757f3ae8a56100606906bce

commit | commitdiff | tree

Pieter Noordhuis [Fri, 30 Nov 2018 05:48:58 +0000 (21:48 -0800)]

Misc distributed documentation updates (#14605)

Summary:
* s/environmental/environment/g
* Casing (CUDA, InfiniBand, Ethernet)
* Don't embed torch.multiprocessing.spawn but link to it (not part of the package)
* spawn _function_ instead of _utility_ (it's mentioned after the launch utility which is a proper utility)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14605

Differential Revision: D13273480

Pulled By: pietern

fbshipit-source-id: da6b4b788134645f2dcfdd666d1bbfc9aabd97b1

commit | commitdiff | tree

Pieter Noordhuis [Fri, 30 Nov 2018 05:36:51 +0000 (21:36 -0800)]

Enable tests for CPU tensors in test_distributed.py (#14572)

Summary:
These were not enabled after adding support in the Gloo backend. The
argument checks in ProcessGroupGloo raised an error in two cases:

* If the input tensor list to scatter was ``[None]`` on processes other
than the source process.
* If the output tensor list to gather was ``[None]`` on processes other
than the destination process.

This commit prepares these arguments explicitly instead of boxing them
at the process group call site.

This fixes #14536.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14572

Differential Revision: D13272812

Pulled By: pietern

fbshipit-source-id: 12cb0d85ec92f175365cbada585260f89330aad8

commit | commitdiff | tree

James Reed [Fri, 30 Nov 2018 04:30:02 +0000 (20:30 -0800)]

fix copy_ (#14593)

Summary:
Closes https://github.com/pytorch/pytorch/issues/14590
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14593

Differential Revision: D13272510

Pulled By: jamesr66a

fbshipit-source-id: b6921a98460c371d435277c416dad0b5ab0fec8c

commit | commitdiff | tree

Pieter Noordhuis [Fri, 30 Nov 2018 04:03:59 +0000 (20:03 -0800)]

Binding for prctl(PR_SET_PDEATHSIG) (#14491)

Summary:
If torch.multiprocessing.spawn is used to launch non-daemonic
processes (the default since #14391), the spawned children won't be
automatically terminated when the parent terminates.

On Linux, we can address this by setting PR_SET_PDEATHSIG, which
delivers a configurable signal to child processes when their parent
terminates.

Fixes #14394.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14491

Differential Revision: D13270374

Pulled By: pietern

fbshipit-source-id: 092c9d3c3cea2622c3766b467957bc27a1bd500c

commit | commitdiff | tree

Teng Li [Fri, 30 Nov 2018 03:55:34 +0000 (19:55 -0800)]

Fixed new_group won't work for two or more different rank groups (#14529)

Summary:
This fixed two things:

(1) NCCL group doesn't support 2 or more groups, this is because, we need a group name in ProcessGroupNCCL class to keep track of the ProcessGroup ID within that group name, and also the NCCL unique ID within that group name and process group ID. Otherwise, different processes will create different NCCL PG in different orders and can clash on these names. This will fix the NCCL problem.

(2) When using new_group, each rank should enter this function and update its global group name counter to ensure that every rank always operates on the same group name.

With both fixes: repro code in: https://github.com/pytorch/pytorch/issues/14528 should work with both NCCL and Gloo backends.

```
tengli@learnfair096:~$ python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=30000 ~/github_issues/nccl_group.py
rank: 0 - val: 6.0
rank: 2 - val: 6.0
rank: 3 - val: 6.0
rank: 1 - val: 6.0
rank: 4 - val: 22.0
rank: 6 - val: 22.0
rank: 5 - val: 22.0
rank: 7 - val: 22.0
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14529

Differential Revision: D13253434

Pulled By: teng-li

fbshipit-source-id: 8eb45882b996b06d951fc9a306d5de86a42e8b84

commit | commitdiff | tree

svcscm [Fri, 30 Nov 2018 03:44:24 +0000 (19:44 -0800)]

Updating submodules

Reviewed By: yns88

fbshipit-source-id: 44cd40cc9bc25629ec9547327a515bac22e5c905

commit | commitdiff | tree

David Riazati [Fri, 30 Nov 2018 03:17:32 +0000 (19:17 -0800)]

Revert D13268293: [pytorch][PR] [jit] Add InstanceNorm, Distance modules to Script

Differential Revision:
D13268293

Original commit changeset: cb33c6dcdadd

fbshipit-source-id: 214a29b74c85b7b25df0eb48e3fdb81539049130

commit | commitdiff | tree

Teng Li [Fri, 30 Nov 2018 02:46:22 +0000 (18:46 -0800)]

Make env init_method support both env and args for rank and size (#14494)

Summary:
Fixing: https://github.com/pytorch/pytorch/issues/14446

This was a supported behavior in old torch.distributed. We want to support it in the new release.

Test should cover all combination of scenario when we have either env or arg set up for rank or size or both
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14494

Differential Revision: D13253433

Pulled By: teng-li

fbshipit-source-id: c05974d84f1bdf969f74ec45763e11a841fe4848

commit | commitdiff | tree

Edward Yang [Fri, 30 Nov 2018 02:31:00 +0000 (18:31 -0800)]

Delete caffe2_cuda_full_device_control (#14283)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14283

According to Yangqing, this code was only used by us to do some end-to-end
performance experiments on the impact of cudaSetDevice and cudaGetDevice.  Now
that the frameworks are merged, there are a lot of bare calls to those functions
which are not covered by this flag.  It doesn't seem like a priority to restore
this functionality, so I am going to delete it for now.  If you want to bring
it back, you'll have to make all get/set calls go through this particular
interfaces.

Reviewed By: dzhulgakov

Differential Revision: D13156472

fbshipit-source-id: 4c6d2cc89ab5ae13f7c816f43729b577e1bd985c

commit | commitdiff | tree

Edward Yang [Fri, 30 Nov 2018 02:30:59 +0000 (18:30 -0800)]

Replace use of 'int' with more descriptive 'DeviceIndex' or 'StreamId'. (#14282)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14282

This also is a substantive change, as 'DeviceIndex' and 'StreamId' are
narrower types than 'int'.

Reviewed By: Yangqing, smessmer

Differential Revision: D13156471

fbshipit-source-id: 08aa0f70c4142415b6bd4d17c57da0641c1d0e9a

commit | commitdiff | tree

Zachary DeVito [Fri, 30 Nov 2018 01:51:45 +0000 (17:51 -0800)]

Switch import/export to python printing (#14400)

Summary:
Stacked on https://github.com/pytorch/pytorch/pull/14378, only look at the last commit.

This changes the way methods are defined in TorchScript archives to use
PythonPrint rather than ONNX protobufs.

It also updates torch.proto to directly document the tensor data
structure actually being serialized.

Notes:
* because PythonPrint prints all the methods at once per module, this
  removes MethodDef in favor of a single torchscript_area and a separate
  caffe2_graphs entry. Note that NetDef's already have method names,
  so there is no need or a separate method name entry.
* This switches cpp/pickle area to RecordRef (references to a file in
  the container format) since it is possible the data in these arenas
  may be large and not suited to json ouput.
* Removes 'annotations' -- annotations should be re-added on the first
  commit that actually has a practical use for them. In the current state
  it is unlikely they are representing the right information.
* Some expect files have changed because PythonPrint is preserving more
  debug name information for parameter names.
* MethodEncoder (the ONNX output format) has been deleted. There is still
  some cleanup possible combining EncoderBase and GraphEncode now that there
  is only a single pathway using EncoderBase.
* This incorporates the changes from #14397
  to define TensorDef
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14400

Reviewed By: suo

Differential Revision: D13231800

Pulled By: zdevito

fbshipit-source-id: af5c1152d0bd6bca8b06c4703f59b161bb19f571

commit | commitdiff | tree

Teng Li [Fri, 30 Nov 2018 01:48:04 +0000 (17:48 -0800)]

PT1 distributed doc update (#14530)

Summary:
Removed an incorrect section. We don't support this. I wrote this from my memory :(
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14530

Differential Revision: D13253471

Pulled By: teng-li

fbshipit-source-id: c3f1ffc6c98ef8789157e885776e0b775ec47b15

commit | commitdiff | tree

David Riazati [Fri, 30 Nov 2018 01:23:16 +0000 (17:23 -0800)]

Add InstanceNorm, Distance modules to Script

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551

Differential Revision: D13268293

Pulled By: driazati

fbshipit-source-id: cb33c6dcdaddf8c7a49b3535894d77bf5d771ddd

commit | commitdiff | tree

David Riazati [Fri, 30 Nov 2018 01:21:08 +0000 (17:21 -0800)]

Add `List` to annotations (#14482)

Summary:
This PR adds a polyfill for `typing.List` for Python versions that don't
support `typing` as a builtin. It also moves the type defintions from
`annotations.py` so that they can be used in `torch.nn`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14482

Differential Revision: D13237570

Pulled By: driazati

fbshipit-source-id: 6575b7025c2d98198aee3b170f9c4323ad5314bd

commit | commitdiff | tree

Lu Fang [Fri, 30 Nov 2018 00:29:46 +0000 (16:29 -0800)]

update of fbcode/onnx to f461f7aad9987635b4aff108620ed7918f002d19 (#14568)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14568

Previous import was 882c5283c54345d131e8fe5c859e4844dcf7ca8e

Included changes:
- **[f461f7a](https://github.com/onnx/onnx/commit/f461f7a)**: Show the op's type and name when the shape inference is failed. (#1623) <Jerry>
- **[ab8aaf9](https://github.com/onnx/onnx/commit/ab8aaf9)**: Add scan test case (#1586) <G. Ramalingam>
- **[c95357e](https://github.com/onnx/onnx/commit/c95357e)**: link the tutorial (#1650) <Lu Fang>
- **[d7e2420](https://github.com/onnx/onnx/commit/d7e2420)**: Upgrade label encoder to support more input types (#1596) <Wei-Sheng Chin>
- **[6425108](https://github.com/onnx/onnx/commit/6425108)**: Add Doc about Adding New Operator into ONNX (#1647) <Lu Fang>
- **[295889c](https://github.com/onnx/onnx/commit/295889c)**: use an empty initializer to create map (#1643) <Lu Fang>
- **[e38f3ec](https://github.com/onnx/onnx/commit/e38f3ec)**: Remove redundant const (#1639) <daquexian>
- **[ea694bf](https://github.com/onnx/onnx/commit/ea694bf)**: implement fuse reduce->unsqueeze + fix assumption in nop_dropout pass (#1565) <Armen>
- **[6db386e](https://github.com/onnx/onnx/commit/6db386e)**: make output shape clear enough for Softmax family (#1634) <Lu Fang>
- **[2b67c6e](https://github.com/onnx/onnx/commit/2b67c6e)**: fix batchnorm doc (#1633) <Lu Fang>
- **[c901784](https://github.com/onnx/onnx/commit/c901784)**: remove inappropriate consts (#1632) <Lu Fang>
- **[de82119](https://github.com/onnx/onnx/commit/de82119)**: Shape inference fix for broadcast, concat and scan (#1594) <KeDengMS>
- **[d7ffe3b](https://github.com/onnx/onnx/commit/d7ffe3b)**: Update Optimizer Docs (#1607) <Armen>
- **[d09d139](https://github.com/onnx/onnx/commit/d09d139)**: mark PROTOBUF_INCLUDE_DIRS as BUILD_INTERFACE (#1466) <Yuta Okamoto>
- **[eb4b7c2](https://github.com/onnx/onnx/commit/eb4b7c2)**: allow variadic parameters of different types (#1615) <G. Ramalingam>
- **[4166246](https://github.com/onnx/onnx/commit/4166246)**: Fix onnxifi test (#1617) <Yinghai Lu>
- **[6706a4d](https://github.com/onnx/onnx/commit/6706a4d)**: Fix a bug in vector address access (#1598) <Raymond Yang>
- **[ae39866](https://github.com/onnx/onnx/commit/ae39866)**: Separate types of inputs 1 and 2 in OneHot op. (#1610) <Spandan Tiwari>
- **[45ba661](https://github.com/onnx/onnx/commit/45ba661)**: Handle new types in the switch. (#1608) <Dmitri Smirnov>
- **[14853b6](https://github.com/onnx/onnx/commit/14853b6)**: Bump docker image version to 230 used in CircleCI (#1606) <bddppq>
- **[e0993b8](https://github.com/onnx/onnx/commit/e0993b8)**: [onnxifi] Make sure that backend handles run async. (#1599) <Roman Dzhabarov>
- **[e6965cc](https://github.com/onnx/onnx/commit/e6965cc)**: Introduce SparseTensor ML proto (#1554) <Dmitri Smirnov>
- **[75b782f](https://github.com/onnx/onnx/commit/75b782f)**: In driver test check the return status of onnxGetBackendIDs (#1597) <bddppq>
- **[c05b364](https://github.com/onnx/onnx/commit/c05b364)**: Make CI log less verbose (#1595) <bddppq>
- **[fa568e4](https://github.com/onnx/onnx/commit/fa568e4)**: Loop type shape inferencing (#1591) <Scott McKay>
- **[937e64c](https://github.com/onnx/onnx/commit/937e64c)**: add uint8 (#1590) <Lu Fang>
- **[f86e951](https://github.com/onnx/onnx/commit/f86e951)**: Add domain as an optional parameter for make_node function (#1588) <Young Kim>
- **[ff45588](https://github.com/onnx/onnx/commit/ff45588)**: Remove unreachable code in shape_inference.h (#1585) <Changming Sun>
- **[f7dcad0](https://github.com/onnx/onnx/commit/f7dcad0)**: Add several hyperbolic function ops. (#1499) <Sergii Dymchenko>
- **[a60ac7d](https://github.com/onnx/onnx/commit/a60ac7d)**: Add OneHot op to ONNX. (#1567) <Spandan Tiwari>
- **[f6c3a7e](https://github.com/onnx/onnx/commit/f6c3a7e)**: [compiler flag] Issue a warning if class has virtual method but missing virtual dtor. (#1583) <Roman Dzhabarov>
- **[88d1784](https://github.com/onnx/onnx/commit/88d1784)**: Fix MaxUnpool shape inference when output_shape is provided as input (#1578) <Spandan Tiwari>
- **[20041b7](https://github.com/onnx/onnx/commit/20041b7)**: Add type shape inferencing for the If operator (#1571) <Scott McKay>
- **[d6c4c75](https://github.com/onnx/onnx/commit/d6c4c75)**: Add a virtual destructor to GraphInferencer (#1574) <Changming Sun>
- **[a339598](https://github.com/onnx/onnx/commit/a339598)**: fix ConvTranspose spec (#1566) <Wenhao Hu>

Reviewed By: zrphercule

Differential Revision: D13263831

fbshipit-source-id: a2ff22c6454e2430429e5a7d18d21661a7ffb0cb

commit | commitdiff | tree

Jane Wang [Fri, 30 Nov 2018 00:14:01 +0000 (16:14 -0800)]

add gloo support for reduce on GPU (#14443)

Summary:
as titled
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14443

Reviewed By: pietern

Differential Revision: D13222907

Pulled By: janewangfb

fbshipit-source-id: f418c5d84880196f97089114d02957cf739243f8

commit | commitdiff | tree

Edward Yang [Fri, 30 Nov 2018 00:01:46 +0000 (16:01 -0800)]

Expunge use of type() from SparseTensor.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14546

Reviewed By: gchanan

Differential Revision: D13258512

fbshipit-source-id: b2d562b6c5228288f60f02beab3c44c50163248f

commit | commitdiff | tree

Edward Yang [Fri, 30 Nov 2018 00:01:45 +0000 (16:01 -0800)]

Expunge occurrences of type() from scalar_test (#14545)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14545

Self explanatory

Reviewed By: gchanan

Differential Revision: D13258513

fbshipit-source-id: abce357de57b95cde58b3894c251da519ede6b53

commit | commitdiff | tree

Edward Yang [Fri, 30 Nov 2018 00:01:44 +0000 (16:01 -0800)]

Expunge use of type() in Distributions.cpp (#14544)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14544

Modern usage is options(). This doesn't have a functional
difference, because all call sites were CPU only (where
getting the device index right doesn't matter.)

Reviewed By: gchanan

Differential Revision: D13258252

fbshipit-source-id: c70f8d618ee9caf37ff2469cceaa439348b6114c

commit | commitdiff | tree

Edward Yang [Fri, 30 Nov 2018 00:01:44 +0000 (16:01 -0800)]

Expunge uses of type() from EmbeddingBag. (#14543)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14543

The modern way to do this is to use options(). It doesn't
make a functional difference here because everything is CPU
(so loss of device information is not a big deal), but
it's definitely safer this way.

Reviewed By: gchanan

Differential Revision: D13257847

fbshipit-source-id: afbc9f7f8d4ca5a8b1cf198997c307e27a2c3333

commit | commitdiff | tree

Edward Yang [Fri, 30 Nov 2018 00:01:44 +0000 (16:01 -0800)]

Expunge direct device index handling from tensor_conversion_dispatch (#14421)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14421

Last time I looked this, I bailed because it seemed like there were
a lot of sites to fix. Well, I need this to work properly for out-of-place
HIPify, so I took another whack at it. Changes should be pretty self-explanatory.

Reviewed By: gchanan

Differential Revision: D13221302

fbshipit-source-id: ed21e2668a1a629898a47358baf368fe680263a0

commit | commitdiff | tree

Jerry Zhang [Thu, 29 Nov 2018 23:16:52 +0000 (15:16 -0800)]

call raw_mutable_data when data type didn't match in BlobGetMutableTensor (#14513)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14513

att

Reviewed By: dzhulgakov

Differential Revision: D13245875

fbshipit-source-id: 3398a1f41a6195e120ed574dee887070e86dfe1f

commit | commitdiff | tree

David Riazati [Thu, 29 Nov 2018 23:13:45 +0000 (15:13 -0800)]

Add broadcast list default arg support (#14361)

Summary:
To convert `max_unpool` functions to weak script, this PR adds support
for `T` as default arguments for `BroadcastingListN[T]`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14361

Differential Revision: D13192231

Pulled By: driazati

fbshipit-source-id: a25b75a0e88ba3dfa22d6a83775e9778d735e249

commit | commitdiff | tree

Michael Carilli [Thu, 29 Nov 2018 22:47:32 +0000 (14:47 -0800)]

Added launch bounds in VolumetricConvolution.cu (#14564)

Summary:
A few months ago we were seeing test failures on certain architectures due to invalid launch configurations of the kernels in aten/src/THCUNN/VolumetricConvolution.cu.

This PR ensures that those kernels are always compiled such that at least one block can be resident on an SM, and such errors will not be encountered at runtime on any architecture after compiling for that architecture.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14564

Differential Revision: D13266136

Pulled By: soumith

fbshipit-source-id: 35464b20848bb0a1168e8f3b233172331c50b35b

commit | commitdiff | tree

rohithkrn [Thu, 29 Nov 2018 21:58:11 +0000 (13:58 -0800)]

Unify cuda and hip device types in Caffe2 python front end (#14221)

Summary:
Goal of this PR is to unify cuda and hip device types in caffe2 python front end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221

Differential Revision: D13148564

Pulled By: bddppq

fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b

commit | commitdiff | tree

Lin Huang [Thu, 29 Nov 2018 21:54:19 +0000 (13:54 -0800)]

Fix tautological-compare in aten/src/ATen/native/cuda/SummaryOps.cu (#14540)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14540

refactor the HANDLE_SWITCH_CASE to avoid tautological-compare in macro

Reviewed By: ezyang

Differential Revision: D13255725

fbshipit-source-id: cfa64bb7bc53d19c93a693015202f207567690b4

commit | commitdiff | tree

zrphercule [Thu, 29 Nov 2018 21:47:50 +0000 (13:47 -0800)]

Update to export in onnx_aten_fallback option

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14492

Differential Revision: D13265701

Pulled By: zrphercule

fbshipit-source-id: b339c92078f73d152a14db7d5d2b3f5edda9dda6

commit | commitdiff | tree

Junjie Bai [Thu, 29 Nov 2018 21:19:45 +0000 (13:19 -0800)]

Add back the MAX_JOBS=4 restriction to make rocm CI more stable (#14566)

Summary:
As a workaround before hcc has fixed high memory usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14566

Differential Revision: D13263555

Pulled By: bddppq

fbshipit-source-id: 479c7a76aff3919f028e03ef345795537480f0fa

commit | commitdiff | tree

Michael Suo [Thu, 29 Nov 2018 21:04:33 +0000 (13:04 -0800)]

assorted alias analysis fixes (#14556)

Summary:
- Correctly report whether nodes write to an alias set.
- Fix loop convergence.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14556

Differential Revision: D13261376

Pulled By: suo

fbshipit-source-id: 8123c0fb1f8f137a15bd82719be2d99e502bccc2

commit | commitdiff | tree

Adam Paszke [Thu, 29 Nov 2018 21:02:15 +0000 (13:02 -0800)]

Broadcast prim::FusedConcat inputs independently when checking kernels (#14503)

Summary:
Fixes #14483.

cc zou3519 mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14503

Differential Revision: D13256343

Pulled By: zou3519

fbshipit-source-id: 1c68a23f425be067a742bada7ee8cdfab7fc3fa2

commit | commitdiff | tree

Your Name [Thu, 29 Nov 2018 19:15:03 +0000 (11:15 -0800)]

Do not load ROCm cmake files if USE_ROCM is off (#14261)

Summary:
Previously if it unconditionally tries to load rocm cmake files, so there was no way to disable rocm build. After this change, USE_ROCM=0 will disable rocm build.
Should fix #14025

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14261

Differential Revision: D13242090

Pulled By: bddppq

fbshipit-source-id: 652ec7d49dce9b357778bfa53a8e04b7079787ab

commit | commitdiff | tree

Sebastian Messmer [Thu, 29 Nov 2018 19:04:40 +0000 (11:04 -0800)]

Remove at references in c10 Allocator.h (#14434)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14434

The referenced classes live now in c10, so we don't need to specify their namespace.

Reviewed By: ezyang

Differential Revision: D13224015

fbshipit-source-id: 6d154b8e3f9a1e38ff0407dbb1151f5c1d5df260

commit | commitdiff | tree

Pieter Noordhuis [Thu, 29 Nov 2018 17:14:37 +0000 (09:14 -0800)]

Add sourceRank() to ProcessGroup::Work (#14453)

Summary:
This function is only implemented for the subclasses where it makes
sense. If it's not overridden it will throw an error. Having this
function removes the need for a pointer passing hack to pass the
source rank of a recv operation back to the caller. Instead, the
caller can now call `source_rank` on the work object and achieve
the same result.

Closes #11804.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14453

Differential Revision: D13230898

Pulled By: pietern

fbshipit-source-id: ef38f48bfaca8ef9a364e5be122951bafc9f8e49

commit | commitdiff | tree

Matthew Heidemann [Thu, 29 Nov 2018 16:16:29 +0000 (08:16 -0800)]

Fixed typo for BCEWithLogitLoss doc comments (#14532)

Summary:
The math symbol was missing a prefix `:`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14532

Differential Revision: D13256077

Pulled By: soumith

fbshipit-source-id: 2359819d8aa664f915be1c436cbb0c0756504028

commit | commitdiff | tree

Ryan Moore [Thu, 29 Nov 2018 15:14:39 +0000 (07:14 -0800)]

typo in Module docstring

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14511

Differential Revision: D13246061

Pulled By: soumith

fbshipit-source-id: 6c13a2957c4c4324ab5d839d634689c61e25b0fe

commit | commitdiff | tree

Jaliya Ekanayake [Thu, 29 Nov 2018 15:04:52 +0000 (07:04 -0800)]

Jaliyae/samplers (#13870)

Summary:
Make Samplers optionally accept new size in their reset() method. This helps dataloader or dataset to reset the sampler for an epoch or a chunk of data with different sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13870

Differential Revision: D13240120

Pulled By: soumith

fbshipit-source-id: 19c53f8be13c0fdcf504f0637b0d3e6009a8e599

commit | commitdiff | tree

David Riazati [Thu, 29 Nov 2018 07:28:59 +0000 (23:28 -0800)]

Use nn module tests in test_jit (#14238)

Summary:
This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types`

Also depends on #14379
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238

Differential Revision: D13252887

Pulled By: driazati

fbshipit-source-id: e9638cf74089884a32b8f0f38396cf432c02c988

commit | commitdiff | tree

svcscm [Thu, 29 Nov 2018 05:37:40 +0000 (21:37 -0800)]

Updating submodules

Reviewed By: yns88

fbshipit-source-id: f957056bb48c583738c5defaf3d1f01cd7df3915

commit | commitdiff | tree

svcscm [Thu, 29 Nov 2018 05:07:02 +0000 (21:07 -0800)]

Updating submodules

Reviewed By: yns88

fbshipit-source-id: 9800251baaa09d9f7988eff340ef36e0ab11f579

commit | commitdiff | tree

Peter Goldsborough [Thu, 29 Nov 2018 04:25:21 +0000 (20:25 -0800)]

Fix version.groups() (#14505)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/14502

fmassa soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14505

Differential Revision: D13242386

Pulled By: goldsborough

fbshipit-source-id: faebae8795e1efd9c0ebc2294fe9648193d16624

commit | commitdiff | tree

Elias Ellison [Thu, 29 Nov 2018 03:14:16 +0000 (19:14 -0800)]

Support Embedding + EmbeddingBag in Script + (Ignore flakey test) (#14509)

Summary:
Resubmitting PR #14415

The tests added for Embedding + EmbeddingBag had random numbers as input, which affected the random number generator & caused the flakey test to break.

Everything but the last two commits have already been accepted
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14509

Differential Revision: D13247917

Pulled By: eellison

fbshipit-source-id: ea6963c47f666c07687787e2fa82020cddc6aa15

commit | commitdiff | tree

Elias Ellison [Thu, 29 Nov 2018 02:12:22 +0000 (18:12 -0800)]

pointwise_loss (#14134)

Summary:
Adding pointwise loss ops to weak_script
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14134

Differential Revision: D13209455

Pulled By: eellison

fbshipit-source-id: 87fc0222121f34a2f4edb24c2da2a11124b097d8

commit | commitdiff | tree

James Sun [Thu, 29 Nov 2018 02:05:10 +0000 (18:05 -0800)]

Merge Caffe2 and PyTorch thread pool definitions (#14114)

Summary:
(1) Move Caffe2 thread pool to aten
(2) Use the same thread pool definition for PyTorch interpreter
(3) Make ivalue::Future thread-safe
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14114

Reviewed By: ilia-cher

Differential Revision: D13110451

Pulled By: highker

fbshipit-source-id: a83acb6a4bafb7f674e3fe3d58f7a74c68064fac

commit | commitdiff | tree

Sam Gross [Thu, 29 Nov 2018 01:51:01 +0000 (17:51 -0800)]

Ensure that indices are on the same device as self

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14504

Reviewed By: wat3rBro

Differential Revision: D13242200

Pulled By: colesbury

fbshipit-source-id: 82731cee808681ec612d406342070640eb26e519

commit | commitdiff | tree

Dmytro Dzhulgakov [Wed, 28 Nov 2018 23:43:22 +0000 (15:43 -0800)]

Remove Context dependency from Tensor class (#14269)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14269

Removes reference to Context proper and instead adds a bool argument for async copy (the same as `copy_`)

For CopyFrom - I haven't tweaked all callsites yet. Instead I rely on a terrible hack that pointer to context is implicitly converted to bool when passed, haha :) It's not a good code and I propose to fix it in a follow up diff (maybe using clangr tooling).

Reviewed By: ezyang

Differential Revision: D13117981

fbshipit-source-id: 7cb1dc2ba6a4c50ac26614f45ab8318ea96e3138

commit | commitdiff | tree

Dmytro Dzhulgakov [Wed, 28 Nov 2018 23:43:22 +0000 (15:43 -0800)]

Change Tensor::CopyFrom to a simple double dispatch (#14268)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14268

Removes the need for Context in Tensor by doing simple dispatch for CopyBytes. It'd eventually be subsumed by Roy Li's changes of proper copy_ op, but before that is done, let's get a clear logic of how copies are implemented and clean up some craft in CopyFrom implementation.

Note, that with these changes, one can probably can get rid of Context::CopyFromCPU/CopyToCPU, but it's a matter for follow up diffs.

This diff doesn't change the API of Tensor yet, but relies on the fact that passing `Context` to CopyFrom makes copy async if the device is CUDA and doesn't have any effect otherwise (that's how Context methods are implemented).

This doesn't change semantics of copy async implementation - as before it blindly calls cudaMemcpyAsync which probably means that it can be misused if invoked separately outside of operator body. I'll leave it for the follow up copy_ unification.

For Extend() we always do async copy - it makes sense as it's an in-place device-device operation and only any further op would be observable.

Note: there are now three ways of invoking copy in C2 code - templated CopyBytes, virtual CopyFromCPU/etc, and double-dispatch free method here. Hopefully we can get rid of the second one.

Also, please advise whether it's c10-worthy :)

Reviewed By: ezyang

Differential Revision: D13117987

fbshipit-source-id: a6772d6dcf3effaf06717da3a656fc9873b310b5

commit | commitdiff | tree

albanD [Wed, 28 Nov 2018 23:25:09 +0000 (15:25 -0800)]

Update Tensor doc (#14339)

Summary:
Add to the Tensor doc info about `.device`, `.is_cuda`, `.requires_grad`, `.is_leaf` and `.grad`.
Update the `register_backward_hook` doc with a warning stating that it does not work in all cases.
Add support in the `_add_docstr` function to add docstring to attributes.

There is an explicit cast here but I am not sure how to handle it properly. The thing is that the doc field for getsetdescr is written as being a const char * (as all other doc fields in descriptors objects) in cpython online documentation. But in the code, it is the only one that is not const.
I assumed here that it is a bug in the code because it does not follow the doc and the convention of the others descriptors and so I cast out the const.
EDIT: the online doc I was looking at is for 3.7 and in that version both the code and the doc are const. For older versions, both are non const.
Please let me know if this should not be done. And if it should be done if there is a cleaner way to do it !
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14339

Differential Revision: D13243266

Pulled By: ezyang

fbshipit-source-id: 75b7838f7cd6c8dc72b0c61950e7a971baefaeeb

commit | commitdiff | tree

andersj [Wed, 28 Nov 2018 22:40:50 +0000 (14:40 -0800)]

nccl fixes (#14195)

Summary:
This has 4 changes

1) propagate USE_SYSTEM_NCCL. Previously it was ignored and cmake always did a FindPackage
2) respect SCCACHE_DISABLE in our caffe2 sccache wrapper for circleci
3) use SCCACHE_DISABLE when building nccl, because it triggers the same bug as when using CCACHE (already tracked in https://github.com/pytorch/pytorch/issues/13362). This was hidden because we weren't respecting USE_SYSTEM_NCCL, and were never building nccl ourselves in CI
4) In one particular CI configuration (caffe2, cuda 8, cudnn 7), force USE_SYSTEM_NCCL=1. Building the bundled nccl triggers a bug in nvlink. I've done some investigation, but this looks like a tricky, preexisting bug, so rather than hold up this diff I'm tracking it separately in https://github.com/pytorch/pytorch/issues/14486
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14195

Differential Revision: D13237502

Pulled By: anderspapitto

fbshipit-source-id: 1100ac1269c7cd39e2e0b3ba12a56a3ce8977c55

commit | commitdiff | tree

Edward Yang [Wed, 28 Nov 2018 21:59:59 +0000 (13:59 -0800)]

Clean up house on CUDAStream (#14247)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14247

Just a bunch of clean up to get the code in a good state before we
enshrine it in c10.

Billing of changes:
- Inline all "pointer" API functions into their real implementations,
  so we don't have a bunch of dead pointer functions hanging around.
- Replace all occurrences of int64_t with DeviceIndex, as appropriate
- Rename device field to device_index
- Add documentation for everything in CUDAStream.h
- Bring CUDAStream to API parity with Stream (e.g., support equality)
- Delete uncheckedSetCurrentCUDAStream, it didn't work anyway because
  StreamId to internal pointer conversion has a bunch of ways it can
  fail.  Just hope for the best!

Reviewed By: dzhulgakov

Differential Revision: D13141949

fbshipit-source-id: a02f34921e3d8294bd77c262bd05da07d1740a71

commit | commitdiff | tree

Edward Yang [Wed, 28 Nov 2018 21:52:44 +0000 (13:52 -0800)]

Make clang-tidy shut up about Python C API macros.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14480

Reviewed By: goldsborough

Differential Revision: D13235001

fbshipit-source-id: cd7f00b12ed3d9ef0fb0d7bd6c428e21561ec1b6

commit | commitdiff | tree

Sebastian Messmer [Wed, 28 Nov 2018 21:37:31 +0000 (13:37 -0800)]

Make TensorImpl/StorageImpl safer (#14429)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14429

- forbid copying
- make final what ought to be

Reviewed By: dzhulgakov

Differential Revision: D13223125

fbshipit-source-id: e6176cc916d4cd8370c835f243ca90d5c3124c4a

commit | commitdiff | tree

Sebastian Messmer [Wed, 28 Nov 2018 21:37:31 +0000 (13:37 -0800)]

Handle copying intrusive_ptr_target correctly (#14428)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14428

See in-code comment

Reviewed By: ezyang

Differential Revision: D13223126

fbshipit-source-id: 1e87e6112bbcca6377ca04ef2ba25ef937931061

commit | commitdiff | tree

Edward Yang [Wed, 28 Nov 2018 21:36:40 +0000 (13:36 -0800)]

Revert D13219647: [pytorch][PR] Support Embedding + EmbeddingBag in Script

Differential Revision:
D13219647

Original commit changeset: c90706aa6fbd

fbshipit-source-id: d189e717ba0773de43d633876bc3a688830a9303

commit | commitdiff | tree

Sebastian Messmer [Wed, 28 Nov 2018 21:30:36 +0000 (13:30 -0800)]

Remove StorageImpl::type() (#14139)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14139

This seems neither be used nor implemented. Also, it is a c10->aten dependency which we don't want.

Reviewed By: ezyang

Differential Revision: D13112298

fbshipit-source-id: 0407c4c3ac9b02bbd6fca478336cb6a6ae334930

commit | commitdiff | tree

Jerry Zhang [Wed, 28 Nov 2018 21:24:30 +0000 (13:24 -0800)]

Add XBlobGetMutableTensor that returns Tensor (#14424)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14424

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14136

Since now Tensor is a shared_ptr, it doesn't make sense to have Tensor* around anymore,
so we want to change Tensor* to Tensor in the interface.
We added functions that work with `Tensor` instead of `Tensor*` in this diff.

To remove Tensor*, we'll do following
```
auto* Y = Ouptut(0);
Y->mutable_data...
```
-->
```
auto Y = Output(0);
Y.mutable_data...
```

But to run clangr codemod, we'll keep both APIs in different names, e.g. `Output` and `XOutput`, and do the refactor and then delete the old method and rename the new method into the old one.
For example for `Output`, we'll first codemod the callsites from `Output` to `XOutput`, then delete the old `Output` and rename `XOutput` to `Output` in the end.

Reviewed By: smessmer

Differential Revision: D12934074

fbshipit-source-id: d0e85f6ef8d13ed4e7a7505faa5db292a507d54c

commit | commitdiff | tree

Pieter Noordhuis [Wed, 28 Nov 2018 19:32:47 +0000 (11:32 -0800)]

Add timeout kwarg to init_process_group (#14435)

Summary:
This applies to the gloo backend only. Timeout support for the NCCL and
MPI backends is tracked in issues #14371 and #14372 respectively.

When creating a new process group (either the global one or any subgroup
created through `new_group`) you can specify a timeout keyword
argument (of type datetime.timedelta). This timeout applies to all
collective operations executed against that process group, such that any
operation taking longer than the timeout will throw a runtime error.
Using a different, better catchable error type is tracked in #14433.

This fixes #14376.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14435

Differential Revision: D13234317

Pulled By: pietern

fbshipit-source-id: 973993b67994dc64861c0977cbb6f051ec9d87f6

commit | commitdiff | tree

Edward Yang [Wed, 28 Nov 2018 19:05:36 +0000 (11:05 -0800)]

Add support for HIP to DispatchStub. (#14413)

Summary:
I feel a bit bad writing this patch, because there isn't really
any reason not to use the normal dispatch mechanism for CUDA
and HIP here (so we have *yet another dispatcher*), but I don't
really want to sign up to rewrite DispatchStub to deduplicate the
dispatcher right now.

Need to natively add support for HIP here, as I don't want to
have to HIPify files which are not in a CUDA directory.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14413

Differential Revision: D13220358

Pulled By: ezyang

fbshipit-source-id: cc61218322589a1dc2ab8eb9d5ddd3c616f6b712

commit | commitdiff | tree

Elias Ellison [Wed, 28 Nov 2018 18:50:26 +0000 (10:50 -0800)]

Support Embedding + EmbeddingBag in Script (#14415)

Summary:
Add support for Embedding and EmbeddingBag in script. Both functions require with torch.no_grad(), which we don't have any plans to support in the near future. To work around this, I added a embedding_renorm function without derivatives.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14415

Reviewed By: wanchaol

Differential Revision: D13219647

Pulled By: eellison

fbshipit-source-id: c90706aa6fbd48686eb10f3efdb65844be7b8717

commit | commitdiff | tree

Jongsoo Park [Wed, 28 Nov 2018 18:39:46 +0000 (10:39 -0800)]

fix build error from D13188595 (#14481)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14481

Fix build error in mode/opt

Reviewed By: dskhudia

Differential Revision: D13234688

fbshipit-source-id: 6c8515c45f75e7b88713a303f22990ad85d68beb

commit | commitdiff | tree

Raghavendra Thodime [Wed, 28 Nov 2018 18:39:31 +0000 (10:39 -0800)]

Revert D13144472: [fix] condition blob in while_op test changes data type

Differential Revision:
D13144472

Original commit changeset: af4d920a3148

fbshipit-source-id: 74d9f69fc66964b5e68b4b2cd2fd2be1f63e9d69

commit | commitdiff | tree

Jiong Gong [Wed, 28 Nov 2018 18:35:28 +0000 (10:35 -0800)]

Fix the build issue in setup.py due to cmake version type x.x.x.x vio… (#14331)

Summary:
See https://github.com/pytorch/pytorch/issues/13226
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14331

Differential Revision: D13234639

Pulled By: orionr

fbshipit-source-id: 87880057e84242e4af5ad6bf87e08831aa2c5459

commit | commitdiff | tree

JerryShih [Wed, 28 Nov 2018 17:26:25 +0000 (09:26 -0800)]

Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#14473)

Summary:
Original PR: https://github.com/pytorch/pytorch/pull/11563
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14473

Differential Revision: D13234208

Pulled By: ezyang

fbshipit-source-id: 7d874c63659e93728af239ecdfb85547613e52ad

commit | commitdiff | tree

Edward Yang [Wed, 28 Nov 2018 15:38:04 +0000 (07:38 -0800)]

Revert D13166626: [pytorch][PR] ignore generated caffe2 docs and virtualenvs

Differential Revision:
D13166626

Original commit changeset: 4f11228d8b5d

fbshipit-source-id: ff301f1791ca8a390767ae43cde8637dcd044d0c

commit | commitdiff | tree

Brennan Vincent [Wed, 28 Nov 2018 14:50:49 +0000 (06:50 -0800)]

Make `mean` function work across multiple dimensions. (#14252)

Summary:
Multi-dimensional `sum` is already implemented, and it's trivial to implement `mean` in terms of `sum`, so just do it.

Bonus: Fix incomplete language in the `torch.sum` documentation which doesn't take into account multiple dimensions when describing `unsqueeze` (at the same time as introducing similar language in `torch.mean`).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14252

Differential Revision: D13161157

Pulled By: umanwizard

fbshipit-source-id: c45da692ba83c0ec80815200c5543302128da75c

commit | commitdiff | tree

Francisco Massa [Wed, 28 Nov 2018 14:11:08 +0000 (06:11 -0800)]

Fix half tensor printing plus speedup large tensor printing (#14418)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/14344 and https://github.com/pytorch/pytorch/issues/6863

The slowdown was due to the fact that we were only summarizing the tensor (for computing the number of digits to print) if its first dimension was larger than the threshold. It now goes over all the dimensions.

Some quick runtime analysis:

Before this PR:
```python
In [1]: import torch; a = torch.rand(1, 1700, 34, 50)

In [2]: %timeit str(a)
13.6 s ± 84.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

After this PR

```python
In [1]: import torch; a = torch.rand(1, 1700, 34, 50)

In [2]: %timeit str(a)
2.08 ms ± 395 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [3]: b = a.cuda()

In [4]: %timeit str(b)
8.39 ms ± 45.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14418

Reviewed By: weiyangfb

Differential Revision: D13226950

Pulled By: soumith

fbshipit-source-id: 19eb4b855db4c8f891d0925a9c56ae8a2824bb23

commit | commitdiff | tree

Wei Yang [Wed, 28 Nov 2018 10:16:56 +0000 (02:16 -0800)]

torch.sparse.sum() (#12430)

Summary:
- to fix #12241
- add `_sparse_sum()` to ATen, and expose as `torch.sparse.sum()`, not support `SparseTensor.sum()` currently
- this PR depends on #11253, and will need to be updated upon it lands
- [x] implement forward
- [x] implement backward
- performance [benchmark script](https://gist.github.com/weiyangfb/f4c55c88b6092ef8f7e348f6b9ad8946#file-sparse_sum_benchmark-py):
  - sum all dims is fastest for sparse tensor
  - when input is sparse enough nnz = 0.1%, sum of sparse tensor is faster than dense in CPU, but not necessary in CUDA
  - CUDA backward is comparable (<2x) between `sum several dims` vs `sum all dims` in sparse
  - CPU backward uses binary search is still slow in sparse, takes `5x` time in `sum [0, 2, 3] dims` vs `sum all dims`
    - optimize CUDA backward for now
      - using thrust for sort and binary search, but runtime not improved
  - both of CPU and CUDA forward are slow in sparse (`sum several dims` vs `sum all dims`), at most `20x` slower in CPU, and `10x` in CUDA
    - improve CPU and CUDA forward kernels

(nnz, sizes, sum_dims, keepdim, sum all or dims, bk=backward) | CPU (sparse vs dense) | CUDA(sparse vs dense)
-- | -- | --
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumAll) | 8.77 µs vs 72.9 µs | 42.5 µs vs 108 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumD) | 112 µs vs 4.47 ms | 484 µs vs 407 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) | 141 µs vs 148 µs | 647 µs vs 231 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumD, bk) | 235 µs vs 1.23 ms | 781 µs vs 213 µs
(1000,   [1000, 1000, 2, 2], [2, 3], False, sumD) | 48.5 µs vs 360 µs | 160 µs vs 2.03 ms
(1000,   [1000, 1000, 2, 2], [2, 3], False, sumD, bk) | 258 µs vs 1.22 ms | 798 µs vs 224 µs
(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD) | 204 µs vs 882 µs | 443 µs vs 133 µs
(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) | 709 µs vs 1.15 ms | 893 µs vs 202 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumAll) | 39.8 µs vs 81 µs | 42.4 µs vs 113 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumD) | 747 µs vs 4.7 ms | 2.4 ms vs 414 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) | 1.04 ms vs 126 µs | 5.03 ms vs 231 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumD, bk) | 1.12 ms vs 1.24 ms | 5.99 ms vs 213 µs
(10000,   [1000, 1000, 2, 2], [2, 3], False, sumD) | 133 µs vs 366 µs | 463 µs vs 2.03 ms
(10000,   [1000, 1000, 2, 2], [2, 3], False, sumD, bk) | 1.56 ms vs 1.22 ms | 6.11 ms vs 229 µs
(10000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD) | 1.53 ms vs 799 µs | 824 µs vs 134 µs
(10000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) | 5.15 ms vs 1.09 ms | 7.02 ms vs 205 µs

- after improving CPU and CUDA forward kernels
  - in `(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD)` forward, CPU takes ~~`171 µs`~~, in which `130 µs` is spent on `coalesce()`, for CUDA, total time is ~~`331 µs`~~, in which `141 µs` is spent on `coalesce()`, we need to reduce time at other places outside `coalesce()`.
  - after a few simple tweaks, now in the forward, it is at most `10x` slower in CPU, and `7x` in CUDA. And time takes in `sum dense dims only [2, 3]` is `~2x` of `sum all dims`. Speed of `sum all sparse dims [0, 1]` is on bar with `sum all dims`

(nnz,   sizes, sum_dims, keepdim, sum all or dims, bk=backward) | CPU (sparse vs dense) | CUDA(sparse vs dense)
-- | -- | --
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumAll) | 7 µs vs 69.5 µs | 31.5 µs vs 61.6 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumD) | 11.3 µs vs 4.72 ms | 35.2 µs vs 285 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) | 197 µs vs 124 µs | 857 µs vs 134 µs
(1000,   [1000, 1000, 2, 2], [0, 1], False, sumD, bk) | 124 µs vs 833 µs | 796 µs vs 106 µs
(1000,   [1000, 1000, 2, 2], [2, 3], False, sumD) | 20.5 µs vs 213 µs | 39.4 µs vs 1.24 ms
(1000,   [1000, 1000, 2, 2], [2, 3], False, sumD, bk) | 131 µs vs 830 µs | 881 µs vs 132 µs
(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD) | 95.8 µs vs 409 µs | 246 µs vs 87.2 µs
(1000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) | 624 µs vs 820 µs | 953 µs vs 124 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumAll) | 45.3 µs vs 72.9 µs | 33.9 µs vs 57.2 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumD) | 81.4 µs vs 4.49 ms | 39.7 µs vs 280 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) | 984 µs vs 111 µs | 6.41 ms vs 121 µs
(10000,   [1000, 1000, 2, 2], [0, 1], False, sumD, bk) | 1.45 ms vs 828 µs | 6.77 ms vs 113 µs
(10000,   [1000, 1000, 2, 2], [2, 3], False, sumD) | 74.9 µs vs 209 µs | 37.7 µs vs 1.23 ms
(10000,   [1000, 1000, 2, 2], [2, 3], False, sumD, bk) | 1.48 ms vs 845 µs | 6.96 ms vs 132 µs
(10000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD) | 1.14 ms vs 411 µs | 252 µs vs 87.8 µs
(10000,   [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) | 4.53 ms vs 851 µs | 7.12 ms vs 128 µs

- time takes in CUDA backward of sparse is super long with large variance (in case of nnz=10000, it normally takes 6-7ms). To improve backward of sparse ops, we will need to debug at places other than CUDA kernels. here is a benchmark of `torch.copy_()`:
```
>>> d = [1000, 1000, 2, 2]
>>> nnz = 10000
>>> I = torch.cat([torch.randint(0, d[0], size=(nnz,)),
               torch.randint(0, d[1], size=(nnz,))], 0).reshape(2, nnz)
>>> V = torch.randn(nnz, d[2], d[3])
>>> size = torch.Size(d)
>>> S = torch.sparse_coo_tensor(I, V, size).coalesce().cuda()
>>> S2 = torch.sparse_coo_tensor(I, V, size).coalesce().cuda().requires_grad_()
>>> data = S2.clone()
>>> S.copy_(S2)
>>> y = S * 2
>>> torch.cuda.synchronize()
>>> %timeit y.backward(data, retain_graph=True); torch.cuda.synchronize()
7.07 ms ± 3.06 ms per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12430

Differential Revision: D12878313

Pulled By: weiyangfb

fbshipit-source-id: e16dc7681ba41fdabf4838cf05e491ca9108c6fe

commit | commitdiff | tree

Jiyan Yang [Wed, 28 Nov 2018 10:13:21 +0000 (02:13 -0800)]

Ensure FP16 rowwise Adagrad can be run

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12317

Reviewed By: hyuen

Differential Revision: D10190778

fbshipit-source-id: 720a9aaa4e6b1736023d8c6326a613e4ea592b31

commit | commitdiff | tree

Jongsoo Park [Wed, 28 Nov 2018 09:11:19 +0000 (01:11 -0800)]

use fbgemm's im2col fusion and thread partitioning (#14350)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14350

acc32 for now. Will have a separate diff for acc16 but that will need another out processing that does sparse convolution without im2col.

Reviewed By: dskhudia

Differential Revision: D13188595

fbshipit-source-id: e8faee46c7ea43e4a600aecb8b8e93e6c860a8c8

commit | commitdiff | tree

Teng Li [Wed, 28 Nov 2018 08:31:34 +0000 (00:31 -0800)]

PT1 Stable Release Distributed Documentation (#14444)

Summary:
The doc covers pretty much all we have had on distributed for PT1 stable release, tracked in https://github.com/pytorch/pytorch/issues/14080

Tested by previewing the sphinx generated webpages. All look good.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14444

Differential Revision: D13227675

Pulled By: teng-li

fbshipit-source-id: 752f00df096af38dd36e4a337ea2120ffea79f86

Domain: Machine Learning / ML Framework;

RSS Atom