platform/upstream/pytorch.git
5 years agoRelax lower bound for nogil timing test to avoid false alarm (#16259)
Shen Li [Fri, 25 Jan 2019 01:11:26 +0000 (17:11 -0800)]
Relax lower bound for nogil timing test to avoid false alarm (#16259)

Summary:
fixes #16250, #16271
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16259

Differential Revision: D13784505

Pulled By: mrshenli

fbshipit-source-id: 0b7ad98cd3c018b9907d70158de3abc3c4cb57ef

5 years agoCode-style fixes. (#16342)
Mikhail Zolotukhin [Fri, 25 Jan 2019 00:34:46 +0000 (16:34 -0800)]
Code-style fixes. (#16342)

Summary:
Some cleanups in ir.{h,cpp}. I plan to continue cleaning it up, so this is a first step.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16342

Differential Revision: D13808897

Pulled By: ZolotukhinM

fbshipit-source-id: 2dedb414576c3efbf8e36434145d7f14a66b1ee7

5 years agodisable testing group conv with EIGEN engine (#16335)
Jongsoo Park [Fri, 25 Jan 2019 00:32:24 +0000 (16:32 -0800)]
disable testing group conv with EIGEN engine (#16335)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16335

group conv is not implemented with EIGEN engine so this diff disables related tests

Reviewed By: jamesr66a

Differential Revision: D13807204

fbshipit-source-id: 41f6de43da40882f57e64474520e185733caefb7

5 years agoRemove unneeded manual unwrap optionals (#16245)
Elias Ellison [Thu, 24 Jan 2019 23:41:50 +0000 (15:41 -0800)]
Remove unneeded manual unwrap optionals (#16245)

Summary:
Remove calls to torch.jit._unwrap_optional that are no longer needed.

The remaining instances would require control flow logic for exceptions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16245

Differential Revision: D13804292

Pulled By: eellison

fbshipit-source-id: 08c5cbe4b956519be2333de5cf4e202488aff626

5 years agofix buildindexop (#16341)
Yan Shang [Thu, 24 Jan 2019 23:22:01 +0000 (15:22 -0800)]
fix buildindexop (#16341)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16341

as in the title

Reviewed By: intermilan

Differential Revision: D13808679

fbshipit-source-id: 0d12d3253f380bec66bc9be899be565861b8163a

5 years agoRevert D13747581: Optimize SpatialBN on GPU
Hoa Dinh [Thu, 24 Jan 2019 23:20:09 +0000 (15:20 -0800)]
Revert D13747581: Optimize SpatialBN on GPU

Differential Revision:
D13747581

Original commit changeset: 48a885a240ef

fbshipit-source-id: 58cec6023843d7459865eb80c9db8dac463cb96c

5 years agoAdd Test for ReinitializeTensor (#16338)
Jerry Zhang [Thu, 24 Jan 2019 23:01:47 +0000 (15:01 -0800)]
Add Test for ReinitializeTensor (#16338)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16338

att

Reviewed By: ezyang

Differential Revision: D13806760

fbshipit-source-id: 322b9b7d314aeb0194f52b803ca35c0cb8efcdec

5 years agoAdd thread-local guard: at::AutoNonVariableTypeMode (#15939)
Will Feng [Thu, 24 Jan 2019 22:29:06 +0000 (14:29 -0800)]
Add thread-local guard: at::AutoNonVariableTypeMode (#15939)

Summary:
This PR adds thread-local guard (`at::AutoNonVariableTypeMode`) to make sure that in VariableType.cpp the operations on baseType still dispatch to non-Variable type, even if the parameters will become Variables after the Tensor/Variable merge. We achieve this by making `legacyTensorType()` and `getType()` check the `at::AutoNonVariableTypeMode` guard to decide whether to return non-Variable type for a variable.

This is part of the VariableImpl/TensorImpl merge work: https://github.com/pytorch/pytorch/issues/13638.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15939

Reviewed By: ezyang

Differential Revision: D13640980

Pulled By: yf225

fbshipit-source-id: d12c2543822958558d7d70d36c50999a5eb8783f

5 years agoreduce parameter space of test_1x1_conv to avoid timeout (#16223)
Jongsoo Park [Thu, 24 Jan 2019 22:09:11 +0000 (14:09 -0800)]
reduce parameter space of test_1x1_conv to avoid timeout (#16223)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16223

As title says

Reviewed By: jamesr66a

Differential Revision: D13758202

fbshipit-source-id: 3cdffb80a5dad53b29e65e8eb0ae128edba70dbb

5 years agoUpdate docs to include variable annotation example (#16324)
Sidney Zhang [Thu, 24 Jan 2019 21:07:35 +0000 (13:07 -0800)]
Update docs to include variable annotation example (#16324)

Summary:
Relates to this issue https://github.com/pytorch/pytorch/issues/16288
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16324

Reviewed By: ezyang

Differential Revision: D13805412

Pulled By: suo

fbshipit-source-id: 8b80f988262da2c717452a71142327bbc23d1b8f

5 years agoDelete duplicate copy of THCCachingAllocator. (#16226)
Edward Yang [Thu, 24 Jan 2019 20:00:34 +0000 (12:00 -0800)]
Delete duplicate copy of THCCachingAllocator. (#16226)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16226

Now that the caching allocator is moved to c10_cuda, we can
delete the duplicate copy from Caffe2.

Reviewed By: dzhulgakov, smessmer

Differential Revision: D13762540

fbshipit-source-id: 03f1ebf7f11c68c19aa0d66110156fe228da6138

5 years agoMove THCCachingAllocator to c10_cuda. (#16119)
Edward Yang [Thu, 24 Jan 2019 20:00:34 +0000 (12:00 -0800)]
Move THCCachingAllocator to c10_cuda. (#16119)

Summary:
Some renaming and renamespacing also took place. I was originally planning not to do anything, but it turns out that it was easier to make HIPify work by using a namespace CUDACachingAllocator:: rather than THCCachingAllocator_, since :: is a word boundary but _ is not.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/16119

Reviewed By: smessmer

Differential Revision: D13718768

fbshipit-source-id: 884a481d99027fd3e34471c020f826aa12225656

5 years agoRemove unnecessary includes and headers from THCCachingAllocator, move to at::cuda...
Edward Yang [Thu, 24 Jan 2019 20:00:34 +0000 (12:00 -0800)]
Remove unnecessary includes and headers from THCCachingAllocator, move to at::cuda:: namespace (#16117)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16117

This means I can move it to c10_cuda with minimal fuss.

Reviewed By: smessmer

Differential Revision: D13717836

fbshipit-source-id: a94c7dc649af64542480fc1c226b289588886c00

5 years agoDirectly include headers from ATen.
Mikhail Zolotukhin [Thu, 24 Jan 2019 19:05:07 +0000 (11:05 -0800)]
Directly include headers from ATen.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16287

Differential Revision: D13792949

Pulled By: ZolotukhinM

fbshipit-source-id: d627d8dc469df048063c70d0b5b8d33fede809a3

5 years agoRefactor the docs build workflow (#16265)
Richard Zou [Thu, 24 Jan 2019 19:01:57 +0000 (11:01 -0800)]
Refactor the docs build workflow (#16265)

Summary:
In preparation for setting up a doc build job for stable docs, I wanted
to refactor the workflow so that future changes will be easier.

This PR the following changes:
- Refactor the doc push script into a reusable command
- Add command line options for the doc push script.
  These don't matter too much for now but will be useful
  for setting up future jobs for building different versions of the
  docs.
- Instead of checking out pytorch/pytorch:master, we re-use the pytorch
  installation inside the docker image.
- Change the sed in the script to a perl command. sed is annoyingly
  different across platforms; the perl command is more stable
- Run the script in dry run mode (without pushing the doc build)
  whenever a PR is opened. This lets us test changes to the doc build workflow.

Test Plan
- I tested the doc build script locally with my own credentials and it
  worked fine.
- Wait for the pytorch_doc_push CI.
- After merging this PR, keep an eye on the pytorch_doc_push CI status.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16265

Differential Revision: D13803511

Pulled By: zou3519

fbshipit-source-id: 4564bca3e74d490f89a1d1da9fb8b98eb44bdbb1

5 years agoSave a little bit of work in constant pooling by not moving nodes that will get deleted.
Owen Anderson [Thu, 24 Jan 2019 18:40:38 +0000 (10:40 -0800)]
Save a little bit of work in constant pooling by not moving nodes that will get deleted.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16161

Differential Revision: D13791247

Pulled By: resistor

fbshipit-source-id: 2a5a4f98309509b4ba875373ee57e6f63c75a4fd

5 years agoHandle non-contiguous inputs with mkldnn convolution. (#16300)
Gregory Chanan [Thu, 24 Jan 2019 15:36:54 +0000 (07:36 -0800)]
Handle non-contiguous inputs with mkldnn convolution. (#16300)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/16018.

Backwards appears to be fine because the derivative is written in terms of mkldnn_convolution itself.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16300

Differential Revision: D13797776

Pulled By: gchanan

fbshipit-source-id: 68a990b8a3c186412a99d176931314806c9ed7bf

5 years agoOptimize SpatialBN on GPU (#16202)
Xiaomeng Yang [Thu, 24 Jan 2019 10:50:35 +0000 (02:50 -0800)]
Optimize SpatialBN on GPU (#16202)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16202

Optimize SpatialBN on GPU

Reviewed By: houseroad

Differential Revision: D13747581

fbshipit-source-id: 48a885a240ef2a325235e8f89ebbe50e7c780c84

5 years agooptimize group_norm (#16216)
Xiaomeng Yang [Thu, 24 Jan 2019 07:55:06 +0000 (23:55 -0800)]
optimize group_norm (#16216)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16216

Optimize GroupNormOp

Reviewed By: houseroad

Differential Revision: D13754145

fbshipit-source-id: 650f64c81486c6c9d276f2e3325392d5838751ba

5 years agoFix the tensor deserialization problem of jit script module on CUDA (#16279)
Lu Fang [Thu, 24 Jan 2019 05:32:57 +0000 (21:32 -0800)]
Fix the tensor deserialization problem of jit script module on CUDA (#16279)

Summary:
Now we create a temporary tensor for the whole record.

Fix https://github.com/pytorch/pytorch/issues/15271
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16279

Reviewed By: BIT-silence

Differential Revision: D13791442

Pulled By: houseroad

fbshipit-source-id: 6f52ca09627fb684f74121357cc42e4adadec36a

5 years agoSmall fixes for pdist (#16210)
Erik Brinkman [Thu, 24 Jan 2019 03:37:29 +0000 (19:37 -0800)]
Small fixes for pdist (#16210)

Summary:
pdist was recently patched to remove buggy batch support and fix issues
with large tensors. This fixed missed a few spots, and didn't handle a
few recommendations that this commit addresses.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16210

Differential Revision: D13791914

Pulled By: gchanan

fbshipit-source-id: 0595841be1b298f7268fd4c02a6628acfec918f2

5 years agoFix comparison in ReinitializeTensor (#16294)
Jerry Zhang [Thu, 24 Jan 2019 03:24:38 +0000 (19:24 -0800)]
Fix comparison in ReinitializeTensor (#16294)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16294

In `ReinitializeTensor`, we compare `tensor->GetDevice()` and `options.device()`, but in the callsite, we actually just provide an option with `device_type`, which means the `device_id` will always be default(-1) for `options`, but for tensor, although it is passed a `device` with default `device_id`, when we allocate the data, the `device` of the `tensor` is the `device` of `Storage`, which is the `device` of underlying `DataPtr`, which is the same as the `device` of the `Context` of the operator, which has a non-default `device_id`.

Therefore everytime we do `ReinitializeTensor`, we'll find the `device` does not match, and after the `ReinitializeTensor` call, the `device` still does not match. That's why everytime we'll allocate a new Tensor and cause perf regressions for ops that uses `ReinitializeTensor` on multiple GPUs.

Reviewed By: BIT-silence

Differential Revision: D13795635

fbshipit-source-id: 24d6afa1a0196a32eb0134ee08b4280244cdb0c3

5 years agoFix issues under caffe round 1
Benny Chen [Thu, 24 Jan 2019 03:02:14 +0000 (19:02 -0800)]
Fix issues under caffe round 1

Summary: Some automation to fix uninitialized members for caffe2 code. Ran canary to make sure I don't have any regression in prod, but not sure how to test comprehensively for caffe2

Reviewed By: ezyang

Differential Revision: D13776185

fbshipit-source-id: fb2a479971cc0276d8784be1c44f01252410bd24

5 years agoAdd support for overloaded functions (#15556)
David Riazati [Thu, 24 Jan 2019 02:11:04 +0000 (18:11 -0800)]
Add support for overloaded functions (#15556)

Summary:
This PR adds support for overloaded functions as a step toward adding rnn modules to the JIT standard library.

Possible overloads must be manually specified, and when resolving the overload it chooses by the first one that passes the schema matching logic. The structure is very similar to boolean dispatch in #14425. The overload will only work on weak modules.

In order to avoid supporting overloaded methods in Python to match the JIT execution, the current setup offloads that work to the user. In the test added in `test_jit.py`, two methods are used to overload the `forward` method. In order to call `forward` outside the JIT, a Python-only `forward` that does the right argument type switching must also be provided.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15556

Differential Revision: D13576348

Pulled By: driazati

fbshipit-source-id: 7d3bdd4ee5a6088cc20c92f26a696d1ee5b9204b

5 years agoConstant propagation changes (#16244)
Elias Ellison [Thu, 24 Jan 2019 01:47:29 +0000 (17:47 -0800)]
Constant propagation changes (#16244)

Summary:
- remove loop node that is guaranteed not to execute
- remove extra loop outputs that are no longer needed

- if we are inlining an if node, only run constant propagation on the block that will execute

- remove the recurse argument since we only expose the Graph Constant Propagation and it's not used

This also includes  a few extra hooks to python_ir that I think make it a little be easier to test graph conditions from python.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16244

Differential Revision: D13791635

Pulled By: eellison

fbshipit-source-id: d16351fffcfc8013b02015db200f8fde002e0577

5 years agoraise exception if try jit.load non-existent file (#16270)
nlml [Thu, 24 Jan 2019 00:02:55 +0000 (16:02 -0800)]
raise exception if try jit.load non-existent file (#16270)

Summary:
addresses https://github.com/pytorch/pytorch/issues/16267
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16270

Differential Revision: D13791773

Pulled By: suo

fbshipit-source-id: 256304a02dbf724a7c0baade48c94b3ee77f53cf

5 years agoFixing upload of nightly binaries and clean MacOS output (#16016)
Jesse Hellemn [Wed, 23 Jan 2019 23:35:00 +0000 (15:35 -0800)]
Fixing upload of nightly binaries and clean MacOS output (#16016)

Summary:
- Fix environment variable used to guard binary uploads
- Move common MacOS brew setup-code into a common function to decrease code duplication and also to move that noisy console output into its own CircleCI step
- Split Mac builds into separate build-test and upload jobs. Add one of these jobs to PR runs; add upload jobs to nightly binarybuilds workflow
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16016

Differential Revision: D13791084

Pulled By: pjh5

fbshipit-source-id: 8eeb8e1963d46eab84f0f6dad9f0265163d5bf73

5 years agoCUDA event should only be recorded after NCCL group (#8219)
Teng Li [Wed, 23 Jan 2019 22:04:47 +0000 (14:04 -0800)]
CUDA event should only be recorded after NCCL group (#8219)

Summary:
Otherwise, it won't work if we sync on this event.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8219

Reviewed By: pietern

Differential Revision: D13788657

Pulled By: teng-li

fbshipit-source-id: 8c96e9691ed2441d7a685fb7ae8fece906f58daf

5 years agoChange data() accessor in Caffe2 to return non-const pointer. (#16176)
Edward Yang [Wed, 23 Jan 2019 21:51:05 +0000 (13:51 -0800)]
Change data() accessor in Caffe2 to return non-const pointer. (#16176)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16176

This makes PyTorch and Caffe2's data() method line up.
Historically, PyTorch made no distinction between tensors
with const or non-const data, and thus provided a
non-const pointer with data() member.  Changing the API to
return a const-pointer would break all mutable code, whereas
changing the Caffe2 API to change a pointer doesn't break
any code, *except* for code which required an exact match
on const-ness (e.g., in template arguments).  Since the latter
is less disruptive, we've opted for it here.

The few places downstream that broke due to this are fixed
in this patch.

Reviewed By: smessmer

Differential Revision: D13742916

fbshipit-source-id: baa4b4544cfdf7c1f369f4d69a1e0d5953c1bd99

5 years agoUpdating submodules
svcscm [Wed, 23 Jan 2019 21:05:30 +0000 (13:05 -0800)]
Updating submodules

Reviewed By: cdelahousse

fbshipit-source-id: 99d58034f9369846f8c82a5ea11c71e202e52a4e

5 years agoAlign native_functions.yaml func schema more with JIT signature schema (#16111)
Christian Puhrsch [Wed, 23 Jan 2019 20:30:47 +0000 (12:30 -0800)]
Align native_functions.yaml func schema more with JIT signature schema (#16111)

Summary:
This PR applies a few minor modifications leading to 100s of additional matches

Modifications to native_functions.yaml
1) double to float
2) int64_t to int
3) IntList[\d*] to int[\d*]
4) {} to []
5) Tensor? x=[] to Tensor? x=None
6) TensorList to Tensor[]
7) 1e-x to 1e-0x
8) Generator* x = nullptr to Generator? x = None
9) `{.*}` to `[.*]`

Overall this adds about 300 new matches and brings us to about 1/2 compliance of native_functions func with their JIT signature equivalent

While this is still a draft "tools/jit/gen_jit_dispatch.py" contains code to aid in finding close signatures
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16111

Reviewed By: ezyang

Differential Revision: D13738123

Pulled By: cpuhrsch

fbshipit-source-id: d1ec1e089bdb26ec155f6f31ccf768270acb76c7

5 years agoFixes selection of cuDNN algorithm (#15881)
Syed Tousif Ahmed [Wed, 23 Jan 2019 20:29:32 +0000 (12:29 -0800)]
Fixes selection of cuDNN algorithm (#15881)

Summary:
This PR updates the logic for using cudnnGet* and cudnnFind*. Current version of cudnn find and get (v7) returns a pair of best algorithm and the convDesc mathType. While we were using the returned algorithm, we didn't update the mathType. As a result, we ended up with a slow choice of algorithm and math type. Without this patch, we are seeing a 10x regression in group convolutions.

Changelist:
- Changed the template arguments to be `perf_t` instead of `algo_t` to unify cudnnFind and cudnnGet. Both cudnnFind and cudnnGet have the same purpose and hence, it made sense to unify them and get rid of `getAlgorithm`.
- Used cudnnGet*_v7 everywhere cudnnGet* was being used.
- Removed all cudnn6 paths (This PR depends on https://github.com/pytorch/pytorch/pull/15851)

Differential Revision: D13787601

Pulled By: ezyang

fbshipit-source-id: 81fe86727673d021306fe1c99c3e528b7c9ad17f

5 years agoDisable flaky test
Edward Yang [Wed, 23 Jan 2019 19:53:55 +0000 (11:53 -0800)]
Disable flaky test

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16274

Reviewed By: pietern

Differential Revision: D13788036

fbshipit-source-id: a9b7353fb0655908e6d47387cc77af33e9471aed

5 years agoUpdate third_party protobuf to v3.6.1
Junjie Bai [Wed, 23 Jan 2019 17:31:14 +0000 (09:31 -0800)]
Update third_party protobuf to v3.6.1

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16251

Reviewed By: ezyang

Differential Revision: D13781444

Pulled By: bddppq

fbshipit-source-id: b713a021033d214f30a49ee02b95edf8633bcc50

5 years agofix sigma in the middle of when word (#16227)
Armaan Sethi [Wed, 23 Jan 2019 16:32:29 +0000 (08:32 -0800)]
fix sigma in the middle of when word (#16227)

Summary:
there is a random sigma in the when word on :
https://pytorch.org/cppdocs/contributing.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16227

Differential Revision: D13762753

Pulled By: goldsborough

fbshipit-source-id: 3d4bf4be859a3069402fe8c3fbc8ebee4f25cc5a

5 years agoTypos and broken RSTs fixed in torch.distribution (#16136)
Derek Kim [Wed, 23 Jan 2019 10:57:56 +0000 (02:57 -0800)]
Typos and broken RSTs fixed in torch.distribution (#16136)

Summary:
- probabilty -> probability
- make long lines break
- Add LogitRelaxedBernoulli in distribution.rst
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16136

Differential Revision: D13780406

Pulled By: soumith

fbshipit-source-id: 54beb975eb18c7d67779a9631dacf7d1461a6b32

5 years agotune elementwise for AMD uarch (#16217)
Johannes M Dieterich [Wed, 23 Jan 2019 02:21:07 +0000 (18:21 -0800)]
tune elementwise for AMD uarch (#16217)

Summary:
Tune elementwise kernel for AMD architectures by increasing the work group sizes and launch bounds. This change improves training throughput for torchvision models by up to 11% in our tests while exhibiting no significant performance regression.

No functional/performance change for CUDA - just shifting numbers into constrexpr.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16217

Differential Revision: D13776684

Pulled By: bddppq

fbshipit-source-id: edbaebe904598b2de66a9e9a68a1aa219ebc01e9

5 years agofix typo in resnet50_trainer.py
rohithkrn [Wed, 23 Jan 2019 01:19:15 +0000 (17:19 -0800)]
fix typo in resnet50_trainer.py

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16219

Differential Revision: D13776742

Pulled By: bddppq

fbshipit-source-id: 10a6ab4c58159b3f619b739074f773662722c1d9

5 years agoAutomatic update of fbcode/onnx to dc75285d4a1cff9618400164dfdb26c5a1bab70a
Lu Fang [Tue, 22 Jan 2019 22:55:50 +0000 (14:55 -0800)]
update of fbcode/onnx to dc75285d4a1cff9618400164dfdb26c5a1bab70a

Summary:
Previous import was c553fb32a0902ce5dd42e1b40123e9e9b38bdbe7

Included changes:
- **[dc75285](https://github.com/onnx/onnx/commit/dc75285)**: Relax constraint that the initializers must be a subset of graph inputs (#1718) <G. Ramalingam>
- **[985c8cd](https://github.com/onnx/onnx/commit/985c8cd)**: Fix typo in scan shape inferencing (#1753) <Scott McKay>
- **[ab52a5d](https://github.com/onnx/onnx/commit/ab52a5d)**: remove stale test cases <Lu Fang>
- **[56434bb](https://github.com/onnx/onnx/commit/56434bb)**: Removing experimental ConstantFill op. <Spandan Tiwari>
- **[881c63c](https://github.com/onnx/onnx/commit/881c63c)**: Show string names of data types instead of int IDs (#1749) <Shinichiro Hamaji>
- **[0a12fe4](https://github.com/onnx/onnx/commit/0a12fe4)**: Update ConstantOfShape op. (#1744) <Bowen Bao>
- **[ef028e5](https://github.com/onnx/onnx/commit/ef028e5)**: Update definition of Cast Op to support casting to/from string (#1704) <Raymond Yang>

Reviewed By: BIT-silence

Differential Revision: D13773962

fbshipit-source-id: b98079277994a699d4807210ba1d9c27f4672090

5 years agoAdd default_stream() and enhance current_stream() (#16200)
Shen Li [Tue, 22 Jan 2019 22:28:18 +0000 (14:28 -0800)]
Add default_stream() and enhance current_stream() (#16200)

Summary:
Closes #16156
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16200

Differential Revision: D13747455

Pulled By: mrshenli

fbshipit-source-id: 00c0d5f341c3ac7a757bdb4631a17e11fbc6d3ec

5 years agocomplex_registration_extension.cpp includes to angled brackets
Edward Yang [Tue, 22 Jan 2019 22:17:20 +0000 (14:17 -0800)]
complex_registration_extension.cpp includes to angled brackets

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16122

Reviewed By: smessmer

Differential Revision: D13717900

fbshipit-source-id: 8401f39d993482d3e08d2d79bc1841deafee2a5b

5 years agoRemove ATen/Allocator.h forwarding header.
Edward Yang [Tue, 22 Jan 2019 22:17:20 +0000 (14:17 -0800)]
Remove ATen/Allocator.h forwarding header.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16121

Reviewed By: smessmer

Differential Revision: D13717899

fbshipit-source-id: 83488f2aa801ca75059949ec85171ec03e64c4ff

5 years agoRemove dead curVal store.
Edward Yang [Tue, 22 Jan 2019 21:35:03 +0000 (13:35 -0800)]
Remove dead curVal store.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16116

Reviewed By: smessmer

Differential Revision: D13717719

fbshipit-source-id: 2ecee3f08f64e64ec5ac3c92fb326bc3df37e40e

5 years agoMake kernel registration constexpr again (#16166)
Sebastian Messmer [Tue, 22 Jan 2019 21:21:38 +0000 (13:21 -0800)]
Make kernel registration constexpr again (#16166)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16166

Since we now don't use std::function anymore, we can make kernel registration constexpr again.

Reviewed By: ezyang

Differential Revision: D13738630

fbshipit-source-id: 918fa3a3c8c6f0ddbd0f08b3b143cdf066265387

5 years agoAvoid closure around kernel (#16165)
Sebastian Messmer [Tue, 22 Jan 2019 21:21:38 +0000 (13:21 -0800)]
Avoid closure around kernel (#16165)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16165

Store kernels as direct function pointers instead of std::function.
Using direct function pointers avoids a performance risk std::function would introduce.

Reviewed By: ezyang

Differential Revision: D13738627

fbshipit-source-id: a348906c8a201436699681980a82ca95065a06a0

5 years agoPass IValues from JIT to c10 dispatcher (#16066)
Sebastian Messmer [Tue, 22 Jan 2019 21:21:37 +0000 (13:21 -0800)]
Pass IValues from JIT to c10 dispatcher (#16066)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16066

Don't unwrap and re-wrap but directly pass through the IValues

Reviewed By: ezyang

Differential Revision: D13689037

fbshipit-source-id: 99b8155e640eb61a3c0597bf0f2b9c338712b45e

5 years agoRelease GIL when synchronize or wait (#16182)
Shen Li [Tue, 22 Jan 2019 21:14:24 +0000 (13:14 -0800)]
Release GIL when synchronize or wait (#16182)

Summary:
address the second future work item in #15937
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16182

Differential Revision: D13744972

Pulled By: mrshenli

fbshipit-source-id: e9812e3fd4a5623e99b639d9f334bfc2d1827d92

5 years agoRevert D13540278: [pytorch][PR] Unhide unique from C++, make unique partially scriptable
Wanchao Liang [Tue, 22 Jan 2019 20:11:23 +0000 (12:11 -0800)]
Revert D13540278: [pytorch][PR] Unhide unique from C++, make unique partially scriptable

Differential Revision:
D13540278

Original commit changeset: 3768c76a90b0

fbshipit-source-id: 7a31c239f9dca6ff467344d99820095addcae9d7

5 years agoReturn namedtuples from torch.* function with multiple return arguments for C++ opera...
Xiang Gao [Tue, 22 Jan 2019 19:09:18 +0000 (11:09 -0800)]
Return namedtuples from torch.* function with multiple return arguments for C++ operators (#15429)

Summary:
Partially fixes: https://github.com/pytorch/pytorch/issues/394

Implementation detail:

Codegen is modified to generate codes that looks like below:
```C++
static PyObject * THPVariable_svd(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  static PythonArgParser parser({
    "svd(Tensor input, bool some=True, bool compute_uv=True, *, TensorList[3] out=None)",
  }, /*traceable=*/true);

  ParsedArgs<6> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);
  static PyStructSequence_Field fields0[] = {
    {"U", ""}, {"S", ""}, {"V", ""}, {nullptr}
  };
  static PyStructSequence_Desc desc0 = {
    "torch.return_types.svd_out", nullptr,
    fields0, 3
  };
  static PyTypeObject type0;
  static bool namedtuple_type_initialized0 = false;
  if (!namedtuple_type_initialized0) {
    PyStructSequence_InitType(&type0, &desc0);
    namedtuple_type_initialized0 = true;
  }
  static PyStructSequence_Field fields1[] = {
    {"U", ""}, {"S", ""}, {"V", ""}, {nullptr}
  };
  static PyStructSequence_Desc desc1 = {
    "torch.return_types.svd", nullptr,
    fields1, 3
  };
  static PyTypeObject type1;
  static bool namedtuple_type_initialized1 = false;
  if (!namedtuple_type_initialized1) {
    PyStructSequence_InitType(&type1, &desc1);
    namedtuple_type_initialized1 = true;
  }
  if (r.idx == 0) {
    if (r.isNone(3)) {
      return wrap(&type1, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2)));
    } else {
      auto results = r.tensorlist_n<3>(3);
      return wrap(&type0, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2), results[0], results[1], results[2]));
    }
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}
```
Types are defined as static member of `THPVariable_${op_name}` functions, and initialized at the first time the function is called.

When parsing function prototypes in `native_functions.yaml`, the parser will set the specified name as `field_name` when see things like `-> (Tensor t1, ...)`. These field names will be the field names of namedtuple. The class of namedtuples will be named `torch.return_types.${op_name}`.

In some python 2, `PyStructSequence` is not a subtype of tuple, so we have to create some functions to check if an object is a tuple or namedtuple for compatibility issue.

Operators in `native_functions.yaml` are changed such that only `max` and `svd` are generated as namedtuple. Tests are added for these two operators to see if the return value works as expected. Docs for these two ops are also updated to explicitly mention the return value is a namedtuple. More ops will be added in later PRs.

There is some issue with Windows build of linker unable to resolve `PyStructSequence_UnnamedField`, and some workaround is added to deal with this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15429

Differential Revision: D13709678

Pulled By: ezyang

fbshipit-source-id: 23a511c9436977098afc49374e9a748b6e30bccf

5 years agoFix formating in caffe2/quantization/server/README.md
Jongsoo Park [Tue, 22 Jan 2019 18:08:33 +0000 (10:08 -0800)]
Fix formating in caffe2/quantization/server/README.md

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14237

Reviewed By: dskhudia

Differential Revision: D13751791

Pulled By: jspark1105

fbshipit-source-id: 54f73d5134e596817802c66d43098d18458c2799

5 years agohip-clang enablement (#16085)
Yaxun (Sam) Liu [Tue, 22 Jan 2019 17:07:18 +0000 (09:07 -0800)]
hip-clang enablement (#16085)

Summary:
Initial enabling of the upcoming hip-clang compiler for the PyTorch source base.

Changes:
* update the Eigen submodule to a version including our upstreamed hip-clang enabling there
* modify a few ifdef guards with the `__HIP__` macro used by hip-clang
* use `__lane_id` instead of `hc::__lane_id`
* add Debug flags for ROCm to the cmake infrastructure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16085

Differential Revision: D13709459

Pulled By: ezyang

fbshipit-source-id: 1b7b33fe810a0434766180580d4443ea177eb7c7

5 years agoRaise CalledProcessError when torch.distributed launch process not return 0 (#16069)
Andy Wei [Tue, 22 Jan 2019 16:46:06 +0000 (08:46 -0800)]
Raise CalledProcessError when torch.distributed launch process not return 0 (#16069)

Summary:
`torch.distributed.launch.py` will not raise error when `subprocess.Popen` is not return 0.
For better debugging it should always raise an error if processes launched have unusual behavior
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16069

Differential Revision: D13709467

Pulled By: ezyang

fbshipit-source-id: 31d32a5ec8fed7bccd62d845bfba0e670ed3fe20

5 years agoReserve vectors that we know the size in advance for. (#16201)
Shahzad Lone [Tue, 22 Jan 2019 16:00:00 +0000 (08:00 -0800)]
Reserve vectors that we know the size in advance for. (#16201)

Summary:
Save reallocation costs, by reserving vectors according to how many elements we expect to put in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16201

Differential Revision: D13762594

Pulled By: ezyang

fbshipit-source-id: 7e3bfe421489dde48a2ddb0920dd155f69baecc0

5 years agocpp doc fix (#16221)
Will Feng [Tue, 22 Jan 2019 05:53:43 +0000 (21:53 -0800)]
cpp doc fix (#16221)

Summary:
Fixed a few C++ API callsites to work with v1.0.1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16221

Differential Revision: D13759207

Pulled By: yf225

fbshipit-source-id: bd92c2b95a0c6ff3ba5d73cb249d0bc88cfdc340

5 years agoMove away from ConstantFill (#16214)
Lu Fang [Tue, 22 Jan 2019 04:13:07 +0000 (20:13 -0800)]
Move away from ConstantFill (#16214)

Summary:
Prerequisite of https://github.com/onnx/onnx/pull/1434
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16214

Reviewed By: BIT-silence

Differential Revision: D13755116

Pulled By: houseroad

fbshipit-source-id: a46be8d7df959b5ede93e1f9c911a9a9326e6879

5 years agoban conv_double_backward from sandcastle, it takes too long
Zachary DeVito [Tue, 22 Jan 2019 03:57:33 +0000 (19:57 -0800)]
ban conv_double_backward from sandcastle, it takes too long

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16220

Differential Revision: D13755108

Pulled By: zdevito

fbshipit-source-id: 46b1b128b155964c25249add0c84680491845e9b

5 years agoRemove dead code from setup.py, remove need for build target. (#16162)
Zachary DeVito [Tue, 22 Jan 2019 01:24:32 +0000 (17:24 -0800)]
Remove dead code from setup.py, remove need for build target. (#16162)

Summary:
Now it is only necessary to use 'develop' or 'install' to build. Incremental cmake is on by default. `develop --cmake` forces it to rerun.

The NinjaBuilder stuff is dead. It was used to make building _C.so
faster but now _C.so is just an empty stub file.

Removed a bunch of custom build commands from setup.py that are
no longer meaningful now that cmake handles most of the build.

Removed unused targets in build_pytorch_lib.sh/bat
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16162

Differential Revision: D13744155

Pulled By: zdevito

fbshipit-source-id: d836484782c65b7f8e8c7a82620886f7a7777892

5 years agoUnhide unique from C++, make unique partially scriptable (#15256)
Xiang Gao [Mon, 21 Jan 2019 20:28:56 +0000 (12:28 -0800)]
Unhide unique from C++, make unique partially scriptable (#15256)

Summary:
This PR does three things:

~~Allow `int64_t?` in function schema,  which provide an elegant way of implementing null-able int arguments, as discussed in https://github.com/pytorch/pytorch/pull/15208#pullrequestreview-185230081~~

~~Originally implemented in https://github.com/pytorch/pytorch/pull/15235~~

~~Example:~~

```yaml
- func: myop(Tensor self, int64_t? dim=None) -> Tensor
  variants: function
```

~~cc: zou3519~~

Edit: implemented in https://github.com/pytorch/pytorch/pull/15234

Previously tried in https://github.com/pytorch/pytorch/pull/12064. There was a problem that C++ does not have kwarg support, which makes it confusing to know whether `unique(t, 1)` actually means `unique(t, dim=1)` or `unique(t, sorted=1)`.

Now I think I have a better idea on how to implement this: there are two ATen operators: `unique` and `unique_dim`. `unique` has the same signature as in python, and exported to both python and C++. `unique_dim` has signature `unique_dim(tensor, dim, sorted=False, return_inverse=False)`, and only exported to C++, which could be used more naturally for a C++ user.

Differential Revision: D13540278

Pulled By: wanchaol

fbshipit-source-id: 3768c76a90b0881f565a1f890459ebccbdfe6ecd

5 years agoAutomatic update of fbcode/onnx to c553fb32a0902ce5dd42e1b40123e9e9b38bdbe7 (#16190)
Lu Fang [Mon, 21 Jan 2019 17:43:40 +0000 (09:43 -0800)]
update of fbcode/onnx to c553fb32a0902ce5dd42e1b40123e9e9b38bdbe7 (#16190)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16190

Previous import was fd60104394fa353e1762f44ecad1b2166e33deef

Included changes:
- **[c553fb3](https://github.com/onnx/onnx/commit/c553fb3)**: Handle negative axis in scan shape inference (#1748) <G. Ramalingam>
- **[51b6ecc](https://github.com/onnx/onnx/commit/51b6ecc)**: external_data: Store large tensor values in separate files (#678) <MichaÅ‚ KarzyÅ„ski>
- **[ba05f26](https://github.com/onnx/onnx/commit/ba05f26)**: Scan output axes (#1737) <G. Ramalingam>
- **[90920c0](https://github.com/onnx/onnx/commit/90920c0)**: Add NonZero op. (#1714) <Sergii Dymchenko>
- **[c4cf112](https://github.com/onnx/onnx/commit/c4cf112)**: fix the test cases for constantofshape (#1746) <Lu Fang>
- **[d902349](https://github.com/onnx/onnx/commit/d902349)**: Add sample implementation support (#1712) <Lu Fang>

Differential Revision: D13745693

fbshipit-source-id: 05e2cce9ae1dfa2865db83840df64673d55cea57

5 years agoSeparate Moments from math and optimize it (#16175)
Xiaomeng Yang [Sun, 20 Jan 2019 16:50:32 +0000 (08:50 -0800)]
Separate Moments from math and optimize it (#16175)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16175

Separate Moments from math and optimize it

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13742472

fbshipit-source-id: 90757d908d38c98ca69818855aaf68315e525992

5 years agoUnify device() return type in Stream, Event, and Tensor (#16150)
Shen Li [Sun, 20 Jan 2019 06:58:54 +0000 (22:58 -0800)]
Unify device() return type in Stream, Event, and Tensor (#16150)

Summary:
Addresses one future work item in #15937
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16150

Differential Revision: D13732299

Pulled By: mrshenli

fbshipit-source-id: 4d0b35df573a3bf92dea6e2e7eb42fe8bac77b18

5 years agoReplace use of ConstantLike with with ConstantOfShape (#16095)
Spandan Tiwari [Sun, 20 Jan 2019 03:50:20 +0000 (19:50 -0800)]
Replace use of ConstantLike with with ConstantOfShape (#16095)

Summary:
Submitting this PR as an update to existing PR (https://github.com/pytorch/pytorch/pull/15938) on houseroad 's request.

This PR replaces the use of ONNX op `ConstantLike` with `ConstantOfShape` in the ONNX exporter. In addition to removing the call sites in `symbolic.py`, it also replace the call site in `peephole.cpp`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16095

Differential Revision: D13745723

Pulled By: houseroad

fbshipit-source-id: e2a5f534f01adf199df9e27544f7afcfa540e1f0

5 years agoFix LBFGS issue (#16167)
Miro Furtado [Sat, 19 Jan 2019 22:58:36 +0000 (14:58 -0800)]
Fix LBFGS issue (#16167)

Summary:
Resolves #15923 where LBFGS threw "Error: a leaf Variable that requires grad has been used in an in-place operation."
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16167

Differential Revision: D13745822

Pulled By: soumith

fbshipit-source-id: 7d1d0511d06838c0c6f4c8a6b53cf15193283059

5 years agoAllow for concurrent quantization in FullyConnectedDNNLowPOp (#16174)
Kjell Schubert [Sat, 19 Jan 2019 13:57:37 +0000 (05:57 -0800)]
Allow for concurrent quantization in FullyConnectedDNNLowPOp (#16174)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16174

Our service creates a new caffe2 workspace for the same underlying network on multiple threads concurrently at service startup time (later these workspaces are being reused for sequential requests), resulting in concurrent quantization via FullyConnectedDNNLowPOp calling GetOrCreateFbgemmPackBMatrix(). The lazily performed quantizations during the first inference in each workspace are all funnelled through GetOrCreateFbgemmPackBMatrix()'s cache_mutex, which means quantization is serialized, so at service startup time only a single CPU core is being used for around a minute until the serial quantization is done.
An better solution would be to avoid the quantization of the same weight matrix of the operator copies in different net copies to begin with, but this here is the simpler solution for our current problem.

Reviewed By: jspark1105

Differential Revision: D13708785

fbshipit-source-id: 537519896b3b939c552d67f400bafc8a69ce11eb

5 years agoSupport ConstantOfShape in Caffe2 ONNX Backend (#16108)
Lu Fang [Sat, 19 Jan 2019 06:55:41 +0000 (22:55 -0800)]
Support ConstantOfShape in Caffe2 ONNX Backend (#16108)

Summary:
This PR is the prerequisite to land https://github.com/pytorch/pytorch/pull/16095
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16108

Reviewed By: BIT-silence

Differential Revision: D13725722

Pulled By: houseroad

fbshipit-source-id: 28c0fb72f075cd04f9db44dfab0163844c20c620

5 years agoSeparate affine_channel from math and optimize it (#16135)
Xiaomeng Yang [Sat, 19 Jan 2019 06:37:12 +0000 (22:37 -0800)]
Separate affine_channel from math and optimize it (#16135)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16135

Separate affine_channel from math and optimize it

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13727606

fbshipit-source-id: 8980af4afadaf964a18a9da581106fe30896a7e9

5 years agoPass IValue from c10 dispatcher to caffe2 operator (#16065)
Sebastian Messmer [Fri, 18 Jan 2019 23:55:57 +0000 (15:55 -0800)]
Pass IValue from c10 dispatcher to caffe2 operator (#16065)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16065

Before, we registered the caffe2 kernel with the c10 dispatcher using plain C types.
Now, we pass in IValues, which avoids the unwrapping inbetween.

Reviewed By: ezyang

Differential Revision: D13689036

fbshipit-source-id: b976a2c46a5a541f6a926b3df255e8a535e32420

5 years agoMake c10 dispatcher use boxed kernel function pointers (#16051)
Sebastian Messmer [Fri, 18 Jan 2019 23:55:57 +0000 (15:55 -0800)]
Make c10 dispatcher use boxed kernel function pointers (#16051)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16051
This changes the kernels stored in the c10 dispatcher from plain C function pointers to IValue-based KernelFunction*.

Note that KernelFunction is currently taking an `ArrayRef<IValue>` as arguments. A later diff will change that to it taking a `Stack*`.

Reviewed By: ezyang

Differential Revision: D13684518

fbshipit-source-id: 1fa54f60cec2e967b92a4a043d6e3ac1627ed991

5 years agoadd back NNPACK in PyTorch (#15924)
Thomas Viehmann [Fri, 18 Jan 2019 23:27:12 +0000 (15:27 -0800)]
add back NNPACK in PyTorch (#15924)

Summary:
This tests the water for adding back NNPACK in PyTorch, it's a lot better than the fallback THNN versions.

In #6151, we (ezyang and soumith) removed NNPACK support from PyTorch. Of course Maratyszcza might have advice, too. (Or an opinion on the CMake changes.)

The only functional changes are to use NNPack more aggressively on mobile and a .contiguous() to match NNPack's assumption (I stumbled over that while using NNPack for style transfer.)
The CMake changes try to use the NNPack we already have in git.

In terms of lines of code this is a large part of the diff of https://lernapparat.de/pytorch-jit-android/ . As far as I can tell, we don't have MKLDNN on mobile and the native THNN implementation are prohibitively expensive in terms of both CPU and memory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15924

Differential Revision: D13709576

Pulled By: ezyang

fbshipit-source-id: f2e287739909451c173abf046588209a7450ca2c

5 years agoimprove performance of unique with inverse indices (#16145)
Natalia Gimelshein [Fri, 18 Jan 2019 22:53:32 +0000 (14:53 -0800)]
improve performance of unique with inverse indices (#16145)

Summary:
Partial fix for #15804, only w/o dim.
For jcjohnson benchmarking script I'm getting the following results on V100:
Before:
```
unning with N = 10000, M = 10000
cuda (no inverse): 0.98 ms
cpu (no inverse): 0.96 ms
cuda (with inverse): 1.07 ms
cpu (with inverse): 1.76 ms

Running with N = 10000, M = 100000
cuda (no inverse): 0.76 ms
cpu (no inverse): 1.53 ms
cuda (with inverse): 1.23 ms
cpu (with inverse): 3.02 ms

Running with N = 100000, M = 100000
cuda (no inverse): 1.28 ms
cpu (no inverse): 11.22 ms
cuda (with inverse): 69.76 ms
cpu (with inverse): 20.28 ms

Running with N = 100000, M = 1000000
cuda (no inverse): 0.78 ms
cpu (no inverse): 18.78 ms
cuda (with inverse): 133.45 ms
cpu (with inverse): 34.09 ms

Running with N = 500000, M = 500000
cuda (no inverse): 1.43 ms
cpu (no inverse): 61.13 ms
cuda (with inverse): 3315.18 ms
cpu (with inverse): 104.57 ms

Running with N = 500000, M = 5000000
cuda (no inverse): 0.86 ms
cpu (no inverse): 96.44 ms
cuda (with inverse): 5209.93 ms
cpu (with inverse): 176.10 ms
```
After
```
Running with N = 10000, M = 10000
cuda (no inverse): 1.04 ms
cpu (no inverse): 0.94 ms
cuda (with inverse): 0.64 ms
cpu (with inverse): 1.76 ms

Running with N = 10000, M = 100000
cuda (no inverse): 0.77 ms
cpu (no inverse): 1.55 ms
cuda (with inverse): 0.58 ms
cpu (with inverse): 2.79 ms

Running with N = 100000, M = 100000
cuda (no inverse): 1.30 ms
cpu (no inverse): 14.15 ms
cuda (with inverse): 1.63 ms
cpu (with inverse): 20.90 ms

Running with N = 100000, M = 1000000
cuda (no inverse): 0.82 ms
cpu (no inverse): 18.63 ms
cuda (with inverse): 0.61 ms
cpu (with inverse): 33.52 ms

Running with N = 500000, M = 500000
cuda (no inverse): 1.51 ms
cpu (no inverse): 59.81 ms
cuda (with inverse): 1.23 ms
cpu (with inverse): 110.69 ms

Running with N = 500000, M = 5000000
cuda (no inverse): 0.92 ms
cpu (no inverse): 104.26 ms
cuda (with inverse): 0.84 ms
cpu (with inverse): 187.12 ms
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16145

Differential Revision: D13738821

Pulled By: soumith

fbshipit-source-id: 0811fb4ade47e3b466cebbc124e3f3333a986749

5 years agofix for clang-tidy (#16164)
Michael Suo [Fri, 18 Jan 2019 22:01:41 +0000 (14:01 -0800)]
fix for clang-tidy (#16164)

Summary:
It turns out that clang-tidy is bundled with travis's standard trusty distribution, so no need to install it manually.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16164

Differential Revision: D13738986

Pulled By: suo

fbshipit-source-id: d0cd76c615625b2ed7f18951289412989f15849d

5 years agoChange current device in stream context manager if necessary (#16128)
Shen Li [Fri, 18 Jan 2019 20:32:20 +0000 (12:32 -0800)]
Change current device in stream context manager if necessary (#16128)

Summary:
Fixes #16019
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16128

Differential Revision: D13721850

Pulled By: mrshenli

fbshipit-source-id: 422c6c0b97c1cd46e127e265b532cb8c74a3aac5

5 years agoFix SoftmaxOps (#16049)
Jerry Zhang [Fri, 18 Jan 2019 20:14:34 +0000 (12:14 -0800)]
Fix SoftmaxOps (#16049)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16049

We might see the pattern
```
if (scale_.numel() != N) {
   scale_->Resize(N);
   // set initial value for scale_
}

// In class:
Tensor scale_{CPU};
```
before in the code, where `scale_` is a member variable of Type `caffe2::Tensor`
This pattern actually serves two purposes, if `scale_` is partially initialized with device type but not size, this call will
initialize Tensor with the correct size, or if `scale_` is already initialized with size, it will check whether the size
matches a runtime value `N` and if not it will Resize. To rewrite this we'll do the following:
```
if (!scale_.defined() || scale_.numel() != N) {
  ReinitializeTensor(&scale_, {N}, at::dtype<float>().device(CPU));
  // set initial value for scale_
}

```
There are some variants, if `scale_` is resized to a constant size, we can call `ReinitializeTensor` instead
```
if (scale_.numel() != 1) {
  scale_->Resize(1);
}
```
-->
```
ReinitializeTensor(&scale_, {1}, at::dtype<float>().device(CPU));
```

Normal Resize will be refactored directly into ReinitializeTensor:
```
scale_->Resize(N);
```
-->
```
ReinitializeTensor(&scale_, {N}, at::dtype<float>().device(CPU));
```

Reviewed By: dzhulgakov

Differential Revision: D13667883

fbshipit-source-id: 2c7cb61544b72765b594011b99150eb5a1b50836

5 years agorest of uses for deprecation of dims() in Tensor (#16118)
Jerry Zhang [Fri, 18 Jan 2019 19:49:36 +0000 (11:49 -0800)]
rest of uses for deprecation of dims() in Tensor (#16118)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16118

att

Differential Revision: D13697211

fbshipit-source-id: 12bf6edd1794240ac748cc1b8fecb0c1e8eb9112

5 years agoRNN operators should inherit step_net device_options (#16086)
Nikita Shulga [Fri, 18 Jan 2019 19:33:40 +0000 (11:33 -0800)]
RNN operators should inherit step_net device_options (#16086)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16086

[caffe2] RNN operators should inherit step_net device_options
According to NetDef documentaiton, if network has a specific device option it applies to all network operators that do not explicitly specifiy it.
But this does not seem to be the case for RecurrentNetwork operators

Reviewed By: orionr

Differential Revision: D13699552

fbshipit-source-id: 14529bc9504e3b02f763e3c2429be21e46f82b68

5 years agoAdd implicit optional unwrapping (#15587)
Elias Ellison [Fri, 18 Jan 2019 19:17:34 +0000 (11:17 -0800)]
Add implicit optional unwrapping (#15587)

Summary:
Add support for type inference for optional type refinement.

If a conditional is of the form "x is None" or "x is not None", or is a boolean expression containing multiple none checks, the proper type refinements are inserted in each branch.

For example:
if optional_tensor is not None and len(optional_tensor) < 2:
# optional_tensor is a Tensor

if optional_tensor1 is not None and optional_tensor2 is not None:
# both optional_tensor1 and optional_tensor2 are Tensors

TODO:

- not run an op for unchecked unwrap optional in the interpreter

- potentially refine types to prim::None (omitted for now to simply things & because it's not an actual use cause).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15587

Differential Revision: D13733810

Pulled By: eellison

fbshipit-source-id: 57c32be9f5a09ab5542ba0144a6059b96de23d7a

5 years agoAdd defined() to caffe2::Tensor (#16125)
Jerry Zhang [Fri, 18 Jan 2019 18:59:53 +0000 (10:59 -0800)]
Add defined() to caffe2::Tensor (#16125)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16125

Add defined() method to check whether the Tensor is defined.

Reviewed By: ezyang

Differential Revision: D13719222

fbshipit-source-id: ff8efef2159ed1026bd16acaea40c768a1e20a47

5 years agoRemove ATen/Half.h and ATen/core/Half.h forwarding headers.
Edward Yang [Fri, 18 Jan 2019 18:52:18 +0000 (10:52 -0800)]
Remove ATen/Half.h and ATen/core/Half.h forwarding headers.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16115

Reviewed By: bddppq

Differential Revision: D13717049

fbshipit-source-id: fb1d690183a932a1fa1a2d235f3219520f51620a

5 years agoPort legacy any(*) to ATen
Shen Li [Fri, 18 Jan 2019 18:07:33 +0000 (10:07 -0800)]
Port legacy any(*) to ATen

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15547

Differential Revision: D13549495

Pulled By: mrshenli

fbshipit-source-id: 09a065a8ffa7d73f409759b779c7314cc87f4853

5 years agoImprove pack_sequence and pack_padded_sequence error message (#16084)
Richard Zou [Fri, 18 Jan 2019 15:56:17 +0000 (07:56 -0800)]
Improve pack_sequence and pack_padded_sequence error message (#16084)

Summary:
Mention that if enforce_sorted=True, the user can set
enforce_sorted=False. This is a new flag that is probably hard to
discover unless one throughly reads the docs.

Fixes #15567
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16084

Differential Revision: D13701118

Pulled By: zou3519

fbshipit-source-id: c9aeb47ae9769d28b0051bcedb8f2f51a5a5c260

5 years agoTCP init method race condition fix (#15684)
Teng Li [Fri, 18 Jan 2019 10:23:51 +0000 (02:23 -0800)]
TCP init method race condition fix (#15684)

Summary:
This PR fixes a race condition for TCP init method, when master rank can exit earlier than slave ranks and thus the TCP daemon thread gets shutdown before other slaves are able to access it.

This will let every rank (process) write a special key to the store to mark that they are completed (and thus about to exit).  The master rank (who is the server) will always wait until all the ranks to complete before complete itself.

This should fix: https://github.com/pytorch/pytorch/issues/15638

Tested using the repro of https://github.com/pytorch/pytorch/issues/15638 and works fine. Also test_distributed and test_c10d should have already had this coverage.

I had to make rendezvous test in c10d the world size of 1, since it is a single process code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15684

Differential Revision: D13570904

Pulled By: teng-li

fbshipit-source-id: 34f3bc471204bbd29320df359347ad5561c6b589

5 years agoRemove caffe2::Tensor copy constructor (#15416)
Dmytro Dzhulgakov [Fri, 18 Jan 2019 08:28:26 +0000 (00:28 -0800)]
Remove caffe2::Tensor copy constructor (#15416)

Summary:
Based on offline discussion it should be less surprising to the users of existing code. Thus caffe2::Tensor is now a move-only class (as it used to be), explicit calls to UnsafeSharedInstance() are necessary to get shared_ptr behavior.

This change also identified a few places that misused the copy constructor - those are fixed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/15416

Reviewed By: Yangqing

Differential Revision: D13524598

fbshipit-source-id: aea12d6dff77342606fa88ce4ddddbff266245a7

5 years agoFix RERUN_CMAKE
Zachary DeVito [Fri, 18 Jan 2019 08:01:48 +0000 (00:01 -0800)]
Fix RERUN_CMAKE

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16132

Differential Revision: D13726816

Pulled By: zdevito

fbshipit-source-id: 26ad70651b0138642ad5240670f5c452018c13a2

5 years agoCleanup includes in python_print.cpp.
Mikhail Zolotukhin [Fri, 18 Jan 2019 02:03:38 +0000 (18:03 -0800)]
Cleanup includes in python_print.cpp.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16129

Differential Revision: D13724297

Pulled By: ZolotukhinM

fbshipit-source-id: 24e140bc052c85ef40b928eb84f463d341346a51

5 years agoRefactor attributes.h (#16098)
Mikhail Zolotukhin [Fri, 18 Jan 2019 01:27:36 +0000 (17:27 -0800)]
Refactor attributes.h (#16098)

Summary:
This PR inlines `Attributes` into `Node`. It helps to cleanup the code a little as everything is one place (some of the cleanups are included in the PR).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16098

Differential Revision: D13717637

Pulled By: ZolotukhinM

fbshipit-source-id: c54ae65178a95a01354688921a9ccb1ca699f8eb

5 years agoFix export macros (#15856)
Sebastian Messmer [Thu, 17 Jan 2019 23:53:52 +0000 (15:53 -0800)]
Fix export macros (#15856)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15856

They seem to be wrong.
cc zdevito to take a look but I think this is now more correct.

It's weird this didn't cause linker errors. Probably, this functionality isn't used across library boundaries yet.

Reviewed By: dzhulgakov

Differential Revision: D13605257

fbshipit-source-id: 7077ca9027c3ac79a4847ec15ead7ddb28696445

5 years agoRemove some dependencies from ivalue.h to ATen (#15855)
Sebastian Messmer [Thu, 17 Jan 2019 23:53:52 +0000 (15:53 -0800)]
Remove some dependencies from ivalue.h to ATen (#15855)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15855

This is preparation work for moving IValue to c10.

Reviewed By: ezyang

Differential Revision: D13605259

fbshipit-source-id: cc545f582ab8607bb02aaf71273cb2710200b295

5 years agoCode style cleanup (#15854)
Sebastian Messmer [Thu, 17 Jan 2019 23:53:52 +0000 (15:53 -0800)]
Code style cleanup (#15854)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15854

- Remove unnecessary `inline` keyword
- Add a TODO stating the intention for Blob::ShareExternal()

Reviewed By: dzhulgakov

Differential Revision: D13605258

fbshipit-source-id: c0bc85c74c4ca4b3811d42ac7f866182e159d840

5 years agoUse intrusive_ptr for Blob in IValue (#16052)
Sebastian Messmer [Thu, 17 Jan 2019 23:47:16 +0000 (15:47 -0800)]
Use intrusive_ptr for Blob in IValue (#16052)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16052

We need IValue to take/return Blob as an intrusive_ptr because we want to pass it around and Blob has disabled copying.
This is needed in a diff on top.

Reviewed By: ezyang

Differential Revision: D13684761

fbshipit-source-id: 7cb3d7e9fec39a2bc9f063d4d30404e6d7016eb2

5 years agoMove c10 dispatcher back to ATen/core (#16050)
Sebastian Messmer [Thu, 17 Jan 2019 23:47:16 +0000 (15:47 -0800)]
Move c10 dispatcher back to ATen/core (#16050)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16050

The c10 dispatcher will (soon) depend on IValue and IValue can't be moved to c10 yet because it depends on at::Tensor, which depends on legacy Type dispatch and we don't want the legacy dispatch in c10.

So instead, we move the c10 dispatcher back to ATen/core until we can actually move at::Tensor to c10.

Reviewed By: ezyang

Differential Revision: D13684517

fbshipit-source-id: 1125f4254223907c52f96ff73034f6d4ae9fd0a7

5 years agoMoving cuda-convnet2 to the internal fb dir to satisfy internal dependencies. (#16104)
Chris Gottbrath [Thu, 17 Jan 2019 22:55:49 +0000 (14:55 -0800)]
Moving cuda-convnet2 to the internal fb dir to satisfy internal dependencies. (#16104)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16104

PyTorch PR 15784 removed cuda-convnet from the contrib directory. This broke
some internal-only fb dependencies. Moving this to the internal area.

Reviewed By: ezyang

Differential Revision: D13709112

fbshipit-source-id: 2d7811545da67489869b59c350a29817eff693cf

5 years agofurther wildcard cleanups (#16041)
Michael Suo [Thu, 17 Jan 2019 22:38:42 +0000 (14:38 -0800)]
further wildcard cleanups (#16041)

Summary:
Some cleanup to wildcard handling, including one bugfix: previously, we were not considering writes to the wildcard set as part of the potential write set for nodes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16041

Differential Revision: D13705738

Pulled By: suo

fbshipit-source-id: acb8ccbaa70fe47445577ddf24a69f84630de411

5 years agoRefactor _jit_internal (#16058)
David Riazati [Thu, 17 Jan 2019 21:39:07 +0000 (13:39 -0800)]
Refactor _jit_internal (#16058)

Summary:
Use qualified names in `jit/__init__.py` to avoid polluting that namespace
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16058

Differential Revision: D13718745

Pulled By: driazati

fbshipit-source-id: 19d150569c8374541250a961f24f70c3f523de03

5 years agoInclude all Caffe2 headers in Python installations (#16124)
Jesse Hellemn [Thu, 17 Jan 2019 21:37:04 +0000 (13:37 -0800)]
Include all Caffe2 headers in Python installations (#16124)

Summary:
Confirmed on a local run that all the additional headers are present. This shouldn't be caught in any existing tests though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16124

Differential Revision: D13720773

Pulled By: pjh5

fbshipit-source-id: 22a42639f5649cac555ecc5a8b6760a8cbfcf01f

5 years agoAdd comment to explain rnn bias vectors (#15843)
Aaron Jaech [Thu, 17 Jan 2019 21:13:07 +0000 (13:13 -0800)]
Add comment to explain rnn bias vectors (#15843)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15843

RNN/LSTMs only need one bias vector, but our implementation uses two to be compatible with CuDNN. This diff adds a comment to explain this.

Reviewed By: ezyang

Differential Revision: D13602365

fbshipit-source-id: eef5bd9383d9f241dc0ef0472f753b4a44cc19b5

5 years agoAdd @yf225 to cpp codeowner
Will Feng [Thu, 17 Jan 2019 20:18:15 +0000 (12:18 -0800)]
Add @yf225 to cpp codeowner

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16120

Differential Revision: D13718880

Pulled By: yf225

fbshipit-source-id: 1c0a41ffba71855a3ad88b8d263ba2bd5076351d

5 years agoUpdate FP16 to latest master (#14498)
Yangqing Jia [Thu, 17 Jan 2019 20:15:07 +0000 (12:15 -0800)]
Update FP16 to latest master (#14498)

Summary:
TSIA - fp16 cmake had a bug that is fixed in https://github.com/Maratyszcza/FP16/pull/9 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14498

Differential Revision: D13240829

Pulled By: Yangqing

fbshipit-source-id: 724745750efe4f1b49d29ee07380c36997579915

5 years agoCleanup gumbel_softmax (#13339)
Egil Martinsson [Thu, 17 Jan 2019 20:14:39 +0000 (12:14 -0800)]
Cleanup gumbel_softmax (#13339)

Summary:
Fixes #12643, amends to #3341.

- Allow multidimensional input ~~(but apply softmax over `dim=-1`)~~ with `dim` argument
- Cleaner: Less lines of code
- Faster (1.32x speedup vs original, 2x speedup vs using `torch.Distributions`)
- Small fixes in docstring
- Remove some references in docstring. Was the linked (excellent) ipynb the first to do the straight-through trick? Instead, I propose changing to reference to the two papers most known for it.
- Add deprecationwarning for `eps`. It's not needed anymore.
- Initial commit keeps some code alternatives commented to exploit CI

- As of discussion when `gumbel_softmax` was added (#3341), this was merged into `torch.nn.functional` before all the work with `Distributions` and `Pyro`, and there will probably be multiple other best practices for this in the future.
I've tested building using the `Distributions`-api, but it was too slow, see below.

I therefore propose not using `Distributions` to keep it fast and simple, but adding a comment in docstring that `gumbel_softmax` may be deprecated in the future.

```
dist = torch.distributions.RelaxedOneHotCategorical(temperature=tau, logits=logits, validate_args=False)
y_soft = dist.rsample()
```

Pros:
* Built using tricks like `logsumexp` etc
* Explicitly uses `torch.distributions.utils._finfo` to avoid overflow (old implementation had an `eps` flag)
* Maintained for this exact purpose.

Cons:
* Very slow. Construction of distribution adds overhead see timings below. May be solved in future with speedups of `TransformedDistribution` and `Distribution`.
* Assumes which `dim` to apply softmax over.

```
    y_soft = logits.new(logits.shape)
    y_soft = (logits - y_soft.exponential_().log()) / tau  # Gumbel noise
    y_soft = y_soft.softmax(dim)  # Gumbel softmax noise
```
Pros:
* Faster

```
    import time
    start = time.time()
    num_draws = 1000000
    logits = torch.randn(1,3)

    for draw in range(num_draws):
        y_draw = gumbel_softmax(logits, hard=True)
        counts = counts + y_draw
    print(end - start)

>> 12.995795965194702

>> 7.658372640609741

>> 20.3382670879364
````

Decide on which path to chose. I'll commit in changes to the unit tests in a while to show that it passes both old tests and new tests. I'll also remove the commented code about `RelaxedOneHotCategorical`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13339

Differential Revision: D13092434

Pulled By: ezyang

fbshipit-source-id: 4c21788df336f4e9c2ac289022e395b261227b4b

5 years agoAdd matches_jit_signature attribute to native_functions.yaml (#16040)
Christian Puhrsch [Thu, 17 Jan 2019 20:07:04 +0000 (12:07 -0800)]
Add matches_jit_signature attribute to native_functions.yaml (#16040)

Summary:
If "matches_jit_signature" is set to True for a particular function, we will assume that the func syntax follows the JIT signature syntax. This is a temporary attribute and doesn't need to be set by developers outside the core team. It serves as a means of tracking an ongoing schema unification with the goal of aligning func syntax with other components of PyTorch in order to reduce overall complexity and match coverage of different function descriptions.

Followup PRs might be about removing _out from native_functions.yaml and using Tensor annotations instead, etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16040

Reviewed By: ezyang

Differential Revision: D13703176

Pulled By: cpuhrsch

fbshipit-source-id: ce248e1823a6f18efa95502f9f3eebf023b4a46c

5 years agoadd if in register_buffer like register_parameters (#16110)
FrankHui [Thu, 17 Jan 2019 19:31:31 +0000 (11:31 -0800)]
add if in register_buffer like register_parameters (#16110)

Summary:
without this "if", code below will throw error " Linear' object has no attribute '_buffers' "
And with this if, error would be "cannot assign buffer before Module.\_\_init\_\_() call", which I think it's more accurate, just like register_parameter.
```
import math
import torch
from torch.nn.parameter import Parameter
from torch.nn import functional as F
from torch.nn import Module
class Linear(Module):
    def __init__(self, in_features, out_features, bias=True):

        self.in_features = in_features
        self.out_features = out_features
        self.register_buffer('test', torch.Tensor(out_features, in_features))
        self.weight = Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter('bias', None)

        super(Linear, self).__init__()

        self.reset_parameters()

    def reset_parameters(self):
        stdv = 1. / math.sqrt(self.weight.size(1))
        self.weight.data.uniform_(-stdv, stdv)
        if self.bias is not None:
            self.bias.data.uniform_(-stdv, stdv)

    def forward(self, input):
        return F.linear(input, self.weight, self.bias)

    def extra_repr(self):
        return 'in_features={}, out_features={}, bias={}'.format(
            self.in_features, self.out_features, self.bias is not None
        )

linear = Linear(3,4)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16110

Differential Revision: D13715839

Pulled By: soumith

fbshipit-source-id: c300eff0a8655aade448354cf489a592f7db722a