review.tizen.org Git - platform/upstream/pytorch.git/log

Handle stack correctly (#16246)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16246

The op schema says it returns multiple values, so let's actually return multiple values instead of one tuple.
For some reason, this did work when called from python (probably some auto-unpacking),
but once called from JIT, it segfaulted. This diff fixes that.

Reviewed By: dzhulgakov

Differential Revision: D13780147

fbshipit-source-id: fe94f82f4c53b7454f77c4484fca4ac9dc444475

Fix compiler error in swapBytes64 for rare architectures (#16418)

Summary:
swapBytes64 used to use SwapByteOrder_32 and value, both of which dont exist. This commit rewrites that part from scratch.
This happened on Debugbuild on Microsoft compiler. For that case " && !defined(_DEBUG)" is also removed, because _byteswap_uint64 works fine in debug mode (if it is necessary it should me commented why).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16418

Differential Revision: D13843306

Pulled By: ezyang

fbshipit-source-id: dde1c7baeccec3aaa750d4b7200b3f4ccb4a00cb

Fix lint errors introduced in pytorch/pytorch@ceece5d (#16454)

Summary:
ifedan

```
./test/common_utils.py:748:1: E302 expected 2 blank lines, found 1
./test/test_torch.py:1235:5: E303 too many blank lines (2)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16454

Differential Revision: D13844905

Pulled By: bddppq

fbshipit-source-id: 3dc7c740d86310a8efc9864d7c7798fda8257a21

Report the slowest 10 tests when using pytest (#16423)

Summary:
This flag is useful in identifying if a test is taking way too long like the ones in the following snippet when running the test suite with pytest. https://github.com/pytorch/pytorch/blob/9757ad35b0b56cf955f294e751de9b437f9bb4ff/test/common_utils.py#L814-L835
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16423

Differential Revision: D13843507

Pulled By: ezyang

fbshipit-source-id: 643e1766a85905b3b112ea5ca562135a17896a72

Optimize SpatialBNOp on GPU (#16395)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16395

Optimize SpatialBNOp on GPU

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13829833

fbshipit-source-id: 04d2a63e8e9830c4c39a91cf87fcd7aa765dc55f

CPU implementation of torch.cdist (#16168)

Summary:
cdist is used for calculating distances between collections of observations.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16168

Differential Revision: D13739147

Pulled By: ifedan

fbshipit-source-id: 9419c2c166891ac7db40672c72f17848f0b446f9

Don't initialize a new `std::vector` in a loop. (#15850)

Summary:
Before this diff, we execute `std::vector<optional<acc_t>> buffer((unsigned)max_threads, optional<acc_t> {});` in every iteration of `foreach_reduced_elt`. Change the code to only execute that line if we need it; i.e., we are actually about to parallelize.

This overhead is quite significant when we are doing a lot of small reductions in single-threaded code.

```
x=torch.randn((1024,10,1024),dtype=torch.float64)
torch.set_num_threads(1)
%timeit x.std(1)
```

Before (with #15845 applied): 708.25 ms
After: 508 ms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15850

Differential Revision: D13612960

Pulled By: umanwizard

fbshipit-source-id: f5e61abfe0027775c97ed81ac09c997fbee741df

More documentation on caffe2::Operator

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16371

Reviewed By: dzhulgakov

Differential Revision: D13820472

fbshipit-source-id: efccea0e92c86d30ec2bdda50eb9aab8a3a1504d

Better error message when creating a module instance in jit.script (#16416)

Summary:
Made the change requested in #15555

PR was failing build due to a time out error while getting packages using pip.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16416

Differential Revision: D13833873

Pulled By: soumith

fbshipit-source-id: e2200e9e8015558fcd359dfa3d025b25802d62b5

Fix issues on Windows brought by #16289 (#16412)

Summary:
This one needs to be merged ASAP because the CUDA build for Windows is skipped at this time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16412

Differential Revision: D13833889

Pulled By: soumith

fbshipit-source-id: 95a401a01fb0f9c1045df0bfd72d8206b8a6f3fd

Fix a typo in Parallel.h (#16419)

Summary:
Fix a typo in Parallel.h.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16419

Differential Revision: D13833705

Pulled By: soumith

fbshipit-source-id: 824ebe753e028fc8e2b5d7a51fdba98a365fd29a

Don't install PDB for Windows static build of caffe2_observers (#16420)

Summary:
Fixes #16292.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16420

Differential Revision: D13833704

Pulled By: soumith

fbshipit-source-id: 482ad6ce103bed7206e924e8c82454fbb1bfac42

Fix slogdet sign requiring grad when input requires grad (#16337)

Summary:
The real fix for https://github.com/pytorch/pytorch/issues/15605.

This is sort of BC breaking because now
```py
In [1]: import torch

In [2]: a = torch.randn(3, 3, requires_grad=True)

In [3]: a.slogdet()
Out[3]: (tensor(1.), tensor(0.1356, grad_fn=<SlogdetBackward>))

In [4]: a.slogdet()[0].requires_grad
Out[4]: False
```
while before this patch ` a.slogdet()[0]` requires grad with `grad_fn=<SlogdetBackward>`. But any use of backproping through this value will meet the error in #15605 so I don't think this is a problem.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16337

Differential Revision: D13832644

Pulled By: soumith

fbshipit-source-id: f96c477e99edcbdbd966888e5c5ea7fd058429a8

CI Fix: restore MAX_JOBS variable (#16415)

Summary:
Restores a CI workaround (https://github.com/pytorch/pytorch/pull/7361) that got dropped with build_pytorch_libs.sh.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16415

Differential Revision: D13833092

Pulled By: zdevito

fbshipit-source-id: f78b60cafd8da945790dba28de373b8faf46e9f5

Update einsum documentation. (#16323)

Summary:
The documentation stated that operands to einsum should be a list of Tensors, not individual arguments. The function, however, now accepts individual arguments for each Tensor operand *and* a single argument consisting of a list of Tensors. The documentation was updated to reflect this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16323

Differential Revision: D13832647

Pulled By: soumith

fbshipit-source-id: c01c2b350f47674d3170337f493b0ee2ea381b3f

Fix flake8 warnings/errors in test_jit.py (#16409)

Summary:
These were really annoying to see in the phabricator UI when trying to land PRs that touched test_jit.py, so this fixes them.

One remaining item is the T484 error. Locally, flake8 still chokes on that line even though I put the noqa comment there (and tried varying whitespaces around it etc). Not sure why it still persists...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16409

Differential Revision: D13832658

Pulled By: jamesr66a

fbshipit-source-id: 46356ba6444ae5ee1a141c28489bdcc7c99e39c0

Trace fork and join calls

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16232

Differential Revision: D13772974

Pulled By: jamesr66a

fbshipit-source-id: b2db370271809e26d3301f8cc98eec567db5e62b

Switch to CUDA implementation if batch size >= 65536 for affine_grid (#16403)

Summary:
Changelog:

- Append a condition that switches to the native CUDA implementation for affine_grid

Fixes #16365

Differential Revision: D13832192

Pulled By: soumith

fbshipit-source-id: 3f484e6673d71e3ba7627b170cb8f1611e12b9b2

gitignore gdb history

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16404

Differential Revision: D13832191

Pulled By: soumith

fbshipit-source-id: ab23d1ad72c041ec2d9616c273bbf399e0feb10d

Revert D13821061: [redo][c10] layernorm example

Differential Revision:
D13821061

Original commit changeset: 82f0dade0145

fbshipit-source-id: e5b0b1bab0c9e731ae04add35e9a6c91656dd178

trying to fix testX (#16370)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16370

passed locally but seems testX has some problem

Reviewed By: ezyang

Differential Revision: D13820250

fbshipit-source-id: e4ad9d1ec99508867d4ead46753a7fb7019c50bd

layernorm example (#16374)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16374

this fixes the original attempt in OSS (adds to CMake and python build files)

Reviewed By: smessmer

Differential Revision: D13821061

fbshipit-source-id: 82f0dade0145fd04bdf8e3cb3954b5790e918162

plug caffe2 into jit" (#16388)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16388

previous diff broke master -- this refactors out the custom_operator.cpp file into a separate header + cpp pair (caffe2_operator.{h,cpp})

Reviewed By: smessmer

Differential Revision: D13823550

fbshipit-source-id: 00e005e650336132d05aef97c1f0e5242ccad5ba

Enable centos pytorch rocm CI

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14879

Differential Revision: D13821534

Pulled By: bddppq

fbshipit-source-id: 45151b880992f1efa83e29c4985a723374575506

Remove bash from build (#16289)

Summary:
This commit removes the dependency on `build_pytorch_libs.sh` by moving the remaining functionality that is not expressible in cmake into python. Removing the indirection through bash also removes over 300 lines of environment munging code that is incredibly hard to understand because it passes a lot of secret parameters through `os.env`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16289

Reviewed By: ezyang

Differential Revision: D13821662

Pulled By: zdevito

fbshipit-source-id: d658d26925e3b1169ac1e3d44a159cf8a1f0d9b1

Remove caffe2::ShareData (#16139)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16139

Original commit changeset: 4b15a4c62995

Reviewed By: dzhulgakov

Differential Revision: D13677464

fbshipit-source-id: 1a644a88fac02b44feebac48ccc01bc72cc47edb

Trying a fix to anaconda logins on nightlies

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16387

Differential Revision: D13826227

Pulled By: pjh5

fbshipit-source-id: 769a53e40a4912879faf9716a80c0e0c86acdbf8

Update Documentation for Optionals (#16380)

Summary:
Now that https://github.com/pytorch/pytorch/pull/15587 has landed, updating docs.

Will close https://github.com/pytorch/pytorch/issues/15278
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16380

Differential Revision: D13825221

Pulled By: eellison

fbshipit-source-id: c5a7a7fbb40ba7be46a80760862468f2c9967169

Revert D13740752: [c10] plug caffe2 into jit

Differential Revision:
D13740752

Original commit changeset: 2d9383574d42

fbshipit-source-id: e9ff217a438720423340a10af7fa263b33f2ae24

Impl Shape op for mkldnn (#15266)

Summary:
Impl Shape op for mkldnn
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15266

Differential Revision: D13804558

Pulled By: yinghai

fbshipit-source-id: 8a35f608c23973d7a15c3d645aee4059eb55f245

Back out "[c10] layernorm example"

Summary: Original commit changeset: 87240ca7f48d

Reviewed By: bddppq

Differential Revision: D13816657

fbshipit-source-id: bafcf0779d811c7e4a134cfb323a89352fa8c180

Add xla test in CI (#15978)

Summary:
Adding xla CPU tests in our CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15978

Differential Revision: D13816344

Pulled By: ailzhang

fbshipit-source-id: f74c52e846976ea4ac439313847908a0e99d05eb

Delete Tensor::swap(), replace with pointer swap (#12730)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12730

i-am-not-moving-c2-to-c10

Reviewed By: smessmer

Differential Revision: D10415430

fbshipit-source-id: 8a2ce8611c5fa77bbbd73fb6788c1baa3b370f07

Make test_proper_exit more robust (#16249)

Summary:
1. Improve error message for better debugging info
2. Increase timeout
3. Also apply the windows worker failure detection mechanism on non-Windows platforms, for better robustness

Attempt to fix #14501

cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16249

Differential Revision: D13784702

Pulled By: ezyang

fbshipit-source-id: 09a7cff83ab9edce561ed69f9fb555ab35d1275f

fix contbuild (#16362)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16362

https://our.intern.facebook.com/intern/testinfra/diagnostics/281475065177800.844424930381786.1548397180/

Reviewed By: ezyang

Differential Revision: D13816639

fbshipit-source-id: 024117233f6d3bc6244013ca2ee1aea065560212

Minor change of group_norm_gradient on GPU (#16307)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16307

Minor change of group_norm_gradient on GPU

Reviewed By: houseroad

Differential Revision: D13800613

fbshipit-source-id: 9e55f93b1e322efe3fc2d684b9c47c3dbb7a0f48

Revert D13551909: [fbcode] logdevice for generic feature type

Differential Revision:
D13551909

Original commit changeset: 807830c50bee

fbshipit-source-id: 48cacf4ec1765253a9be9d78f4b28cc48330be59

logdevice for generic feature type (#16191)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16191

logdevice related modifications for generic feature type

we directly convert the generic feature structures to json strings, which corresponds to the column input in offline and dper

Reviewed By: itomatik

Differential Revision: D13551909

fbshipit-source-id: 807830c50bee569de202530bc3700374757793a2

layernorm example (#16350)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16350

Example usage of the new caffe2 integration

Reviewed By: smessmer

Differential Revision: D13408546

fbshipit-source-id: 87240ca7f48d653a70241d243aa0eb25efa67611

plug caffe2 into jit (#16331)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16331

Temporary measure to enable caffe2 ops in pytorch

Reviewed By: smessmer

Differential Revision: D13740752

fbshipit-source-id: 2d9383574d42ce84ee471aba32eeb4f5a0cc7a4c

Add RunOperator for using FunctionSchema registered ops easily in caffe2 (#16173)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16173

Helper to make it easy to run ops in caffe2

Reviewed By: smessmer

Differential Revision: D13468240

fbshipit-source-id: 2276c7870af6dcdf829957f005fd16ac1ef319b5

Add correct Input() shim to caffe2 operator impl (#16048)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16048

This enables full shimming of the operator (previously it was only
Output() shimmed).

Reviewed By: smessmer

Differential Revision: D13468241

fbshipit-source-id: c853b775ab5cdcd968f4a6cc4766e91c3c6b1c45

Relax lower bound for nogil timing test to avoid false alarm (#16259)

Summary:
fixes #16250, #16271
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16259

Differential Revision: D13784505

Pulled By: mrshenli

fbshipit-source-id: 0b7ad98cd3c018b9907d70158de3abc3c4cb57ef

Code-style fixes. (#16342)

Summary:
Some cleanups in ir.{h,cpp}. I plan to continue cleaning it up, so this is a first step.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16342

Differential Revision: D13808897

Pulled By: ZolotukhinM

fbshipit-source-id: 2dedb414576c3efbf8e36434145d7f14a66b1ee7

disable testing group conv with EIGEN engine (#16335)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16335

group conv is not implemented with EIGEN engine so this diff disables related tests

Reviewed By: jamesr66a

Differential Revision: D13807204

fbshipit-source-id: 41f6de43da40882f57e64474520e185733caefb7

Remove unneeded manual unwrap optionals (#16245)

Summary:
Remove calls to torch.jit._unwrap_optional that are no longer needed.

The remaining instances would require control flow logic for exceptions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16245

Differential Revision: D13804292

Pulled By: eellison

fbshipit-source-id: 08c5cbe4b956519be2333de5cf4e202488aff626

fix buildindexop (#16341)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16341

as in the title

Reviewed By: intermilan

Differential Revision: D13808679

fbshipit-source-id: 0d12d3253f380bec66bc9be899be565861b8163a

Revert D13747581: Optimize SpatialBN on GPU

Differential Revision:
D13747581

Original commit changeset: 48a885a240ef

fbshipit-source-id: 58cec6023843d7459865eb80c9db8dac463cb96c

Add Test for ReinitializeTensor (#16338)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16338

att

Reviewed By: ezyang

Differential Revision: D13806760

fbshipit-source-id: 322b9b7d314aeb0194f52b803ca35c0cb8efcdec

Add thread-local guard: at::AutoNonVariableTypeMode (#15939)

Summary:
This PR adds thread-local guard (`at::AutoNonVariableTypeMode`) to make sure that in VariableType.cpp the operations on baseType still dispatch to non-Variable type, even if the parameters will become Variables after the Tensor/Variable merge. We achieve this by making `legacyTensorType()` and `getType()` check the `at::AutoNonVariableTypeMode` guard to decide whether to return non-Variable type for a variable.

This is part of the VariableImpl/TensorImpl merge work: https://github.com/pytorch/pytorch/issues/13638.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15939

Reviewed By: ezyang

Differential Revision: D13640980

Pulled By: yf225

fbshipit-source-id: d12c2543822958558d7d70d36c50999a5eb8783f

reduce parameter space of test_1x1_conv to avoid timeout (#16223)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16223

As title says

Reviewed By: jamesr66a

Differential Revision: D13758202

fbshipit-source-id: 3cdffb80a5dad53b29e65e8eb0ae128edba70dbb

Update docs to include variable annotation example (#16324)

Summary:
Relates to this issue https://github.com/pytorch/pytorch/issues/16288
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16324

Reviewed By: ezyang

Differential Revision: D13805412

Pulled By: suo

fbshipit-source-id: 8b80f988262da2c717452a71142327bbc23d1b8f

Delete duplicate copy of THCCachingAllocator. (#16226)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16226

Now that the caching allocator is moved to c10_cuda, we can
delete the duplicate copy from Caffe2.

Reviewed By: dzhulgakov, smessmer

Differential Revision: D13762540

fbshipit-source-id: 03f1ebf7f11c68c19aa0d66110156fe228da6138

Move THCCachingAllocator to c10_cuda. (#16119)

Summary:
Some renaming and renamespacing also took place. I was originally planning not to do anything, but it turns out that it was easier to make HIPify work by using a namespace CUDACachingAllocator:: rather than THCCachingAllocator_, since :: is a word boundary but _ is not.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/16119

Reviewed By: smessmer

Differential Revision: D13718768

fbshipit-source-id: 884a481d99027fd3e34471c020f826aa12225656

Remove unnecessary includes and headers from THCCachingAllocator, move to at::cuda:: namespace (#16117)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16117

This means I can move it to c10_cuda with minimal fuss.

Reviewed By: smessmer

Differential Revision: D13717836

fbshipit-source-id: a94c7dc649af64542480fc1c226b289588886c00

Directly include headers from ATen.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16287

Differential Revision: D13792949

Pulled By: ZolotukhinM

fbshipit-source-id: d627d8dc469df048063c70d0b5b8d33fede809a3

Refactor the docs build workflow (#16265)

Summary:
In preparation for setting up a doc build job for stable docs, I wanted
to refactor the workflow so that future changes will be easier.

This PR the following changes:
- Refactor the doc push script into a reusable command
- Add command line options for the doc push script.
  These don't matter too much for now but will be useful
  for setting up future jobs for building different versions of the
  docs.
- Instead of checking out pytorch/pytorch:master, we re-use the pytorch
  installation inside the docker image.
- Change the sed in the script to a perl command. sed is annoyingly
  different across platforms; the perl command is more stable
- Run the script in dry run mode (without pushing the doc build)
  whenever a PR is opened. This lets us test changes to the doc build workflow.

Test Plan
- I tested the doc build script locally with my own credentials and it
  worked fine.
- Wait for the pytorch_doc_push CI.
- After merging this PR, keep an eye on the pytorch_doc_push CI status.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16265

Differential Revision: D13803511

Pulled By: zou3519

fbshipit-source-id: 4564bca3e74d490f89a1d1da9fb8b98eb44bdbb1

Save a little bit of work in constant pooling by not moving nodes that will get deleted.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16161

Differential Revision: D13791247

Pulled By: resistor

fbshipit-source-id: 2a5a4f98309509b4ba875373ee57e6f63c75a4fd

Handle non-contiguous inputs with mkldnn convolution. (#16300)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/16018.

Backwards appears to be fine because the derivative is written in terms of mkldnn_convolution itself.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16300

Differential Revision: D13797776

Pulled By: gchanan

fbshipit-source-id: 68a990b8a3c186412a99d176931314806c9ed7bf

Optimize SpatialBN on GPU (#16202)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16202

Optimize SpatialBN on GPU

Reviewed By: houseroad

Differential Revision: D13747581

fbshipit-source-id: 48a885a240ef2a325235e8f89ebbe50e7c780c84

optimize group_norm (#16216)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16216

Optimize GroupNormOp

Reviewed By: houseroad

Differential Revision: D13754145

fbshipit-source-id: 650f64c81486c6c9d276f2e3325392d5838751ba

Fix the tensor deserialization problem of jit script module on CUDA (#16279)

Summary:
Now we create a temporary tensor for the whole record.

Fix https://github.com/pytorch/pytorch/issues/15271
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16279

Reviewed By: BIT-silence

Differential Revision: D13791442

Pulled By: houseroad

fbshipit-source-id: 6f52ca09627fb684f74121357cc42e4adadec36a

Small fixes for pdist (#16210)

Summary:
pdist was recently patched to remove buggy batch support and fix issues
with large tensors. This fixed missed a few spots, and didn't handle a
few recommendations that this commit addresses.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16210

Differential Revision: D13791914

Pulled By: gchanan

fbshipit-source-id: 0595841be1b298f7268fd4c02a6628acfec918f2

Fix comparison in ReinitializeTensor (#16294)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16294

In `ReinitializeTensor`, we compare `tensor->GetDevice()` and `options.device()`, but in the callsite, we actually just provide an option with `device_type`, which means the `device_id` will always be default(-1) for `options`, but for tensor, although it is passed a `device` with default `device_id`, when we allocate the data, the `device` of the `tensor` is the `device` of `Storage`, which is the `device` of underlying `DataPtr`, which is the same as the `device` of the `Context` of the operator, which has a non-default `device_id`.

Therefore everytime we do `ReinitializeTensor`, we'll find the `device` does not match, and after the `ReinitializeTensor` call, the `device` still does not match. That's why everytime we'll allocate a new Tensor and cause perf regressions for ops that uses `ReinitializeTensor` on multiple GPUs.

Reviewed By: BIT-silence

Differential Revision: D13795635

fbshipit-source-id: 24d6afa1a0196a32eb0134ee08b4280244cdb0c3

Fix issues under caffe round 1

Summary: Some automation to fix uninitialized members for caffe2 code. Ran canary to make sure I don't have any regression in prod, but not sure how to test comprehensively for caffe2

Reviewed By: ezyang

Differential Revision: D13776185

fbshipit-source-id: fb2a479971cc0276d8784be1c44f01252410bd24

Add support for overloaded functions (#15556)

Summary:
This PR adds support for overloaded functions as a step toward adding rnn modules to the JIT standard library.

Possible overloads must be manually specified, and when resolving the overload it chooses by the first one that passes the schema matching logic. The structure is very similar to boolean dispatch in #14425. The overload will only work on weak modules.

In order to avoid supporting overloaded methods in Python to match the JIT execution, the current setup offloads that work to the user. In the test added in `test_jit.py`, two methods are used to overload the `forward` method. In order to call `forward` outside the JIT, a Python-only `forward` that does the right argument type switching must also be provided.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15556

Differential Revision: D13576348

Pulled By: driazati

fbshipit-source-id: 7d3bdd4ee5a6088cc20c92f26a696d1ee5b9204b

Constant propagation changes (#16244)

Summary:
- remove loop node that is guaranteed not to execute
- remove extra loop outputs that are no longer needed

- if we are inlining an if node, only run constant propagation on the block that will execute

- remove the recurse argument since we only expose the Graph Constant Propagation and it's not used

This also includes a few extra hooks to python_ir that I think make it a little be easier to test graph conditions from python.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16244

Differential Revision: D13791635

Pulled By: eellison

fbshipit-source-id: d16351fffcfc8013b02015db200f8fde002e0577

raise exception if try jit.load non-existent file (#16270)

Summary:
addresses https://github.com/pytorch/pytorch/issues/16267
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16270

Differential Revision: D13791773

Pulled By: suo

fbshipit-source-id: 256304a02dbf724a7c0baade48c94b3ee77f53cf

Fixing upload of nightly binaries and clean MacOS output (#16016)

Summary:
- Fix environment variable used to guard binary uploads
- Move common MacOS brew setup-code into a common function to decrease code duplication and also to move that noisy console output into its own CircleCI step
- Split Mac builds into separate build-test and upload jobs. Add one of these jobs to PR runs; add upload jobs to nightly binarybuilds workflow
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16016

Differential Revision: D13791084

Pulled By: pjh5

fbshipit-source-id: 8eeb8e1963d46eab84f0f6dad9f0265163d5bf73

CUDA event should only be recorded after NCCL group (#8219)

Summary:
Otherwise, it won't work if we sync on this event.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8219

Reviewed By: pietern

Differential Revision: D13788657

Pulled By: teng-li

fbshipit-source-id: 8c96e9691ed2441d7a685fb7ae8fece906f58daf

Change data() accessor in Caffe2 to return non-const pointer. (#16176)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16176

This makes PyTorch and Caffe2's data() method line up.
Historically, PyTorch made no distinction between tensors
with const or non-const data, and thus provided a
non-const pointer with data() member. Changing the API to
return a const-pointer would break all mutable code, whereas
changing the Caffe2 API to change a pointer doesn't break
any code, *except* for code which required an exact match
on const-ness (e.g., in template arguments). Since the latter
is less disruptive, we've opted for it here.

The few places downstream that broke due to this are fixed
in this patch.

Reviewed By: smessmer

Differential Revision: D13742916

fbshipit-source-id: baa4b4544cfdf7c1f369f4d69a1e0d5953c1bd99

Updating submodules

Reviewed By: cdelahousse

fbshipit-source-id: 99d58034f9369846f8c82a5ea11c71e202e52a4e

Align native_functions.yaml func schema more with JIT signature schema (#16111)

Summary:
This PR applies a few minor modifications leading to 100s of additional matches

Modifications to native_functions.yaml
1) double to float
2) int64_t to int
3) IntList[\d*] to int[\d*]
4) {} to []
5) Tensor? x=[] to Tensor? x=None
6) TensorList to Tensor[]
7) 1e-x to 1e-0x
8) Generator* x = nullptr to Generator? x = None
9) `{.*}` to `[.*]`

Overall this adds about 300 new matches and brings us to about 1/2 compliance of native_functions func with their JIT signature equivalent

While this is still a draft "tools/jit/gen_jit_dispatch.py" contains code to aid in finding close signatures
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16111

Reviewed By: ezyang

Differential Revision: D13738123

Pulled By: cpuhrsch

fbshipit-source-id: d1ec1e089bdb26ec155f6f31ccf768270acb76c7

Fixes selection of cuDNN algorithm (#15881)

Summary:
This PR updates the logic for using cudnnGet* and cudnnFind*. Current version of cudnn find and get (v7) returns a pair of best algorithm and the convDesc mathType. While we were using the returned algorithm, we didn't update the mathType. As a result, we ended up with a slow choice of algorithm and math type. Without this patch, we are seeing a 10x regression in group convolutions.

Changelist:
- Changed the template arguments to be `perf_t` instead of `algo_t` to unify cudnnFind and cudnnGet. Both cudnnFind and cudnnGet have the same purpose and hence, it made sense to unify them and get rid of `getAlgorithm`.
- Used cudnnGet*_v7 everywhere cudnnGet* was being used.
- Removed all cudnn6 paths (This PR depends on https://github.com/pytorch/pytorch/pull/15851)

Differential Revision: D13787601

Pulled By: ezyang

fbshipit-source-id: 81fe86727673d021306fe1c99c3e528b7c9ad17f

Disable flaky test

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16274

Reviewed By: pietern

Differential Revision: D13788036

fbshipit-source-id: a9b7353fb0655908e6d47387cc77af33e9471aed

Update third_party protobuf to v3.6.1

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16251

Reviewed By: ezyang

Differential Revision: D13781444

Pulled By: bddppq

fbshipit-source-id: b713a021033d214f30a49ee02b95edf8633bcc50

fix sigma in the middle of when word (#16227)

Summary:
there is a random sigma in the when word on :
https://pytorch.org/cppdocs/contributing.html
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16227

Differential Revision: D13762753

Pulled By: goldsborough

fbshipit-source-id: 3d4bf4be859a3069402fe8c3fbc8ebee4f25cc5a

Typos and broken RSTs fixed in torch.distribution (#16136)

Summary:
- probabilty -> probability
- make long lines break
- Add LogitRelaxedBernoulli in distribution.rst
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16136

Differential Revision: D13780406

Pulled By: soumith

fbshipit-source-id: 54beb975eb18c7d67779a9631dacf7d1461a6b32

tune elementwise for AMD uarch (#16217)

Summary:
Tune elementwise kernel for AMD architectures by increasing the work group sizes and launch bounds. This change improves training throughput for torchvision models by up to 11% in our tests while exhibiting no significant performance regression.

No functional/performance change for CUDA - just shifting numbers into constrexpr.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16217

Differential Revision: D13776684

Pulled By: bddppq

fbshipit-source-id: edbaebe904598b2de66a9e9a68a1aa219ebc01e9

fix typo in resnet50_trainer.py

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16219

Differential Revision: D13776742

Pulled By: bddppq

fbshipit-source-id: 10a6ab4c58159b3f619b739074f773662722c1d9

update of fbcode/onnx to dc75285d4a1cff9618400164dfdb26c5a1bab70a

Summary:
Previous import was c553fb32a0902ce5dd42e1b40123e9e9b38bdbe7

Included changes:
- **[dc75285](https://github.com/onnx/onnx/commit/dc75285)**: Relax constraint that the initializers must be a subset of graph inputs (#1718) <G. Ramalingam>
- **[985c8cd](https://github.com/onnx/onnx/commit/985c8cd)**: Fix typo in scan shape inferencing (#1753) <Scott McKay>
- **[ab52a5d](https://github.com/onnx/onnx/commit/ab52a5d)**: remove stale test cases <Lu Fang>
- **[56434bb](https://github.com/onnx/onnx/commit/56434bb)**: Removing experimental ConstantFill op. <Spandan Tiwari>
- **[881c63c](https://github.com/onnx/onnx/commit/881c63c)**: Show string names of data types instead of int IDs (#1749) <Shinichiro Hamaji>
- **[0a12fe4](https://github.com/onnx/onnx/commit/0a12fe4)**: Update ConstantOfShape op. (#1744) <Bowen Bao>
- **[ef028e5](https://github.com/onnx/onnx/commit/ef028e5)**: Update definition of Cast Op to support casting to/from string (#1704) <Raymond Yang>

Reviewed By: BIT-silence

Differential Revision: D13773962

fbshipit-source-id: b98079277994a699d4807210ba1d9c27f4672090

Add default_stream() and enhance current_stream() (#16200)

Summary:
Closes #16156
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16200

Differential Revision: D13747455

Pulled By: mrshenli

fbshipit-source-id: 00c0d5f341c3ac7a757bdb4631a17e11fbc6d3ec

complex_registration_extension.cpp includes to angled brackets

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16122

Reviewed By: smessmer

Differential Revision: D13717900

fbshipit-source-id: 8401f39d993482d3e08d2d79bc1841deafee2a5b

Remove ATen/Allocator.h forwarding header.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16121

Reviewed By: smessmer

Differential Revision: D13717899

fbshipit-source-id: 83488f2aa801ca75059949ec85171ec03e64c4ff

Remove dead curVal store.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16116

Reviewed By: smessmer

Differential Revision: D13717719

fbshipit-source-id: 2ecee3f08f64e64ec5ac3c92fb326bc3df37e40e

Make kernel registration constexpr again (#16166)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16166

Since we now don't use std::function anymore, we can make kernel registration constexpr again.

Reviewed By: ezyang

Differential Revision: D13738630

fbshipit-source-id: 918fa3a3c8c6f0ddbd0f08b3b143cdf066265387

Avoid closure around kernel (#16165)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16165

Store kernels as direct function pointers instead of std::function.
Using direct function pointers avoids a performance risk std::function would introduce.

Reviewed By: ezyang

Differential Revision: D13738627

fbshipit-source-id: a348906c8a201436699681980a82ca95065a06a0

Pass IValues from JIT to c10 dispatcher (#16066)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16066

Don't unwrap and re-wrap but directly pass through the IValues

Reviewed By: ezyang

Differential Revision: D13689037

fbshipit-source-id: 99b8155e640eb61a3c0597bf0f2b9c338712b45e

Release GIL when synchronize or wait (#16182)

Summary:
address the second future work item in #15937
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16182

Differential Revision: D13744972

Pulled By: mrshenli

fbshipit-source-id: e9812e3fd4a5623e99b639d9f334bfc2d1827d92

Revert D13540278: [pytorch][PR] Unhide unique from C++, make unique partially scriptable

Differential Revision:
D13540278

Original commit changeset: 3768c76a90b0

fbshipit-source-id: 7a31c239f9dca6ff467344d99820095addcae9d7

Return namedtuples from torch.* function with multiple return arguments for C++ operators (#15429)

Summary:
Partially fixes: https://github.com/pytorch/pytorch/issues/394

Implementation detail:

Codegen is modified to generate codes that looks like below:
```C++
static PyObject * THPVariable_svd(PyObject* self_, PyObject* args, PyObject* kwargs)
{
  HANDLE_TH_ERRORS
  static PythonArgParser parser({
    "svd(Tensor input, bool some=True, bool compute_uv=True, *, TensorList[3] out=None)",
  }, /*traceable=*/true);

  ParsedArgs<6> parsed_args;
  auto r = parser.parse(args, kwargs, parsed_args);
  static PyStructSequence_Field fields0[] = {
    {"U", ""}, {"S", ""}, {"V", ""}, {nullptr}
  };
  static PyStructSequence_Desc desc0 = {
    "torch.return_types.svd_out", nullptr,
    fields0, 3
  };
  static PyTypeObject type0;
  static bool namedtuple_type_initialized0 = false;
  if (!namedtuple_type_initialized0) {
    PyStructSequence_InitType(&type0, &desc0);
    namedtuple_type_initialized0 = true;
  }
  static PyStructSequence_Field fields1[] = {
    {"U", ""}, {"S", ""}, {"V", ""}, {nullptr}
  };
  static PyStructSequence_Desc desc1 = {
    "torch.return_types.svd", nullptr,
    fields1, 3
  };
  static PyTypeObject type1;
  static bool namedtuple_type_initialized1 = false;
  if (!namedtuple_type_initialized1) {
    PyStructSequence_InitType(&type1, &desc1);
    namedtuple_type_initialized1 = true;
  }
  if (r.idx == 0) {
    if (r.isNone(3)) {
      return wrap(&type1, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2)));
    } else {
      auto results = r.tensorlist_n<3>(3);
      return wrap(&type0, dispatch_svd(r.tensor(0), r.toBool(1), r.toBool(2), results[0], results[1], results[2]));
    }
  }
  Py_RETURN_NONE;
  END_HANDLE_TH_ERRORS
}
```
Types are defined as static member of `THPVariable_${op_name}` functions, and initialized at the first time the function is called.

When parsing function prototypes in `native_functions.yaml`, the parser will set the specified name as `field_name` when see things like `-> (Tensor t1, ...)`. These field names will be the field names of namedtuple. The class of namedtuples will be named `torch.return_types.${op_name}`.

In some python 2, `PyStructSequence` is not a subtype of tuple, so we have to create some functions to check if an object is a tuple or namedtuple for compatibility issue.

Operators in `native_functions.yaml` are changed such that only `max` and `svd` are generated as namedtuple. Tests are added for these two operators to see if the return value works as expected. Docs for these two ops are also updated to explicitly mention the return value is a namedtuple. More ops will be added in later PRs.

There is some issue with Windows build of linker unable to resolve `PyStructSequence_UnnamedField`, and some workaround is added to deal with this case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15429

Differential Revision: D13709678

Pulled By: ezyang

fbshipit-source-id: 23a511c9436977098afc49374e9a748b6e30bccf

Fix formating in caffe2/quantization/server/README.md

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14237

Reviewed By: dskhudia

Differential Revision: D13751791

Pulled By: jspark1105

fbshipit-source-id: 54f73d5134e596817802c66d43098d18458c2799

hip-clang enablement (#16085)

Summary:
Initial enabling of the upcoming hip-clang compiler for the PyTorch source base.

Changes:
* update the Eigen submodule to a version including our upstreamed hip-clang enabling there
* modify a few ifdef guards with the `__HIP__` macro used by hip-clang
* use `__lane_id` instead of `hc::__lane_id`
* add Debug flags for ROCm to the cmake infrastructure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16085

Differential Revision: D13709459

Pulled By: ezyang

fbshipit-source-id: 1b7b33fe810a0434766180580d4443ea177eb7c7

Raise CalledProcessError when torch.distributed launch process not return 0 (#16069)

Summary:
`torch.distributed.launch.py` will not raise error when `subprocess.Popen` is not return 0.
For better debugging it should always raise an error if processes launched have unusual behavior
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16069

Differential Revision: D13709467

Pulled By: ezyang

fbshipit-source-id: 31d32a5ec8fed7bccd62d845bfba0e670ed3fe20

Reserve vectors that we know the size in advance for. (#16201)

Summary:
Save reallocation costs, by reserving vectors according to how many elements we expect to put in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16201

Differential Revision: D13762594

Pulled By: ezyang

fbshipit-source-id: 7e3bfe421489dde48a2ddb0920dd155f69baecc0

cpp doc fix (#16221)

Summary:
Fixed a few C++ API callsites to work with v1.0.1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16221

Differential Revision: D13759207

Pulled By: yf225

fbshipit-source-id: bd92c2b95a0c6ff3ba5d73cb249d0bc88cfdc340

Move away from ConstantFill (#16214)

Summary:
Prerequisite of https://github.com/onnx/onnx/pull/1434
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16214

Reviewed By: BIT-silence

Differential Revision: D13755116

Pulled By: houseroad

fbshipit-source-id: a46be8d7df959b5ede93e1f9c911a9a9326e6879

ban conv_double_backward from sandcastle, it takes too long

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16220

Differential Revision: D13755108

Pulled By: zdevito

fbshipit-source-id: 46b1b128b155964c25249add0c84680491845e9b

Remove dead code from setup.py, remove need for build target. (#16162)

Summary:
Now it is only necessary to use 'develop' or 'install' to build. Incremental cmake is on by default. `develop --cmake` forces it to rerun.

The NinjaBuilder stuff is dead. It was used to make building _C.so
faster but now _C.so is just an empty stub file.

Removed a bunch of custom build commands from setup.py that are
no longer meaningful now that cmake handles most of the build.

Removed unused targets in build_pytorch_lib.sh/bat
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16162

Differential Revision: D13744155

Pulled By: zdevito

fbshipit-source-id: d836484782c65b7f8e8c7a82620886f7a7777892

Unhide unique from C++, make unique partially scriptable (#15256)

Summary:
This PR does three things:

~~Allow `int64_t?` in function schema, which provide an elegant way of implementing null-able int arguments, as discussed in https://github.com/pytorch/pytorch/pull/15208#pullrequestreview-185230081~~

~~Originally implemented in https://github.com/pytorch/pytorch/pull/15235~~

~~Example:~~

```yaml
- func: myop(Tensor self, int64_t? dim=None) -> Tensor
variants: function
```

~~cc: zou3519~~

Edit: implemented in https://github.com/pytorch/pytorch/pull/15234

Previously tried in https://github.com/pytorch/pytorch/pull/12064. There was a problem that C++ does not have kwarg support, which makes it confusing to know whether `unique(t, 1)` actually means `unique(t, dim=1)` or `unique(t, sorted=1)`.

Now I think I have a better idea on how to implement this: there are two ATen operators: `unique` and `unique_dim`. `unique` has the same signature as in python, and exported to both python and C++. `unique_dim` has signature `unique_dim(tensor, dim, sorted=False, return_inverse=False)`, and only exported to C++, which could be used more naturally for a C++ user.

Differential Revision: D13540278

Pulled By: wanchaol

fbshipit-source-id: 3768c76a90b0881f565a1f890459ebccbdfe6ecd