platform/upstream/pytorch.git
5 years agoMove abs, frac, reciprocal, and neg to TensorIterator (#19041)
James Reed [Wed, 10 Apr 2019 04:48:49 +0000 (21:48 -0700)]
Move abs, frac, reciprocal, and neg to TensorIterator (#19041)

Summary:
I've been messing around with vectorizing the fusion compiler in JIT, and noticed that these ops were pathologically slow. I moved them to use TensorIterator + Vec256<> and got some speed wins.

Benchmark script:

```
import torch, time

ops = ['abs', 'neg', 'reciprocal', 'frac']

x = torch.rand(1024, 1024)
NITER = 10000

print('op', 'time per iter (ms)', 'gops/s', 'GB/s', sep='\t')

for op in ops:
    s = time.time()
    for i in range(NITER):
        getattr(x, op)()
    elapsed_sec = ((time.time() - s) / NITER)
    print(op, elapsed_sec * 1000, (1024*1024/elapsed_sec)/1e9, (1024*1024*4*2) / elapsed_sec / 1e9, sep='\t')

```

Before this change (on my mac with a skylake):
```
op      time per iter (ms)      gops/s  GB/s
abs     0.9730974197387695      1.0775652866097343      8.620522292877874
neg     1.0723679780960083      0.9778136063534356      7.822508850827485
reciprocal      1.2610594034194946      0.8315040490215421      6.6520323921723366
frac    1.1681334018707275      0.8976509004200546      7.181207203360437
```

After this change:
```
op      time per iter (ms)      gops/s  GB/s
abs     0.5031076192855835      2.084198210889721       16.673585687117768
neg     0.4433974027633667      2.3648672578256087      18.91893806260487
reciprocal      0.47145988941192624     2.2241043693195985      17.79283495455679
frac    0.5036592721939087      2.0819154096627024      16.65532327730162
```

So, after this change it looks like we are hitting machine peak for bandwidth and are bandwidth bound.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19041

Differential Revision: D14862037

Pulled By: jamesr66a

fbshipit-source-id: e2032ac0ca962dbf4120bb36812277c260e22912

5 years agoFix aten op output assignment (#18581)
Wanchao Liang [Wed, 10 Apr 2019 04:33:54 +0000 (21:33 -0700)]
Fix aten op output assignment (#18581)

Summary:
Fixes the problem of #18391

The issue is that when we code gen the ATenOp, we always generated static number of outputs for each operator. E.g. If there's operator from a old model that only requires two outputs, in its createOperator it will only allocate two output blobs, while the newer version of the operator (`unique` in this case) requires more output blob to be allocated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18581

Differential Revision: D14865647

Pulled By: wanchaol

fbshipit-source-id: 85f63fe16d6fe408a09eca84798c7e8cab3070e9

5 years agoEmbeddingBag w/ differentiable per_sample_weights (#18957)
Richard Zou [Wed, 10 Apr 2019 01:09:01 +0000 (18:09 -0700)]
EmbeddingBag w/ differentiable per_sample_weights (#18957)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18957
ghimport-source-id: 7396ca08b137ea40f04285764a9d9a6d4f19227e

Reviewed By: cpuhrsch

Differential Revision: D14856526

Pulled By: zou3519

fbshipit-source-id: 949faea219c7c02ad981b1db610a477194d3f5c9

5 years agoEmbeddingBag w/ per_sample_weights CUDA fwd + bwd (#18800)
Richard Zou [Wed, 10 Apr 2019 01:08:59 +0000 (18:08 -0700)]
EmbeddingBag w/ per_sample_weights CUDA fwd + bwd (#18800)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18800
ghimport-source-id: 17f638dea0e1ac9a86ec06b223c60362ed78449c

Reviewed By: cpuhrsch

Differential Revision: D14851422

Pulled By: zou3519

fbshipit-source-id: 27b114e51e66112e4bc9cfc63d1d1ddfa650d347

5 years agoEmbeddingBag w/ per_sample_weights CPU backward (#18799)
Richard Zou [Wed, 10 Apr 2019 01:08:59 +0000 (18:08 -0700)]
EmbeddingBag w/ per_sample_weights CPU backward (#18799)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18799
ghimport-source-id: 58a6f629e890449013f24a9b6282664ca2a1e3ba

Reviewed By: cpuhrsch

Differential Revision: D14851417

Pulled By: zou3519

fbshipit-source-id: c36b9d469989354bf6cef1c2c3dc4f13e7cb1a25

5 years agoEmbeddingBag CPU forward with per_sample_weights. (#18735)
Richard Zou [Wed, 10 Apr 2019 01:08:59 +0000 (18:08 -0700)]
EmbeddingBag CPU forward with per_sample_weights. (#18735)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18735
ghimport-source-id: d81bef54dafd7167d2451250d7be478d3c013920

Reviewed By: cpuhrsch

Differential Revision: D14851415

Pulled By: zou3519

fbshipit-source-id: cea6039e760ad571b90f0a536e420498f34be325

5 years agoRefactor CPU embedding_bag implementation (#18734)
Richard Zou [Wed, 10 Apr 2019 01:08:59 +0000 (18:08 -0700)]
Refactor CPU embedding_bag implementation (#18734)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18734
ghimport-source-id: e0e50d4b47f2fb8c86e464aacb950521d601f8d3

Reviewed By: cpuhrsch

Differential Revision: D14851413

Pulled By: zou3519

fbshipit-source-id: 8ac4e4de590a363e9807dc552fe4ca52b92652ed

5 years agoMake BlackBoxPredictor handle networks throwing exceptions (#19080)
Alexander Sidorov [Tue, 9 Apr 2019 23:32:52 +0000 (16:32 -0700)]
Make BlackBoxPredictor handle networks throwing exceptions (#19080)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19080

OSS: add a tiny unit test utility function to create tensors given shape and data outside of any workspace. I use it in an internal test

Reviewed By: dzhulgakov

Differential Revision: D14814194

fbshipit-source-id: 6d53b235d99a97da812215f5c7f11fecad363c8c

5 years agoRemind users to set map_location properly when using DDP
Shen Li [Tue, 9 Apr 2019 23:11:05 +0000 (16:11 -0700)]
Remind users to set map_location properly when using DDP

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19084

Differential Revision: D14861702

Pulled By: mrshenli

fbshipit-source-id: 10ca4a9b41e707050a6bce228ccca4177c9fa4a6

5 years agoRename btrisolve to lu_solve (#18726)
Vishwak Srinivasan [Tue, 9 Apr 2019 22:15:06 +0000 (15:15 -0700)]
Rename btrisolve to lu_solve (#18726)

Summary:
Changelog:
- Rename `btrisolve` to `lu_solve` to remain consistent with names of solve methods (`cholesky_solve`, `triangular_solve`, `solve`)
- Fix all callsites
- Rename all tests
- Create a tentative alias for `lu_solve` under the name `btrisolve` and add a deprecation warning to not promote usage
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18726

Differential Revision: D14726237

Pulled By: zou3519

fbshipit-source-id: bf25f6c79062183a4153015e0ec7ebab2c8b986b

5 years agoAvoid calling tensor.data.set_() in DDP
Shen Li [Tue, 9 Apr 2019 21:10:04 +0000 (14:10 -0700)]
Avoid calling tensor.data.set_() in DDP

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18961

Differential Revision: D14811208

Pulled By: mrshenli

fbshipit-source-id: c1c46dfa13e0a6ec83aefd35696ee31a7ea3d810

5 years agoReapply Wrap workaround for cpp custom types a bit prettier and add an example" ...
Dmytro Dzhulgakov [Tue, 9 Apr 2019 19:13:41 +0000 (12:13 -0700)]
Reapply Wrap workaround for cpp custom types a bit prettier and add an example" (#19062)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19062

As a temporary demonstration on how to extend this hack further until custom C types are ready.

Reviewed By: ezyang

Differential Revision: D14817809

fbshipit-source-id: 6eaf731e9135313eb858e178abcd9f25380ab8fe

5 years agoPropagate ProcessGroup timeout to Store (#16571)
Shen Li [Tue, 9 Apr 2019 19:06:04 +0000 (12:06 -0700)]
Propagate ProcessGroup timeout to Store (#16571)

Summary:
closes #16520

Hi pietern, I am not sure if this is the expected way to pass timeout to `Store`, could you please help take a look? Thanks!

Questions:
1. How do I write tests for this? I wanted to do something like `test_barrier_timeout_global`, but it seems I need to set the pg's timeout larger than the `Store`'s default timeout (3 min) to see a difference, which is too long for a unit test. And I do not want to change the `Store`'s default timeout either. Any suggestion?
2. Should I also propagate timeout configuration down to `PrefixStore` in `_new_process_group_helper`?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16571

Differential Revision: D13954527

Pulled By: mrshenli

fbshipit-source-id: 77f2653903f24255207233eb298f7c0321119a87

5 years agomake test_jit_fuser runnable
Wanchao Liang [Tue, 9 Apr 2019 18:53:23 +0000 (11:53 -0700)]
make test_jit_fuser runnable

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19036

Differential Revision: D14839800

Pulled By: wanchaol

fbshipit-source-id: b52c131b58e1b42a8c3da5d1117217c3dc2e5f5b

5 years agoFix documentation for unfold(dimension=..., ...), fixes #18793 (#19020)
Edward Yang [Tue, 9 Apr 2019 18:48:56 +0000 (11:48 -0700)]
Fix documentation for unfold(dimension=..., ...), fixes #18793 (#19020)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19020
ghimport-source-id: 8f31e51b79daba11939aa7992450984054713b9c

Differential Revision: D14851890

Pulled By: ezyang

fbshipit-source-id: 8498e86a63633fdfd9ecae9b7f85b773b75fe27a

5 years agoDebugging: Increase process reporting for apt/dpkg. (#18880)
Edward Yang [Tue, 9 Apr 2019 18:34:37 +0000 (11:34 -0700)]
Debugging: Increase process reporting for apt/dpkg. (#18880)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18880
ghimport-source-id: b43a33c12df379ec75c1fd4c713c1fc723a763e1

Differential Revision: D14856296

Pulled By: ezyang

fbshipit-source-id: 30691eb14dddfe998b2605b416aaa1b14d1b6ad5

5 years agoAdd torch.__config__.show(), reporting detailed version of all libraries. (#18579)
Edward Yang [Tue, 9 Apr 2019 18:09:31 +0000 (11:09 -0700)]
Add torch.__config__.show(), reporting detailed version of all libraries. (#18579)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18579
ghimport-source-id: 65124c95e49423de4ad1008c65e75057fea09b94

Differential Revision: D14778507

Pulled By: ezyang

fbshipit-source-id: 1e4bb79f4800a116ce8fb7af2fefbd34da8d102c

5 years agoFix torch::nn::init::orthogonal_ with CNNs (#18915)
Omegastick [Tue, 9 Apr 2019 17:36:13 +0000 (10:36 -0700)]
Fix torch::nn::init::orthogonal_ with CNNs (#18915)

Summary:
Fixes #18518

I changed the C++ API torch::nn::init::orthogonal_ implementation to match the Python implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18915

Differential Revision: D14851833

Pulled By: ezyang

fbshipit-source-id: 45b5e9741582777c203e9ebed564ab3ac1f94baf

5 years agomove nightlies to 1.1.0xxx
Soumith Chintala [Tue, 9 Apr 2019 17:09:58 +0000 (10:09 -0700)]
move nightlies to 1.1.0xxx

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19069

Differential Revision: D14854600

Pulled By: soumith

fbshipit-source-id: 85c703bddbd47c1b3914d58ab9521ed22ddeb62a

5 years agoadd an utility function to check whether it's in the middle of onnx export or not
Lu Fang [Tue, 9 Apr 2019 17:01:48 +0000 (10:01 -0700)]
add an utility function to check whether it's in the middle of onnx export or not

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19050

Reviewed By: yinghai

Differential Revision: D14849878

Pulled By: houseroad

fbshipit-source-id: a0a4a57f5f9f315ba1334edfccc9284a8099d17f

5 years agoremove interned_string.h dep (#19061)
Lu Fang [Tue, 9 Apr 2019 16:56:34 +0000 (09:56 -0700)]
remove interned_string.h dep (#19061)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19061

remove the deps on interned_string.h

Reviewed By: BIT-silence

Differential Revision: D14850078

fbshipit-source-id: 07e6ad72a7de369049ea56f32b72276fb4c59b32

5 years agoadd logging to make the saving action visible (#19042)
Liang Xiong [Tue, 9 Apr 2019 16:25:03 +0000 (09:25 -0700)]
add logging to make the saving action visible (#19042)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19042

show the model saving step in the log.

Reviewed By: kennyhorror

Differential Revision: D14809385

fbshipit-source-id: c7a1e50ff92bb45b16b1c501d9325b304b07fbd3

5 years agoNamedtuple return for gels, triangular_solve, and test refactor (#17195)
Xiang Gao [Tue, 9 Apr 2019 16:10:42 +0000 (09:10 -0700)]
Namedtuple return for gels, triangular_solve, and test refactor (#17195)

Summary:
Partial fix of: https://github.com/pytorch/pytorch/issues/394
- `gels` and `triangular_solve` now returns namedtuple
- refactor test for namedtuple API for better coverage and maintainability
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17195

Differential Revision: D14851875

Pulled By: ezyang

fbshipit-source-id: 9b2cba95564269d2c3a15324ba48751d68ed623c

5 years agoConvert all tabs to spaces, add CI. (#18959)
Edward Yang [Tue, 9 Apr 2019 15:02:30 +0000 (08:02 -0700)]
Convert all tabs to spaces, add CI. (#18959)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18959
ghimport-source-id: a934163fa34cb2019732d5f49dc7290c376bf156

Differential Revision: D14831246

Pulled By: ezyang

fbshipit-source-id: beb92dc4ee8c82f4c8259c081dd72e477fe7a9d0

5 years agoFix BN tests for >= 8 GPU test environments (#19049)
Shen Li [Tue, 9 Apr 2019 15:01:18 +0000 (08:01 -0700)]
Fix BN tests for >= 8 GPU test environments (#19049)

Summary:
DDP does not support replicating BN layers within a process. Existing BN tests fail if the test environment has more than 8 GPUs. This is fixed by explicitly setting each process to use a single replica.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19049

Differential Revision: D14845286

Pulled By: mrshenli

fbshipit-source-id: 937dda5081d415ece48b21f2781b6b4e008dd42f

5 years agodo not use constexpr with CUDA >= 9.2 compiler on Windows. (#18986)
Shuichi KITAGUCHI [Tue, 9 Apr 2019 15:00:18 +0000 (08:00 -0700)]
do not use constexpr with CUDA >= 9.2 compiler on Windows. (#18986)

Summary:
Define `AT_CPP14_CONSTEXPR` from `constexpr` to empty on Windows with CUDA >= 9.2 as workaround.

Discussed in #18425.

When using CUDA 10.1 on Windows, I faced following errors:
~~~
D:/data/source/pytorch\c10/util/ArrayRef.h(144): error: variable in constexpr function does not have automatic storage duration
          detected during instantiation of "const T &c10::ArrayRef<T>::front() const [with T=at::Tensor]"
D:/data/source/pytorch/aten/src\ATen/DeviceGuard.h(30): here
~~~

From documentation of CUDA Toolkit v10.1.105, compiler supports `constexpr` and relaxing requirements (in C++14), but compilation failed.

I suppose this could be compiler bug and require this workaround.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18986

Differential Revision: D14821836

Pulled By: ezyang

fbshipit-source-id: 9800da2fe7291e7c09e8e5e882adebab08d83ae3

5 years agoAdd torch/lib/protobuf to gitignore, fixes #18700 (#19019)
Edward Yang [Tue, 9 Apr 2019 14:29:42 +0000 (07:29 -0700)]
Add torch/lib/protobuf to gitignore, fixes #18700 (#19019)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19019
ghimport-source-id: 84d36f8d27912d1d094d5672154b82187dd88761

Differential Revision: D14846615

Pulled By: ezyang

fbshipit-source-id: e402557ec321c85be3b28c8602b680246c8eecfe

5 years agoAutomatic update of fbcode/onnx to 971311db58f2fa8306d15e1458b5fd47dbc8d11c (#19046)
Lu Fang [Tue, 9 Apr 2019 06:12:58 +0000 (23:12 -0700)]
update of fbcode/onnx to 971311db58f2fa8306d15e1458b5fd47dbc8d11c (#19046)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19046

Previous import was 079c2639f9bb79b1774d1e3bfa05b0c093816ca7

Included changes:
- **[971311db](https://github.com/onnx/onnx/commit/971311db)**: use ONNX_NAMESPACE::to_string instead of std::to_string (#1915) <Lu Fang>
- **[65227446](https://github.com/onnx/onnx/commit/65227446)**: Remove all the experimental ops (#1909) <Lu Fang>
- **[bdb28f29](https://github.com/onnx/onnx/commit/bdb28f29)**: opset converter backward compatibility support for opset versions 9 and 8 (#1847) <Peyman Manikashani>
- **[47692338](https://github.com/onnx/onnx/commit/47692338)**: Create CODEOWNERS for automatic reviewer assignment for PRs (#1910) <Prasanth Pulavarthi>
- **[8121c731](https://github.com/onnx/onnx/commit/8121c731)**: Revert "quantization support in onnx (#1872)" (#1911) <Lu Fang>
- **[4cfa5426](https://github.com/onnx/onnx/commit/4cfa5426)**: quantization support in onnx (#1872) <Ke Zhang>
- **[030bbb80](https://github.com/onnx/onnx/commit/030bbb80)**: Update LICENSE formatting and clarify # of WG chairs (#1907) <Prasanth Pulavarthi>

Reviewed By: yinghai

Differential Revision: D14843284

fbshipit-source-id: 96c1c79abb62beff227a9fc8b2af9382c4673755

5 years agoFix default CXX for Windows in cpp_extensions.py (#19052)
peter [Tue, 9 Apr 2019 06:10:51 +0000 (23:10 -0700)]
Fix default CXX for Windows in cpp_extensions.py (#19052)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/19017.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19052

Differential Revision: D14846702

Pulled By: soumith

fbshipit-source-id: b0e4dadaa749da0fa2d0405a1a064820d094220a

5 years agofix the onnx ci
Lu Fang [Tue, 9 Apr 2019 06:03:56 +0000 (23:03 -0700)]
fix the onnx ci

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19048

Reviewed By: yinghai

Differential Revision: D14844917

Pulled By: houseroad

fbshipit-source-id: 30719e05a443981284dedf34a9e51213271aa934

5 years agoAdd gelu op (#18992)
Xiaomeng Yang [Tue, 9 Apr 2019 04:55:43 +0000 (21:55 -0700)]
Add gelu op (#18992)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18992

Add gelu op

Reviewed By: houseroad

Differential Revision: D14814811

fbshipit-source-id: 00f126b8b83763c57ebbf28fbd2de5a8fab6d491

5 years agoAdd MKL-DNN Tensor (#17748)
jgong5 [Tue, 9 Apr 2019 04:30:50 +0000 (21:30 -0700)]
Add MKL-DNN Tensor (#17748)

Summary:
This is a minimalist PR to add MKL-DNN tensor per discussion from Github issue: https://github.com/pytorch/pytorch/issues/16038

Ops with MKL-DNN tensor will be supported in following-up PRs to speed up imperative path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17748

Reviewed By: dzhulgakov

Differential Revision: D14614640

Pulled By: bddppq

fbshipit-source-id: c58de98e244b0c63ae11e10d752a8e8ed920c533

5 years agodetect C++ ABI flag for cpp extensions from available runtime information (#18994)
Soumith Chintala [Tue, 9 Apr 2019 00:43:57 +0000 (17:43 -0700)]
detect C++ ABI flag for cpp extensions from available runtime information (#18994)

Summary:
Previously, when a user built PyTorch from source, but set the version string manually to be binary-formatted, it would've simply used CXX11_ABI=0 incorrectly.

We have this information available at runtime with `torch._C._GLIBCXX_USE_CXX11_ABI`, so this PR improves the situation by simply using that information.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18994

Differential Revision: D14839393

Pulled By: soumith

fbshipit-source-id: ca92e0810b29ffe688be82326e02a64a5649a3ad

5 years agoFix momentum setting in BatchNorm forward pass. (#18764)
Spandan Tiwari [Mon, 8 Apr 2019 23:21:30 +0000 (16:21 -0700)]
Fix momentum setting in BatchNorm forward pass. (#18764)

Summary:
This is a fix for issue https://github.com/pytorch/pytorch/issues/18525. The issue is related not only to ONNX export, but can manifest in other scenarios.
An existing test point in test/onnx/test_operators.py has been updated to cover this scenario as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18764

Reviewed By: zrphercule

Differential Revision: D14735166

Pulled By: houseroad

fbshipit-source-id: 5a737c648f64355929ff31eb12bd4869e744768d

5 years agoadd android build workflow to pytorch CI jobs (#18919)
Jiakai Liu [Mon, 8 Apr 2019 23:19:51 +0000 (16:19 -0700)]
add android build workflow to pytorch CI jobs (#18919)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18919
ghimport-source-id: 3f0ce4334c899d262403d88bd8bd7513e99570f0

Reviewed By: kostmo

Differential Revision: D14800728

Pulled By: ljk53

fbshipit-source-id: fec2e34c192181b8fa31c9a30f60c9bf7388f083

5 years agoExport C10 operator in PyTorch Model (#18210)
Lu Fang [Mon, 8 Apr 2019 23:01:30 +0000 (16:01 -0700)]
Export C10 operator in PyTorch Model (#18210)

Summary:
Almost there, feel free to review.

these c10 operators are exported to _caffe2 domain.

TODO:

- [x] let the onnx checker pass
- [x] test tensor list as argument
- [x] test caffe2 backend and converter
- [x] check the c10 schema can be exported to onnx
- [x] refactor the test case to share some code
- [x] fix the problem in ONNX_ATEN_FALLBACK
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18210

Reviewed By: zrphercule

Differential Revision: D14600916

Pulled By: houseroad

fbshipit-source-id: 2592a75f21098fb6ceb38c5d00ee40e9e01cd144

5 years agoFix interpolate tracing (#19034)
Zachary DeVito [Mon, 8 Apr 2019 21:56:26 +0000 (14:56 -0700)]
Fix interpolate tracing (#19034)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19034
ghimport-source-id: 874e0b0a8685184416152a77fc1850d9a06516ae

Differential Revision: D14837282

Pulled By: zdevito

fbshipit-source-id: b0ed82b607c288a54eecec3d6ed62c4626e5a563

5 years agoFix default dtype in shape analysis (#18968)
Elias Ellison [Mon, 8 Apr 2019 21:44:45 +0000 (14:44 -0700)]
Fix default dtype in shape analysis (#18968)

Summary:
Fix for https://github.com/pytorch/pytorch/issues/18823

Previously we were setting the dtype to Float when in torchscript the default is double. When the problem in https://github.com/pytorch/pytorch/issues/17662 gets landed, we will have to reevalute (and this test will fail).

We should still be consistent in shape_analysis in the meantime.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18968

Differential Revision: D14837939

Pulled By: eellison

fbshipit-source-id: 32383b55c14bdc7753e26dec33c39ab10124c255

5 years agoRenamed bool tensors into byte tensors (#19021)
Iurii Zdebskyi [Mon, 8 Apr 2019 20:46:52 +0000 (13:46 -0700)]
Renamed bool tensors into byte tensors (#19021)

Summary:
Renamed bool tensors into byte tensors to represent the correct type in generated code
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19021

Differential Revision: D14835188

Pulled By: izdeby

fbshipit-source-id: 0252d2c69dab35ac2f076cf9a87423463e902c76

5 years agoHandle None indexing in TorchScript (#18615)
Thomas Viehmann [Mon, 8 Apr 2019 20:35:34 +0000 (13:35 -0700)]
Handle None indexing in TorchScript (#18615)

Summary:
t[None], t[None, 1:, None] and friends for unsqueezing

Fixes: #12810
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18615

Differential Revision: D14837039

Pulled By: wanchaol

fbshipit-source-id: ab3862c41629f087b0a46b7c59c93dac4018e6fe

5 years agoTurn on mkldnn in most builds except rocm
Junjie Bai [Mon, 8 Apr 2019 20:09:11 +0000 (13:09 -0700)]
Turn on mkldnn in most builds except rocm

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18965

Differential Revision: D14836931

Pulled By: bddppq

fbshipit-source-id: 463a9bc5043a1f3194158f7bbfae3b71c6cd4b20

5 years agoRemove dead code in module.cpp (#19022)
David Riazati [Mon, 8 Apr 2019 20:01:09 +0000 (13:01 -0700)]
Remove dead code in module.cpp (#19022)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19022
ghimport-source-id: cdf694c1b426eb9f82d4c148c9f2c2cfc180cedd

Reviewed By: eellison

Differential Revision: D14833409

Pulled By: driazati

fbshipit-source-id: 8914c7227add7f3e07f56b21a513ba7727fb6800

5 years agoConvert test_recursive_cse to use Filecheck inline annotations. (#19032)
Mikhail Zolotukhin [Mon, 8 Apr 2019 19:22:52 +0000 (12:22 -0700)]
Convert test_recursive_cse to use Filecheck inline annotations. (#19032)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19032
ghimport-source-id: 58a146542deb08dd3057d099167ba530a5e51400

Differential Revision: D14836689

Pulled By: ZolotukhinM

fbshipit-source-id: e65ca5f09193eb7c16c204aedd50c474ea31210c

5 years agoAdd a document 'How to Write Tests Using FileCheck' (#19005)
Mikhail Zolotukhin [Mon, 8 Apr 2019 19:06:55 +0000 (12:06 -0700)]
Add a document 'How to Write Tests Using FileCheck' (#19005)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19005
ghimport-source-id: f9c3eff54adc8eef3ead2c77be62c44d88d22a00

Differential Revision: D14826845

Pulled By: ZolotukhinM

fbshipit-source-id: 62cc3657ee89acc979403da15e39bd4cd09a866d

5 years agocaffe2 - Expose tensor filler util to Python (#18886)
Duc Ngo [Mon, 8 Apr 2019 18:48:42 +0000 (11:48 -0700)]
caffe2 - Expose tensor filler util to Python (#18886)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18886

Expose tensor filler util to Python and add a unit test (both C++/Python)

Reviewed By: salexspb

Differential Revision: D14784470

fbshipit-source-id: bb8e013d1755c27c166e87d5a8491a97c65d3d8d

5 years agocall build_android.sh from pytorch CI build script (#18918)
Jiakai Liu [Mon, 8 Apr 2019 17:54:59 +0000 (10:54 -0700)]
call build_android.sh from pytorch CI build script (#18918)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18918
ghimport-source-id: 98c63da263adbbc6ac74a69ac117740c852833cd

Reviewed By: dreiss

Differential Revision: D14800727

Pulled By: ljk53

fbshipit-source-id: 4d06f845bb34bcdb74b0602404f2a0782f8c8783

5 years agoType annotations for `util.data`. (#18963)
Jon Malmaud [Mon, 8 Apr 2019 16:45:49 +0000 (09:45 -0700)]
Type annotations for `util.data`. (#18963)

Summary:
I haven't had a chance to rigorously try these out yet so don't merge yet.
Closes #18725.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18963

Differential Revision: D14832897

Pulled By: ezyang

fbshipit-source-id: 4780e7a34126bc66ddbfd9d808dfc9e0edd77e68

5 years agoifdef guard some explicit pragma unrolls (#19018)
Johannes M Dieterich [Mon, 8 Apr 2019 16:44:08 +0000 (09:44 -0700)]
ifdef guard some explicit pragma unrolls (#19018)

Summary:
the ROCm compiler cannot and will not satisfy them, causing compile time warnings.

Reason being a runtime loop trip count.

Some warnings remain arising from other parts of the ROCm stack - tickets are filed and they will be resolved within these components.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19018

Differential Revision: D14832859

Pulled By: ezyang

fbshipit-source-id: 0d66e4aebe4e56af14dd5e2967d3c374a82be25c

5 years agoFix a dev mode bug in activation distribution observer (#19004)
Summer Deng [Mon, 8 Apr 2019 16:26:37 +0000 (09:26 -0700)]
Fix a dev mode bug in activation distribution observer (#19004)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19004

Handling the exception case when the data has min 3.40282e+38 max -3.40282e+38

Reviewed By: jspark1105

Differential Revision: D14822193

fbshipit-source-id: b9771d1584fdf8317f5b8c7f5806be5d27314386

5 years agoClean up some sparse code. (#18962)
Gregory Chanan [Mon, 8 Apr 2019 15:10:19 +0000 (08:10 -0700)]
Clean up some sparse code. (#18962)

Summary:
1) sparse_dispatches in native_parse was not used anymore, got rid of it.
2) got rid of overloaded sizes_ in SparseTensorImpl, which just uses the base implementation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18962

Differential Revision: D14811545

Pulled By: gchanan

fbshipit-source-id: 2fa60ef50456b5f605caa63beae1d8d2542fd527

5 years agoRemove tensorWithAllocator() from Type (#18780)
Roy Li [Mon, 8 Apr 2019 06:56:02 +0000 (23:56 -0700)]
Remove tensorWithAllocator() from Type (#18780)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18780
ghimport-source-id: 7d18a11ce87d988bd32f6ebb96acd878ab8d61be

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18780 Remove tensorWithAllocator() from Type**
* #18779 Remove tensorFromBlob() from Type

Differential Revision: D14739336

fbshipit-source-id: 429ab10bb9f6ac9f97b5a11c7a836b6b6336cb2d

5 years agoFix sparse mm for ROCm (#18985)
Johannes M Dieterich [Mon, 8 Apr 2019 01:13:33 +0000 (18:13 -0700)]
Fix sparse mm for ROCm (#18985)

Summary:
* Annotate also two pass reduction with launch bounds
* ifdef some shortcomings of ROCm w.r.t. short-circuit returns - internal tickets filed
* while there, plug memory leak by destroying matrix descriptor after the sparse call (applicable to cuSPARSE)
* while there, fix types for cusparseXcoo2csr as per cuSPARSE documentation
* enable test_dsmm in test_sparse which now passes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18985

Differential Revision: D14822009

Pulled By: bddppq

fbshipit-source-id: 757267a47a63ee56ef396c33059f7eca099f4833

5 years agoCheck if profiler is disabled in push/pop event (#18908)
Ilia Cherniavskii [Sun, 7 Apr 2019 22:03:21 +0000 (15:03 -0700)]
Check if profiler is disabled in push/pop event (#18908)

Summary:
Make sure to check if profiler is disabled in push/pop and mark event
functions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18908

Differential Revision: D14791931

Pulled By: ilia-cher

fbshipit-source-id: e4f5149e69999ee2b9238c21cccad6d27c6a714a

5 years agoImplement Observer pass on simple model and validate stats (#18848)
Nishant Pandit [Sun, 7 Apr 2019 16:09:33 +0000 (09:09 -0700)]
Implement Observer pass on simple model and validate stats (#18848)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18848

The Observer Module is based on eager mode compute qparam implementation.
Goal is to validate QParam result for EagerMode and Script Mode for simple
model

Observer stats are collected and qparam computed only for activations only at this point

Reviewed By: zafartahirov

Differential Revision: D14720805

fbshipit-source-id: cb2f321b4b9927b37905fdb8eb55c5610d41b351

5 years agoAVX2 with GCC9 fix. (#18991)
Balint Cristian [Sun, 7 Apr 2019 15:23:10 +0000 (08:23 -0700)]
AVX2 with GCC9 fix. (#18991)

Summary:
Dear All,

The proposed patch fixes the test code snippets used in cmake infrastructure, and implicit failure to set properly the ```CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS``` flag. The libcaffe2.so will have some ```UND``` avx2 related references, rendering it unusable.

* Using GCC 9 test code from cmake build infra always fails:
```
$ gcc  -O2 -g -pipe -Wall -m64 -mtune=generic -fopenmp -DCXX_HAS_AVX_1 -fPIE -o test.o -c test.c -mavx2
test.c: In function â€˜main’:
test.c:11:26: error: incompatible type for argument 1 of â€˜_mm256_extract_epi64’
   11 |     _mm256_extract_epi64(x, 0); // we rely on this in our AVX2 code
      |                          ^
      |                          |
      |                          __m256 {aka __vector(8) float}
In file included from /usr/lib/gcc/x86_64-redhat-linux/9/include/immintrin.h:51,
                 from test.c:4:
/usr/lib/gcc/x86_64-redhat-linux/9/include/avxintrin.h:550:31: note: expected â€˜__m256i’ {aka â€˜__vector(4) long long int’} but argument is of type â€˜__m256’ {aka â€˜__vector(8) float’}
  550 | _mm256_extract_epi64 (__m256i __X, const int __N)
      |

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/9/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 9.0.1 20190328 (Red Hat 9.0.1-0.12) (GCC)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18991

Differential Revision: D14821838

Pulled By: ezyang

fbshipit-source-id: 7eb3a854a1a831f6fda8ed7ad089746230b529d7

5 years agoRemove tensorFromBlob() from Type (#18779)
Roy Li [Sun, 7 Apr 2019 08:35:11 +0000 (01:35 -0700)]
Remove tensorFromBlob() from Type (#18779)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18779
ghimport-source-id: e7453b74fcce0e4f4a9cbce0324992a85272a426

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18780 Remove tensorWithAllocator() from Type
* **#18779 Remove tensorFromBlob() from Type**

Differential Revision: D14739335

fbshipit-source-id: 8a0619a5b412332efa3b2d60c1edebd53d089d50

5 years agoImprove precision of emitted code for prim::Constant (#18817)
James Reed [Sun, 7 Apr 2019 07:15:42 +0000 (00:15 -0700)]
Improve precision of emitted code for prim::Constant (#18817)

Summary:
Stacked on https://github.com/pytorch/pytorch/pull/18815 and https://github.com/pytorch/pytorch/pull/18811.

This makes it so that we emit a higher-precision literal for float values in the fusion kernel, as well as assign that to a `double` variable. This prevents us from losing precision for values such as `pi`, but with the previous fixes this will also get downcasted to `float` if downstream operations require it. Therefore, we should not lose performance because of implicit promotions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18817

Differential Revision: D14820842

Pulled By: jamesr66a

fbshipit-source-id: 519671c6ca5e7adac746a4c4c72760a6d91e332f

5 years agoconvert_sync_batch_norm to SyncBatchNorm (#18787)
Arunava [Sun, 7 Apr 2019 07:07:24 +0000 (00:07 -0700)]
convert_sync_batch_norm to SyncBatchNorm (#18787)

Summary:
Closes #18382

Please let me know if any changes are required.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18787

Differential Revision: D14821147

Pulled By: soumith

fbshipit-source-id: edd98eab1b3f4151c4ae5148146435ddb2ae678d

5 years agofix bug when falling back to acc32 when weight is prepacked (#18974)
Summer Deng [Sun, 7 Apr 2019 04:50:28 +0000 (21:50 -0700)]
fix bug when falling back to acc32 when weight is prepacked (#18974)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18974

When the weight is prepacked and it doesn't contain a prepacked weight for acc32, we shouldn't fallback to acc32.

Reviewed By: bddppq

Differential Revision: D14814067

fbshipit-source-id: aec917322de695e283f0aca1e930c5603d196404

5 years agomove 2ops back to autodiff (#18969)
Ailing Zhang [Sun, 7 Apr 2019 04:36:22 +0000 (21:36 -0700)]
move 2ops back to autodiff (#18969)

Summary:
Move these 2 ops back to autodiff to unblock xla CI.
I will leave them for my next PR to cleanup symbolic_variable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18969

Differential Revision: D14816811

Pulled By: ailzhang

fbshipit-source-id: dd8a7e133dcad29560d3d1d25691883960117299

5 years agoPreserve naming for inputs/outputs with observer insertion (#18713)
Nishant Pandit [Sun, 7 Apr 2019 03:56:17 +0000 (20:56 -0700)]
Preserve naming for inputs/outputs with observer insertion (#18713)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18713

  - Quantizer observer node output is hooked up to following node
which mutates the naming for input/output. This is not desired and
required because observer op can be a sync node

  - Quantizer is aimed for quantizing tensors so we should insert observer
op for Values that are type tensor

Reviewed By: zafartahirov

Differential Revision: D14715916

fbshipit-source-id: feca04c65a43103b46084d3548998498b19ee599

5 years agoEmit math functions specific to output type (#18815)
James Reed [Sun, 7 Apr 2019 00:44:53 +0000 (17:44 -0700)]
Emit math functions specific to output type (#18815)

Summary:
Stacked on https://github.com/pytorch/pytorch/pull/18811

This makes it so that we only emit the *f variants of math functions if the output value's type is FloatTensor, otherwise we call the double variants to prevent loss of precision. This fixes more numerical issues
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18815

Differential Revision: D14816965

Pulled By: jamesr66a

fbshipit-source-id: 464be644168875ede987142281fb2168f4041e81

5 years agoadd instructions for NVIDIA Jetson platforms (#18990)
Soumith Chintala [Sat, 6 Apr 2019 19:36:58 +0000 (12:36 -0700)]
add instructions for NVIDIA Jetson platforms (#18990)

Summary:
Thanks to dusty-nv , we now have Stable and Weekly wheels provided for the NVIDIA Jetson Platform. They require JetPack 4.2.

He's also maintaining source build instructions.

This PR adds links to the binaries and source build instructions to the README.

The links are dynamic, so when new stable / weekly wheels are available, Dustin will update the same URL to point to the new files
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18990

Differential Revision: D14820158

Pulled By: soumith

fbshipit-source-id: 761a56557decb72ad9c1b9f8a2745667f558eec3

5 years agoQuantizer pass to insert quant-dequant nodes into IR (#18446)
Nishant Pandit [Sat, 6 Apr 2019 19:34:33 +0000 (12:34 -0700)]
Quantizer pass to insert quant-dequant nodes into IR (#18446)

Summary:
- Quantizer pass to mutate IR by inserting quant-dequant nodes
before and after nodes which support quantized ops. This information
will be used by jit compiler to substitute with quantized ops

- This currently covers simple model. It will be expanded later
for subgraph pattern matching to cover more complex patterns
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18446

Differential Revision: D14592265

Pulled By: nishantpdce

fbshipit-source-id: c9ba6c12aa96cb9c117826e386721eec83a55ea6

5 years agoadd SyncBatchNorm to docs (#18988)
Soumith Chintala [Sat, 6 Apr 2019 18:37:41 +0000 (11:37 -0700)]
add SyncBatchNorm to docs (#18988)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/18983
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18988

Differential Revision: D14820042

Pulled By: soumith

fbshipit-source-id: 356169f554a42303b266d700d3379a5288f9671d

5 years agoAdd c10_cuda to libraries in CUDAExtension for Windows (#18982)
mooncake4132 [Sat, 6 Apr 2019 17:25:56 +0000 (10:25 -0700)]
Add c10_cuda to libraries in CUDAExtension for Windows (#18982)

Summary:
This change was necessary for me to compile [apex](https://github.com/NVIDIA/apex) on Windows.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18982

Differential Revision: D14819818

Pulled By: soumith

fbshipit-source-id: 37ff9b93a72ab2b7c87f23a61e9f776c71c4c1a8

5 years agoRemove Trainer from README.md (#18980)
Gao, Xiang [Sat, 6 Apr 2019 16:09:52 +0000 (09:09 -0700)]
Remove Trainer from README.md (#18980)

Summary:
Trainer has been removed long time ago
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18980

Differential Revision: D14819855

Pulled By: ezyang

fbshipit-source-id: f62020e688ebf6663416aec7435bf1f531607941

5 years agoCreate Object that represents a Module (#18469)
Zachary DeVito [Sat, 6 Apr 2019 01:53:31 +0000 (18:53 -0700)]
Create Object that represents a Module (#18469)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18469
ghimport-source-id: 73cb8b58f43f10b1dcfca805fd5b25c4fa977632

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18469 Create Object that represents a Module**
* #18468 slots with explicit value/setValue make more sense in future patches
* #18467 Make Object hold its ClassType
* #18379 Enforce single parent for script submodules
* #18378 Unify namespace of script::Module
* #18314 Add ability to specialize class types to ArgumentSpec
* #18226 Add Slot type to abstract the raw pointers being used for slots.

This changes the underlying storage for script::Module to hold
a ivalue::Object which has slots for all the parameters and attributes.

NamedIValue and Slot are now merged together into one class Slot that stores
the tuple (ivalue::Object, offset) and can be used to read the name, type,
or value of the slot and also to set the value. This cleans up a bunch
of client uses.

This PR does not actually use the module object in any generated code.
A future PR will switch how code is generated to treat modules as
first class.

Differential Revision: D14613508

fbshipit-source-id: d853a7559f58d244de2ef54a781427fcd1060ed0

5 years agoAdd numpy like repeat as torch.repeat_interleave (#18395)
Gao, Xiang [Sat, 6 Apr 2019 01:13:39 +0000 (18:13 -0700)]
Add numpy like repeat as torch.repeat_interleave (#18395)

Summary:
Fixes: https://github.com/pytorch/pytorch/issues/14093
cc: SsnL
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18395

Differential Revision: D14599509

Pulled By: umanwizard

fbshipit-source-id: 2391a1cc135fe5bab38475f1c8ed87c4a96222f3

5 years agoFix interpolate trace (#18875)
Elias Ellison [Sat, 6 Apr 2019 00:52:12 +0000 (17:52 -0700)]
Fix interpolate trace (#18875)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/10654

The issue is that in tracing `.size` returns an int tensor, and when an int tensor is multiplied by a scalar the int dominates and the scalar gets casted 0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18875

Differential Revision: D14814441

Pulled By: eellison

fbshipit-source-id: a4e96a2698f2fcbf3ec4b2bb4c43a30250f30ad9

5 years agoCode string API for fuser testing (#18884)
James Reed [Sat, 6 Apr 2019 00:10:13 +0000 (17:10 -0700)]
Code string API for fuser testing (#18884)

Summary:
This adds a C++ function `debugGetFusedKernelCode` as well as a Python binding `_jit_fuser_get_fused_kernel_code` that will, given a FusionGroup graph and a set of specified inputs, return the compiled kernel source code. We can then check the contents of this source code for verification of the fuser codegen backend.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18884

Differential Revision: D14795508

Pulled By: jamesr66a

fbshipit-source-id: 8f6e9dd13ebbb517737d893b0b5f5e9aa06af124

5 years agoremove unused func (#18712)
Michael Suo [Fri, 5 Apr 2019 22:13:35 +0000 (15:13 -0700)]
remove unused func (#18712)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18712
ghimport-source-id: e435150a501b20695a5276addee93d795e04b532

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18712 [jit][easy] remove unused func**
* #18711 [jit] fix side-effects and aliasing for custom ops

as title

Differential Revision: D14730979

fbshipit-source-id: 381d16ea2a45779bf6d5fc6d90a4f8585461e902

5 years agoRevert D14778810: [caffe2/int8] fix bug when falling back to acc32 when weight is...
Junjie Bai [Fri, 5 Apr 2019 20:56:34 +0000 (13:56 -0700)]
Revert D14778810: [caffe2/int8] fix bug when falling back to acc32 when weight is prepacked

Differential Revision:
D14778810

Original commit changeset: d49a8c4b7c81

fbshipit-source-id: 15568b084848de74437582548bec42aadc74080d

5 years agoslots with explicit value/setValue make more sense in future patches (#18468)
Zachary DeVito [Fri, 5 Apr 2019 20:33:14 +0000 (13:33 -0700)]
slots with explicit value/setValue make more sense in future patches (#18468)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18468
ghimport-source-id: d4b41c521f2269a695e03c8e7d05d5542731ee48

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18469 Create Object that represents a Module
* **#18468 slots with explicit value/setValue make more sense in future patches**
* #18467 Make Object hold its ClassType
* #18379 Enforce single parent for script submodules
* #18378 Unify namespace of script::Module
* #18314 Add ability to specialize class types to ArgumentSpec
* #18226 Add Slot type to abstract the raw pointers being used for slots.

Reviewed By: suo

Differential Revision: D14613509

fbshipit-source-id: 9f2208d0efd01465c78cebdc3e8365a9e0adf9ff

5 years agoMake Object hold its ClassType (#18467)
Zachary DeVito [Fri, 5 Apr 2019 20:33:14 +0000 (13:33 -0700)]
Make Object hold its ClassType (#18467)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18467
ghimport-source-id: d51bdd64d2529d08c634c58df1a0870b54ad49fb

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18469 Create Object that represents a Module
* #18468 slots with explicit value/setValue make more sense in future patches
* **#18467 Make Object hold its ClassType**
* #18379 Enforce single parent for script submodules
* #18378 Unify namespace of script::Module
* #18314 Add ability to specialize class types to ArgumentSpec
* #18226 Add Slot type to abstract the raw pointers being used for slots.

Currently it holds a symbol whose unqualified name is the name of the
class. This will get confusing when there are multiple possible registries,
and it makes getting the class type from the object difficult.
The pointer to the class is only 4 more bytes so this patch just puts
it in the object.

Reviewed By: suo

Differential Revision: D14613510

fbshipit-source-id: b35175ba4be83d2522deaa6dad5070d6ec691fed

5 years agoEnforce single parent for script submodules (#18379) (#18860)
Zachary DeVito [Fri, 5 Apr 2019 20:33:14 +0000 (13:33 -0700)]
Enforce single parent for script submodules (#18379) (#18860)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18860
ghimport-source-id: 96305349bf3db564f43df2263b1e5bddcc9e9dae

Reviewed By: suo

Differential Revision: D14780421

Pulled By: zdevito

fbshipit-source-id: 2bdd89b35866ba035ebea0adab037e441c1006e2

5 years agoCUDA_NVCC_EXECUTABLE is not needed, as nvcc is in PATH (#18958)
Stas Bekman [Fri, 5 Apr 2019 19:46:44 +0000 (12:46 -0700)]
CUDA_NVCC_EXECUTABLE is not needed, as nvcc is in PATH (#18958)

Summary:
As indicated by f0k: https://github.com/pytorch/pytorch/pull/18495#issuecomment-480178763
nvcc via ccache is already first in the PATH in the instructions I provided, so CUDA_NVCC_EXECUTABLE is not needed.

I re-built to test that it's so.

Thank you!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18958

Differential Revision: D14810732

Pulled By: ezyang

fbshipit-source-id: 3758ae2253c745c5d7cfccedd49fa00cc4629965

5 years agoFix precision issue with expansion that prefers 'probs' over 'logits' (#18614)
Ahmad Salim Al-Sibahi [Fri, 5 Apr 2019 19:45:37 +0000 (12:45 -0700)]
Fix precision issue with expansion that prefers 'probs' over 'logits' (#18614)

Summary:
I have experienced that sometimes both were in `__dict__`, but it chose to copy `probs` which loses precision over `logits`. This is especially important when training (bayesian) neural networks or doing other type of optimization, since the loss is heavily affected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18614

Differential Revision: D14793486

Pulled By: ezyang

fbshipit-source-id: d4ff5e34fbb4021ea9de9f58af09a7de00d80a63

5 years agoMethod is supposed to be in-place (#18684)
Joakim Rishaug [Fri, 5 Apr 2019 19:44:49 +0000 (12:44 -0700)]
Method is supposed to be in-place (#18684)

Summary:
Tracing models which attempts to return this in-place value doesn't turn out well.

I haven't run any tests to confirm the results to be honest, but regardless of the outcome, the operation happens in-place, so it should work as before.

Sample output from traced model attempting to set `max_norm` on `Embedding`:
```
a leaf Variable that requires grad has been used in an in-place operation. (check_inplace at /pytorch/torch/csrc/autograd/VariableTypeUtils.h:49)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f0ecc5cc021 in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f0ecc5cb8ea in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x38ab2f (0x7f0ecb55ab2f in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #3: torch::autograd::VariableType::embedding_renorm_(at::Tensor&, at::Tensor const&, double, double) const + 0x76 (0x7f0ecb5b5966 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #4: <unknown function> + 0x56c958 (0x7f0ecb73c958 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #5: <unknown function> + 0x672286 (0x7f0ecb842286 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #6: torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x22 (0x7f0ecb83d842 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #7: <unknown function> + 0x65c6ac (0x7f0ecb82c6ac in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #8: <unknown function> + 0x3c8ab4 (0x7f0f06bc0ab4 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x3ad2c3 (0x7f0f06ba52c3 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0x11663e (0x7f0f0690e63e in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #39: python_call + 0x11 (0x5563c3c521c1 in uwsgi)
frame #40: uwsgi_request_wsgi + 0x100 (0x5563c3c54410 in uwsgi)
frame #41: wsgi_req_recv + 0xac (0x5563c3becabc in uwsgi)
frame #42: simple_loop_run + 0xc4 (0x5563c3c35be4 in uwsgi)
frame #43: simple_loop + 0x10 (0x5563c3c35a00 in uwsgi)
frame #44: uwsgi_ignition + 0x241 (0x5563c3c3a3a1 in uwsgi)
frame #45: uwsgi_worker_run + 0x275 (0x5563c3c3ec35 in uwsgi)
frame #46: <unknown function> + 0x8f22c (0x5563c3c3f22c in uwsgi)
frame #47: <unknown function> + 0x3c13e (0x5563c3bec13e in uwsgi)
frame #48: __libc_start_main + 0xf1 (0x7f0f138922e1 in /lib/x86_64-linux-gnu/libc.so.6)
frame #49: _start + 0x2a (0x5563c3bec16a in uwsgi)
:
operation failed in interpreter:
op_version_set = 0
def forward(self,
    input_1: Tensor) -> Tensor:
  _0 = torch.norm(self.item_embedding.weight, 2, 1, True)
  _1 = torch.div(self.item_embedding.weight, _0)
  m_weight = torch.t(_1)
  input_2 = torch.contiguous(input_1)
  weight_1 = torch.embedding_renorm_(self.item_embedding.weight, input_2, 1., 2.)
             ~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  x = torch.embedding(weight_1, input_2, -1, False, False)
  input_3 = torch.div(x, torch.norm(x, 2, 2, True))
  max_batch_size = ops.prim.NumToTensor(torch.size(input_3, 0))
  hx = torch.zeros([2, int(max_batch_size), 70], dtype=6, layout=0, device=torch.device("cpu"))
  _2 = [self.lstm_layer.weight_ih_l0, self.lstm_layer.weight_hh_l0, self.lstm_layer.weight_ih_l1, self.lstm_layer.weight_hh_l1]
  input_4, _3, _4 = torch.lstm(input_3, [hx, hx], _2, False, 2, 0.10000000000000001, False, False, True)
  input = torch.matmul(input_4, torch.t(self.rnn2item.weight))
  tastevec = torch.div(input, torch.norm(input, 2, 2, True))
  outputs = torch.matmul(tastevec, m_weight)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18684

Differential Revision: D14782041

Pulled By: ezyang

fbshipit-source-id: 7b2fc19b7d5b6600263644498bb728319a19f39d

5 years agofix bug when falling back to acc32 when weight is prepacked (#18881)
Summer Deng [Fri, 5 Apr 2019 19:44:09 +0000 (12:44 -0700)]
fix bug when falling back to acc32 when weight is prepacked (#18881)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18881

Pull Request resolved: https://github.com/pytorch/pytorch/pull/18878

When the weight is prepacked and it doesn't contain a prepacked weight for acc32, we shouldn't fallback to acc32.

TODO: add unit tests with better coverage

Reviewed By: feiyu1990

Differential Revision: D14778810

fbshipit-source-id: d49a8c4b7c815ab29b77feb53ee730ad63780488

5 years agoMore numerically stable lerp (#18871)
Marek Kolodziej [Fri, 5 Apr 2019 19:43:02 +0000 (12:43 -0700)]
More numerically stable lerp (#18871)

Summary:
The C++ and CUDA implementations of the lerp are not numerically stable. This is discussed on Wikipedia [here](https://en.wikipedia.org/wiki/Linear_interpolation#Programming_language_support). I checked the GPU SASS output and there's no overhead from using the more precise implementation, from Kepler all the way to Turing. I haven't looked at CPU ASM though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18871

Differential Revision: D14793438

Pulled By: ezyang

fbshipit-source-id: 2ddc2e026c5285466cae7d1b4101174253100445

5 years agoIncrease default c10d/ProcessGroupGloo test timeout (#18916)
Pieter Noordhuis [Fri, 5 Apr 2019 19:13:31 +0000 (12:13 -0700)]
Increase default c10d/ProcessGroupGloo test timeout (#18916)

Summary:
See #18659.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18916

Differential Revision: D14808749

Pulled By: pietern

fbshipit-source-id: 9a9c8beddb2dbbb1bf4c5e575743d9e1fa3f07fa

5 years agoremove symbolic variable part 1 (#17986)
Ailing Zhang [Fri, 5 Apr 2019 18:57:17 +0000 (11:57 -0700)]
remove symbolic variable part 1 (#17986)

Summary:
As discussed with gchanan we should deduplicate symbolic_variable and symbolic_script to prepare for the future merge with derivatives.yaml.

This PR moves most easy formulas to symbolic_script.

TODO: run benchmarks to make sure no perf regression

cc: apaszke zdevito wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17986

Differential Revision: D14766412

Pulled By: ailzhang

fbshipit-source-id: d95a3f876e256c0f505779a71587c985571d3b8f

5 years agoRevert D14742020: Wrap workaround for cpp custom types a bit prettier and add an...
Edward Yang [Fri, 5 Apr 2019 18:55:38 +0000 (11:55 -0700)]
Revert D14742020: Wrap workaround for cpp custom types a bit prettier and add an example

Differential Revision:
D14742020

Original commit changeset: 0f2fd83ae56a

fbshipit-source-id: 5640255aef0319b7d8996e07132e87213130d31c

5 years agoDecompose more Windows scripts (#18917)
Karl Ostmo [Fri, 5 Apr 2019 18:26:31 +0000 (11:26 -0700)]
Decompose more Windows scripts (#18917)

Summary:
This PR:

* pulls four distinct installation steps out of `build_pytorch.bat` and into their own scripts.
* eliminates the copy step for helper scripts called by `win-build.sh` and `win-test.sh`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18917

Differential Revision: D14807236

Pulled By: kostmo

fbshipit-source-id: 03e91a5834dfd6d68903ad9725eacc099bbf6d53

5 years agoWrap workaround for cpp custom types a bit prettier and add an example (#18791)
Dmytro Dzhulgakov [Fri, 5 Apr 2019 18:14:11 +0000 (11:14 -0700)]
Wrap workaround for cpp custom types a bit prettier and add an example (#18791)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18791

As a temporary demonstration on how to extend this hack further until custom C types are ready.

Reviewed By: jamesr66a

Differential Revision: D14742020

fbshipit-source-id: 0f2fd83ae56ab2abe16977a1829ed421e6abe74b

5 years agoRemove cuda::compat functions in aten (#18905)
bddppq [Fri, 5 Apr 2019 18:09:15 +0000 (11:09 -0700)]
Remove cuda::compat functions in aten (#18905)

Summary:
Looks like the issue of using `std::` functions is fixed in new rocm version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18905

Differential Revision: D14792943

Pulled By: bddppq

fbshipit-source-id: af11acbb85872943f23b6e55415db1f0699e7b8f

5 years agofix side-effects and aliasing for custom ops (#18711)
Michael Suo [Fri, 5 Apr 2019 17:40:19 +0000 (10:40 -0700)]
fix side-effects and aliasing for custom ops (#18711)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18711
ghimport-source-id: c9caedc0660b2b7ba3730cd0e1a2e0e9c3cf422b

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18711 [jit] fix side-effects and aliasing for custom ops**

Previously we didn't track aliasing, mutation, or side effects for
custom ops. This PR adds in guards with the most conservative
assumptions possible: the op will
1) have side effects,
2) write to everything
3) produce a wildcard.

In order to tell whether a given operator is a custom op, this PR introduces
the concept of a "reserved" namespace (basically all our builtin namespaces).
Custom ops live in non-reserved namespaces, so a check on the namespace
is sufficient to tell whether a schema/node is "custom" or not.

This is just to get things correct for now. Follow-ups to this:
- Users should be able to specify aliasing/mutability without having to learn
the whole alias annotation schema.
- Relax assumptions a bit. In particular outputs can only alias input tensors,
they don't have to be wildcards.

Fixes #18490

Differential Revision: D14730978

fbshipit-source-id: 540b47a24ccf24145051609bdcc99c97e46e0fe0

5 years agoExpand the list of ops that mutate an inputs shape (#18812)
Elias Ellison [Fri, 5 Apr 2019 17:37:58 +0000 (10:37 -0700)]
Expand the list of ops that mutate an inputs shape (#18812)

Summary:
Expand the list of ops that resize an input in-place to include broadcasting ops and other ops that affect shape. Whoever is reviewing the PR could you please look through pytorch in place ops and see if I missed any.

Expanding the PR from: https://github.com/pytorch/pytorch/pull/17518

This is already being tested in test_resize_input_ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18812

Differential Revision: D14793410

Pulled By: eellison

fbshipit-source-id: 125f4f5375ac1036fb96fabc9da2aaccc9adc778

5 years agoadd launch bounds, enable more tests (#18909)
J M Dieterich [Fri, 5 Apr 2019 17:11:43 +0000 (10:11 -0700)]
add launch bounds, enable more tests (#18909)

Summary:
Add launch bounds annotations for ROCm arising from maxThreadsPerBlock and apply threads use.

Enable tests that now work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18909

Differential Revision: D14801490

Pulled By: ezyang

fbshipit-source-id: b81c97fc783a2627bc7e31b32036a364cfe40cc7

5 years agoAdd backward pass to infer single missing input shape for Concat opportunitiscally...
Yinghai Lu [Fri, 5 Apr 2019 17:09:14 +0000 (10:09 -0700)]
Add backward pass to infer single missing input shape for Concat opportunitiscally (#18911)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18911

Att.

Reviewed By: bddppq

Differential Revision: D14791295

fbshipit-source-id: 4b7a775924f0eadb0cb73aa6c434a6a5be8b92be

5 years agochange to use clang if NDK >= 18 (#18914)
Jiakai Liu [Fri, 5 Apr 2019 16:54:27 +0000 (09:54 -0700)]
change to use clang if NDK >= 18 (#18914)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18914
ghimport-source-id: 4d9d9322ee5559d96e13533ec37ff3be86a0227c

Reviewed By: ezyang

Differential Revision: D14794162

Pulled By: ljk53

fbshipit-source-id: caac55e12b1e62bf6ebcc6e2062d5ed122ad4e64

5 years agoRevert D14673459: [pytorch][PR] [jit] Replace Slot on script::Method with NamedIValue
Zachary DeVito [Fri, 5 Apr 2019 16:46:10 +0000 (09:46 -0700)]
Revert D14673459: [pytorch][PR] [jit] Replace Slot on script::Method with NamedIValue

Differential Revision:
D14673459

Original commit changeset: 21200180c47f

fbshipit-source-id: 9c01de4cf5bb7c87ac0c55705b901db990cd917b

5 years agoDisable flaky test_proper_exit test. (#18950)
Edward Yang [Fri, 5 Apr 2019 16:37:11 +0000 (09:37 -0700)]
Disable flaky test_proper_exit test. (#18950)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18950
ghimport-source-id: 27bd575fd3c73a51ace1360aa020fa63a792a5d2

Differential Revision: D14802009

Pulled By: ezyang

fbshipit-source-id: 051e1d038892c2c6e8337357fa80771b8dc42680

5 years agoCheckout pytorch_sphinx_theme with https. (#18859)
Edward Yang [Fri, 5 Apr 2019 16:33:08 +0000 (09:33 -0700)]
Checkout pytorch_sphinx_theme with https. (#18859)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18859
ghimport-source-id: fbbcb8a2dd9c9f0a317de489b6bbb83e9071a7d8

Differential Revision: D14801989

Pulled By: ezyang

fbshipit-source-id: a9bc02e1383adafcac01994e6346b28551d95c71

5 years agoAdd tests for reducer class (#18845)
Pieter Noordhuis [Fri, 5 Apr 2019 16:04:43 +0000 (09:04 -0700)]
Add tests for reducer class (#18845)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18845

This adds a few CPU only test cases for the reducer class.

Reviewed By: mrshenli

Differential Revision: D14768432

fbshipit-source-id: c008a52206826304e634a95bc14167ed94c97662

5 years agoFix a few instances of notifying on a CV while holding the lock (#18857)
Owen Anderson [Fri, 5 Apr 2019 15:34:41 +0000 (08:34 -0700)]
Fix a few instances of notifying on a CV while holding the lock (#18857)

Summary:
Fix a few instances of notifying on a CV while holding the lock to release the lock before notifying.  This avoids an extra thread suspension when the notified thread tries to grab the lock.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18857

Differential Revision: D14779132

Pulled By: resistor

fbshipit-source-id: b18a05c4c15be1426ebfdffac1c8f002b771cfd7

5 years agoUnify caffe2 and libtorch build scripts on Windows (#18683)
peter [Fri, 5 Apr 2019 14:44:43 +0000 (07:44 -0700)]
Unify caffe2 and libtorch build scripts on Windows (#18683)

Summary:
`scripts/build_windows.bat` is the original way to build caffe2 on Windows, but since it is merged into libtorch, the build scripts should be unified because they actually do the same thing except there are some different flags.

The follow-up is to add the tests. Looks like the CI job for caffe2 windows is defined [here](https://github.com/pytorch/ossci-job-dsl/blob/master/src/jobs/caffe2.groovy#L906). Could we make them a separate file, just like what we've done in `.jenkins/pytorch/win-build.sh`? There's a bunch of things we can do there, like using ninja and sccache to accelerate build.

cc orionr yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18683

Differential Revision: D14730188

Pulled By: ezyang

fbshipit-source-id: ea287d7f213d66c49faac307250c31f9abeb0ebe

5 years agoSimplify storage wrapping in TH. (#18855)
Gregory Chanan [Fri, 5 Apr 2019 14:18:39 +0000 (07:18 -0700)]
Simplify storage wrapping in TH. (#18855)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18855
ghimport-source-id: 01faa229fa4db901ab8539d3778b716d909ba4cf

Reviewed By: dzhulgakov

Differential Revision: D14790669

Pulled By: gchanan

fbshipit-source-id: 167b9bc9c9872743fa8f6040a26ddf7ff5789c27

5 years agoCache device on TensorImpl; clean up TensorImpl constructors. (#18833)
Gregory Chanan [Fri, 5 Apr 2019 14:18:38 +0000 (07:18 -0700)]
Cache device on TensorImpl; clean up TensorImpl constructors. (#18833)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18833
ghimport-source-id: 6f2be25fcc5e6be3ffe20582e604bd2c1fbab66b

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors.**
* #18832 [STACK] Disallow changing the device of a tensor via set_.
* #18831 [STACK] Stop swapping in Storages of the wrong device for Tensors.

1) We cache device on TensorImpl.  This means we can access the device without a virtual function and allows us to more easily extend TensorImpls (because they don't need to figure out how to store the Device for themselves).

2) Clean up TensorImpl APIs.  We had a constructor that took a TensorTypeId and an allocator and would allocate a Storage based on the recognized types of TensorTypeIds.  Instead, we just have two different constructors: one for types with a storage, one without.

Reviewed By: dzhulgakov

Differential Revision: D14766230

fbshipit-source-id: 745b8db84dcd6cb58f1a8675ad3ff8d033bc50df