platform/upstream/pytorch.git
5 years agoUpgrade mkl-dnn to v0.17.3 to fix core dump issue (#17107)
Gu, Jinghui [Fri, 15 Feb 2019 09:19:33 +0000 (01:19 -0800)]
Upgrade mkl-dnn to v0.17.3 to fix core dump issue (#17107)

Summary:
Upgrade mkl-dnn to 0.17.3 to fix core dump issue in #16183
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17107

Differential Revision: D14097600

Pulled By: yinghai

fbshipit-source-id: 2baa44e211ce37fbdf01585344c98745f5ba008c

5 years agoUpdated bbox_transform and nms unit test for caffe2 ops. (#16722)
Peizhao Zhang [Fri, 15 Feb 2019 08:14:45 +0000 (00:14 -0800)]
Updated bbox_transform and nms unit test for caffe2 ops. (#16722)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16722

Updated bbox_transform and nms unit test for caffe2 ops.

Differential Revision: D13937416

fbshipit-source-id: 034743d29671c6e73d323a935e2d734ecc071bff

5 years agoExtend support for exporting reshape to onnx. (#16971)
BowenBao [Fri, 15 Feb 2019 08:14:25 +0000 (00:14 -0800)]
Extend support for exporting reshape to onnx. (#16971)

Summary:
Resolve issue with reshape_as test case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16971

Differential Revision: D14098871

Pulled By: houseroad

fbshipit-source-id: ed6b966821462d374313256abbbe27f96ce11b2c

5 years agoadd std to autodiff, and mean/var/std to operator set (#17137)
Wanchao Liang [Fri, 15 Feb 2019 07:15:53 +0000 (23:15 -0800)]
add std to autodiff, and mean/var/std to operator set (#17137)

Summary:
supersedes #16684
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17137

Differential Revision: D14096724

Pulled By: wanchaol

fbshipit-source-id: d801d70029a6a1f5851400ff4094c0299c102b2b

5 years agoScript module data parallel (#16891)
Guoqiang Jerry Chen [Fri, 15 Feb 2019 06:43:46 +0000 (22:43 -0800)]
Script module data parallel (#16891)

Summary:
support data parallel for ScriptModule.

see unit tests for testing done for this PR. I also tried traced version of resnet18 from torchvision.

I'm yet to try a complete end-to-end data parallel training. This will be next steps.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16891

Differential Revision: D14002222

Pulled By: gqchen

fbshipit-source-id: fce3598169113215599815c6978e66d3c3a8c282

5 years agoadd pre-packing operation in README.md (#17151)
Jongsoo Park [Fri, 15 Feb 2019 06:41:18 +0000 (22:41 -0800)]
add pre-packing operation in README.md (#17151)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17151

As title

Reviewed By: jianyuh

Differential Revision: D14084272

fbshipit-source-id: e58c041e0374f6e82b337e5b6325ef06981ad8b4

5 years agoMinor fix of the histogram observer in FBL eval flows (#17118)
Summer Deng [Fri, 15 Feb 2019 05:58:22 +0000 (21:58 -0800)]
Minor fix of the histogram observer in FBL eval flows (#17118)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17118

Fix the bug in quantization eval workflow; Add mul_nets option in histogram observer pybind

Reviewed By: yinghai

Differential Revision: D14085321

fbshipit-source-id: 08e3153148522ebc9512a57144d9a8ad154bb6f8

5 years agomore test coverage on emitIf none dispatch (#16794)
Wanchao Liang [Fri, 15 Feb 2019 05:37:08 +0000 (21:37 -0800)]
more test coverage on emitIf none dispatch (#16794)

Summary:
Follow up of #14533, add more test coverage for emitif metaprogramming conditions. Also delete some unwrap optional usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16794

Differential Revision: D14096868

Pulled By: wanchaol

fbshipit-source-id: ee1cec609c58d0dd65211249a90207be06649e71

5 years agoSpeed-up adaptive average pooling for the common case of size=1 output (#17011)
ngimel [Fri, 15 Feb 2019 05:11:30 +0000 (21:11 -0800)]
Speed-up adaptive average pooling for the common case of size=1 output (#17011)

Summary:
When adaptive pooling has to produce a single pixel feature map, it is faster to do so by calling .mean(). Backward calls a pretty inefficient cuda kernel with atomics, which becomes ridiculously slow for halfs. For half this PR provides approx 30x speed-up for adaptive average pooling, which results in 30% end-to-end speed-up on senet. Improvements are smaller for float, but still significant (approx 5x).
Also this PR unifies handling of 3d (no batch dimension) and 4d tensors, using negative dimension indices.
cc ezyang for review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17011

Reviewed By: ailzhang

Differential Revision: D14078747

Pulled By: soumith

fbshipit-source-id: 0eb9255da2351190a6bcaf68c30e2ae2402a2dd9

5 years agoImprove example for torch.mode (#17069)
Thomas Viehmann [Fri, 15 Feb 2019 02:41:35 +0000 (18:41 -0800)]
Improve example for torch.mode (#17069)

Summary:
This updates the example for `torch.mode` to show a case where there is a mode.
Also add a bit of a description to the explanation as well as being a bit more precise about "a" mode rather than "the" mode.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17069

Differential Revision: D14078722

Pulled By: soumith

fbshipit-source-id: 837a238d53a9b8e868511acbdc258633975bea48

5 years agoCreate BackendTransformerBase to host common functions used for backend lowering...
Yinghai Lu [Fri, 15 Feb 2019 01:45:36 +0000 (17:45 -0800)]
Create BackendTransformerBase to host common functions used for backend lowering (#17074)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17074

There are some common functionalities in backend lowering. This diff creates a base class which hosts these common stuff.

Reviewed By: ipiszy

Differential Revision: D14073192

fbshipit-source-id: 9617603d0e73db6f7fcc5572756b9dbab506dae5

5 years agoFix android crash when model detects nothing
Zhizhen Qin [Fri, 15 Feb 2019 01:22:34 +0000 (17:22 -0800)]
Fix android crash when model detects nothing

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17119

Reviewed By: sf-wind

Differential Revision: D14087835

Pulled By: ZhizhenQin

fbshipit-source-id: 32e61d46679bae645fd0bbec724513cfa5c553ab

5 years agoFix some documentation links in torch.tensor (#17109)
kngwyu [Fri, 15 Feb 2019 01:07:12 +0000 (17:07 -0800)]
Fix some documentation links in torch.tensor (#17109)

Summary:
Currently it's broken https://pytorch.org/docs/stable/tensors.html#torch.Tensor.norm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17109

Differential Revision: D14093567

Pulled By: ezyang

fbshipit-source-id: b167cde2150ee97ccf5689fcf50ff8157acfce10

5 years agoApply modernize-use-override (2nd iteration)
Michael Liu [Fri, 15 Feb 2019 00:21:50 +0000 (16:21 -0800)]
Apply modernize-use-override (2nd iteration)

Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.

Reviewed By: Orvid

Differential Revision: D14086124

fbshipit-source-id: 2005227d095d776ca3b4309a57f54e25782b9b58

5 years agoGeneralize catArray for contiguous inputs and dim != 0 (#17032)
James Reed [Thu, 14 Feb 2019 23:58:06 +0000 (15:58 -0800)]
Generalize catArray for contiguous inputs and dim != 0 (#17032)

Summary:
I noticed that we were sinking a lot of time into `cat` operations in machine translation on CPU, and drilled down to us doing the cat element-by-element, even though all the inputs were contiguous. The reason was we were doing the cat along a dimension that was not 0, and that caused us to not use the fast `memcpy` branch. This PR generalizes that branch.

Quick benchmark script:
```
import torch, time

tensors = [torch.rand(6, 2, 1024) for i in range(5)]

NITER = 1000
s = time.time()
for i in range(NITER):
    torch.cat(tensors, dim=1)
print('time per iter ', (time.time() - s) / NITER)
```

Before:
```
time per iter  8.089399337768554e-05
```

After:
```
time per iter  2.183413505554199e-05
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17032

Differential Revision: D14090038

Pulled By: jamesr66a

fbshipit-source-id: 2c733a84915896008ac95f2233f44894bd2573de

5 years agofix test_jit canonicalize_tensor_iterator
Wanchao Liang [Thu, 14 Feb 2019 23:37:42 +0000 (15:37 -0800)]
fix test_jit canonicalize_tensor_iterator

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17104

Differential Revision: D14089928

Pulled By: wanchaol

fbshipit-source-id: 8b288514ab9ee8d24a11d39b75eef95783f28f20

5 years agoUse new constructor in USE_SIMPLE_CTOR_DTOR (#17080)
Sebastian Messmer [Thu, 14 Feb 2019 23:06:53 +0000 (15:06 -0800)]
Use new constructor in USE_SIMPLE_CTOR_DTOR (#17080)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17080

This changes all operators using this macro to the new format

Reviewed By: dzhulgakov

Differential Revision: D14078628

fbshipit-source-id: 67048e485e326765fd49567cc008633d3d500d5c

5 years agoCaffe2 TARGETS for HIP (#17076)
Xiaodong Wang [Thu, 14 Feb 2019 23:02:56 +0000 (15:02 -0800)]
Caffe2 TARGETS for HIP (#17076)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17076

OSS: slightely change the tools/amd_build/build_amd.py to add the output_directory for internal use. Also modify the renaming convention in hipify script to reflect the updated rules.

Reviewed By: bddppq

Differential Revision: D13767218

fbshipit-source-id: cbcadc51daab42197d545f204840dcc18176bb3d

5 years agomaskrcnn & bert AD coverage part 1 (#16689)
Ailing Zhang [Thu, 14 Feb 2019 22:55:44 +0000 (14:55 -0800)]
maskrcnn & bert AD coverage part 1 (#16689)

Summary:
- Moved a few functions from `autograd` namespace to `aten` namespace to be visible from JIT nativeResolver.
- Added a hack to loop up keyword only argument. Will add proper support for kw only later
- Simulate function overload in aten using `_<number>` as function name suffix.
- Even `forward` returns multiple outputs like in `kthvalue`, there's at most one requires grad that we currently support.
- Removed the `TensorList` related ops here since partial `TensorList` support is prone to bugs. Our symbolic diff for `cat` was never tested with autodiff, and it seems broken. Need to find another proper way to support these ops(either by properly supporting `TensorList` or sth like `prim::ConstantChunk`  and leave them for next PR.

Ops supported in this PR:
```
erf
expand_as
index
kthvalue
mean
permute
pow
rsub
select
sqrt
squeeze
t
to
topk
transpose
view
var
embedding
logsumexp
// grad is None
_dim_arange
contiguous
nonzero
ones_like
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16689

Differential Revision: D14020806

Pulled By: ailzhang

fbshipit-source-id: a5e2c144a7be5a0d39d7ac5f93cb402ec12503a5

5 years agoSecond PR to restore reverted commit (#16224) (#17040)
jiej [Thu, 14 Feb 2019 22:40:13 +0000 (14:40 -0800)]
Second PR to restore reverted commit (#16224) (#17040)

Summary:
update:
  1. global_reduce check for should_block_y_reduce first.
     This avoids the enabling global_reduce without block_y_reduce. Leading to
     accessing shared memory during global reduce without allocation.
  2. updating block_y_reduce heuristics. Improves perf on tiny tensors
  3. adding test case covering old cases where illegal memory access might occur

  TensorIterator cuda launch configs update (#16224)
    Update launch configs for TensorIterator gpu_reduce_kernel. Enable flexible
    block dimension to improve efficiency for reduction cases with small fast
    dimension.

    Previously TensorIterator launches blocks with fixed 32x16 threads.
    For cases like:

      import torch
      torch.randn(2**20, 4, device='cuda').sum(0)

    The fixed launch config does handle coalesced memory access efficiently.

    Updated launch configure enables flexible block dimension. Combining with
    improved reduction scheme (using flexible vertical / horizontal reduction
    instead of limited warp / block reduction in the old code), it ensures optimal
    memory access pattern even with reduction on dimension with small stride.

    Possible future improvements:
    1. Precise dynamic shared memory allocation.
    2. Using warp shuffle for vertical (block_y) reduction.
    Pull Request resolved: https://github.com/pytorch/pytorch/pull/16224
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17040

Differential Revision: D14078295

Pulled By: umanwizard

fbshipit-source-id: ecc55054a5a4035e731f0196d633412225c3b06c

5 years agoRemove fake inference for shape info in ONNXIFI transform (#17046)
Yinghai Lu [Thu, 14 Feb 2019 22:22:51 +0000 (14:22 -0800)]
Remove fake inference for shape info in ONNXIFI transform (#17046)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17046

As we are moving to use bound shape inference, we can remove the awkward fake inference run path and make the code cleaner.

Reviewed By: ipiszy

Differential Revision: D14061501

fbshipit-source-id: b3ace98b3dabef3c3359086a0bb1410518cefa26

5 years agoUpdate alexnet expect.
Gregory Chanan [Thu, 14 Feb 2019 21:45:04 +0000 (13:45 -0800)]
Update alexnet expect.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17122

Reviewed By: colesbury

Differential Revision: D14090209

Pulled By: gchanan

fbshipit-source-id: 78c5961dd7d752b237782b6ed90c376bbd6d3145

5 years agoadd clear functionality to list (#17050)
Michael Kösel [Thu, 14 Feb 2019 21:42:27 +0000 (13:42 -0800)]
add clear functionality to list (#17050)

Summary:
Add clear functionality to list. See #16662

```python
import torch

torch.jit.script
def foo():
    a = [1, 2, 3, 4]
a.clear()

    return a
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17050

Differential Revision: D14071799

Pulled By: driazati

fbshipit-source-id: 305551c16f7db127c43de0ad5885d9f10678e101

5 years agoModerate the dim type after LengthsRangeFill (#17096)
Yinghai Lu [Thu, 14 Feb 2019 21:33:52 +0000 (13:33 -0800)]
Moderate the dim type after LengthsRangeFill (#17096)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17096

LengthsRangeFill will take a batch size of lengths input and expand it into sequence. Later op should follow this type until it hits another batch type moderating op, e.g. SparseLengthsSum.

Reviewed By: ipiszy

Differential Revision: D14079422

fbshipit-source-id: 1a26925d502c32875ea95c160268bf6a256cc955

5 years agofix behavior of ConcatDataset w/ negative indices (#15756)
jayleverett [Thu, 14 Feb 2019 19:46:55 +0000 (11:46 -0800)]
fix behavior of ConcatDataset w/ negative indices (#15756)

Summary:
Currently, when you pass a negative index to a `Dataset` created with `ConcatDataset`, it simply passes that index to the first dataset in the list. So if, for example, we took `concatenated_dataset[-1]`, this will give us the last entry of the *first* dataset, rather than the last entry of the *last* dataset, as we would expect.

This is a simple fix to support the expected behavior for negative indices.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15756

Reviewed By: ezyang

Differential Revision: D14081811

Pulled By: fmassa

fbshipit-source-id: a7783fd3fd9e1a8c00fd076c4978ca39ad5a8a2a

5 years agoAdd support of count_include_pad and test end to end test for AveragePool (#17034)
Dwarak Rajagopal [Thu, 14 Feb 2019 18:28:25 +0000 (10:28 -0800)]
Add support of count_include_pad and test end to end test for AveragePool (#17034)

Summary:
Add support of count_include_pad end to end test for AveragePool

We can export AveragePool from PyTorch with count_include_pad attribute. However, we don't directly support it in Caffe2's ONNX backend.
We also want to check whether we can pass the end to end test for average pool operator with count_include_pad attribute (pytorch => onnx => caffe2)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17034

Reviewed By: houseroad

Differential Revision: D14060186

Pulled By: dwarakrajagopal

fbshipit-source-id: 10dae532611c71f8c8cfc3fa701cc7c1c1c02695

5 years agoSupport nonzero onnx export
BowenBao [Thu, 14 Feb 2019 07:43:14 +0000 (23:43 -0800)]
Support nonzero onnx export

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17036

Differential Revision: D14079676

Pulled By: houseroad

fbshipit-source-id: 562b538dd9ab330c26f15fdb34c98dc7a23571a1

5 years agoAdd more headers to setup.py to make pytorch/benchmark work (#16890)
Dmytro Dzhulgakov [Thu, 14 Feb 2019 06:53:56 +0000 (22:53 -0800)]
Add more headers to setup.py to make pytorch/benchmark work (#16890)

Summary:
Since we don't do tmp_install any more it's better to include all necessary headers.

cc kostmo for better suggestions of how to list all headers here
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16890

Differential Revision: D14079848

Pulled By: dzhulgakov

fbshipit-source-id: 4522c80d05e5d91f99f6700cde46cac559330d28

5 years agoClean up Storage/StorageImpl constructors (#16948)
Dmytro Dzhulgakov [Thu, 14 Feb 2019 06:38:24 +0000 (22:38 -0800)]
Clean up Storage/StorageImpl constructors (#16948)

Summary:
Small cleanup while doing https://github.com/pytorch/pytorch/pull/16857:

- rename C2 constructors as create_legacy
- remove duplicated constructors
- make resizable flag non-default
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16948

Differential Revision: D14062755

Pulled By: dzhulgakov

fbshipit-source-id: 3b7b4ec9cdf67d2628cccc001156e040006b673e

5 years agoSafety check for negative alloc_cpu() attempt (#17071)
Dmytro Dzhulgakov [Thu, 14 Feb 2019 06:18:27 +0000 (22:18 -0800)]
Safety check for negative alloc_cpu() attempt (#17071)

Summary:
Some legacy TH code was relying on alloc to throw when called with negative number!!! E.g. `torch.linspace(0, 1, -1)`. And it breaks ASAN build. I still believe alloc should receive size_t, but I added a safety enforce inside.

It should fix ASAN. I'll follow up with a proper fix for empty_cpu (which is probably the right place to do it) separately
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17071

Differential Revision: D14074157

Pulled By: dzhulgakov

fbshipit-source-id: 3ed3bdb873e446edecb558e1df491310fd7179e3

5 years agoUpdating submodules
svcscm [Thu, 14 Feb 2019 05:38:37 +0000 (21:38 -0800)]
Updating submodules

Reviewed By: cdelahousse

fbshipit-source-id: b4e7a3850b01bbec56faa3eb0feb3bc6197c0393

5 years agoApply modernize-use-override - 2/2
Michael Liu [Thu, 14 Feb 2019 04:51:55 +0000 (20:51 -0800)]
Apply modernize-use-override - 2/2

Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.

Reviewed By: Orvid

Differential Revision: D14054721

fbshipit-source-id: 15d266fa1779b1e3ea6270f00841d7fb1e4d44ee

5 years agoUpdating submodules
svcscm [Thu, 14 Feb 2019 04:49:07 +0000 (20:49 -0800)]
Updating submodules

Reviewed By: cdelahousse

fbshipit-source-id: 5d9763a6f26ba53c6402b978004aaa7508f4e354

5 years ago#16627 convert weights using torch.as_tensor to avoid warning (#17067)
ptrblck [Thu, 14 Feb 2019 04:42:45 +0000 (20:42 -0800)]
#16627 convert weights using torch.as_tensor to avoid warning (#17067)

Summary:
Minor change which fixes #16627
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17067

Differential Revision: D14078726

Pulled By: soumith

fbshipit-source-id: c04a5f1eff44e4a4b04b981f0ae8de6ff018515b

5 years agoUpdating submodules
svcscm [Thu, 14 Feb 2019 04:26:10 +0000 (20:26 -0800)]
Updating submodules

Reviewed By: cdelahousse

fbshipit-source-id: e074a865b859fd72b34b012505dfbd3a27a0cc41

5 years agoRevert D14062537: [pytorch][PR] Implement NetDef <--> JIT IR converters.
Edward Yang [Thu, 14 Feb 2019 04:21:06 +0000 (20:21 -0800)]
Revert D14062537: [pytorch][PR] Implement NetDef <--> JIT IR converters.

Differential Revision:
D14062537

Original commit changeset: 88b184ee7276

fbshipit-source-id: 01971bbe20daade40cc2cbf85fc08edb380b445c

5 years agoPyTorch model metadata. (#16275)
Pritam Damania [Thu, 14 Feb 2019 03:41:25 +0000 (19:41 -0800)]
PyTorch model metadata. (#16275)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16275

Adding a generic string `metadata` field as part of the model to capture additional metadata with the model.

Reviewed By: dzhulgakov

Differential Revision: D13579029

fbshipit-source-id: 7456ef2edbe73bb70bbb31889cecd94e0db329a2

5 years agoTrim libshm deps, move tempfile.h to c10 (#17019)
Dmytro Dzhulgakov [Thu, 14 Feb 2019 03:28:05 +0000 (19:28 -0800)]
Trim libshm deps, move tempfile.h to c10 (#17019)

Summary:
libshm_manager doesn't need to depend on all of libtorch. It only uses tiny tempfile.h which can be moved to c10. I could just duplicate the file too, but it's not worth it as c10 is small enough.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17019

Differential Revision: D14052688

Pulled By: dzhulgakov

fbshipit-source-id: 8797d15f8c7c49c49d40b7ab2f43aa3bf6becb0c

5 years agoImplement NetDef <--> JIT IR converters. (#16967)
Mikhail Zolotukhin [Thu, 14 Feb 2019 02:15:57 +0000 (18:15 -0800)]
Implement NetDef <--> JIT IR converters. (#16967)

Summary:
Currently the converters are very straightforward, i.e. there is no code for trying to
preserve semantics, we're purely perform conversion from one format to another.

Two things that we might want to add/change:
1. Add semantic conversion as well (but probably it would be a good idea to keep
it separate as a temporary thing).
2. Make sure we don't mess with value names, as they are crucial for current
uses of NetDefs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16967

Differential Revision: D14062537

Pulled By: ZolotukhinM

fbshipit-source-id: 88b184ee7276779e5e9152b149d69857515ad98a

5 years agoRemove IgnoredPythonOp sugared value
David Riazati [Thu, 14 Feb 2019 01:56:52 +0000 (17:56 -0800)]
Remove IgnoredPythonOp sugared value

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17042

Differential Revision: D14072497

Pulled By: driazati

fbshipit-source-id: 68fe3fa89c22e60142d758c8cbe0e6e258e7d5c2

5 years agoSeparate reduce functions from math (#16929)
Xiaomeng Yang [Thu, 14 Feb 2019 01:47:49 +0000 (17:47 -0800)]
Separate reduce functions from math (#16929)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16929

Separate CPU reduce functions from math

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D13999469

fbshipit-source-id: bd628b15a6e3c1f04cc62aefffb0110690e1c0d1

5 years agoSkip test_cudnn_multiple_threads_same_device on ROCm (flaky) (#17061)
Junjie Bai [Thu, 14 Feb 2019 01:12:01 +0000 (17:12 -0800)]
Skip test_cudnn_multiple_threads_same_device on ROCm (flaky) (#17061)

Summary:
cc iotamudelta
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10722//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10710//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/10753//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-devtoolset7-rocmrpm-centos7.5-test/1756//console
```
19:07:18 ======================================================================
19:07:18 FAIL: test_cudnn_multiple_threads_same_device (test_nn.TestNN)
19:07:18 ----------------------------------------------------------------------
19:07:18 Traceback (most recent call last):
19:07:18   File "/var/lib/jenkins/workspace/test/test_nn.py", line 3905, in test_cudnn_multiple_threads_same_device
19:07:18     (2048 - test_iters) * (2048 - test_iters))
19:07:18   File "/var/lib/jenkins/workspace/test/common_utils.py", line 453, in assertEqual
19:07:18     super(TestCase, self).assertLessEqual(abs(x - y), prec, message)
19:07:18 AssertionError: 3794704.0 not less than or equal to 1e-05 :
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17061

Differential Revision: D14069324

Pulled By: bddppq

fbshipit-source-id: e33b09abca217a62a8b577f9c332ea22985ef4ff

5 years agoSupport FC (Caffe2) -> Gemm (ONNX) with variable input shape. (#16184)
Tongliang Liao [Thu, 14 Feb 2019 01:08:40 +0000 (17:08 -0800)]
Support FC (Caffe2) -> Gemm (ONNX) with variable input shape. (#16184)

Summary:
For >2D input, previously the code uses static shape captured during tracing and reshape before/after `Gemm`.
Now we add `-1` to the first `Reshape`, and uses `Shape(X) => Slice(outer) => Concat(with -1 for inner) => Reshape` for the second.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16184

Differential Revision: D14070754

Pulled By: ezyang

fbshipit-source-id: 86c69e9b254945b3406c07e122e57a00dfeba3df

5 years agoMake timeout in resnet50_trainer configurable (#17058)
Junjie Bai [Thu, 14 Feb 2019 00:57:30 +0000 (16:57 -0800)]
Make timeout in resnet50_trainer configurable (#17058)

Summary:
xw285cornell petrex dagamayank
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17058

Differential Revision: D14068458

Pulled By: bddppq

fbshipit-source-id: 15df4007859067a22df4c6c407df4121e19aaf97

5 years agoMake mkldnn Stream object thread_local and enable mkldnn thread-safe (#17022)
Boris Daskalov [Wed, 13 Feb 2019 23:30:39 +0000 (15:30 -0800)]
Make mkldnn Stream object thread_local and enable mkldnn thread-safe (#17022)

Summary:
This PR fixes following issue: https://github.com/pytorch/pytorch/issues/16828

It is a combination of two things:
1) MKLDNN streams are not thread-safe but are currently shared between different threads. This change makes them thread_local
2) By default MKLDNN primitives can share global memory and can't be invoked from multiple threads. This PR enables the MKLDNN_ENABLE_CONCURRENT_EXEC cmake configuration option that makes them thread-safe.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17022

Differential Revision: D14069052

Pulled By: ezyang

fbshipit-source-id: f8f7fcb86c40f5d751fb35dfccc2f802b6e137c6

5 years agoSupport conversion from Caffe2 MergeDim to ONNX Reshape + Squeeze. (#16189)
Tongliang Liao [Wed, 13 Feb 2019 22:57:27 +0000 (14:57 -0800)]
Support conversion from Caffe2 MergeDim to ONNX Reshape + Squeeze. (#16189)

Summary:
`MergeDim` can be done by `Reshape([1, -1, 0, 0, ...]) + Squeeze`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16189

Differential Revision: D14070676

Pulled By: ezyang

fbshipit-source-id: 28d7e9b35cc2c1dcbd4afb3fbdf7383e219b1777

5 years agoFix mvlgamma doc (#17045)
vishwakftw [Wed, 13 Feb 2019 21:35:40 +0000 (13:35 -0800)]
Fix mvlgamma doc (#17045)

Summary:
Changelog:
- Fix the constant in the docs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17045

Differential Revision: D14068698

Pulled By: ezyang

fbshipit-source-id: af040b9a9badea213785f5bf3b6daf4d90050eb2

5 years agoChange IR graph print format to make it look more pythonic (#16986)
Mikhail Zolotukhin [Wed, 13 Feb 2019 18:32:38 +0000 (10:32 -0800)]
Change IR graph print format to make it look more pythonic (#16986)

Summary:
This removes curly braces from the outputs (we have indentation to indicate scopes), also adds ':' after graph and blocks declaration and removes ';' from the return line. ".expect" tests are updated to keep up with it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16986

Differential Revision: D14062540

Pulled By: ZolotukhinM

fbshipit-source-id: 7f8e2d11619152a21ef7f1f7f8579c49392c3eca

5 years agoTurn off the ability for Declarations.cwrap entries to be methods.
Gregory Chanan [Wed, 13 Feb 2019 18:27:23 +0000 (10:27 -0800)]
Turn off the ability for Declarations.cwrap entries to be methods.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17053

Differential Revision: D14065887

Pulled By: gchanan

fbshipit-source-id: 5d06ac66d27d28d48c2aff2b0d911f34ea0cd6fd

5 years agoRemove chunk count check on the ChunkBuffer (#16868)
Jaliya Ekanayake [Wed, 13 Feb 2019 18:26:15 +0000 (10:26 -0800)]
Remove chunk count check on the ChunkBuffer (#16868)

Summary:
Previously, the ChunkBuffer depends on the remaining chunk count to signal end of dataloading. This does not work with distributed samplers where each sampler only loads a subset of  chunks. This refactor remove the dependency on the remaining chunk count at the ChunkBuffer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16868

Differential Revision: D14066517

Pulled By: goldsborough

fbshipit-source-id: 293dfe282ceff326dff0876c2f75c2ee4f4463e2

5 years agoUse IndexError instead of RuntimeError in ATen CPU kernels
Stefan Krah [Wed, 13 Feb 2019 17:24:04 +0000 (09:24 -0800)]
Use IndexError instead of RuntimeError in ATen CPU kernels

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17049

Reviewed By: ezyang

Differential Revision: D14064700

Pulled By: fmassa

fbshipit-source-id: 3575db103bba5a7d82f574cbb082beca419151ec

5 years agoMark IntList as deprecated; add C10_DEPRECATED_USING (#16824)
Edward Yang [Wed, 13 Feb 2019 16:44:43 +0000 (08:44 -0800)]
Mark IntList as deprecated; add C10_DEPRECATED_USING (#16824)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16824

There was a big wooly yak getting the deprecated macros to work.
Gory details are in Deprecated.h

Reviewed By: smessmer

Differential Revision: D13978429

fbshipit-source-id: f148e5935ac36eacc481789d22c7a9443164fe95

5 years agoAdd more debugging facilities to ONNXIFI transform (#17043)
Yinghai Lu [Wed, 13 Feb 2019 07:59:40 +0000 (23:59 -0800)]
Add more debugging facilities to ONNXIFI transform (#17043)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17043

Add more debugging facilities for ONXNIFI transform.

Reviewed By: ipiszy

Differential Revision: D14019492

fbshipit-source-id: 8c258ccba2f8ce77db096031fc8a61e15bd8af93

5 years agoUpdating submodules
svcscm [Wed, 13 Feb 2019 05:45:34 +0000 (21:45 -0800)]
Updating submodules

Reviewed By: cdelahousse

fbshipit-source-id: 399afdc341075c383227d0d410a30eeb6c1d3b08

5 years agoUpdating submodules
svcscm [Wed, 13 Feb 2019 05:20:09 +0000 (21:20 -0800)]
Updating submodules

Reviewed By: cdelahousse

fbshipit-source-id: edb216d2eca7120d0f7729b2e4640096a0341154

5 years agounify c2 and TH allocator (#16892)
Dmytro Dzhulgakov [Wed, 13 Feb 2019 05:13:25 +0000 (21:13 -0800)]
unify c2 and TH allocator (#16892)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16892

Replaces https://github.com/pytorch/pytorch/pull/14517

Merged caffe2 and TH CPU Allocators. Mostly using the code from caffe2 allocators.
`memset` of caffe2 allocator is gone now. These two allocators should be almost the same.

Baseline:
```
Running ./tensor_allocation
Run on (48 X 2501 MHz CPU s)
CPU Caches:
  L1 Data 32K (x24)
  L1 Instruction 32K (x24)
  L2 Unified 256K (x24)
  L3 Unified 30720K (x2)
-------------------------------------------------------------------------
Benchmark                                  Time           CPU Iterations
-------------------------------------------------------------------------
BM_MakeStorageImpl                       148 ns        148 ns    4676594
BM_StorageImplCtor                        54 ns         54 ns   12957810
BM_MallocStorageImpl                      62 ns         62 ns   11254745
BM_TensorImplCtor                         22 ns         22 ns   31939472
BM_MallocTensorImpl                      105 ns        105 ns    6505661
BM_Malloc_1                               43 ns         43 ns   16464905
BM_MakeTensorFromStorage                 126 ns        126 ns    5586116
BM_MakeVariableFromTensor                236 ns        236 ns    2995528
BM_ATenCPUTensorAllocationSmall1         319 ns        319 ns    2268884
BM_ATenCPUTensorAllocationSmall2         318 ns        318 ns    2163332
BM_ATenCPUTensorAllocationMedium1        403 ns        403 ns    1663228
BM_ATenCPUTensorAllocationMedium2        448 ns        448 ns    1595004
BM_ATenCPUTensorAllocationBig1           532 ns        532 ns    1352634
BM_ATenCPUTensorAllocationBig2          4486 ns       4486 ns     160978
```
Changed:
```
Running ./tensor_allocation
Run on (48 X 2501 MHz CPU s)
CPU Caches:
  L1 Data 32K (x24)
  L1 Instruction 32K (x24)
  L2 Unified 256K (x24)
  L3 Unified 30720K (x2)
-------------------------------------------------------------------------
Benchmark                                  Time           CPU Iterations
-------------------------------------------------------------------------
BM_MakeStorageImpl                       141 ns        141 ns    4803576
BM_StorageImplCtor                        55 ns         55 ns   13129391
BM_MallocStorageImpl                      64 ns         64 ns   11088143
BM_TensorImplCtor                         23 ns         23 ns   31616273
BM_MallocTensorImpl                      101 ns        101 ns    7017585
BM_Malloc_1                               39 ns         39 ns   18523954
BM_MakeTensorFromStorage                 118 ns        118 ns    5877919
BM_MakeVariableFromTensor                452 ns        452 ns    1565722
BM_ATenCPUTensorAllocationSmall1         384 ns        384 ns    1819763
BM_ATenCPUTensorAllocationSmall2         389 ns        389 ns    1857483
BM_ATenCPUTensorAllocationMedium1        425 ns        425 ns    1646284
BM_ATenCPUTensorAllocationMedium2        430 ns        430 ns    1561319
BM_ATenCPUTensorAllocationBig1           508 ns        508 ns    1309969
BM_ATenCPUTensorAllocationBig2          3799 ns       3799 ns     173674
```

lstm benchmark:
Before:
```
INFO:lstm_bench:Iter: 1 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 21 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 41 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 61 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 81 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 101 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 121 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 141 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 161 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 181 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 201 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 221 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 241 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 261 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 281 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 301 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 321 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 341 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 361 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 381 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Done. Total EPS excluding 1st iteration: 0.8k
```

After:
```
INFO:lstm_bench:Iter: 1 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 21 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 41 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 61 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 81 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 101 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 121 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 141 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 161 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 181 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 201 / 390. Entries Per Second: 0.8k.
INFO:lstm_bench:Iter: 221 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 241 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 261 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 281 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 301 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 321 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 341 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 361 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Iter: 381 / 390. Entries Per Second: 0.7k.
INFO:lstm_bench:Done. Total EPS excluding 1st iteration: 0.8k
```

Reviewed By: ezyang

Differential Revision: D13202632

fbshipit-source-id: db6d2ec756ed15b0732b15396c82ad42302bb79d

5 years agoUpdating submodules
svcscm [Wed, 13 Feb 2019 04:42:10 +0000 (20:42 -0800)]
Updating submodules

Reviewed By: cdelahousse

fbshipit-source-id: 7d730945dbdd7bb7d10192061229ee6e759a1a7f

5 years agoRemove second output of Reshape during ONNXIFI transform (#17027)
Yinghai Lu [Wed, 13 Feb 2019 02:21:09 +0000 (18:21 -0800)]
Remove second output of Reshape during ONNXIFI transform (#17027)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17027

Glow doesn't support second output of Reshape right now and it's useless. For correctness, we do make sure that the second output of Reshape is of Constant type during bound shape inference.

Reviewed By: ipiszy

Differential Revision: D14056555

fbshipit-source-id: f39cca7ba941bf5a5cc3adc96e2b1f943cc0be93

5 years agoenable more unit tests in test_nn (#16994)
Johannes M Dieterich [Wed, 13 Feb 2019 01:41:24 +0000 (17:41 -0800)]
enable more unit tests in test_nn (#16994)

Summary:
These tests work with ROCm 2.1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16994

Differential Revision: D14059802

Pulled By: bddppq

fbshipit-source-id: 8e2cbb13196c2e0283d3e02b7f761374bc580751

5 years agofix bicubic upsampling and enable tests (#17020)
Johannes M Dieterich [Wed, 13 Feb 2019 01:18:40 +0000 (17:18 -0800)]
fix bicubic upsampling and enable tests (#17020)

Summary:
Fix macro name in ifdef guard, enable upsampling tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17020

Differential Revision: D14059780

Pulled By: bddppq

fbshipit-source-id: 82c57d17d5bccdccb548c65d2b7a1ff8ab05af30

5 years agoFold col offsets into bias; optimize A symmetric quant (#16942)
Jongsoo Park [Wed, 13 Feb 2019 01:00:33 +0000 (17:00 -0800)]
Fold col offsets into bias; optimize A symmetric quant (#16942)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16942

We can fold col offsets into bias if zero point of activation is constant.
fbgemm still needs to provide an option to pass col offsets in case zero point of activation keep changes (e.g., dynamic quantization).
A trick to optimize static quantization case is setting A zero point to 0 after folding into bias.

This diff also optimizes when weights use symmetric quantization. When B zero point is 0, we use PackAMatrix instead of PackAWithRowOffset .

TODO:
Ideally, PackAWithRowOffset should perform as fast as PackAMatrix when B_zero_point is 0 to make client code simpler
Same in PackAWithIm2Col and depth-wise convolution (group convolution is already doing this)

Reviewed By: csummersea

Differential Revision: D14013931

fbshipit-source-id: e4d313343e2a16a451eb910beed30e35de02a40c

5 years agoenable unit tests in test_cuda that now pass with ROCm 2.1
Johannes M Dieterich [Wed, 13 Feb 2019 00:48:51 +0000 (16:48 -0800)]
enable unit tests in test_cuda that now pass with ROCm 2.1

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17012

Differential Revision: D14059761

Pulled By: bddppq

fbshipit-source-id: 8309c3ffe1efed42b5db69fdec26427413c3f224

5 years agoRegister CUDA kernels for caffe2 operators (#16691)
Sebastian Messmer [Wed, 13 Feb 2019 00:47:53 +0000 (16:47 -0800)]
Register CUDA kernels for caffe2 operators (#16691)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16691

Previous diffs already introduced a macro that registers caffe2 CPU kernels with c10.
This now also registers the CUDA kernels with it.

Reviewed By: bwasti

Differential Revision: D13901619

fbshipit-source-id: c15e5b7081ff10e5219af460779b88d6e091a6a6

5 years agoEnable test_jit tests that work on ROCm 2.1
Johannes M Dieterich [Wed, 13 Feb 2019 00:45:09 +0000 (16:45 -0800)]
Enable test_jit tests that work on ROCm 2.1

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17010

Differential Revision: D14059748

Pulled By: bddppq

fbshipit-source-id: 7a1f7eee4f818dba91e741437415370973e4d429

5 years agoExtract ShapeInfo and some util functions into a separate file. (#17025)
Ying Zhang [Wed, 13 Feb 2019 00:37:50 +0000 (16:37 -0800)]
Extract ShapeInfo and some util functions into a separate file. (#17025)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17025

Extract ShapeInfo and some util functions into a separate file.

Reviewed By: yinghai

Differential Revision: D14017432

fbshipit-source-id: 201db46bce6d52d9355a1a86925aa6206d0336bf

5 years agoAllow customization of blob node in net_drawer (#16915)
Yinghai Lu [Tue, 12 Feb 2019 22:43:44 +0000 (14:43 -0800)]
Allow customization of blob node in net_drawer (#16915)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16915

TSIA

Reviewed By: ipiszy

Differential Revision: D14018010

fbshipit-source-id: df5ccc06fa37f08e7a02a8acc466c4ad47afe04e

5 years agoIgnore unknown_shaped tensor in bound shape inference (#16916)
Yinghai Lu [Tue, 12 Feb 2019 22:43:43 +0000 (14:43 -0800)]
Ignore unknown_shaped tensor in bound shape inference (#16916)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16916

Two fixes for maximum effort bound shape inference
1. Ignore failed and unknown shape
2. Add specialization for `SparseLengthsWeightedSumFused8BitRowwise`.

Reviewed By: ipiszy

Differential Revision: D14017810

fbshipit-source-id: 25cd68d35aa20b9ed077bdb562eb7f9deff0ab96

5 years agoWorkarounds to the lack of nvidia-smi and ldconfig programs in macosx (was PR 16968...
Pearu Peterson [Tue, 12 Feb 2019 22:14:30 +0000 (14:14 -0800)]
Workarounds to the lack of nvidia-smi and ldconfig programs in macosx (was PR 16968) (#16999)

Summary:
Fix issue #12174 for Mac OSX.

PS: This is a duplicate of PR #16968 that got messed up. Sorry for the confusion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16999

Differential Revision: D14050669

Pulled By: zou3519

fbshipit-source-id: a4594c03ae8e0ca91a4836408b6c588720162c9f

5 years agoDispatch the correct legacy function for geqrf_out and ormqr_out (#16964)
vishwakftw [Tue, 12 Feb 2019 21:34:44 +0000 (13:34 -0800)]
Dispatch the correct legacy function for geqrf_out and ormqr_out (#16964)

Summary:
This fixes the segfault.

Changelog:
- Modify the function calls in LegacyDefinitions for `geqrf_out` and `ormqr_out`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16964

Differential Revision: D14025985

Pulled By: gchanan

fbshipit-source-id: aa50e2c1694cbf3642273ee14b09ba12625c7d33

5 years agoRegister layout for XLA backend.
Davide Libenzi [Tue, 12 Feb 2019 21:34:11 +0000 (13:34 -0800)]
Register layout for XLA backend.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16946

Differential Revision: D14054716

Pulled By: gchanan

fbshipit-source-id: 063495b99b9f7d29ca3ad2020a6bc90d36ba0d7d

5 years agoExport ReduceMean/ReduceFrontMean/ReduceBackMean (Caffe2) to ReduceMean (ONNX). ...
Tongliang Liao [Tue, 12 Feb 2019 21:18:13 +0000 (13:18 -0800)]
Export ReduceMean/ReduceFrontMean/ReduceBackMean (Caffe2) to ReduceMean (ONNX). (#16727)

Summary:
The second input (`lengths`) is not supported.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16727

Differential Revision: D14054105

Pulled By: houseroad

fbshipit-source-id: 36b8d00460f9623696439e1bd2a6bc60b7bb263c

5 years agoClean up allocations in FBGEMM linear (#16985)
James Reed [Tue, 12 Feb 2019 20:18:54 +0000 (12:18 -0800)]
Clean up allocations in FBGEMM linear (#16985)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16985

These statements were causing some redundant allocations + copying, so I cleaned
them up

Reviewed By: zdevito, wanchaol

Differential Revision: D14031067

fbshipit-source-id: f760fb29a2561894d52a2663f557b3e9ab1653de

5 years agoProperly dispatch s_copy__cpu.
Gregory Chanan [Tue, 12 Feb 2019 20:13:11 +0000 (12:13 -0800)]
Properly dispatch s_copy__cpu.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16974

Differential Revision: D14030516

Pulled By: gchanan

fbshipit-source-id: ba4cde5ebf2898d207efbc9117c1f1d6ccae861b

5 years agoGet rid of unused THPStorage defines related to accreal.
Gregory Chanan [Tue, 12 Feb 2019 20:11:45 +0000 (12:11 -0800)]
Get rid of unused THPStorage defines related to accreal.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16973

Differential Revision: D14029538

Pulled By: gchanan

fbshipit-source-id: b51f203ccff97695bf228772bb13e3e6b9bb6d1a

5 years agoFix AddAdjustBatchOp (#16997)
Yinghai Lu [Tue, 12 Feb 2019 19:18:52 +0000 (11:18 -0800)]
Fix AddAdjustBatchOp (#16997)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16997

1. Don't create multiple AdjustBatch ops for the same input name. We create it once and hook input to abc_post_adjust_batch.

2. Dangling tensor. The problem for such an error is still with AttachAdjustBatchOp. Considering such as net
```
op {
  type : "Relu"
  input: "X"
  outpu: "Y"
}
op {
  type : "Relu"
  input: "Y"
  output: "Y2"
}
external_output: "Y"
external_output: "Y2"
```
In this the output of first Relu will be used as an internal node as well as output. We cannot simply rename Y into Y_pre_batch_adjust. Basically, we need another pass in to check all the input of the ops in the net and rename Y into Y_pre_batch_adjust.

Reviewed By: bertmaher

Differential Revision: D14041446

fbshipit-source-id: f6553e287a8dfb14e4044cc20afaf3f290e5151b

5 years agoRoll back PyTorch DockerVersion to 282
Will Feng [Tue, 12 Feb 2019 18:49:38 +0000 (10:49 -0800)]
Roll back PyTorch DockerVersion to 282

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17013

Differential Revision: D14052415

Pulled By: yf225

fbshipit-source-id: df663fb46ee825174fe06b8d395979b3d4e84766

5 years agofix silent failure on Windows builds (#16984)
Karl Ostmo [Tue, 12 Feb 2019 18:41:45 +0000 (10:41 -0800)]
fix silent failure on Windows builds (#16984)

Summary:
Closes #16983

Remove backticks that are being interpreted by the shell. Add -e option to bash script to avoid future such failures
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16984

Reviewed By: yf225

Differential Revision: D14039128

Pulled By: kostmo

fbshipit-source-id: c31a1895377ca86c1b59e79351843cc8c4fd7de3

5 years agoAdd module and name to func created with _jit_internal.boolean_dispatch (#16922)
Theo [Tue, 12 Feb 2019 17:35:23 +0000 (09:35 -0800)]
Add module and name to func created with _jit_internal.boolean_dispatch (#16922)

Summary:
The use case for making this PR is the following bug :
(with F = torch.nn.functional)
`F.max_pool2d.__module__` is `torch._jit_internal`
`F.max_pool2d.__name__` is `fn`

With this PR you get:
`F.max_pool2d.__module__` is `torch.nn.functional`
`F.max_pool2d.__name__` is `max_pool2d`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16922

Differential Revision: D14020053

Pulled By: driazati

fbshipit-source-id: c109c1f04640f3b2b69bc4790b16fef7714025dd

5 years agoMore docs for methods in operator.h
Edward Yang [Tue, 12 Feb 2019 16:02:05 +0000 (08:02 -0800)]
More docs for methods in operator.h

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16826

Reviewed By: izdeby

Differential Revision: D13979891

fbshipit-source-id: df8391ffaff0d44845057bb839f05aea6fc5712c

5 years agoMinor typo
Daniel [Tue, 12 Feb 2019 15:52:55 +0000 (07:52 -0800)]
Minor typo

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16980

Differential Revision: D14033686

Pulled By: gchanan

fbshipit-source-id: 9f7967defc6795640e14157d0b701b185061741f

5 years agoFix allow_inf in assertEqual (#16959)
SsnL [Tue, 12 Feb 2019 15:49:48 +0000 (07:49 -0800)]
Fix allow_inf in assertEqual (#16959)

Summary:
gchanan pointed out in https://github.com/pytorch/pytorch/pull/16389 that `allow_inf` is treating `-inf` and `inf` as equal. This fixes it.

Also fixing #16448 since it's near and 2.1 has released.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16959

Differential Revision: D14025297

Pulled By: gchanan

fbshipit-source-id: 95348309492e7ab65aa4d7aabb5a1800de66c5d6

5 years agoRefine return type Stream to HIPStream in HIPStreamGuardMasqueradingAsCUDA (#16978)
Edward Yang [Tue, 12 Feb 2019 15:22:05 +0000 (07:22 -0800)]
Refine return type Stream to HIPStream in HIPStreamGuardMasqueradingAsCUDA (#16978)

Summary:
Previously, we used the templated class directly to provide
implementations.  However, there is a subtle difference
between this, and CUDAStreamGuard: CUDAStreamGuard has refined types
for the Streams it returns.  This lead to a compilation failure
of HIPified ddp.cpp.  This commit lines them up more closely,
at the cost of copy-paste.

A possible alternate strategy would have been to extend the
InlineDeviceGuard templates to optionally accept refinements
for Stream.  I leave this for future work.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16978

Differential Revision: D14045346

Pulled By: ezyang

fbshipit-source-id: 2b101606e62e4db588027c57902ea739a2119410

5 years agoRevert D14030665: [pytorch][PR] [HOTFIX] Pin docker-ce version to the one expected...
Edward Yang [Tue, 12 Feb 2019 14:59:36 +0000 (06:59 -0800)]
Revert D14030665: [pytorch][PR] [HOTFIX] Pin docker-ce version to the one expected by nvidia-docker2

Differential Revision:
D14030665

Original commit changeset: dece6a5aa4d1

fbshipit-source-id: 885a464ec3d1c23d4e07630fa3b67e69a3eab1b8

5 years agoParse the command line and check the arguments before build_deps() (#16914)
Simeon Monov [Tue, 12 Feb 2019 08:12:03 +0000 (00:12 -0800)]
Parse the command line and check the arguments before build_deps() (#16914)

Summary:
This is needed to check for wrong arguments or --help options
before `build_deps()` is executed. Otherwise command line arguments
are not parsed and checked until `setup()` is run.

Fixes: #16707
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16914

Differential Revision: D14041236

Pulled By: soumith

fbshipit-source-id: 41f635772ccf47f05114775d5a19ae04c495ab3b

5 years agoFix and add testing for nullptr allocator in c2->pt conversion (#16857)
Dmytro Dzhulgakov [Tue, 12 Feb 2019 07:15:54 +0000 (23:15 -0800)]
Fix and add testing for nullptr allocator in c2->pt conversion (#16857)

Summary:
Fixes the bug for when tensor is created on Caffe2 side, then passed to PT and resized. Now we just initialize allocator correctly.

Note that the code in raw_mutable_data() is still necessary because of non-resizable tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16857

Reviewed By: houseroad

Differential Revision: D14019469

Pulled By: dzhulgakov

fbshipit-source-id: 14d3a3b946d718bbab747ea376903646b885706a

5 years agoFix NERPredictor for zero initialization
Dmytro Dzhulgakov [Tue, 12 Feb 2019 07:04:59 +0000 (23:04 -0800)]
Fix NERPredictor for zero initialization

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16931

Reviewed By: dragonxlwang

Differential Revision: D14016749

fbshipit-source-id: b5512c52cef77651bdba1e31f588ea649daacdd9

5 years agoAllow calling a Python function with a dict
David Riazati [Tue, 12 Feb 2019 05:48:58 +0000 (21:48 -0800)]
Allow calling a Python function with a dict

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16989

Differential Revision: D14037896

Pulled By: driazati

fbshipit-source-id: 5f26d2d8fabf0f267909a3383f19d984645f94d0

5 years agoKeep weights name unchanged during SsaRewrite (#16932)
Kimish Patel [Mon, 11 Feb 2019 22:32:30 +0000 (14:32 -0800)]
Keep weights name unchanged during SsaRewrite (#16932)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16932

During onnxifi transformation net ssa is rewritten. At the last step the weight
names are changed back to what they were before. The diff keeps the weight
names unchanged thru the process.

Reviewed By: yinghai

Differential Revision: D13972597

fbshipit-source-id: 7c29857f788a674edf625c073b345f2b44267b33

5 years agoPin docker-ce version to the one expected by nvidia-docker2 (#16976)
Will Feng [Mon, 11 Feb 2019 22:04:31 +0000 (14:04 -0800)]
Pin docker-ce version to the one expected by nvidia-docker2 (#16976)

Summary:
Fix errors such as https://circleci.com/gh/pytorch/pytorch/760715.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16976

Differential Revision: D14030665

Pulled By: yf225

fbshipit-source-id: dece6a5aa4d13ff771c18b4ce02a0b9f9572a379

5 years agoExpose GenerateProposals to PyTorch
Sebastian Messmer [Mon, 11 Feb 2019 22:03:45 +0000 (14:03 -0800)]
Expose GenerateProposals to PyTorch

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16880

Reviewed By: bwasti

Differential Revision: D13998092

fbshipit-source-id: 23ab886ba137377312557fa718f262f4c8149cc7

5 years agoExpose BBoxTransform to pytorch
Sebastian Messmer [Mon, 11 Feb 2019 22:03:45 +0000 (14:03 -0800)]
Expose BBoxTransform to pytorch

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16879

Reviewed By: bwasti

Differential Revision: D13998093

fbshipit-source-id: ddfe4bff83e9a1a4cedf1e520e6d2977b21cb3af

5 years agoMinimize templated code in caffe2 operator wrapper (#16965)
Sebastian Messmer [Mon, 11 Feb 2019 22:03:45 +0000 (14:03 -0800)]
Minimize templated code in caffe2 operator wrapper (#16965)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16965

Instead of having one large templated function to wrap the caffe2 op, minimize the amount of templated code.
Non-templated code can be reused between different operators and decreases binary size.

Reviewed By: orionr

Differential Revision: D14018806

fbshipit-source-id: bedd4152eec21dd8c5778446963826316d210543

5 years agoDon't keep unnecessary saved_inputs alive (#16583)
Adam Paszke [Mon, 11 Feb 2019 21:31:06 +0000 (13:31 -0800)]
Don't keep unnecessary saved_inputs alive (#16583)

Summary:
Fixes #16577.

This greatly improves memory efficiency of certain ops like Dropout2d. Previously, they were implemented as `input * mask` where mask never requires_grad, but we didn't use that knowledge in forward, and (in case of a in-place dropout) kept input.clone() for the backward, when it would simply get ignored.

This patch tries to address this situation by emitting some guards for stores like this, but only if they are as simple, as checking if a single value requires_grad.

Interestingly, the same optimizations apply to methods like bmm, baddmm, etc., but _not to mm nor addmm_, because of how their derivatives are defined. Apparently they unnecessarily use `mat1` to compute the derivative of `mat1` just to improve the error message in case `mat1` was sparse. I'd like to apply this optimization to that case, but I don't want to loose the nicer error message, so if anyone has any ideas for solutions, please let me know...

Full list of operators affected by this patch:
* _nnpack_spatial_convolution
* addbmm
* addcdiv
* addcmul
* addmv
* addr
* baddbmm
* bmm
* cross
* div
* dot
* fmod
* ger
* index_add_
* mul
* mv
* scatter_add_
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16583

Differential Revision: D13900881

Pulled By: gchanan

fbshipit-source-id: dd0aeb2ab58c4b6aa95b37b46d3255b3e014291c

5 years agoEnforce same input tensor storage in VariableType functions (#16305)
Will Feng [Mon, 11 Feb 2019 20:48:17 +0000 (12:48 -0800)]
Enforce same input tensor storage in VariableType functions (#16305)

Summary:
In VariableType.cpp, when a function modifies its input tensors, it should only change the input tensors' storage data in-place, and should never change the input tensors' storage pointers. This PR adds checks for this, and also fixes functions that fail this test.

This is part of the Variable/Tensor merge work (https://github.com/pytorch/pytorch/issues/13638).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16305

Differential Revision: D13897855

Pulled By: yf225

fbshipit-source-id: 0c4fc7eb530d30db88037b1f0981f6f8454d3b79

5 years agoRevert unneeded fixes in flat_hash_map (#16907)
Sebastian Messmer [Mon, 11 Feb 2019 20:29:47 +0000 (12:29 -0800)]
Revert unneeded fixes in flat_hash_map (#16907)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16907

The begin()/end() fix actually doesn't make sense, see my comment on https://github.com/skarupke/flat_hash_map/pull/8
This diff removes it.

Reviewed By: ezyang

Differential Revision: D13985779

fbshipit-source-id: f08b02c941069e2a4e728e02a19b65dc72f96b41

5 years agoFix constexpr in KernelRegistrationBuilder (#16906)
Sebastian Messmer [Mon, 11 Feb 2019 20:29:47 +0000 (12:29 -0800)]
Fix constexpr in KernelRegistrationBuilder (#16906)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16906

In C++11, constexpr implies const, so these methods actually wouldn't be rvalue overloads as intended but const rvalue overloads.
Let's only apply the constexpr flag in C++14 to be safe.

Reviewed By: bddppq

Differential Revision: D13998486

fbshipit-source-id: a04d17ef0cc8f45e3d0a1ca9843d194f4f0f6f7f

5 years agoCatch cudaError_t return val (nodiscard in rocm) (#16399)
Xiaodong Wang [Mon, 11 Feb 2019 20:27:12 +0000 (12:27 -0800)]
Catch cudaError_t return val (nodiscard in rocm) (#16399)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16399

Catching cudaError_t return values in a few places, because it's nodiscard in rocm. Unless we add -Wno-unused-result, it'll end up with a compilation error.

Also in c10/cuda/test, check whether a host has GPU or not. We were silently throwing out the error before (so not really testing the cuda api).

Reviewed By: bddppq

Differential Revision: D13828281

fbshipit-source-id: 587d1cc31c20b836ce9594e3c18f067d322b2934

5 years agooptionally zero infinite losses in CTCLoss (#16199)
Thomas Viehmann [Mon, 11 Feb 2019 20:26:47 +0000 (12:26 -0800)]
optionally zero infinite losses in CTCLoss (#16199)

Summary:
Here is a stab at implementing an option to zero out infinite losses (and NaN gradients).
It might be nicer to move the zeroing to the respective kernels.
The default is currently `False` to mimic the old behaviour, but I'd be half inclined to set the default to `True`, because the behaviour wasn't consistent between CuDNN and Native anyways and the NaN gradients aren't terribly useful.

This topic seems to come up regularly, e.g. in  #14335
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16199

Differential Revision: D14020462

Pulled By: ezyang

fbshipit-source-id: 5ba8936c66ec6e61530aaf01175dc49f389ae428

5 years agoMerge binaries "convert_image_to_tensor" and "caffe2_benchmark" (#16875)
Zhizhen Qin [Mon, 11 Feb 2019 20:24:10 +0000 (12:24 -0800)]
Merge binaries "convert_image_to_tensor" and "caffe2_benchmark" (#16875)

Summary:
Merge binaries "convert_image_to_tensor" and "caffe2_benchmark" to remove the overhead of writing to/reading from Tensor file.

*TODO next: TensorProtos is another overhead. No need for de-serialization.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16875

Reviewed By: sf-wind

Differential Revision: D13997726

Pulled By: ZhizhenQin

fbshipit-source-id: 4dec17f0ebb59cf1438b9aba5421db2b41c47a9f

5 years agoFix missing CircleCI GPG key (#16961)
SsnL [Mon, 11 Feb 2019 19:59:17 +0000 (11:59 -0800)]
Fix missing CircleCI GPG key (#16961)

Summary:
I'm seeing a bunch of apt gpg key errors on CI with the following message:
```
An error occurred during the signature verification. The repository is not
updated and the previous index files will be used. GPG error:
https://packagecloud.io trusty InRelease: The following signatures couldn't
be verified because the public key is not available:
NO_PUBKEY 4E6910DFCB68C9CD
```

Most of the times apt will reuse the old cached version, but sometimes this results in a build failure: https://circleci.com/gh/pytorch/pytorch/758366?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link.

This should hopefully fix it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16961

Differential Revision: D14028151

Pulled By: ezyang

fbshipit-source-id: 7648a0a58ece38d8d04916937a9fa17f34f8833e