review.tizen.org Git - platform/upstream/pytorch.git/log

Undefined behavior with memset of std::string to 0 (#18703)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18703

`zeroPtr` is sometimes a `std::string` tensor, so `memset` to 0 is undefined behavior.

This might be accidentally safe with `std::string` implementation that use SSO (Small String Optimization), but will crash otherwise.

Reviewed By: zheng-xq

Differential Revision: D14714458

fbshipit-source-id: 012a18464e6514d38ff791509b88ddc3fc55b2b1

Revert D14717015: [pytorch][PR] fix nccl compilation to make sure it compiles for architectures that pytorch compiles for

Differential Revision:
D14717015

Original commit changeset: 4aac036f57e5

fbshipit-source-id: c820b8dfb27564271e6b80e133fe655658a7c25c

update of fbcode/onnx to f0d7df2c643c4e37f1fd7735ef02c972c4d19fb5 (#18695)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18695

Previous import was fb1a80692c1ab0bd27b1072f2e7bffacba336777

Included changes:
- **[f0d7df2c](https://github.com/onnx/onnx/commit/f0d7df2c)**: fix testcase names of maxpool_2d_ceil and averagepool_2d_ceil (#1896) <karljang>

Reviewed By: zrphercule

Differential Revision: D14709993

fbshipit-source-id: 7fe2145a481ea2c1b6d85ba1c85c662200a53241

Adding pin_memory kwarg to zeros, ones, empty, ... tensor constructors. (#18455)

Summary:
Make it possible to construct a pinned memory tensor without creating a storage first and without calling pin_memory() function. It is also faster, as copy operation is unnecessary.

Supported functions:
```python
torch.rand_like(t, pin_memory=True)
torch.randn_like(t, pin_memory=True)
torch.empty_like(t, pin_memory=True)
torch.full_like(t, 4, pin_memory=True)
torch.zeros_like(t, pin_memory=True)
torch.ones_like(t, pin_memory=True)
torch.tensor([10,11], pin_memory=True)
torch.randn(3, 5, pin_memory=True)
torch.rand(3, pin_memory=True)
torch.zeros(3, pin_memory=True)
torch.randperm(3, pin_memory=True)
torch.empty(6, pin_memory=True)
torch.ones(6, pin_memory=True)
torch.eye(6, pin_memory=True)
torch.arange(3, 5, pin_memory=True)
```

Part of the bigger: `Remove Storage` plan.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18455

Reviewed By: ezyang

Differential Revision: D14672084

Pulled By: VitalyFedyunin

fbshipit-source-id: 9d0997ec00f59500ee018f8b851934d334012124

Improve Backend comment. (#18567)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18567
ghimport-source-id: 1e50e611a3afcfae86828b7afe06c3fdc6a7bef7

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18567 Improve Backend comment.**

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Reviewed By: dzhulgakov

Differential Revision: D14666189

fbshipit-source-id: 64a41c4a998b1a59ff780d1ae06fa16e5ef3c7c4

Expose alias multinomial methods to ATen (#17904)

Summary:
This PR exposes the multinomialAliasSetup and multinomialAliasDraw methods.

cc: neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17904

Differential Revision: D14700205

Pulled By: ezyang

fbshipit-source-id: 16462fb1f1ef1d560fd586632ea356b23e966ee3

Update cpp_extension.py (#18638)

Summary:
Hi. It seems that when building CPP-extensions with CUDA for Windows, an `extra_cuda_cflags` options are not properly forwarded to `nvcc`.

Use of extra CUDA options is necessary to build, for instance, a InplaceABN (https://github.com/mapillary/inplace_abn), which requires `--expt-extended-lambda` option.

This PR adds one line that correctly appends `extra_cuda_cflags`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18638

Differential Revision: D14704270

Pulled By: ezyang

fbshipit-source-id: e1e330d193d9afd5707a5437a74c0499460d2b90

fix typo

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18653

Differential Revision: D14713920

Pulled By: ezyang

fbshipit-source-id: 170295a162dd23916c1dcc9330918d33277cc9ed

Kill LegacyBridge functions that don't do multiple dispatch. (#18696)

Summary:
At some point, we needed these functions to deal with autograd dispatching to the sparse of TH version of a backwards. But we rewrote all backwards definitions in terms of native functions, so this is no longer necessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18696

Differential Revision: D14710834

Pulled By: gchanan

fbshipit-source-id: b22568c58eefc79d672555bd8832398ccd965cb7

Updating submodules

Reviewed By: zpao

fbshipit-source-id: da3cd711bb81b07c6c284426ffc5e10a969b0d2b

add Int8FCRelu (#18673)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18673

Add a fused FC + Relu

Reviewed By: csummersea

Differential Revision: D14667055

fbshipit-source-id: d88fefba008fc0ca450291532d2b320694c6b785

Fix uninitialized value in pickler (#18678)

Summary:
Fixes #18671
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18678

Differential Revision: D14708969

Pulled By: driazati

fbshipit-source-id: d372c6e3a2a3d3fc48d8afc1fa6807f2ce0e5c6e

fixes multiprocessing serialization for integer nn.Parameter (#18639)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/17345
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18639

Differential Revision: D14711565

Pulled By: soumith

fbshipit-source-id: 0063ed138a215b95d6571dcd68b18569714abe19

fix nccl compilation to make sure it compiles for architectures that pytorch compiles for (#18704)

Summary:
cc: t-vi gchanan zou3519

This fixes https://github.com/pytorch/pytorch/issues/18359
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18704

Differential Revision: D14717015

Pulled By: soumith

fbshipit-source-id: 4aac036f57e564b05d759662e8ad7a80170901c0

More type stubs (#18511)

Summary:
Added stubs for:

* The `device` module
* The `cuda` module
* Parts of the `optim` module
* Began adding stubs for the `autograd` module. I'll annotate more later but `no_grad` and friends are probably the most used exports from it so it seemed like a good place to start.

This would close #16996, although comments on that issue reference other missing stubs so maybe it's worth keeping open as an umbrella issue.

The big remaining missing package is `nn`.

Also added a `py.typed` file so mypy will pick up on the type stubs. That closes #17639.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18511

Differential Revision: D14715053

Pulled By: ezyang

fbshipit-source-id: 9e4882ac997063650e6ce47604b3eaf1232c61c9

NCCL build fix WITH_DISTRIBUTED=1.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18691

Reviewed By: ezyang

Differential Revision: D14706205

Pulled By: gchanan

fbshipit-source-id: 802f19bfd7df3703c0dbce03036e2f2e32eb3efb

caffe2 - set up correct inheritance structure for remaining operator test classes (#18622)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18622

Set up correct inheritance structure for remaining operator test classes

Reviewed By: ezyang

Differential Revision: D14685941

fbshipit-source-id: a6b1b3be325935b7fec7515be13a4994b3016bf0

Peephole Optimize Shape Ops (#18549)

Summary:
Peephole optimize ops that just require Dimensioned Tensor Type, which is what we specialize graphs on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18549

Differential Revision: D14690827

Pulled By: eellison

fbshipit-source-id: 9d7439eb584f0a5b877f5aa53cf80150f00e7e5f

Deprecated lambda based API (#18542)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18542

This adds the deprecated API for defining kernels as lambdas. The new API for defining kernels as lambdas was introduced in D14653005.

Reviewed By: dzhulgakov

Differential Revision: D14653551

fbshipit-source-id: 99900f1436716c69e52c83b68333b642ec2c8558

deprecated function based API (#18444)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18444

This adds the deprecated function based API to c10::RegisterOperators().
This is the API currently exposed under jit::RegisterOperators() and we need to support it for backwards compatibility.

Reviewed By: dzhulgakov

Differential Revision: D14514218

fbshipit-source-id: c77676851cfd431d66f18fd8038cf153a3a7d7cc

Revert "Tensor construction codemod(raw_mutable_data) (#16373)" (#18680)

Summary:
This reverts commit d73c830e236f5b980e5c91914b818d150b60278c.

We have observed significant perf drop when training ResNext101 with multiple amd GPUs:

Before:
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-bench/1636/console
2 GPUs ResNext training got 150\~160 imgs/sec
4 GPUs ResNext training got 270\~280 imgs/sec

After:
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang7-rocmdeb-ubuntu16.04-bench/1637/console
Both 2 and 4 GPUs ResNext training drop to 110\~120 imgs/sec

Similar perf drop are seen on ResNet50 training jobs as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18680

Differential Revision: D14702941

Pulled By: bddppq

fbshipit-source-id: 828141805afc23f25c08d4a2eb6d4b99f817c128

C++ handler for gradient reduction (#18251)

Summary:
This commit adds the `c10d::Reducer` class that hooks into autograd
and performs gradient bucketing and reduction. These are the core
parts of `nn.parallel.DistributedDataParallel` that up to now were
only usable for CUDA models.

This should enable the following:

* Distributed data parallelism for models defined using the C++ frontend.
* Allow overlap of gradient computation and reduction for non-CUDA models.
* Enable distributed data parallelism for models with some unused parameters.

This does not include any logic for computing bucket assignment, which
can be done separately; either by observing autograd execution order
(this is what Apex does), or by assigning buckets based on some
maximum byte size, or both.

Also see #17757 and #13273.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18251

Reviewed By: mrshenli

Differential Revision: D14571899

Pulled By: pietern

fbshipit-source-id: 20f95eefd288dfe8cfffe0a28ca22fa7c9c3cd4c

Updating submodules

Reviewed By: zpao

fbshipit-source-id: 735fc388bff7066e8f46526266a73bf35e121442

add ConvRelu schema (#18693)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18693

As title

Reviewed By: protonu

Differential Revision: D14662880

fbshipit-source-id: 3664faa660a04e1f528a413d2a1700b872c3c684

offload scripts from win-test.sh

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18601

Differential Revision: D14711856

Pulled By: kostmo

fbshipit-source-id: 75fe620541fe2903f69a53dbd1b6d51a0d718113

Some fixes for the build script on Windows (#18681)

Summary:
Fixes https://discuss.pytorch.org/t/pytorch-build-from-source-on-windows/40288/13?u=peterjc123.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18681

Differential Revision: D14711039

Pulled By: soumith

fbshipit-source-id: f7e1a94b163064c055670b2925cd4502e7773599

Fix for double backwards tests (#18190)

Summary:
If none of the outputs require_grad, we don't actually check gradgrad, instead we will check that their numerical gradients are 0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18190

Differential Revision: D14563388

Pulled By: ifedan

fbshipit-source-id: a4eb94c9eb60f14dbe6986cd8cef1fe78a7bc839

Add string index/slice operations (#18247)

Summary:
Adds support for string indexing (`"a"[0]`) and slicing (`"abc"[1:3]`)
to script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18247

Differential Revision: D14574486

Pulled By: driazati

fbshipit-source-id: 4b42aa0881e5398ea7f112be46c0335e6e19dced

Re-land Parsing file check (#18570)

Summary:
The last time I tried to land it there was a merge race with the docs coverage test lol. Re-landing with the fix.

Re-land of https://github.com/pytorch/pytorch/pull/18304
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18570

Reviewed By: driazati

Differential Revision: D14707285

Pulled By: eellison

fbshipit-source-id: 3a0265928aa8cad78961723d8bf0fbf871fdb71d

Create Node2Vec ModuleKeeper

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18504

Reviewed By: sunnieshang

Differential Revision: D14632091

fbshipit-source-id: d4544866552dc6bcbc7515be9e88cb11e7622a44

use acc16 only when n>128 and k>128 in Skylake (#18672)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18672

In Skylake, when n < 128 or k < 128, acc16 is slower.

Reviewed By: jianyuh

Differential Revision: D14700576

fbshipit-source-id: 80ca9f1af4626637eed9c5ca49f95ae744811189

Move ideep singleton registration to ATen from C2. (#18335)

Summary:
Since we are going to add ideep to ATen, and ATen is always compiled, it makes sense to have the registration in ATen rather than C2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18335

Reviewed By: bddppq

Differential Revision: D14578652

Pulled By: gchanan

fbshipit-source-id: 4d77fcfc21a362b21d5291a127498aa722548873

Create torch/lib directory before copying _C.lib on Windows environment. (#18666)

Summary:
`python setup.py develop` fails with following messages.
~~~
...
-- Building with NumPy bindings
-- Not using cuDNN
-- Not using MIOpen
-- Not using CUDA
-- Using MKLDNN
-- Not using NCCL
-- Building without distributed package

Copying extension caffe2.python.caffe2_pybind11_state
Copying caffe2.python.caffe2_pybind11_state from torch\Lib\site-packages\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd to C:\data\source\pytorch\build\lib.win-amd64-3.7\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd
copying torch\Lib\site-packages\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd -> C:\data\source\pytorch\build\lib.win-amd64-3.7\caffe2\python
building 'torch._C' extension
creating build\temp.win-amd64-3.7
creating build\temp.win-amd64-3.7\Release
creating build\temp.win-amd64-3.7\Release\torch
creating build\temp.win-amd64-3.7\Release\torch\csrc
...
creating C:\data\source\pytorch\build\lib.win-amd64-3.7\torch
C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.16.27023\bin\HostX64\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /nodefaultlib:libucrt.lib ucrt.lib /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\data\source\pytorch\torch\lib /LIBPATH:C:\data\dlenv\libs /LIBPATH:C:\data\dlenv\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.16.27023\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.16.27023\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.17763.0\um\x64" shm.lib torch_python.lib /EXPORT:PyInit__C build\temp.win-amd64-3.7\Release\torch/csrc/stub.obj /OUT:build\lib.win-amd64-3.7\torch\_C.cp37-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.7\Release\torch/csrc\_C.cp37-win_amd64.lib /NODEFAULTLIB:LIBCMT.LIB
ライブラリ build\temp.win-amd64-3.7\Release\torch/csrc\_C.cp37-win_amd64.lib とオブジェクト build\temp.win-amd64-3.7\Release\torch/csrc\_C.cp37-win_amd64.exp を作成中
コード生成しています。
コード生成が終了しました。
copying build\lib.win-amd64-3.7\torch\_C.cp37-win_amd64.pyd -> torch
copying build\lib.win-amd64-3.7\caffe2\python\caffe2_pybind11_state.cp37-win_amd64.pyd -> caffe2\python
copying build/temp.win-amd64-3.7/Release/torch/csrc/_C.cp37-win_amd64.lib -> build/lib.win-amd64-3.7/torch/lib/_C.lib
error: could not create 'build/lib.win-amd64-3.7/torch/lib/_C.lib': No such file or directory
~~~

When `python setup.py install` is executed, `torch/lib` has been created by previous process (copying many files) and this copy succeeds. But in develop mode, that process does not executed and this copy fails.

This patch creates `torch/lib` directory if do not exist.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18666

Differential Revision: D14704269

Pulled By: ezyang

fbshipit-source-id: b2d7c698a906b945bf34bb78f17b91b4fdfd3294

Move flags that do not work on MSVC (#18686)

Summary:
MSVC errors on these flags as they are not supported
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18686

Differential Revision: D14704254

Pulled By: ezyang

fbshipit-source-id: 936d33ed6b7474d7774a49505cdac50dbe8dd99a

Fix unused lambda capture warnings (#18662)

Summary:
```
aten/src/ATen/native/cpu/DistanceOpsKernel.cpp.DEFAULT.cpp:109:104: warning: lambda capture 'combs' is not used [-Wunused-lambda-capture]
parallel_for(0, combs, internal::GRAIN_SIZE / (16 * m), [p, self_start, self_end, n, m, res_start, combs](int64_t k, int64_t end) {
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18662

Differential Revision: D14699379

Pulled By: bddppq

fbshipit-source-id: 5062d4327bb5f7b485c2ffa30c98e10576416f03

handle a rare case of histogram min is inf/nan (#18239)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18239

When min is inf or nan, we get UBSAN errors

Reviewed By: csummersea

Differential Revision: D14537668

fbshipit-source-id: e70ffb5ecd2b10793356070c69fdabf8f25b203e

Delete duplicated technical content from contribution_guide.rst (#18628)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18628
ghimport-source-id: d94b81a6f303883d97beaae25344fd591e13ce52

Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18629 Provide flake8 install instructions.
* **#18628 Delete duplicated technical content from contribution_guide.rst**

There's useful guide in contributing_guide.rst, but the
technical bits were straight up copy-pasted from CONTRIBUTING.md,
and I don't think it makes sense to break the CONTRIBUTING.md
link. Instead, I deleted the duplicate bits and added a cross
reference to the rst document.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D14701003

fbshipit-source-id: 3bbb102fae225cbda27628a59138bba769bfa288

Provide flake8 install instructions. (#18629)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18629
ghimport-source-id: 66a8871c56ffcfa7d4bfdf601e180fae99194e28

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18629 Provide flake8 install instructions.**
* #18628 Delete duplicated technical content from contribution_guide.rst

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D14701004

fbshipit-source-id: b64292f0ef01b7894cf6b9ff8d5fd9e921c8d162

Adding quantized tensor shape/type info support for caffe2=>glow in caffe2 side (#18621)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18621

This diff added caffe2 support for onnxifi quantization.

Reviewed By: yinghai

Differential Revision: D14648767

fbshipit-source-id: 4ddb492cacbba6142305866e6dbb875880acaea3

Fix test on windows (#18667)

Summary:
Breakage in #18188
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18667

Differential Revision: D14700133

Pulled By: driazati

fbshipit-source-id: 4cc26bd579fc1b074b3bef6046cc1030facee130

Enforce check ad in test_jit (#18509)

Summary:
If a test triggers autodiff, it must have a `DifferentiableGraph` in its differentiated forward graph, and this subgraph must have either the original aten node, or the corresponding nodes used in AD formula.

Typically a forward differentiable graph looks like this:
```
graph(%i0 : Float(),
      %i1 : Float()):
  %3 : Float() = prim::DifferentiableGraph_0(%i0, %i1)
  return (%3)
with prim::DifferentiableGraph_0 = graph(%0 : Float(),
      %1 : Float()):
  %2 : Float() = aten::max(%0, %1)
  return (%2)
```
which tells us `aten::max(Tensor self, Tensor other) -> Tensor` is symbolically differentiable.

Update: there're a lot of cases (fusions/ConstantChunk/python implementations) that breaks it so I decided to make the check optionally take node names if different from function name.
~~[OLD]Theoretically I could also check if `aten::max` is in the differentiable block or not to be more precise, but there're also cases like `chunk` where in a differentiable block it's replaced with a prim node (ConstantChunk) and we will have to special case them. Any suggestions here (to be more precise or no) is very welcome!~~

We used to have a list containing nn tests should be run against AD, I moved it to an field when constructing our test(either torch or nn). I think it's cleaner this way, and it matches the fact that for the same op we support one schema of it but not all, in this way we could just turn on the corresponding test which triggers that supported schema.

cc: apaszke zdevito wanchaol ngimel for a review

[Done] :
- Going through a manual second pass of all tests to check if they should enable AD test or not....
- Add a readme about how to add AD for an op and how to add/enable its test in test_jit.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18509

Differential Revision: D14696811

Pulled By: ailzhang

fbshipit-source-id: c5e693277baac585cd3aed5ab2c0e7faa5e6f29f

Use proper isnan check

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18663

Differential Revision: D14699385

Pulled By: bddppq

fbshipit-source-id: 596ad3371e7704802591e49f7e1c55dc6cd2896f

pad_circular -> _pad_circular (#18608)

Summary:
pad_circular is really private, as circular padding is exposed via `F.pad`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18608

Differential Revision: D14691704

Pulled By: soumith

fbshipit-source-id: 8c2f90596feed670976115041efed3ca071e8306

Fix wrap(at::Scalar) (#18632)

Summary:
Problem:
```cpp
// This function expects a `Variable` as input
inline PyObject* wrap(at::Tensor tensor) {
  return THPVariable_Wrap(Variable(std::move(tensor)));
}

inline PyObject* wrap(at::Scalar scalar) {
  // This function calls `wrap(at::Tensor tensor)` (the function above), but since
  // `scalar_to_tensor(...)` returns a `Tensor` and not a `Variable`, the call to
  // `wrap(at::Tensor tensor)` will fail with "Tensor that was converted to Variable
  // was not actually a Variable", which is not what we want.
  return wrap(scalar_to_tensor(scalar));
}
```

The right fix is to call `make_variable(...)` with the tensor returned from `scalar_to_tensor(scalar)`.

This unblocks https://github.com/pytorch/pytorch/pull/18230 as it is the only patch that hits this code path now. All other native functions that return Scalar (such as `item()` or `_local_scalar_dense()`) either has custom-defined implementation that doesn't go through this path, or is not exposed to Python at all.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18632

Differential Revision: D14689293

Pulled By: yf225

fbshipit-source-id: be7ba5d3de83a69533a2997de97ad92989ff78ee

Deprecated type() -> scalar_type()

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18642

Differential Revision: D14696848

Pulled By: ezyang

fbshipit-source-id: 43d1f86ecee5f6c6c5b70fd7d0e2063c3fc473ab

Turn on F401: Unused import warning. (#18598)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18598
ghimport-source-id: c74597e5e7437e94a43c163cee0639b20d0d0c6a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18598 Turn on F401: Unused import warning.**

This was requested by someone at Facebook; this lint is turned
on for Facebook by default.  "Sure, why not."

I had to noqa a number of imports in __init__.  Hypothetically
we're supposed to use __all__ in this case, but I was too lazy
to fix it.  Left for future work.

Be careful!  flake8-2 and flake8-3 behave differently with
respect to import resolution for # type: comments.  flake8-3 will
report an import unused; flake8-2 will not.  For now, I just
noqa'd all these sites.

All the changes were done by hand.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Differential Revision: D14687478

fbshipit-source-id: 30d532381e914091aadfa0d2a5a89404819663e3

Update documentation for CTCLoss (#18415)

Summary:
This is meant to resolve #18249, where I pointed out a few things that could improve the CTCLoss docs.

My main goal was to clarify:
- Target sequences are sequences of class indices, excluding the blank index
- Lengths of `target` and `input` are needed for masking unequal length sequences, and do not necessarily = S, which is the length of the longest sequence in the batch.

I thought about Thomas's suggestion to link the distill.pub article, but I'm not sure about it. I think that should be up to y'all to decide.

I have no experience with .rst, so it might not render as expected :)

t-vi ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18415

Differential Revision: D14691969

Pulled By: soumith

fbshipit-source-id: 381a2d52307174661c58053ae9dfae6e40cbfd46

Fallback kernels (#18443)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18443

Allow registering a kernel without a dispatch key. In this case, the kernel becomes a fallback kernel that is called whenever no other kernel matches.
This is also useful for the legacy function based API (since that API doesn't know about dispatch keys) or any other custom ops that don't care about dispatch
and just want one kernel to be called no matter the dispatch key.

Reviewed By: dzhulgakov

Differential Revision: D14603258

fbshipit-source-id: 242dc8871dad2989ca25079854d0cc97429e7199

Introduce lambda-based kernel API (#18541)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18541

Allow registering lambdas as c10 kernels.

Reviewed By: dzhulgakov

Differential Revision: D14653005

fbshipit-source-id: f867cc776b1339e83b7a2e1935f5cf924cfba44a

Report better errors when kernel or dispatch key are missing (#18302)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18302

These might be use cases we want to support in the future, but they don't work yet.
Let's at least report an error instead of doing segfaults or worse.

Reviewed By: dzhulgakov

Differential Revision: D14572346

fbshipit-source-id: 49262ce131493bc887defe2978d8b22f202cd8cc

Move stuff to cpp files (#18301)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18301

Move code out of headers and templates into source files and non-templates.

Reviewed By: dzhulgakov

Differential Revision: D14572347

fbshipit-source-id: 9fd5d62d54000a95e93076cd73f591ba2c5c2653

Check kernel against function schema in c10 op registration (#18256)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18256

This diff infers the function schema from the kernel function/functor and checks that it matches the specified function schema.

This diff does not allow (yet) to omit specifying the function schema in the registration API. That will come in a future diff.

Reviewed By: dzhulgakov

Differential Revision: D14552738

fbshipit-source-id: 00202b489ede19f26ae686c97416b38c72c11532

Add functor- and function-based kernel registration API (#18162)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18162

- Adds the API to register a functor- and function-based kernel.
- Change the experimental c10 ops to use this new API instead of the old one
- Deletes the old APIs in KernelRegistration.h and OpSchemaRegistration.h

Reviewed By: dzhulgakov

Differential Revision: D14514239

fbshipit-source-id: 35b2f6e8f62964e54886450a6a5fac812ed20f26

New operator registration MVP (#18161)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18161

This introduces version 0 for the new operator registration.

For now, it only works with kernels that are defined as stack-based functions.
This is actually not the intended public API for defining kernels, but it's the basis which is going to be used to define the public APIs (see diffs on top for them),
and it's also the API used for exposing caffe2 operators.

This diff also switches the mechanism for exposing caffe2 operators to the new mechanism.

Reviewed By: dzhulgakov

Differential Revision: D14514231

fbshipit-source-id: 454ab7b5b46a10203aa27b175400d23f818dd1df

Fix trt installation in CI (#18609)

Summary:
caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_build is failing
```
...
Mar 29 04:44:46 Need to get 174 MB of archives.
Mar 29 04:44:46 After this operation, 576 MB of additional disk space will be used.
Mar 29 04:44:46 Do you want to continue? [Y/n] Abort.
Exited with code 1
...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18609

Differential Revision: D14694990

Pulled By: bddppq

fbshipit-source-id: 260446a8650f660a2baf123a3f17efdf0a8d6c64

Attribute serialization improvements (#18188)

Summary:
* adds attributes to `ScriptModule.__getattr__` so they can be accessed in Python after re-importing
* full support for all the possible values for an `int64_t`
* this necessitated a bunch more `pushWhatever` functions, so re-introduced a templated version to cut down on duplicate code
* tests to validate references / value sharing works
* adds `torch.jit.Unpickler` which people can use to de-serialize the pickle files into Python / have a quick reference on how to do this without PyTorch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18188

Differential Revision: D14527490

Pulled By: driazati

fbshipit-source-id: efd15579cc04aa2e28c4b2c9490d82d849dee559

support pre-convert filter format for mkldnn training mode and change 'OptimizeForIdeep' to 'OptimizeForMkldnn' (#15171)

Summary:
For MKL-DNN,the filter data will be reorderd to primitive format, it takes a lot of time.
So the patch provide a method to convert filter format before training.
And "OptimizeForIdeep" will be changed to "OptimizeForMkldnn" in this patch.
This patch depends on https://github.com/pytorch/pytorch/pull/12866
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15171

Differential Revision: D14590741

Pulled By: yinghai

fbshipit-source-id: 07971c9977edac3c8eec08ca2c39cda639683492

Tensor construction codemod(raw_mutable_data) (#16373)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16373

motivation: https://github.com/pytorch/pytorch/pull/12407
This is a manual diff.
most of the fixes should be:

```
auto* Y = Output(0);
Y->Resize(dims);
Y->raw_mutable_data(dtype);
```
-->
```
auto* Y = Output(0, dims, at::dtype(dtype));
```
But there might be other cases.

Reviewed By: dzhulgakov

Differential Revision: D13725460

fbshipit-source-id: 649a4b0e42f62cda1a60171dd9fa3e440dc9dca1

Add hash() global (#18258)

Summary:
This adds `hash()` which supports `int`, `str`, and `float`. It relies on `std::hash` which is implementation defined, so the result of `hash()` in TorchScript is not the same as in Python, but should satisfy the same properties.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18258

Differential Revision: D14692317

Pulled By: driazati

fbshipit-source-id: 909df5d024bb3feea157d5a203b7de53c72261c9

Move fuser to test_jit_fuser (#18590)

Summary:
Start of breaking up test_jit.py

New files will have the format test_jit_* so they are easily grepable but remain in the same directory so we don't have to go through multiple sources for imports.

I am adding a test that's expected to fail to be sure it's running.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18590

Reviewed By: wanchaol

Differential Revision: D14677094

Pulled By: eellison

fbshipit-source-id: 9782c6aa9525bb6f332fc75cfff004c83a417522

Experimental logging/counters API (#18235)

Summary:
This defines a generic counters API that users can utilize to provide monitoring functionality in e.g. a production service. We expose both counters for runtime internals as well as a TorchScript API to create user-defined counters. Synopsis of the API:

- `torch/csrc/jit/script/logging.h` specifies the externally-facing API in C++
- `torch/jit/_logging.py` specifies the Python API

We use an interface, `LoggerBase`, to define the interactions between users and a logging backend. Implementing a subclass of `LoggerBase` allows the user to handle these events in a custom way, such as logging into a DB or calling into an infra-specific counters API.

From the frontend perspective, we can create log events in two ways:
1. We provide an `add_stat_value(name, val)` function. This calls into the Logger backend with a key/value pair. For example, we might call `add_stat_value('foo', 1)` to bump an event counter.
2. We provide a `time_point()` function to record a timestamp in nanoseconds. This can be used in conjunction with `add_stat_value` to record runtime wall clock durations.

Examples of frontend usage can be found in `test_jit.py TestLogging`.

We provide a trivial `LockingLogger` implementation as an example and for testing purposes. It is likely not ready for production usage. It demonstrates that a backend implementing the API can do things like specify aggregation types and report these aggregate stats via the `get_counters()` API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18235

Differential Revision: D14545060

Pulled By: jamesr66a

fbshipit-source-id: 04099543a1898cfdd411511e46e03d5dce9b4881

Revert D14668859: [pytorch][PR] Re-land Parsing file check

Differential Revision:
D14668859

Original commit changeset: 3825a35ddc61

fbshipit-source-id: f3343ec6b63fe8f1f04959adfac4331865990047

Update argument names of torch::autograd::FunctionPostHook (#18140)

Summary:
They are called as (outputs, inputs) and were named (inputs, outputs).

Possible follow up fix is to make the outputs argument an lvalue to allow for calling multiple post hooks without ever copying outputs vector. It looks like the copy is now forced because the hook takes a const reference as input and returns an value. This would change the prototype of the function, so needs further discussion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18140

Differential Revision: D14684498

Pulled By: pietern

fbshipit-source-id: 1bd3ddbdd1ff7fe0a18241de5a9ec745a4e7ef07

note on updating existing source (#18409)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/18388
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18409

Differential Revision: D14597666

Pulled By: soumith

fbshipit-source-id: 156104c0cd19da06f6f96a225228d1e8cf831af1

Re-land Parsing file check (#18570)

Summary:
The last time I tried to land it there was a merge race with the docs coverage test lol. Re-landing with the fix.

Re-land of https://github.com/pytorch/pytorch/pull/18304
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18570

Differential Revision: D14668859

Pulled By: eellison

fbshipit-source-id: 3825a35ddc6179a0d433d70d22b5c1a96c20b21a

Refactoring serialization of ONNX initializers to be name-based (Resubmission) (#17830)

Summary:
houseroad - this is the resubmission of https://github.com/pytorch/pytorch/pull/17420, as suggested.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17830

Reviewed By: zrphercule

Differential Revision: D14398714

Pulled By: houseroad

fbshipit-source-id: bda475f1ae8a5273ebdb0f6883fc66036c29d326

Initial implementation of InsertObserverNodes pass. (#18152)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18152
ghimport-source-id: 1dd5e62c4d93394dcd8d8af2871554575c8d3d1a

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18152 Initial implementation of InsertObserverNodes pass.**
* #18151 Add quant-passes stubs.

gh-metadata: pytorch pytorch 18150 gh/zolotukhinm@gmail.com/2/head

Differential Revision: D14584223

fbshipit-source-id: 30896acc1a8901d22c6a167eb87d2fbaafbbeb6f

Fix bug in tensor feed which caused crash due to wrong tensor type (#18552)

Summary:
In blob feeder for ideep device, the wrong device option is given and led to a crash issue.
This patch aims to correct the device option to fix this bug.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18552

Differential Revision: D14679838

Pulled By: yinghai

fbshipit-source-id: bde11e6a6fe44822166881dcb7c9bd0b34b4ecf3

Upgrade mkldnn-bridge to revert tensor capacity patch and prepare for DNNLOWP support (#18471)

Summary:
1. Upgrade mkldnn-bridge to revert tensor capacity patch to avoid ASAN issue.
2. Prepare for DNNLOWP support.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18471

Differential Revision: D14621569

Pulled By: yinghai

fbshipit-source-id: 9df300b77d0f2acd1a4f63c2925b7a7cab7a474e

register BoxWithNMSLimit with C10

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17956

Reviewed By: houseroad

Differential Revision: D14417300

fbshipit-source-id: eb5e2ba84513b3b7bfa509dc442424b13fe9148f

Fix c10d build without nccl.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18582

Differential Revision: D14672928

Pulled By: gchanan

fbshipit-source-id: 74e9805cbaf5ebe8e3f579fe08dad72eb410b80a

Add named submodule support to nn::Sequential (#17552)

Summary:
Previously, we were not able to assign names to `nn::Sequential`'s submodules. This PR adds this feature to match the Python API. Example use:
```cpp
Sequential sequential(named_submodule({
      {"linear", Linear(10, 3)},
      {"conv2d", Conv2d(1, 2, 3)},
      {"dropout", Dropout(0.5)},
      {"batchnorm", BatchNorm(5)},
      {"embedding", Embedding(4, 10)},
      {"lstm", LSTM(4, 5)}
}));
```

It also enables loading parameters of Python `nn.Sequential` module with custom submodules names into C++ frontend, unblocking https://github.com/pytorch/vision/pull/728#issuecomment-466661344.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17552

Differential Revision: D14246834

Pulled By: yf225

fbshipit-source-id: 3030b5c5d68f6dd5d3e37ac4b4f98dc6d6d9ba72

Rename `btriunpack` to `lu_unpack` (#18529)

Summary:
Changelog:
- Renames `btriunpack` to `lu_unpack` to remain consistent with the `lu` function interface.
- Rename all relevant tests, fix callsites
- Create a tentative alias for `lu_unpack` under the name `btriunpack` and add a deprecation warning to not promote usage.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18529

Differential Revision: D14683161

Pulled By: soumith

fbshipit-source-id: 994287eaa15c50fd74c2f1c7646edfc61e8099b1

fix lint (#18623)

Summary:
Fix lint
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18623

Differential Revision: D14686265

Pulled By: eellison

fbshipit-source-id: 4bbe0f5bc58f508cbf4bc1baef2029ce1eaa42d8

Manual hipify caffe2/distributed and rocm update (no hcc modules support) (#18088)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18088

Manually hipify the distributed folder

Reviewed By: bddppq

Differential Revision: D14482702

fbshipit-source-id: cc0abdf525b423ab1f18db8010d21e27c6668d36

Change dnnlowp log level from warning to v2 (#18576)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18576

As in title

Reviewed By: feiyu1990

Differential Revision: D14670898

fbshipit-source-id: 1983099b2ba57daab393278553f10dcdb1812fdf

multiline KeyError msg python bug workaround (#18557)

Summary:
make multiline KeyError msg readable by working around a python bug https://bugs.python.org/issue2651

discussion: https://github.com/pytorch/pytorch/issues/16647
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18557

Differential Revision: D14681086

Pulled By: soumith

fbshipit-source-id: acbd13a823302c854c3d364028ed414fd8ce6bc8

ReduceLrOnPlateau: best=current -> best=copy(current) (#16364) (#16697)

Summary:
Fixes #16364
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16697

Differential Revision: D14680879

Pulled By: soumith

fbshipit-source-id: c50c22f3eacea4474fb3a04fe85fbf11d5a177c9

make InstanceNorm1d raise an error if the input is 2D (#11992)

Summary:
Resolves #11991 .

Any comment is welcome.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11992

Differential Revision: D14680974

Pulled By: soumith

fbshipit-source-id: 8e287a9c32bf43b35edc9d127f16ed6b72c61d91

Fixed torch.arange docs (#18604)

Summary:
Kindly let me know if its okay and if any places i need to make a fix. Closes #18534
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18604

Differential Revision: D14680712

Pulled By: soumith

fbshipit-source-id: 030e4a3d8f7839cbe2b8a3ef386323f0d39eb81a

Minor fixes in fastrnns benchmarks

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18613

Reviewed By: wanchaol

Differential Revision: D14681838

fbshipit-source-id: 60bd5c9b09398c74335f003cd21ea32dd1c45876

Rename `btrifact*` to `lu` (#18435)

Summary:
Changelog:

- Renames `btrifact` and `btrifact_with_info` to `lu`to remain consistent with other factorization methods (`qr` and `svd`).
- Now, we will only have one function and methods named `lu`, which performs `lu` decomposition. This function takes a get_infos kwarg, which when set to True includes a infos tensor in the tuple.
- Rename all tests, fix callsites
- Create a tentative alias for `lu` under the name `btrifact` and `btrifact_with_info`, and add a deprecation warning to not promote usage.
- Add the single batch version for `lu` so that users don't have to unsqueeze and squeeze for a single square matrix (see changes in determinant computation in `LinearAlgebra.cpp`)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18435

Differential Revision: D14680352

Pulled By: soumith

fbshipit-source-id: af58dfc11fa53d9e8e0318c720beaf5502978cd8

Optimize relu op on GPU (#18506)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18506

Optimize relu op on GPU

Reviewed By: houseroad

Differential Revision: D14633171

fbshipit-source-id: bd3afa9a0bae1325d32ad4153736a0c7ecb0ec64

update of fbcode/onnx to fb1a80692c1ab0bd27b1072f2e7bffacba336777 (#18585)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18585

Previous import was b29e78a4efb8e5d8995f576bbf19a846807829b6

Included changes:
- **[fb1a8069](https://github.com/onnx/onnx/commit/fb1a8069)**: Fix wrongly handled attribute in MVN and test generating scripts (#1877) <Raymond Yang>
- **[b22041c3](https://github.com/onnx/onnx/commit/b22041c3)**: Add dilation attribute to MaxPool (#1864) <karljang>

Reviewed By: zrphercule, benoitsteiner

Differential Revision: D14668623

fbshipit-source-id: fa7f44b1ecc949d8dd654939d20b1e93db98b1d2

update of fbcode/foxi to 81e1683d6348eee4b5ed1145222dc2c41be4269c (#18596)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18596

Previous import was 2bcc4064c90e87b9638615c733485f07c47b7558

Included changes:
- **[81e1683](https://github.com/houseroad/foxi/commit/81e1683)**: Merge pull request #9 from zrphercule/add_foxi_quantization <Rui Zhu>
- **[580559c](https://github.com/houseroad/foxi/commit/580559c)**: char=>uint8 <zrphercule>
- **[1a572f7](https://github.com/houseroad/foxi/commit/1a572f7)**: add quantization <zrphercule>

Reviewed By: zrphercule

Differential Revision: D14677404

fbshipit-source-id: 09429b3bf0e7783a25b8145020e505761bad887d

Delete batch tensor (#18575)

Summary:
Deleting batch tensor since we are no longer maintaining the project and keeping it functional is blocking other improvements.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18575

Differential Revision: D14671126

Pulled By: eellison

fbshipit-source-id: b42d5b699c4d12171ed95e6d3a977532167f0d2c

Update NNPACK to current master (#18580)

Summary:
This fixes builds on x86 (32 bits).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18580

Differential Revision: D14672462

Pulled By: soumith

fbshipit-source-id: 7629b001c2bfa3e5b6ade7f1b03a8280232a4c16

Enhance build_ios.sh to be consistent with build_android.sh (#18564)

Summary:
1, Enhance build_ios.sh to be consistent with build_android.sh;
2, Add docs for build_ios.sh.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18564

Differential Revision: D14680752

Pulled By: soumith

fbshipit-source-id: 6d2667ed8a3c85a057a522838f5d0461dd4788cf

Serialization supports pathlib.Path object for the input argument (#18562)

Summary:
This will allow pathlib.Path object to the torch.load as an input argument.
Fixes #16607
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18562

Differential Revision: D14668255

Pulled By: soumith

fbshipit-source-id: 0ae4f7c210918582912f2d1ef2a98f1ab288c540

Target and input sizes mismatch warning in L1 Loss / L1 Smooth Loss (#18565)

Summary:
Addind the same warning message already present in the mse_loss function to the L1 losses when input and target sizes are different.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18565

Differential Revision: D14671415

Pulled By: soumith

fbshipit-source-id: 01f5e1fb1ea119dbb2aecf1d94d0cb462f284982

Resubmit PR-18512: Improved onnx export for 3 onnx ops (#18571)

Summary:
Fix ROCm CI failure
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18571

Differential Revision: D14669323

Pulled By: bddppq

fbshipit-source-id: 022afe5c20e680295c9cfdfe1ec14650305955a8

in caching allocator, ignore and clear the error if not ready

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18584

Differential Revision: D14675041

Pulled By: bddppq

fbshipit-source-id: c1fab797e0d224e0a481a0395a3f9975c4265ff6

Add external callbacks into RecordFunction (#17844)

Summary:
Add a way to insert external callbacks into PT's RecordFunction
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17844

Differential Revision: D14399664

Pulled By: ilia-cher

fbshipit-source-id: 76654799811fefd3ffed4abfb46ed95b492cebab

Implement rotated generate_proposals_op without opencv dependency (CPU version)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18533

Reviewed By: ezyang

Differential Revision: D14648083

fbshipit-source-id: e53e8f537100862f8015c4efa4efe4d387cef551

Use SetOutputTensor instead of copying outputs manually (#17770)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17770

As title

Reviewed By: dzhulgakov

Differential Revision: D14370937

fbshipit-source-id: f415490c38556cf03bb13dce3643775331483448

Fix NCCL/Gloo process groups and DDP stream sync bug (#18465)

Summary:
DDP with NCCL backend uses a [worker stream](https://github.com/pytorch/pytorch/blob/d3eb941ed96774efb8d89a0b20c9e49807ea85a7/torch/csrc/distributed/c10d/ddp.cpp#L142) to flatten grand batch
tensors, and passes the flattened tensor to [another stream](https://github.com/pytorch/pytorch/blob/d3eb941ed96774efb8d89a0b20c9e49807ea85a7/torch/lib/c10d/ProcessGroupNCCL.cpp#L379) to
conduct ncclAllReduce. The flattened tensor has to record the
ncclAllReduce stream, otherwise multiple streams might access the
same memory space.

cc ppwwyyxx
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18465

Differential Revision: D14613449

Pulled By: mrshenli

fbshipit-source-id: b62773732552d12cc87b7adeb6897e9e11753ea9

Inference LSTM integration test (#18559)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18559

Adding integration test for inference LSTM

Reviewed By: houseroad

Differential Revision: D14656698

fbshipit-source-id: 80fb2a72be30fcb695f4471b72bf9d6e3965bf81

Add Slot type to abstract the raw pointers being used for slots. (#18226)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18226
ghimport-source-id: b9ec8651212875b30971cc6859d2ddec6559ae3a

If modules become first-class IValues, then the slots will no longer be raw pointers but (IValue, index) pairs. This commit inserts the Slot abstraction so that this change can be made in later patches.

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18226 Add Slot type to abstract the raw pointers being used for slots.**

Differential Revision: D14542022

fbshipit-source-id: b81d7f4334c983d663e7551bda82df43680d7c5f

Revert D14635130: Improved onnx export for 3 onnx ops.

Differential Revision:
D14635130

Original commit changeset: d54a2b6e2950

fbshipit-source-id: f624e2befdde245cb88435a95508b2a8e6b12e61

Improved onnx export for 3 onnx ops. (#18512)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18512

Ceil and Floor have been supported since version 6 of ONNX: export them using the native onnx ops instead of an Aten op.
Similarly, support for the Where op has been added in version 9, so we don't need to wrap these op in an Aten op.

Reviewed By: houseroad

Differential Revision: D14635130

fbshipit-source-id: d54a2b6e295074a6214b5939b21051a6735c9958