review.tizen.org Git - platform/upstream/pytorch.git/log

Include vec256 headers in setup.py (#17220)

Summary:
Fix #16650.

Headers such as `ATen/cpu/vml.h` contain `#include <ATen/cpu/vec256/vec256.h>`
for example, but these vec256 headers aren't included, due to commit e4c0bb1.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17220

Differential Revision: D14165695

Pulled By: ezyang

fbshipit-source-id: 27b2aa2a734b3719ca4af0565f79623b64b2620f

Enable MAX_JOBS for using Ninja on Windows

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17341

Differential Revision: D14164740

Pulled By: soumith

fbshipit-source-id: 7a1c3db0a7c590f72a777fcd32e1c740bb0c6257

Avoid unnecessary CPU-to-GPU copy of torch.load with CUDA (#17297)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17297

When `torch.load` needs to load a tensor, no matter which device it will be end up being loaded on, it first creates a CPU storage for it of the necessary size. This storage is allocated but it's not "set" yet, hence no data is written to it: it exists in the kernel's memory map, but it's not resident and doesn't take up physical pages. Then, this storage is passed to the `map_location` function (if the parameter is a string, a device or a map, PyTorch builds that function automatically). The default map for CUDA consists effectively in `lambda storage, _: storage.cuda()` (I omitted the code needed to pick the correct device). This creates a GPU storage and copies over the data of the CPU storage. *This step is unnecessary as we're copying uninitialized memory*. (Surprisingly enough, though, it appears the kernel is smart enough that reading from the unpaged CPU memory doesn't cause it to become paged.) Once `map_location` returns a storage residing on the correct target device, `torch.load` resumes reading the file and copying the tensor's content over into the storage. This will overwrite the content that had previously been written to it, which confirms that the above copy was pointless.

A way to avoid this useless copy is to just create and return a new empty storage on the target GPU, instead of "transforming" the original one.

This does indeed increase the performance:
```
In [5]: torch.save(torch.rand(100, 100, 100), "/tmp/tensor")

In [6]: %timeit torch.load("/tmp/tensor", map_location="cuda")
1.55 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [7]: %timeit torch.load("/tmp/tensor", map_location=lambda storage, _: torch.cuda.FloatStorage(storage.size()))
1.03 ms ± 44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

Credit for this diff is shared with adamlerer and fmassa.

Differential Revision: D14147673

fbshipit-source-id: a58d4bc0d894ca03a008499334fc2cdd4cc91e9f

allow lists to contain any tensor type (#17321)

Summary:
If something is a TensorList, it should be a list of `TensorType`, not a list of some specialized type.
Fixes #17140, #15642
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17321

Differential Revision: D14158192

Pulled By: suo

fbshipit-source-id: ba8fe6ae8d618c73b23cd00cbcb3111c390c5514

Skip convnets benchmark in rocm CI (#17331)

Summary:
random coredump
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17331

Differential Revision: D14162018

Pulled By: bddppq

fbshipit-source-id: 3ed15a79b7bca2498c50f6af80cbd6be7229dea8

Don't have malloc-free pairs that cross DLL boundaries. (#17302)

Summary:
See https://blogs.msdn.microsoft.com/oldnewthing/20060915-04/?p=29723
for more background on this requirement on Windows.

Fixes #17239.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
cc xkszltl peterjc123
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17302

Differential Revision: D14150067

Pulled By: ezyang

fbshipit-source-id: 9dc16ca781ff17515b8df1bb55492477e7843d4c

Add support to build for multiple amd gpu targets (#17329)

Summary:
iotamudelta petrex
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17329

Differential Revision: D14161277

Pulled By: bddppq

fbshipit-source-id: f3eb9f52e96a8fcd779c57df0f8c9a2c54754e35

batched cleanups (#17288)

Summary:
Bunch of random stuff I came across while doing UDT stuff. Putting in a separate PR to avoid noise
- fix up the alias analysis list ops to include fork/wait
- improve dump() for aliasDb to print writes
- Move BuiltinFunction::call() to sugaredvalue with the rest of the methods
- formatting and includes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17288

Differential Revision: D14147105

Pulled By: suo

fbshipit-source-id: 62e2a922a1726b684347365dc42c72188f154e9c

(Permanently) fix CI breakage due to new docker version. (#17338)

Summary:
Pull request resolved: https://github.com/pytorch/pytorch/pull/17338

See comment in config.yml for details.

build-break

Reviewed By: orionr

Differential Revision: D14160934

fbshipit-source-id: a91160ab15dd6c174a7d946a78a7d2d50ae0a011

Implementation convolutionTranspose operator for mkl-dnn (#12866)

Summary:
the speed-up of a single operation is up to 2-3X on BDW.
This PR depend on https://github.com/pytorch/pytorch/pull/14308
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12866

Differential Revision: D13936110

Pulled By: ezyang

fbshipit-source-id: 34e3c2ca982a41e8bf556e2aa0477c999fc939d3

Support multi-device configuration for MKL-DNN (#12856)

Summary:
MKL-DNN support multi-node mode，but not support multi-devices mode,this commit will support multi-devices for MKL-DNN.This commit depend on https://github.com/pytorch/pytorch/pull/11330
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12856

Differential Revision: D13735075

Pulled By: ezyang

fbshipit-source-id: b63f92b7c792051f5cb22e3dda948013676e109b

fix missing std (#17263)

Summary:
add missing std introduced by #16689 . Investigating why this wasn't caught in CI (nor my local dev environment).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17263

Reviewed By: ezyang

Differential Revision: D14134556

Pulled By: ailzhang

fbshipit-source-id: 6f0753fa858d3997e654924779646228d6d49838

Rethrow exceptions from RunAsync (#15034)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15034

Rethrow exception happened during RunAsync, ensure that pending tasks
are not executed after marked as finished

Reviewed By: andrewwdye

Differential Revision: D13409649

fbshipit-source-id: 3fd12b3dcf32af4752f8b6e55eb7a92812a5c057

Reinforce scheduling invariants (#17132)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17132

schedule() function is not supposed to throw exception and is supposed
to succeed in scheduling the full graph of tasks, potential errors (e.g. errors
from underlying thread pool, out of memory exceptions etc) are considered not
recoverable.
The invariant - the graph of tasks is either not executed or
executed in full before the call to finishRun()

Reviewed By: andrewwdye

Differential Revision: D14092457

fbshipit-source-id: a3e5d65dfee5ff5e5e71ec72bb9e576180019698

Modify TileOp GPU implementation to expose more concurrency and better utilize GPU memory bandwidth (#17275)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17275

Previous implementation used a memcpy inside the kernel. It is more efficient to reduce the data fetched per thread to a single word from memory. This exposes more concurrency and takes advantage of GPU memory coalescing support.

Reviewed By: takatosp1

Differential Revision: D14120147

fbshipit-source-id: c4734003d4342e55147c5b858f232a006af60b68

Support str for native_functions.yaml schema

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17276

Differential Revision: D14154222

Pulled By: cpuhrsch

fbshipit-source-id: 411181da5399608c1d1f3218f8f570bb106c88ec

Separate gpu reduce functions (#17146)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17146

Separate gpu reduce functions

i-am-not-moving-c2-to-c10

Reviewed By: houseroad

Differential Revision: D14097564

fbshipit-source-id: a27de340997111a794b1d083c1673d4263afb9fb

Minor doc updates in c10/core/Allocator.h (#17164)

Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17164

Differential Revision: D14154393

Pulled By: ezyang

fbshipit-source-id: 59d8276d4bb4e7cadb4382769b75e5348ed388de

Namedtuple return for symeig, eig, pstrf, qr, geqrf (#16950)

Summary: More ops for https://github.com/pytorch/pytorch/issues/394

Differential Revision: D14118645

Pulled By: ezyang

fbshipit-source-id: a98646c3ddcbe4e34452aa044951286dcf9df778

Allow PyTorch to be built without NCCL (#17295)

Summary:
With this patch you can use USE_DISTRIBUTED=OFF (possibly in combination with USE_NCCL=OFF (?))

The significance is partly because the NCCL doesn't build with CUDA 8.
This is written under the assumption that NCCL is required for distributed if not, the USE_DISTRIBUTED check in nccl.py should be replaced by a check for the USE_NCCL environment variable.

Fixes: #17274
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17295

Differential Revision: D14155080

Pulled By: ezyang

fbshipit-source-id: 0d133f7c5b4d118849f041bd4d4cbbd7ffc3c7b4

add foxi submodule (#17184)

Removed obsolete argument correct_transform_coords in bbox_transform op. (#16723)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16723

Removed obsolete argument correct_transform_coords in bbox_transform op.
* It was only for backward compatibility. We should not have models using it now.

Differential Revision: D13937430

fbshipit-source-id: 504bb066137ce408c12dc9dcc2e0a513bad9b7ee

make the threshold for acurracy more precise (#17194)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17194

we found that there is a per row absolute error due to int8 quant
and a relative error table-wide in case fp16 is used

Reviewed By: csummersea

Differential Revision: D14113353

fbshipit-source-id: c7065aa9d15c453c2e5609f421ad0155145af889

Add rule based filtering for ONNXIFI transformation (#17198)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17198

We come to the point that we need to apply some rules to bind certain ops together to avoid un-inferrable intermediate shapes. We either lower them together to backend or neither. This diff adds a pass for us to add rules like this. The first one is to bind `Gather` with `SparseLengthsWeighted*`.

Reviewed By: ipiszy

Differential Revision: D14118326

fbshipit-source-id: 14bc62e1feddae02a3dd8eae93b8f553d52ac951

Updating submodules

Reviewed By: zpao

fbshipit-source-id: 4ee15707bcf8c23c2d7feb6987acecef4131d467

caffe2 | added missing operator source file (#17272)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17272

after windows-specific fixes were applied new file was left out of CMakeLists

Reviewed By: orionr

Differential Revision: D14140419

fbshipit-source-id: 6a6c652048ed196ec20241bc2a1d08cbe2a4e155

add list methods: copy,extend (#17092)

Summary:
This PR adds the following methods to python's list.

* copy
* extend

and tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17092

Differential Revision: D14141817

Pulled By: Krovatkin

fbshipit-source-id: c89207f0f25f3d1d4ad903ee634745615d61d576

Improve error message w/ size inference on empty tensors

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17255

Differential Revision: D14143094

Pulled By: soumith

fbshipit-source-id: f96fa7f8eb6eaac72887d3e837546cbfa505f101

add install step and docs for Android build (#17298)

Summary:
This commit did below enhancements:
1, add doc for build_android.sh;
2, add install step for build_android.sh, thus the headers and libraries can be collected together for further usage conveniently;
3, change the default INSTALL_PREFIX from $PYTORCH_ROOT/install to $PYTORCH_ROOT/build_android/install to make the project directory clean.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17298

Differential Revision: D14149709

Pulled By: soumith

fbshipit-source-id: a3a38cb41f26377e21aa89e49e57e8f21c9c1a39

improve libtorch install docs with GPU note (#17299)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/15702
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17299

Differential Revision: D14149712

Pulled By: soumith

fbshipit-source-id: 5b83110bb00e4d4dad04c1f293c2b52e41711f11

Add launch bounds for TopK kernel, be more conservative in sorting (#17296)

Summary:
The particular use case reported is Jetson TX2 and maskrcnn.

Fixes #17144
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17296

Differential Revision: D14147886

Pulled By: soumith

fbshipit-source-id: 44d5a89aaeb4cc07d1b53dd90121013be93c419c

ONNX Export Maxpool Indices

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16455

Differential Revision: D14140375

Pulled By: houseroad

fbshipit-source-id: 12d02c447e7fe0fae49969d1daf40a87660ed416

Revert D14144264: [pytorch][PR] [jit] clean up print from test

Differential Revision:
D14144264

Original commit changeset: eec837d29c46

fbshipit-source-id: ad91cb1d047fd34967385b661a6757111f92026e

Updating submodules

Reviewed By: zpao

fbshipit-source-id: 68a648b2136823994f02fa5b567a2656494f6dd3

clean up print from test

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17279

Differential Revision: D14144264

Pulled By: suo

fbshipit-source-id: eec837d29c46e96be37c54192a841046b486cb8b

Fix dll loading process in newer Python on Windows (#17191)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/17051.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17191

Differential Revision: D14138427

Pulled By: kostmo

fbshipit-source-id: 9f207105161ad0312eb09fd86072afd5f22de785

Fix dll loading issue for Caffe2 and Windows

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17215

Reviewed By: orionr

Differential Revision: D14138445

Pulled By: kostmo

fbshipit-source-id: 0bb4f2f1ed5bda7416ba7e4c6b0618414b328934

Fix cuda softmax backward with empty input (#17259)

Summary:
Fixes #17256
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17259

Differential Revision: D14142196

Pulled By: soumith

fbshipit-source-id: 1f2dc202951b59b43da27684f9f924314bcd3040

at::native batch norm kernel launch config update (#17047)

Summary:
limit block dimension to avoid configuration error on batch norm kernel launch

This should resolve #16998
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17047

Differential Revision: D14142132

Pulled By: soumith

fbshipit-source-id: 9c8c52dcd1d108cda1f65f5227e625b8fe6e12a0

False alarm about leak in TestNN.test_variable_sequence_cuda (#17242)

Summary:
`TestNN.test_variable_sequence_cuda` sometimes brakes due to CUDA leak.
The cause appears to be too small tolerance breaking float16 sub-test of the test above.
When it breaks it calls abort disrupting correct tear down of the test
and false alarming about the leak.

~~Also, removed annoying **Upsample** module warning.
IMHO this warning is wrong because the module **Upsample** is not deprecated. Seems like it's been mixed
with `nn.functional.upsample` function which is indeed deprecated in favor of `nn.functional.interpolate`, see `torch/nn/functional.py:2387` for details (this replacement is also performed in `test_nn.py`).~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17242

Differential Revision: D14141686

Pulled By: soumith

fbshipit-source-id: faa8f87440d94bdc6ab0ff00be6dad82353115c4

U/kostmo/gen circle conf (#17189)

Summary:
Diagram preview:
![binarysmoketests-config-dimensions](https://user-images.githubusercontent.com/261693/53040977-a0f88d00-3437-11e9-9190-796cc243e0f9.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17189

Differential Revision: D14141362

Pulled By: kostmo

fbshipit-source-id: 0625a1234d0307c6be79f17e756ddb1cc445b374

update doc for multinomial (#17269)

Summary:
Update documentation to raise awareness of the fix in #12490. Thanks matteorr for pointing this out!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17269

Reviewed By: ezyang

Differential Revision: D14138421

Pulled By: ailzhang

fbshipit-source-id: 6433f9807a6ba1d871eba8e9d37aa6b78fa1e1fd

update of fbcode/onnx to 4c091e048ca42682d63ccd3c1811560bc12b732d (#17264)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17264

Previous import was 822d8df0a2a32233c6022f50a158817a0f19bdc7

Included changes:
- **[4c091e0](https://github.com/onnx/onnx/commit/4c091e0)**: Support defined ONNX_ML in parent cmake files (#1821) <Lu Fang>
- **[57372f3](https://github.com/onnx/onnx/commit/57372f3)**: Delete OpsetVersionConverter.md which is a duplicate of VersionConverter.md (#1818) <Prasanth Pulavarthi>
- **[ab1c57e](https://github.com/onnx/onnx/commit/ab1c57e)**: [ONNXIFI]Add extension to be implementable (#1796) <Rui Zhu>
- **[b92eee8](https://github.com/onnx/onnx/commit/b92eee8)**: Revert "Implement Op Annotation's for ONNX (#1648)" (#1812) <Ke Zhang>
- **[61f1e9e](https://github.com/onnx/onnx/commit/61f1e9e)**: Enable ONNX_ML by default (#1810) <Shinichiro Hamaji>
- **[4f064a1](https://github.com/onnx/onnx/commit/4f064a1)**: fix Greater and Less doc (#1811) <Guoliang Hua>
- **[0628582](https://github.com/onnx/onnx/commit/0628582)**: Implement Op Annotation's for ONNX (#1648) <Armen>
- **[ad9d2f7](https://github.com/onnx/onnx/commit/ad9d2f7)**: Versioning doc update for Opset 9 (#1805) <Vinitra Swamy>
- **[e71e3be](https://github.com/onnx/onnx/commit/e71e3be)**: add dilation case for ConvTranspose op (#1797) <Randy>

Reviewed By: yinghai

Differential Revision: D14135024

fbshipit-source-id: 1e4f9dda89abf48994798d080dd5d58207a6e4b6

Make tril_ and triu_ actually in-place (#17031)

Summary:
Currently, when the input tensor `self` is not contiguous, `tril_` and `triu_` calls `self = self.contiguous()`, which allocates a new contiguous tensor and assign it to `self`. This effectively changes the input tensor `self`'s pointer and will break downstream code after Variable/Tensor merge.

This PR fixes it so that `tril_` and `triu_` always update the input tensor in-place and preserve the input tensor's TensorImpl.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17031

Differential Revision: D14069592

Pulled By: yf225

fbshipit-source-id: d188218f426446a44ccc1d33fc28ac3f828c6a05

Fix remaining -Wreturn-std-move violations in fbcode

Summary:
Some value are copied when it could've been moved.
Detected by compiler flag -Wreturn-std-move

Reviewed By: igorsugak

Differential Revision: D14134303

fbshipit-source-id: 8fc3bb2017108b3d65097cb8447e33f5b6c743b4

Lightweight String check Utility (#16858)

Summary:
light weight implementation of LLVM filecheck utility. Currently only handles string matching - regexes & saving a regex to a variable name can be added as needed.

Current intended usage is through FileCheckBuilder python handle, and is shown in the tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16858

Differential Revision: D14096244

Pulled By: eellison

fbshipit-source-id: c7c8d1457691c105e6ccbb3c1a378d96baac2569

move prim::None to prim::Constant (again) (#17186)

Summary:
Trying to land again, make prim::None into a case of prim::Constant. Reverted the previous landing because it broke an important onnx export test.

https://github.com/pytorch/pytorch/pull/16160
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17186

Differential Revision: D14115304

Pulled By: eellison

fbshipit-source-id: 161435fc30460b4e116cdd62c7b2e5b94581dcb7

Clarification of Lerp operation on tensors (#17253)

Summary: The `tensor` be used as `end` clarified in the docs.

Differential Revision: D14132212

Pulled By: ezyang

fbshipit-source-id: e9bca14d5079e5f7adfc18afcb1eec832ef86e9e

reenable rand_like fusion when there is no broadcast (#16087)

Summary:
Reenables rand_like fusion if no tensor is broadcasted in the fusion group. This is a sufficient but not necessary condition for fused rand_like to produce correct results, and it has an unpleasant side effect of falling back to non-fused path if rand_like was optimistically included in the fusion group, but there is a broadcast in the fusion group not necessarily related to rand_like. E.g. before this PR, if the network had (biasAdd -> relu -> dropout), fuser could fuse biasAdd and relu, now it will try fusing the whole thing (if dropout is expressed via rand_like) and fall back every time.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16087

Differential Revision: D13720232

Pulled By: zou3519

fbshipit-source-id: 1e19203bec4a59257bfc7078b054a19f00fab4ad

discrepancy in smoke_macos_libtorch_2.7_cpu job spec (#17224)

Summary:
closes #17223
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17224

Reviewed By: pjh5

Differential Revision: D14121612

Pulled By: kostmo

fbshipit-source-id: bfd5a392de5e614031389725535756d7fa7db784

Bool tensor. Part 0: Boolean storage implementation (#16810)

Summary:
This is the first commit from a series of planned changes in order to add boolean tensors to PyTorch. The whole plan looks like this:

0. Storage Implementation (this change)
1. Tensor Creation.
2. Tensor Conversions.
3. Tensor Indexing.
4. Tensor Operations.
5. Back compatibility related changes.

This feature was requested by the community:
https://github.com/pytorch/pytorch/issues/4764
https://github.com/pytorch/pytorch/issues/4219
https://github.com/pytorch/pytorch/issues/4288

**Change**:
Added boolean type to the Storage class for CPU and CUDA backends.

**Tested via**:
1. unit tests
2. running this:
-> import torch
-> torch.BoolStorage
<class 'torch.BoolStorage'>
-> torch.cuda.BoolStorage
<class 'torch.cuda.BoolStorage'>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16810

Reviewed By: gchanan

Differential Revision: D14087246

Pulled By: izdeby

fbshipit-source-id: 042642ced1cb0fd1bb6bff05f9ca871a5c54ee5e

Correct padding and activations docstrings in nn module

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17197

Differential Revision: D14131284

Pulled By: soumith

fbshipit-source-id: 6edd225b47b1dde81b5ad0a23c588c6621987a69

Use move to avoid copying (#17188)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17188

Using flag "-Wreturn-std-move", compiler can identify the cases where a copy
operation
is performed when a move operation would have been available. Wrapped return
statement with std::move to fix.

For some reason, these files are not automatically modded. With D14115372
we should be able to turn on the compile flag

Reviewed By: soumith

Differential Revision: D14115786

fbshipit-source-id: e763b92eecbe4468027fc141d029618d1e9f280b

Replace resize_dim() with set_sizes_and_strides() (#17127)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17127

Replace resize_dim() with set_sizes_and_strides() in `THTensor_(squeeze1d) in aten/src/TH/generic/THTensor.cpp`

Reviewed By: ezyang

Differential Revision: D14088697

fbshipit-source-id: 518b72f7c0c4fbedf11a29a6ceb9fee8eefd9273

C++ Frontend: adding two distributed samples (Random and Sequential) (#16910)

Summary:
Adding two distrbuted samplers, Random and Sequential to the mix. Similar to python counterpart, DistributedSampler introduces a new method `set_epoch(size_t epoch)` which can be use to shuffle data determinstically between distributed processes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16910

Differential Revision: D14130980

Pulled By: soumith

fbshipit-source-id: ec08b7130c01e2fc6dc3693f7ac622a0a6d60f10

Correct recurrent/linear/dropout/sparse layers docstrings

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17238

Differential Revision: D14130811

Pulled By: soumith

fbshipit-source-id: d3998ca7da46aec5a59220c6af489f71f3d60735

Optional arg fixes (#17222)

Summary:
fixes #17210.
cc : ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17222

Differential Revision: D14130833

Pulled By: soumith

fbshipit-source-id: 19ff6020c47208e3436ae28cd16110a0f435b25e

Reset grad attribute when called using del (#16525)

Summary:
del Tensor.grad set PyObject to nullptr
and Tensor.grad = None set PyObject to Py_None
Handling both the cases now
fixes ##16471
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16525

Differential Revision: D14130800

Pulled By: soumith

fbshipit-source-id: ed85c38305bba94d5047311cb58e4e4cedd09832

Logging stuffs (#17177)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17177

Add more logging and flag.

Reviewed By: yinghai

Differential Revision: D14111643

fbshipit-source-id: 4b1c005faf41c21f59100bc401120c6970a24c42

Implement IRParser. (#16987)

Summary:
It might need some cleaning up and might be missing some features, but it should be already working for most cases.

This PR is based on top of PR16986 (so please review only the last commit here).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16987

Differential Revision: D14074577

Pulled By: ZolotukhinM

fbshipit-source-id: 712b598f423265655f574bb9903e2066628eaad3

Skip onnx logsoftmax tests in rocm (#17170)

Summary:
similar to softmax there are issues of getting nan randomly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17170

Differential Revision: D14110515

Pulled By: bddppq

fbshipit-source-id: 5c97661184d45a02122fd69d35a839fdf4520c8c

Add namedtuple return for min, median, mode, kthvalue, add test for namedtuple return API (#16186)

Summary:
This partially fixes https://github.com/pytorch/pytorch/issues/394 and depend on https://github.com/pytorch/pytorch/pull/15429. I suggest to review this only after https://github.com/pytorch/pytorch/pull/15429 get landed, otherwise the diff might be large to review.

The test only allows explicitly whitelisted operators to have named return.

Differential Revision: D14070735

Pulled By: ezyang

fbshipit-source-id: ace2a672998b4e4a8094f52cbda5aa1cea6e3b42

Remove templates for GenericDict

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17175

Differential Revision: D14113022

Pulled By: driazati

fbshipit-source-id: 5183e131cc8ccb58525875f76fa03133570a59ea

fix missing constant in adaptive_avg_pool2d AD (#17180)

Summary:
Thanks ngimel for pointing this out!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17180

Differential Revision: D14113001

Pulled By: ailzhang

fbshipit-source-id: 78e7d7f2cda3889138e2bf26a54980c2cc665882

Implement NetDef <--> JIT IR converters. Try 2. (#17123)

Summary:
Currently the converters are very straightforward, i.e. there is no code for trying to
preserve semantics, we're purely perform conversion from one format to another.

Two things that we might want to add/change:

1. Add semantic conversion as well (but probably it would be a good idea to keep
it separate as a temporary thing).
2. Make sure we don't mess with value names, as they are crucial for current
uses of NetDefs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17123

Differential Revision: D14090244

Pulled By: ZolotukhinM

fbshipit-source-id: 07175fa9235582e1d1da5f10a42a5c1280b1b394

change the epsilon for fp32/fp16 to uint8 to be the same (#17062)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17062

from jiyan's training jobs it seems like we found a quantization bug

fp32
fp32->rowwise int8 is fine
fp16 is fine
fp16->rowwise int8 is not fine

we are preconverting everything to fp32 and using the existing code, so there is no need to change the epsilon in the case of fp16 since at the time of converting, everything is a float

Reviewed By: jspark1105

Differential Revision: D14063271

fbshipit-source-id: 747297d64ed8c6fdf4be5bb10ac584e1d21a85e6

Revert D14109636: [pytorch][PR] move prim::None to a case in prim::Constant

Differential Revision:
D14109636

Original commit changeset: d26fd3839761

fbshipit-source-id: c8c8113e2bff49ea93235732603e6ebc89356533

move prim::None to a case in prim::Constant (#16160)

Summary:
This change simplifies analysis done on constants since prim::None does not need to be handled separately now. To check if a constant node is None, use node->isNone().

Next step will be to remove prim::Undefined.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16160

Differential Revision: D14109636

Pulled By: eellison

fbshipit-source-id: d26fd383976163a2ddd4c24984bd672a541cc876

Move outplace ops to ATen (#16788)

Summary:
Based on https://github.com/pytorch/pytorch/pull/12413, with the following additional changes:

- Inside `native_functions.yml` move those outplace operators right next to everyone's corresponding inplace operators for convenience of checking if they match when reviewing
- `matches_jit_signature: True` for them
- Add missing `scatter` with Scalar source
- Add missing `masked_fill` and `index_fill` with Tensor source.
- Add missing test for `scatter` with Scalar source
- Add missing test for `masked_fill` and `index_fill` with Tensor source by checking the gradient w.r.t source
- Add missing docs to `tensor.rst`

Differential Revision: D14069925

Pulled By: ezyang

fbshipit-source-id: bb3f0cb51cf6b756788dc4955667fead6e8796e5

Fix for 16939:multinomial performance regressed

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17121

Differential Revision: D14088558

Pulled By: ifedan

fbshipit-source-id: e03583135f1e797fe1d8081ec5e9e6b63d4015c1

Add special ops for BatchNorm symbolic differentiation (#15403)

Summary:
The main problem there is with differentiating batch norm statically
is that we make a lot of complex run-time decisions about the backend
we choose. Then, the autograd derivatives are implemented for every
backend separately, which makes sense, because they might be saving
buffers containing different values. To resolve the issue, the forward
op returns an index of the chosen backend, and the backward function
takes it as an argument, such that it knows how to interpret the buffers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15403

Differential Revision: D14098815

Pulled By: ailzhang

fbshipit-source-id: 7fcd3e6e0566433e81fe8286fb441c1ecaf198ad

improve error msg when module list isn't added to __constants__ (#17167)

Summary:
Add suggestion to add to __constants__ when a ModuleList of Sequential module is used as a tuple

Addresses https://github.com/pytorch/pytorch/issues/13899
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17167

Differential Revision: D14107688

Pulled By: eellison

fbshipit-source-id: 8c07d1f3e25a9c6bdcfd96dbf6b72c2130838278

Kaiming Initialization (#14718)

Summary:
/cc goldsborough

Working on #14582

The corresponding python implementations are at: [pytorch/torch/nn/init.py](https://github.com/pytorch/pytorch/blob/6302e4001ab54b3ddeca2b608d337fe7077e801c/torch/nn/init.py#L261-L327)

Here is my initial implementation of Kaiming Initialization. I have not been able to figure out how to successfully run tests locally so I haven't added any yet.

A couple questions:
- Are the enums defined in the right place? I copied their names from Python, but do you prefer different naming conventions for C++?
- To run tests locally do I use `python setup.py test`? Can I run just a subset of the tests somehow?
- Should I add my tests at [test/cpp/api/misc.cpp](https://github.com/pytorch/pytorch/blob/master/test/cpp/api/misc.cpp#L47-L54)?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14718

Differential Revision: D14049159

Pulled By: goldsborough

fbshipit-source-id: 966ac5126875936e69b185b5041f16476ed4cf70

Pass torch.distributed launch process local rank as environment variable instead of argument (#16360)

Summary:
In `torch.distributed.launch.py`, it passes `local_rank` as argument and requires user's program to parse it. However, it would be more flexible for users and consistent with other variables, e.g. `RANK`, `MASTER_PORT`, `WORLD_SIZE`, if passing through environment variables.

https://github.com/pytorch/pytorch/blob/265ed8ff451c17eac82050b1767837ec924d9591/torch/distributed/launch.py#L200-L212
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16360

Differential Revision: D14070372

Pulled By: ezyang

fbshipit-source-id: c3f6a8e55ab513918cad09d1326eccdedb4d98c9

Assert cases exist for unschematized ops in alias analysis

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16334

Differential Revision: D13901238

Pulled By: driazati

fbshipit-source-id: be99f89e7dc6a299b770ea92e217932a5271027d

Fix avg pool2d api (#17166)

Summary:
Fix xla breakage (partially).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17166

Differential Revision: D14106954

Pulled By: ailzhang

fbshipit-source-id: 35ae6713272d0517b66da2ee9209f49015492b89

Fix syntax error in set instantiation (#17174)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17174

Use curly braces syntax to avoid Lint complaint

Reviewed By: yf225

Differential Revision: D14111368

fbshipit-source-id: 44aa21deb9feededb94f23d92262a4164fe0cc1c

Make getting the dtype of a tensor work for backend extensions.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17131

Differential Revision: D14093163

Pulled By: gchanan

fbshipit-source-id: 06638706e26505e3c741b7ae290000ca258599db

Stop reassigning (output) reference arguments in BinaryOps. (#17059)

Summary:
The binary ops that are using TensorIterator do a trick in order to only write the code once for out and non-out variants:

1) Have the non-out variant call the out variant with an undefined tensor.
2) the out variant then reassigns the result tensor to the output of the TensorIterator; this is a no-op in the case where a valid tensor was passed and it correctly propagates the result back to the non-out variant, which is legal because it's just reassigning an undefined tensor.

I believe other solutions to this problem would require an unnecessary reference bump, e.g. defining another out variant that returns a Tensor rather than a reference.

Unfortunately, this doesn't work with const-references, which we want to move our output arguments to be (because const doesn't actually provide const correctness here, and writers mistakenly reassign the parameter in the case it isn't an out variant).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17059

Differential Revision: D14068402

Pulled By: gchanan

fbshipit-source-id: 89fef177a1e174dbe2858e2eae0f6d85460b07d1

Fix batch insert (#17158)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17158

Because of Reshape op, batch size can be changed. This diff addresses first order issue raised from multiple batch size system. We need to export different real_batch_size for different max_batch_size input and attach it to the right output.

It also fixes a false exception.

Reviewed By: ipiszy

Differential Revision: D14099541

fbshipit-source-id: 0fa9e86826f417a11d2b5dd2ee60dff64a7ce8c4

Generate CircleCI config.yml from a script (#17039)

Summary:
This initial PR splits the `.circleci/config.yml` file into several smaller files that are stitched verbatim back into the original. A proof of concept of dynamically generating yaml for the job configuration list is also introduced.

Since the `config.yml` file must exist in the repo in its final form, there must exist a manual update and check-in step to regenerate `config.yml` from its constituent parts.
Consistency between the checked-in `config.yml` file and the authoritative source data is enforced at build time through TravisCI.

closes #17038
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17039

Reviewed By: yf225

Differential Revision: D14109059

Pulled By: kostmo

fbshipit-source-id: bc04a73145290358854f5a5e552a45e559118fc3

Add support for simpler for-in-list + tests (#16726)

Summary:
This PR add supports for simpler for-in-list loops such as the example below:

```python
torch.ji.python
def sum_list(a):
    # type: (List[int]) -> int
    sum = 0
    for i in a:
        sum += i

    return sum
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16726

Differential Revision: D14070007

Pulled By: ezyang

fbshipit-source-id: b4d971ee647729a6caa3099ceac34ec5c4f143de

Update pybind11 (#17143)

Summary:
Fixes #17130
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17143

Differential Revision: D14107386

Pulled By: zdevito

fbshipit-source-id: 1834d14bcdcad6857c199bf4fb8f67298394bbf3

Enforce module device at DataParallel construction time (#17129)

Summary:
closes #17065

CC douwekiela
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17129

Differential Revision: D14093353

Pulled By: mrshenli

fbshipit-source-id: 9a5a10f16e392337a7f7073223541cf69b402f82

one_hot docs missing (#17142)

Summary:
one_hot docs is missing [here](https://pytorch.org/docs/master/nn.html#one-hot).

I dug around and could not find a way to get this working properly.

Differential Revision: D14104414

Pulled By: zou3519

fbshipit-source-id: 3f45c8a0878409d218da167f13b253772f5cc963

add pop support to list (#17015)

Summary:
[WIP] add "pop" to list, see https://github.com/pytorch/pytorch/issues/16662
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17015

Differential Revision: D14071680

Pulled By: eellison

fbshipit-source-id: b49a318059c1cc131acda50713132e11b562568f

Updating submodules

Reviewed By: cdelahousse

fbshipit-source-id: bbfb709d8681da60ccc9f3bafc6c296c32fcf835

merge fully_connected_rowwise_dnnlowp_op into fully_connected_dnnlowp_op (#17105)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17105

To make FC with rowwise quantization faster, reduce code duplication, and make code consistent with Convolution

Reviewed By: csummersea

Differential Revision: D14080461

fbshipit-source-id: 2b0e67b86e7e3029c90751a8824bf80ae1223680

bug fix when we prepack weight and bias together (#17145)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17145

Prepacked weight contains both weight and bias, so the bias should be obtained from input index 1, not from 2

Reviewed By: jianyuh

Differential Revision: D14097281

fbshipit-source-id: b8b836b85a7b240e2fd1734377c46d9bf2ce3390

caffe2: fix PinnedCPUAllocator cudaHostRegister() leak (#16340)

Summary:
In the NUMA case, PinnedCPUAllocator's allocate() would return a
DataPtr constructed by DefaultCPUAllocator, which would reference
the Default... Delete() rather than the Pinned... Delete(). That
meant Pinned... Delete() would never run, so cudaHostUnregister()
would never be called when regions were freed.

See: https://github.com/pytorch/pytorch/issues/16280

This change adds a 'naked_allocate()' method to the Default allocator
that just returns a pointer to the allocated memory rather than
wrapping it in a DataPtr. Pinned allocator uses that then constructs
a DataPtr with reference to its own Delete().
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16340

Reviewed By: dzhulgakov

Differential Revision: D13843206

Pulled By: ezyang

fbshipit-source-id: 9efb572e5a01b49ef2a4aceeccc13cd0b1066528

Add some missing docs to torch.rst, new unittest to enforce torch.rst no longer miss anything (#16039)

Summary:
This prevent people (reviewer, PR author) from forgetting adding things to `torch.rst`.

When something new is added to `_torch_doc.py` or `functional.py` but intentionally not in `torch.rst`, people should manually whitelist it in `test_docs_coverage.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16039

Differential Revision: D14070903

Pulled By: ezyang

fbshipit-source-id: 60f2a42eb5efe81be073ed64e54525d143eb643e

(#16825)

Summary:
setting the correct math type for cudnn rnn, which is enforced starting from cudnn 7.5+

1. Updating persistent rnn check with input data type instead of rnn math type;
2. Updating rnn type promotion to set correct math type for accumulation;
3. Replace datatype check for filter descriptor from rnn.datatype to input.datatype;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16825

Differential Revision: D14071190

Pulled By: ezyang

fbshipit-source-id: 1c9a1531ccf510cb0619e830be444c20c5e72f3f

Correct conv and pooling docstrings in nn module (#17052)

Summary:
This PR fix conv and pooling docstrings in nn module
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17052

Differential Revision: D14068566

Pulled By: ezyang

fbshipit-source-id: 3ec1de232ff6334b6a544dadefbb0ee6193d443a

Fix AdaptiveLogSoftmaxWithLoss's constructor (#16694)

Summary:
t-ken1 and I are members of a same team.
I have added test codes about the pull request https://github.com/pytorch/pytorch/pull/16656.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16694

Differential Revision: D14070106

Pulled By: ezyang

fbshipit-source-id: ff784dbf45e96a6bcf9a4b5cb9544a661a8acad2

Update Upsample docs to match nn.interpolate

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17134

Reviewed By: ezyang

Differential Revision: D14095694

Pulled By: driazati

fbshipit-source-id: 79afec9ddd50b3b8ce39acf98c2543cf1a3d1127

Remove static_cast insertion/kernel argument extration. (#17055)

Summary:
In light of the antistatic feature being a part of the released ROCm 2.1, remove
the feature in pyHIPIFY for extraction of kernel arguments and insertion of
static_casts.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17055

Differential Revision: D14068478

Pulled By: bddppq

fbshipit-source-id: 6895f490c78247a129aa18c520ff8d4d1a3d3642

Upgrade mkl-dnn to v0.17.3 to fix core dump issue (#17107)

Summary:
Upgrade mkl-dnn to 0.17.3 to fix core dump issue in #16183
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17107

Differential Revision: D14097600

Pulled By: yinghai

fbshipit-source-id: 2baa44e211ce37fbdf01585344c98745f5ba008c

Updated bbox_transform and nms unit test for caffe2 ops. (#16722)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16722

Updated bbox_transform and nms unit test for caffe2 ops.

Differential Revision: D13937416

fbshipit-source-id: 034743d29671c6e73d323a935e2d734ecc071bff

Extend support for exporting reshape to onnx. (#16971)

Summary:
Resolve issue with reshape_as test case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16971

Differential Revision: D14098871

Pulled By: houseroad

fbshipit-source-id: ed6b966821462d374313256abbbe27f96ce11b2c

add std to autodiff, and mean/var/std to operator set (#17137)

Summary:
supersedes #16684
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17137

Differential Revision: D14096724

Pulled By: wanchaol

fbshipit-source-id: d801d70029a6a1f5851400ff4094c0299c102b2b