review.tizen.org Git - platform/upstream/pytorch.git/log

Support tracing GenericList (#15969)

Summary:
Treat GenericList similarly to tuples and TensorList: recursively unpack them and assignValueTrace accordingly. Also add interpreter support for ListUnpack on GenericList
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15969

Differential Revision: D13665139

Pulled By: jamesr66a

fbshipit-source-id: cd8cb3dd7475f424e48a69d217f2eac529df9f6a

s/fwdproxy.any/fwdproxy/g in fbsource (#16024)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16024

codemod with 'Yes to all': s/fwdproxy.any/fwdproxy/g in fbsource

Reviewed By: maxgeorg

Differential Revision: D13666336

fbshipit-source-id: a5a694d66efec5304a1c8c231d638441f88efe1d

update of fbcode/onnx to 84a0441ae28795a928005863dc142bee81827566 (#16046)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16046

Previous import was 7abd834091f1024c11749dcfd25126802db9fdd5

Included changes:
- **[84a0441](https://github.com/onnx/onnx/commit/84a0441)**: Clarify namescopes in the presence of nested subgraphs (#1665) <G. Ramalingam>
- **[118fec5](https://github.com/onnx/onnx/commit/118fec5)**: Add Where op. (#1569) <Sergii Dymchenko>
- **[beefa15](https://github.com/onnx/onnx/commit/beefa15)**: Use strings directly for casing as np.object w/o redundant StringHolder. (#1736) <Dmitri Smirnov>
- **[4023bae](https://github.com/onnx/onnx/commit/4023bae)**: Add a capability to input/output unicode strings (#1734) <Dmitri Smirnov>
- **[1a8a7fc](https://github.com/onnx/onnx/commit/1a8a7fc)**: typos fixed: iutput -> input (#1726) <Beomsoo Kim>
- **[0128478](https://github.com/onnx/onnx/commit/0128478)**: Scan test update (#1732) <G. Ramalingam>
- **[c6a24fd](https://github.com/onnx/onnx/commit/c6a24fd)**: turn rtol to 0.002 on densenet121, since AMD and Nvidia GPU's precion difference (#1733) <Lu Fang>
- **[5b7ac72](https://github.com/onnx/onnx/commit/5b7ac72)**: Add Shrink operator (#1622) <Rui Zhu>

Reviewed By: yinghai

Differential Revision: D13676711

fbshipit-source-id: 513cc137223469b47af48919432aaecf58006012

Add count_include_pad to average_pool_gradient_op (#15997)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15997

Add count_include_pad to average_pool_gradient_op

Reviewed By: houseroad

Differential Revision: D13648339

fbshipit-source-id: 205cb2acb32dc24a85256b628298b1a11f0ffa2c

Remove cuda from autograd profiler (#15898)

Summary:
This puts stubs in the autograd profiler for the use of cuda APIs allowing the cuda parts of libtorch to be linked separately from the CPU parts.

This also edits the buck build.

Previous:

For GPU builds:
_C -> csrc -> caffe2
For CPU builds:
_C -> csrc-cpu -> caffe2

Now:
GPU:
_C -> libtorch_cuda -> (libtorch -> caffe2, for CPU)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/15898

Reviewed By: ailzhang

Differential Revision: D13617991

Pulled By: zdevito

fbshipit-source-id: 6d84a50bb356a54b4217f93219902755601b00e1

Fix namespace typo. (#16021)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16021

Adds nom:: so that TRIVIAL_CONVERTER works more generally.

Reviewed By: janewangfb

Differential Revision: D13664748

fbshipit-source-id: 100f47a8326e41bd0ac2ae281669f5a0363fe060

Fixing missing cpp tests for Caffe2 setup.py builds (#16037)

Summary:
These were broken (always skipped in setup.py builds) by https://github.com/pytorch/pytorch/pull/15917
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16037

Differential Revision: D13675549

Pulled By: pjh5

fbshipit-source-id: fed50855dd0b5d0c80fface3d8b2156f18aae4e7

Test cases for calling caffe2 LayerNorm from PyTorch and JIT

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15895

Reviewed By: dzhulgakov

Differential Revision: D13615336

fbshipit-source-id: de28fef8ce025d6d37a4c80c029ec97b7195cfd9

Enhance cpu support on gloo based multi-nodes mode. (#11330)

Summary:
1. Add some gloo communication operators into related fallback list;
2. Work around to avoid compiling errors while using fallback operator whose CPU operator inherits from 'OperatorBase' directly like PrefetchOperator;
3. Add new cpu context support for some python module files and resnet50 training example file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11330

Reviewed By: yinghai

Differential Revision: D13624519

Pulled By: wesolwsk

fbshipit-source-id: ce39d57ddb8cd7786db2e873bfe954069d972f4f

Constant prop prim::None (#15979)

Summary:
Previously we were only constant propping prim::Constants, but we should be constant propping prim::None as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15979

Differential Revision: D13664692

Pulled By: eellison

fbshipit-source-id: 01839403576c21fc030c427e49275b8e1210fa8f

Add a note about THNN height/width/etc argument reordering. (#15819)

Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15819

Differential Revision: D13665297

Pulled By: ezyang

fbshipit-source-id: 4570275bc9e65269788f836f2447d09474cefeff

Fix Python path finding for benchmark tests

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16022

Differential Revision: D13673792

Pulled By: pjh5

fbshipit-source-id: 177a823ef343b7f60e26ad9ef51415332045438d

Quantized RNNCell modules (#15469)

Summary:
Similarly to https://github.com/pytorch/pytorch/pull/13777, we apply post-processing quantization to RNN cell modules (`RNNCell`, `LSTMCell`, and `GRUCell`).

A further follow-up PR will involve quantizing the full `RNN`, `GRU`, and `LSTM` modules. This depends on those modules being scriptable as part of the standard library scripting effort, though. Note that infrastructure in this pr such as `gather_quantized_params` is currently unused but should be used in the future when we can port over the full RNN modules.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15469

Differential Revision: D13545802

Pulled By: jamesr66a

fbshipit-source-id: ad3b694517842893ea619438e9f5e88fd7b96510

Miscellaneous broken RSTs fixed (#16033)

Summary:
https://pytorch.org/docs/master/tensors.html#torch.Tensor.bernoulli_
https://pytorch.org/docs/master/torch.html#torch.addmm
https://pytorch.org/docs/master/distributed_deprecated.html#torch.distributed.deprecated.reduce_multigpu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16033

Differential Revision: D13671202

Pulled By: soumith

fbshipit-source-id: 276e10e610affe205376573e7f0f9894695d218d

Add PyTorchPredictorContainer (#15899)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15899

Add PyTorchPredictorContainer to support multiple jit script modules

Reviewed By: pritamdamania87

Differential Revision: D13596139

fbshipit-source-id: 3ce0bdf2f4dbba7aa1d20e824d03e5ac98f5d887

Add `itertools.{prod, combinations, combinations_with_replacement}` like op to pytorch (#9393)

Summary:
closes https://github.com/pytorch/pytorch/issues/7580
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9393

Differential Revision: D13659628

Pulled By: zou3519

fbshipit-source-id: 3a233befa785709395a793ba8833413be394a6fd

use fbgemm gconv in dnnlowp (#16020)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16020

Needs to go over more iterations. For conv, I think we need a high level interface that abstracts out low-level details of which code path will be taken (acc16, outlier-aware, depth-wise, group conv, ...) otherwise the client code will be complex as can be seen from DNNLOWP Conv ops. This will also help us to make interface more stable.

Reviewed By: dskhudia, jianyuh

Differential Revision: D13588996

fbshipit-source-id: 9afce9e441bcaf20437fcc2874fb9d4165a46bcb

`var` for multiple dimensions (#15892)

Summary:
Timings are the same as for `std` .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15892

Differential Revision: D13651173

Pulled By: umanwizard

fbshipit-source-id: a26bf1021dd972aa9e3e60fb901cd4983bfa190f

Updating submodules

Reviewed By: yns88

fbshipit-source-id: 19841cff4a7fd69318d7828db75c16cd75757edd

Updating submodules

Reviewed By: yns88

fbshipit-source-id: 68b7c41366618ffd636c2b9c45c7ffbbcbc44f85

nomnigraph - easy - use new test utils in converter_nomnigraph_test (#15751)

Summary:
Use new test utils in converter_nomnigraph_test , and add utils to set device option name, external inputs, outputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15751

Differential Revision: D13586228

Pulled By: duc0

fbshipit-source-id: ff809dd7bf9f30641ce2a6fef7e2810f005521c2

Remove code duplication (#15880)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15880

The layer_norm reference was implemented twice. Removing one of them.

Reviewed By: dzhulgakov

Differential Revision: D13611232

fbshipit-source-id: cee96c78d3255c3a4e34300693bf9260cf096615

Fix ormqr docs, fixes #15565 (#15694)

Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
cc meganset
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15694

Differential Revision: D13573064

Pulled By: zou3519

fbshipit-source-id: 1d0b693d7c26db91826b81e6c98b45a69b5e9bc4

Fix c10d checking errno unconditionally (#15986)

Summary:
In #15964, I learned that `errno` is only meaningful if the function call fails. E.g., on some macos, a successful `fork()` sets `errno` to `EINVAL` in child process. This commit changes the `SYSCALL` macro so error checking is only done when an error happens. This means checking whether `rv == -1` for most calls, but is checking `rv == nullptr` for `inet_ntop`.

Now `SYSCALL` accepts a second argument `success_cond`, which should be an expression returning whether the call succeeded. `SYSCHECK_ERR_RETURN_NEG1` is the shorthand for checking if rv is `-1`.

Any suggestion on better macro names is welcomed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15986

Reviewed By: janewangfb

Differential Revision: D13661790

Pulled By: pietern

fbshipit-source-id: 9551b14b9f88805454a7bfb8e4d39e0f3aed8131

add tensor.to to script (#15976)

Summary:
Previously it only worked with keyword arguments. Now it is fully compatible.

Fix for: https://github.com/pytorch/pytorch/issues/15478
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15976

Differential Revision: D13643979

Pulled By: eellison

fbshipit-source-id: 6a47bce7db362da80452adffebd2732f8e62a240

Split Caffe2 CI into cmake-only and python builds (#15917)

Summary:
bypass-lint

- Change all Caffe2 builds to use setup.py instead of cmake
- Add a -cmake- Caffe2 build configuration that uses cmake and only builds cpp
- Move skipIfCI logic from onnx test scripts to the rest of CI logic
- Removal of old PYTHONPATH/LD_LIBRARY_PATH/etc. env management
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15917

Reviewed By: orionr

Differential Revision: D13637583

Pulled By: pjh5

fbshipit-source-id: c5c5639db0251ba12b6e4b51b2ac3b26a8953153

Make call operator on module holder call forward (#15831)

Summary:
In Python, you can use the call operator to invoke the `forward()` method of a module. In C++ this was currently not possible, because I couldn't figure out how to deduce the return type of a module's `forward()` method under the constraint that `forward()` may not exist at all (since the base module class in C++ does not mandate a `forward()` method). I now figured it out, so the call operator can be used.

ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15831

Differential Revision: D13652676

Pulled By: goldsborough

fbshipit-source-id: ccab45a15215dda56460e560f0038781b539135f

Updating submodules

Reviewed By: yns88

fbshipit-source-id: 0e31357e8a34614226e8948ae76d67e0786a9196

Fix broken rst of torch.nn.utils.spectral_norm and others (#15995)

Summary:
- Currently, the [rst](https://pytorch.org/docs/stable/nn.html#torch.nn.utils.spectral_norm) looks broken, at least in my browser. So I fixed it.
- I thought a subscript may be needed to the left W in the definition.
- A few typos fixed.

crcrpar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15995

Differential Revision: D13649888

Pulled By: soumith

fbshipit-source-id: 00a2c3b043c7c8ebdd9fc2bf77ba27ae695fee3f

Add cuda.reset_max_memory_* (#15985)

Summary:
Addresses #15968
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15985

Differential Revision: D13649916

Pulled By: soumith

fbshipit-source-id: a207aea5709a79dba7a6fc541d0a70103f49efff

libshm retry on EINTR (#15964)

Summary:
fixes https://github.com/pytorch/pytorch/issues/14314
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15964

Differential Revision: D13639034

Pulled By: soumith

fbshipit-source-id: 44592762aa46982e5d3616d55b5666a2c2ce9105

Improved the documentation for torch.nn.functional.pad (#15984)

Summary:
- Fixed a few typos and grammar errors.
- Changed the sentences a bit.
- Changed the format of the tuples to be consistent with padding notations in the other places. For example, `ReflectionPad2d`'s dostring contains :math:`H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}`.

I also made sure that the generated html doesn't break.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15984

Differential Revision: D13649939

Pulled By: soumith

fbshipit-source-id: 0abfa22a7bf1cbc6546ac4859652ce8741d41232

Improve the docstring of nn.random.fork_rng (#15960)

Summary:
Improved the docstring of nn.random.fork_rng
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15960

Differential Revision: D13649929

Pulled By: soumith

fbshipit-source-id: d3843179a2f1f838792c2f07f34deda2c06af56e

doc fixes (#15990)

Summary: fixes #15597 , #15283 and #10258

Differential Revision: D13649905

Pulled By: soumith

fbshipit-source-id: 753f46c2c96c61fba460019d9ed3e0d047d42ee7

simplify lambda function use in conv dnnlowp ops to fix #15911 (#15996)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15996

As reported in issue #15911, gcc 4.9 was getting internal compiler error due to a complex use of lambda function in conv_dnnlowp_op.cc and conv_acc16_op.cc . This diff simplifies them.

Reviewed By: viswanathgs

Differential Revision: D13648264

fbshipit-source-id: 1551ae8a0a7653749185dca51ccceb2471b96b82

fix RandomSampler length (#15991)

Summary:
Hi!

This PR addresses #15537 issue.
Please review.

Thanks!

Differential Revision: D13649890

Pulled By: soumith

fbshipit-source-id: 166212ae383331345423236dfc4fa2ea907d265d

Fix static build on Windows (#15989)

Summary:
Tested locally. It could be now be started by running `set EXTRA_CAFFE2_CMAKE_FLAGS= -DTORCH_STATIC=1` before build. If we want to make sure it works, then maybe we should add it into CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15989

Differential Revision: D13649935

Pulled By: soumith

fbshipit-source-id: 956945ed572819d8cf0bc9bd48df3ea9bc6f4a8a

Caffe 2: Reshape Op upgrade (#15380)

Summary:
This is follow up on #13945 where we had to turn off some TRT tests because some ops were not ready to accept ONNX opset 9+ models. This PR fixes Reshape.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15380

Differential Revision: D13649825

Pulled By: houseroad

fbshipit-source-id: b72e62803de5b63cc001c3fe4b3bf64dfa996e94

fix compile error reported in issue #15911 (#15953)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15953

Fix issue reported in https://github.com/pytorch/pytorch/issues/15911

Reviewed By: csummersea

Differential Revision: D13633256

fbshipit-source-id: 3808f100ff7dedfe5e20708e72e6081ff07eb32c

Back out "[pt1][tensor] Remove caffe2::ShareData" (#15983)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15983

Original commit changeset: 6e4275d02f4c

Reviewed By: supertopher, Yangqing

Differential Revision: D13644123

fbshipit-source-id: 4b15a4c62995c0e68aad58465600409e302e6504

Remove StopGradient op when it is inplace in inference (#12152)

Summary:
For Inference, if the StopGradient op is inpalce, we just remove it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12152

Differential Revision: D13633946

Pulled By: yinghai

fbshipit-source-id: 57762bcc37b38a1d39cb4af316ca50bfe961b105

Add global pooling specialization and also update MaxPooling on GPU (#15824)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15824

Add global pooling specialization and also update MaxPooling on GPU

Reviewed By: houseroad

Differential Revision: D13596340

fbshipit-source-id: c8a42aa69ee92c383c9f19d3ed57b77cb3e5bd28

AliasDB interface cleanup (#15656)

Summary:
This is the first of several PRs to simplify AliasDb usage.
- Hide the concept wildcards from users. They are too hard to think about and too easy to forget about.
- Start moving "mutability-safe" graph mutation methods into AliasDb (right now, the various methods that deal with topological move).

Eventually I want to create a "mutability-aware" handle to the graph. If you only use that handle to transform the graph, you can be sure that all transformations are safe with respect to mutability.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15656

Differential Revision: D13615492

Pulled By: suo

fbshipit-source-id: 5c39a157b4ea76f1f976315d06a314a89cc4f22f

Updating submodules

Reviewed By: zpao

fbshipit-source-id: 2671ea6bb594280a9d3352fbfa3628f28c6847aa

Add the normalize transform to the core library (#15891)

Summary:
Adds the `Normalize` transform to the core C++ frontend library.

ebetica ezyang soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15891

Differential Revision: D13642167

Pulled By: goldsborough

fbshipit-source-id: 573428e626d6106cf2aadf3dc2e2aecb9a85efc3

3x3x3 depthwise convolution with per channel quantization (#15775)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15775

Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/55

fbgemm didn't have per-channel quantization for 3x3x3 depth-wise convolution

Reviewed By: jianyuh

Differential Revision: D13587438

fbshipit-source-id: 91c36fae7a0e8386e3bc49808e18918b01681dd1

Make it consistent for OperatorBase usage (#15908)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15908

"OperatorBase::" is changed to "this->template ".

For example,

  # This no longer works
  OperatorBase::GetSingleArgument<>()
  # Should change to:
  this->template GetSingleArgument<>()

https://fb.workplace.com/groups/101100140348621/permalink/576804082778222/

Follow up of D13574832.

Sample Diff:
D9319742, D10045844.

Reviewed By: jspark1105

Differential Revision: D13613574

fbshipit-source-id: 2cb4094557b4af78d41e289816cad3e1194fb82c

rocm build (#15981)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15981

caffe2/operators/unique_ops.cu translated to caffe2/operators/hip/unique_ops.hip breaks rocm build

Reviewed By: BIT-silence

Differential Revision: D13646129

fbshipit-source-id: 900a14e14216686ec4560b30df2eabbd7ec2ff91

Updating submodules

Reviewed By: zpao

fbshipit-source-id: 3bbf550cb0bfe71c05b73b8bc4ce97285b50608b

Tensor construction codemod(ResizeLike) - 2/3 (#15940)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15940

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13629047

fbshipit-source-id: 5f0641a9aaab9045fa63c32c6a07a4cab3340cc3

Fixed typo in batchnorm docstrings

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15975

Differential Revision: D13642271

Pulled By: soumith

fbshipit-source-id: 60ffa392bf1f916f2b93c943bb44a642a9815c42

Tensor reinitialization codemod - 4/5 (#15967)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15967

Codemod generated with clangr shard mode, 25 files per diff,
To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and
call `ReinitializeTensor` to initialize it.
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13586735

fbshipit-source-id: eae2d79e1107a2e813ce3809e690af4706aaa9ca

Fix the lint (#15973)

Summary:
Fix the lint error introduced in https://github.com/pytorch/pytorch/pull/15965
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15973

Differential Revision: D13640856

Pulled By: houseroad

fbshipit-source-id: 3f14d9898dcfb0fc469468f63fa1461c88b66b2e

Tensor reinitialization codemod - 2/5 (#15947)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15947

Codemod generated with clangr shard mode, 25 files per diff,
To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and
call `ReinitializeTensor` to initialize it.
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13586732

fbshipit-source-id: 5295ab27ca0155f96a4fccf9c0ba8a609101ba24

Expose dim() on type and use it in ONNX symbolics (#15933)

Summary:
While integrating fork/join into production translation, we found that trying to export `transpose()` where the input is of `TensorType` (rather than `CompleteTensorType`) failed. This is not ideal, since `TensorType` still contains the number of dimensions of the tensor, and that's all the `transpose` symbolic needs.

This PR introduces a pybind binding for `dim()` on `TensorType` (and `CompleteTensorType` by inheritance). We now use this in places where it logically makes sense in the symbolics: those symbolics which only require knowledge of the number of dimensions rather than concrete sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15933

Differential Revision: D13639657

Pulled By: jamesr66a

fbshipit-source-id: 6e50e407e93060085fd00a686a928764d0ec888d

Tensor construction codemod(ResizeLike) - 3/3 (#15943)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15943

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13629082

fbshipit-source-id: d3863615fd612f73bb73ac67159fd0f0d237fe5c

FC shape inference should use int64_t (#15961)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15961

as title

Reviewed By: yinghai

Differential Revision: D13634427

fbshipit-source-id: ec7d168b6272f0dac8a693401cfd0bea368f929a

Undo norm optimizations and add more documentation for parallel.h (#15885)

Summary:
See https://github.com/pytorch/pytorch/issues/15602
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15885

Differential Revision: D13614841

Pulled By: cpuhrsch

fbshipit-source-id: 5d3e45f499d36ac287dbbc2e45798aa51eb5bfdf

Add/fallback some operators for mkl-dnn (#11696)

Summary:
Implementation LeakyRelu operator for mkl-dnn,the speed-up of a single operation is up to 10X on BDW.
Implementation rashape operator for mkl-dnn,it will resolve occasionally crash issue which use fallback reshape operator.
Implementation CreateBlobQueue and SafeEnqueueBlobs operators,it will resolve crash issue which use fallback operators.
Fallback CreateBlobsQueueDBOp,TensorProtosDBInput,CloseBlobsQueue operators.
Implement adam operator for mkl-dnn,the speed-up of a single operator is up to 6X on BDW.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11696

Reviewed By: yinghai

Differential Revision: D10100438

Pulled By: wesolwsk

fbshipit-source-id: 0b6e06897cc11e0a8e349d80a870b1e72e47f10d

Don't call cudaStreamDestroy at destruction time (#15692)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15692

It was leading to ocassional crashes with dynamically linked CUDA because runtime was already destroyed.

Also, unique_ptr<T[]> is more suitable than deque<T> for the purpose.

Reviewed By: Yangqing

Differential Revision: D13571988

fbshipit-source-id: 37eb26dfbe361c49160367b53f87bd037c6c0e46

Tensor construction codemod(ResizeLike) - 1/3 (#15944)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15944

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: dzhulgakov

Differential Revision: D13628999

fbshipit-source-id: e17c44cec6746674dfd5c2a89c28c4ac0a3da450

Move nightly binary builds to 05:05 UTC (#15966)

Summary:
This corresponds to 00:05 EST
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15966

Differential Revision: D13639027

Pulled By: pjh5

fbshipit-source-id: 6685a7af74329b2730e519afd10e350ef2258f32

Add backend checks for batch norm (#15955)

Summary:
Fixes #15826

Changelog:
- Add backend checks in `batch_norm_cpu` and `batch_norm_cuda`
- Modify check in `checkBackend` to pass on undefined tensors.

Differential Revision: D13636410

Pulled By: soumith

fbshipit-source-id: 3b1cfe5ca8b7c0346569077163503065e75c2659

Add scalar_type_to_pytorch_type dict in ONNX symbolic

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15965

Differential Revision: D13637521

Pulled By: zrphercule

fbshipit-source-id: 922cadc56f6380f67c14444cff4aa354a87150af

Register CPU/CUDA fuser dynamically (#15887)

Summary:
This avoids a bunch of conditional compilation logic
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15887

Reviewed By: eellison

Differential Revision: D13613239

Pulled By: zdevito

fbshipit-source-id: a18fc69676b3ef19b4469ab58d8714d1f6efccbb

Simplify cat fusion (#15633)

Summary:
That makes that definition of a "fusable node" much simpler,
as we don't need to keep considering whether something has to be an
"exit node" at every step. The fuser now tries to maximize the
pointwise fusions first, and proceeds to prepending chunks and appending
concats only once a fix point is reached.

This patch not only makes the fuser much simpler to reason about,
making it siginifcantly easier to implement features like SumToSize
fusion, to improve performance of derivative graphs.

cc zou3519 mruberry
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15633

Differential Revision: D13575306

Pulled By: zou3519

fbshipit-source-id: 0c55ea61d65d1f1ed3d75a8e1e83bc85a83f3aff

Add bindings for .cpu() & .cuda() to script (#15904)

Summary:
Adding bindings for .cpu() and .cuda() to script.

It's worth noting that if the device remains unchanged, than the returned tensor aliases the input, but if it does change than they do not alias each other.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15904

Differential Revision: D13632879

Pulled By: eellison

fbshipit-source-id: 024a04f267909674aa1e510562efd9cb081f407c

comment out large test cases for tril(u)_indices (#15959)

Summary:
4GB is still too large and leads to CUDA OOM failures.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15959

Differential Revision: D13635146

Pulled By: mrshenli

fbshipit-source-id: 3dc34a03d6ed65c458839d8fa37cd05bf3bc8106

update of fbcode/onnx to 7abd834091f1024c11749dcfd25126802db9fdd5 (#15942)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15942

Previous import was 8384c788939bc65463f9754b6a7a00b212b18ba1

Included changes:
- **[7abd834](https://github.com/onnx/onnx/commit/7abd834)**: Clarify some aspects of the Loop spec. (#1587) <Scott McKay>
- **[5a5b15f](https://github.com/onnx/onnx/commit/5a5b15f)**: Support rtol and atol at the model granularity (#1723) <Lu Fang>
- **[ba76e45](https://github.com/onnx/onnx/commit/ba76e45)**: print some information (#1724) <Lu Fang>
- **[797390d](https://github.com/onnx/onnx/commit/797390d)**: Update README.md (#1722) <Prasanth Pulavarthi>
- **[40cdb5f](https://github.com/onnx/onnx/commit/40cdb5f)**: repaire convtranspose shape inference (#1660) <peter yang>
- **[68fdb3f](https://github.com/onnx/onnx/commit/68fdb3f)**: [Minor] Fix Windows line ending in test coverage generating script (#1717) <Raymond Yang>
- **[00101bf](https://github.com/onnx/onnx/commit/00101bf)**: Remove ConstantLike op. Updates to ConstantOfShape op. (#1716) <Spandan Tiwari>
- **[c59e90a](https://github.com/onnx/onnx/commit/c59e90a)**: add a shape inference test for group conv (#1719) <Lu Fang>

Reviewed By: zrphercule

Differential Revision: D13629499

fbshipit-source-id: 4b3e4cb29bdb84c3777a8fb26263548efb20f317

Match NumPy by considering NaNs to be larger than any number when sorting (#15886)

Summary:
Fixes #15764
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15886

Differential Revision: D13612971

Pulled By: umanwizard

fbshipit-source-id: 91f552a25d1fd108f2f0b10e09a0ce0364f8c21e

Port empty_strided to ATen. (#15948)

Summary:
Turns out this has basically been implemented already in Resize.h / Resize.cuh.
Also added some testing, basically just to check that empty_strided behaves equivalently to as_strided.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15948

Differential Revision: D13631098

Pulled By: gchanan

fbshipit-source-id: eb0e04eead45e4cff393ebde340f9d265779e185

Move cudaDeviceProp to ATen (#14834)

Summary:
This PR moves `deviceProperties` from `THCState` struct to `CUDAContext` in ATen and hence, takes one more step towards removing `THCState`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14834

Differential Revision: D13633956

Pulled By: soumith

fbshipit-source-id: 51820ac224fc566f17aa92570fd378cff4248596

Trivial typo fixings in nn.functional dropout* docstrings (#15951)

Summary:
Defualt -> Default
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15951

Differential Revision: D13633875

Pulled By: soumith

fbshipit-source-id: 0da823ef235418396e9322089f6610b592e6990f

Resolves ptxas warnings when compiling for CUDA_ARCH 750 and a memoryType deprecation warning (#15461)

Summary:
When compiling for `TORCH_CUDA_ARCH_LIST=7.5` we were getting ptxas warnings (https://github.com/pytorch/pytorch/issues/14310). This was because we had some hardcoded values when using launch_bounds in kernels. The maximum number of threads per multiprocessor is 1024 for Turing architecture (7.5) but 2048 for previous architectures. The hardcoded launch_bounds in the kernel were requesting for 2048 threads when compiling for Turing and hence were generating the warning.

This PR adds a macro that checks for the bounds on the launch bounds value supplied. The max number of threads per block across all architectures is 1024. If a user supplies more than 1024, I just clamp it down to 512. Depending on this value, I set the minimum number of blocks per sm. This PR should resolve https://github.com/pytorch/pytorch/issues/14310. The gradient computation being wrong reported in that PR is probably due to the faulty card.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15461

Differential Revision: D13633952

Pulled By: soumith

fbshipit-source-id: 795aa151109f343ab5433bf3cb070cb6ec896fff

Fix fallback issues to handle inplace case (#15726)

Summary:
Fix fallback issues to handle inplace case
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15726

Differential Revision: D13591243

Pulled By: yinghai

fbshipit-source-id: 6897f1daacb36beabcdfc22c39242bbdfdd0e534

Optimize CPU version performance of the nonzero function. (#15925)

Summary:
Same as #15190 but compatible with MSVS compiler
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15925

Differential Revision: D13623473

Pulled By: VitalyFedyunin

fbshipit-source-id: d0db9dbc1a0d8fc9bda08348cb1d3763ae9f8679

Tensor reinitialization codemod - 5/5 (#15884)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15884

Codemod generated with clangr shard mode, 25 files per diff,
To eliminiate partially initialized Tensor, we split the initialization of local Tensor variables into two steps, first declare un uninitialized Tensor, and
call `ReinitializeTensor` to initialize it.
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: hyuen

Differential Revision: D13586737

fbshipit-source-id: dc8e49e9f29505b8898bb19f84c1a983f2d811ab

Add backward pass notes for eig() and symeig()

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15929

Differential Revision: D13626158

Pulled By: soumith

fbshipit-source-id: ab869560926036053c39d20b217ccef8767e7d3f

caffe2::Tensor::is_same() (#15407)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15407

Don't ask the tensor for its intrusive pointer if we just want to check if two tensors are the same.
This mirrors ATen APIs.

Reviewed By: dzhulgakov

Differential Revision: D13520389

fbshipit-source-id: 681317f36f480ab60e532bb08a073f98f39770fd

Clean up Half (#15317)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15317

- Merge bitcasts.h and Half.h
- Remove 'static' keyword

Reviewed By: dzhulgakov

Differential Revision: D13498492

fbshipit-source-id: 46d47143e7d3a9d3f4aa7d92379dbba015c97435

Move files to/from c10/core and c10/util (#15316)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15316

This starts cleaning up the files in c10 according to the module structure we decided on.

Move to c10/util:
- Half.h, Half-inl.h, Half.cpp, bitcasts.h

Move to c10/core:
- Device.h, Device.cpp
- DeviceType.h, DeviceType.cpp

i-am-not-moving-c2-to-c10

Reviewed By: dzhulgakov

Differential Revision: D13498493

fbshipit-source-id: dfcf1c490474a12ab950c72ca686b8ad86428f63

Remove Context from c10 operator schemas (#15312)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15312

Context will soon be entirely obsolete. Remove it from the operator schema interface.

Reviewed By: dzhulgakov

Differential Revision: D13495323

fbshipit-source-id: caa0f8f092cd6284e510c3e1e3374fe2f8338364

Enable calling caffe2 LayerNorm from PyTorch and JIT (#15243)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15243

Register it as a custom JIT op.

Reviewed By: dzhulgakov

Differential Revision: D13473791

fbshipit-source-id: 0f7e72e3efc85a75060a7597fadaf0a8bd289651

fix rocm build

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15945

Differential Revision: D13630505

Pulled By: zdevito

fbshipit-source-id: a4d2ae1370ab475fc1711027c0c9d2a9192be195

Remove USE_CUDA and USE_ROCM in engine.cpp

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15893

Differential Revision: D13627319

Pulled By: zdevito

fbshipit-source-id: 7c72c1c6cc242143fb66383423c668c9b9810884

Extend note about contributing to the C++ frontend (#15902)

Summary:
soumith ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15902

Differential Revision: D13628525

Pulled By: goldsborough

fbshipit-source-id: 70cf36d1bacd9d689d4fa4f2290886fd3765e89b

Fix different env variables in schedules runs pt 2 (#15934)

Summary:
Unfortunately I do not know how to test this without merging it first
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15934

Reviewed By: orionr

Differential Revision: D13627472

Pulled By: pjh5

fbshipit-source-id: 35eced1483bbf3c0c3f6f62fb7bbbf2f200e50e6

Change PoolOp Functors design to support CuDNN CUDA fallback (#15903)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15903

Change PoolOp Functors design to support CuDNN CUDA fallback

Reviewed By: houseroad

Differential Revision: D13617085

fbshipit-source-id: 8a539d77f35bc47afe5dc8e32aaad52e45cb691c

Fix bug in torch::load and unpack torch::optim::detail namespace (#15926)

Summary:
Wasn't clearing optimizer buffers before adding new entries to it during deserialization. Successive calls to `torch::load` with the same optimizer would just append to the buffer container. Also moved `serialize()` function from `torch::optim::detail` into `torch::optim` so users can use it for custom optimizers.

Fixes #15792

ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15926

Differential Revision: D13623615

Pulled By: goldsborough

fbshipit-source-id: e193091f25f56a95f2a9648af312cb7caa45f300

fix aliasing on unwrap optional (#15748)

Summary:
Fix for https://github.com/pytorch/pytorch/issues/15604
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15748

Differential Revision: D13583632

Pulled By: eellison

fbshipit-source-id: 9655ee010494179e17e34f3047363477dad15fb1

JIT Batch Norm fusion (#15897)

Summary:
Resubmit of #15146, which has been accidentally reverted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15897

Differential Revision: D13616093

Pulled By: zou3519

fbshipit-source-id: 0c3a3bec8f9fed57274da9f6c7cf40cbc05cf91a

Fix different env variables in schedules runs

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15927

Reviewed By: orionr

Differential Revision: D13624127

Pulled By: pjh5

fbshipit-source-id: e8b14f0401b0c278a5d17af6d7979800917e3ae6

Allow for registration after GlobalInit (#15876)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15876

Build changes made it so some .so libraries are now registered after GlobalInit is called. Although this shouldn't be common, it also shouldn't be explicitly excluded. These changes allow for late Caffe2 registration, but also warn in that case.

Reviewed By: kuttas

Differential Revision: D13608186

fbshipit-source-id: 0ca7bcd32516d374077db0c2548cf8c28ccdd5f6

Fix TestDataLoader.test_proper_exit (#15665)

Summary:
Currently, in `test_proper_exit`,
1. we do not kill the correct input `pid` in the `kill_pid` function
https://github.com/pytorch/pytorch/blob/fe15d6a2c231a7bc1b32781217ed336ccf9adff7/test/test_dataloader.py#L325-L329
2. the Windows command that detects process status doesn't actually work
https://github.com/pytorch/pytorch/blob/fe15d6a2c231a7bc1b32781217ed336ccf9adff7/test/test_dataloader.py#L641-L646
3. `worker_error` and `worker_kill` cases (sometimes?) are not tested because the workers may exit naturally due to the pre-fetching mechanism and a too small `dataset size / batch size`.

In this PR, I, in separate commits:
1. Install `psutil` (a python package specifically built for process monitoring) on some CI builds. (Linux builds installation are done in https://github.com/pietern/pytorch-dockerfiles/pull/29 https://github.com/pietern/pytorch-dockerfiles/pull/30  https://github.com/pytorch/ossci-job-dsl/pull/36 and https://github.com/pytorch/pytorch/pull/15795).
2. Rewrite `test_proper_exit` with `psutil` so we

    1. do not rely on the hacky `is_process_alive` https://github.com/pytorch/pytorch/blob/fe15d6a2c231a7bc1b32781217ed336ccf9adff7/test/test_dataloader.py#L640-L653
   2. increase the #task per worker so `worker_error` and `worker_kill` properly trigger
   3. test error message content to ensure that the loader exits with correct message corresponding to each exiting scenario.

3. Fix Windows data loader not having any mechanism to detect worker failures.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15665

Differential Revision: D13615527

Pulled By: soumith

fbshipit-source-id: cfb2f67837d2d87928a53f00b4d20f09754b7949

Unify flags and environmental variable when building LibTorch/PyTorch (#15868)

Summary:
Fixes #15858.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15868

Differential Revision: D13622354

Pulled By: soumith

fbshipit-source-id: bb8c49520ebf926c6194d42db75accba867018c7

Adding binary builds to circleci

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15577

Reviewed By: orionr

Differential Revision: D13617359

Pulled By: pjh5

fbshipit-source-id: 2b2a1b8735f2af6973a2352bee78912794402ae1

Fix lint

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/15910

Differential Revision: D13620684

Pulled By: houseroad

fbshipit-source-id: af3b1e2fed55ecd3417f66e549fa921bf4fd758e

Make SGD match python (#15840)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/15530
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15840

Differential Revision: D13608503

Pulled By: goldsborough

fbshipit-source-id: aad17c110d64cbe2c126bccd36d228e4108ffa9a

test_jit.py: Speedup EndToEnd tests by reducing workload size. (#15906)

Summary:
Currently these tests are taking most of the time in test_jit.py run, with the
proposed changes the testing time is reduced by ~75%:

```
TestEndToEndHybridFrontendModels.test_neural_style: 203.360s -> 10.650s
TestEndToEndHybridFrontendModels.test_snli: 422.315s -> 9.152s
TestEndToEndHybridFrontendModels.test_super_resolution: 73.362s -> 19.185s

time python test/test_jit.py (real): 13m50.828s -> 3m11.768s
time python test/test_jit.py (user): 85m59.745s -> 13m18.135s
time python test/test_jit.py (sys): 144m9.028s -> 25m58.019s
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15906

Differential Revision: D13619659

Pulled By: ZolotukhinM

fbshipit-source-id: 6c22d8740f8ddb865c3a0667af32653723383816