review.tizen.org Git - platform/upstream/pytorch.git/log

torch.ao migration: numeric suite, eager and fx (#64817)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64817

This migrates `torch.quantization._numeric_suite` to `torch.ao.ns._numeric_suite`, and `torch.quantization._numeric_suite_fx` to `torch.ao.ns._numeric_suite_fx`.

1. move the files
```
HG: move eager mode
hg mv caffe2/torch/quantization/_numeric_suite.py caffe2/torch/ao/ns/
HG: move fx
hg mv caffe2/torch/quantization/_numeric_suite_fx.py caffe2/torch/ao/ns/
hg mv caffe2/torch/quantization/ns/* caffe2/torch/ao/ns/fx/
```

2. create new versions of `_numeric_suite.py` and `_numeric_suite_fx.py` with
imports

3. update all FB callsites

Test Plan: buck test mode/dev //caffe2/test:quantization

Reviewed By: z-a-f

Differential Revision: D30867538

fbshipit-source-id: 120ee830434ca490c1183a187a518eebcbbaf22c

Pin scipy to 1.6.3 on Windows and Mac (#64844)

Summary:
It's already pinned by via docker install on Linux

As `scipy.stats.`[`poission`|`geom`|`binom`] returns quite different results in 1.7+ versions of SciPy

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64844

Reviewed By: driazati

Differential Revision: D30876591

Pulled By: malfet

fbshipit-source-id: 4946e0922063e9ac320c218a0b089f73486466f7

Revert D30867266: [pytorch][PR] TST Adds gradcheck and gradgradcheck to module info

Test Plan: revert-hammer

Differential Revision:
D30867266 (https://github.com/pytorch/pytorch/commit/67ebde56459557199b3c907b81b3c819f77500b9)

Original commit changeset: cbc073326151

fbshipit-source-id: 00234e01eafc45fb999f7c83a397f9d6b3e01e46

[RFC] Modularize functions of parsing bytecode (#61862)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61862

Modularize functions of parsing bytecode tables so that they can be used as needed in situations other than mobile lite interpreter.
* The decoupled functions are re-used by current lite interpreter loader.
* The bytecode can be serialized/deserialized from other formats.
* The decoupled functions have minimum dependencies on other PyTorch components.

Next:
Build a driver binary to include the parser and interpreter, but only has necessary dependency on other PyTorch components.
ghstack-source-id: 137867287

Test Plan:
As an example, a simple bytecode is parsed to a mobile function, and directly run in the added unit test, `RunTimeTest:ParseBytecode`. It contains basic control flow (if, else) and basic data orchestration (list construction).
CI

Reviewed By: larryliu0820

Differential Revision: D29798382

Pulled By: iseeyuan

fbshipit-source-id: 1c173a5f5d37097e3a97baec3f3e48e1eea1400f

Revert D30875977: [caffe2] [aten] Remove loose (unpaired) #pragma warning ( pop ) in TensorBase.h

Test Plan: revert-hammer

Differential Revision:
D30875977 (https://github.com/pytorch/pytorch/commit/1f35d20a894bb07e27691332af4beb097142762f)

Original commit changeset: bd593feb5a75

fbshipit-source-id: 4c82dbc857fdb28e0240dacc1a0e607a76552bb4

[iOS][OSS][BE] Update XCode to use 12.5.1 (#64850)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64850

ghstack-source-id: 137827895

Test Plan: CircleCI

Reviewed By: hanton

Differential Revision: D30877964

fbshipit-source-id: 803f2506a755b3815024704e6177c7826bc42de8

[iOS][OSS][BE] Remove unused files (#64849)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64849

ghstack-source-id: 137827893

Test Plan: CircleCI

Reviewed By: hanton

Differential Revision: D30877962

fbshipit-source-id: a76f7fe888b990ba6cad650f72be7f4a1e58a2f1

[TensorExpr] Move 2 graph passes from kernel.cpp to graph_opt.cpp (#64828)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64828

Also, make `removeUnusedSelfArgument` more consistent with other passes
by mutating the graph in-place rather than returning a copy.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30870776

Pulled By: ZolotukhinM

fbshipit-source-id: 4873f01b013921143a5aa43746d655a2d8d620c9

[TensorExpr] Add debug logging (store/load tracing) to IREval. (#64848)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64848

Test Plan: Imported from OSS

Reviewed By: Chillee

Differential Revision: D30878278

Pulled By: ZolotukhinM

fbshipit-source-id: bd946075336ba2e9786602161c236a0ff8a5a011

[TensorExpr] LLVMCodegen: fix lowering for UInt->Float casts. (#64862)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64862

Previously we erroneously were looking at dst signedness. This was
discovered when we tried to implement quantize/dequantize ops.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30881696

Pulled By: ZolotukhinM

fbshipit-source-id: 34af842e5e52a3b6b5d2e70c4ef32f910a20341f

[caffe2] [aten] Remove loose (unpaired) #pragma warning ( pop ) in TensorBase.h (#64870)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64870

Remove loose (unpaired) #pragma warning ( pop ) in TensorBase.h
Issue started with D30728580 (https://github.com/pytorch/pytorch/commit/d701357d921ef167d42c125e65b6f7da6be3ad0f), was fixed with D30846958 (https://github.com/pytorch/pytorch/commit/40098f48a1a37a06a456fd642d908ca522295706), and brought back again with the reversion of D30846958 (https://github.com/pytorch/pytorch/commit/40098f48a1a37a06a456fd642d908ca522295706).

Reviewed By: H-Huang

Differential Revision: D30875977

fbshipit-source-id: bd593feb5a75245470e43ad568ebdd3f1738da7c

[quant][fx2trt] Add lowering support for reference linear/conv modules (#64368)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64368

Test Plan:
python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py

Imported from OSS

Reviewed By: 842974287

Differential Revision: D30708738

fbshipit-source-id: 88142b7ce43ed96093597112dab03a2d277de993

[tensorexpr] Simplify x/100 -> 0 if x is a non-negative integer less than 100. (#64763)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64763

Simplification pattern:
x/N -> 0; N is a constant positive integer and x is a for-loop index whose range is a subset of [0, N).

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30845854

Pulled By: huiguoo

fbshipit-source-id: 814d69ed4be05e57405c222183cc1c6c526721cd

Revert D30869803: .circleci: Skip cuda /cudnn install if existing

Test Plan: revert-hammer

Differential Revision:
D30869803 (https://github.com/pytorch/pytorch/commit/717d267e191bcc1669acad21d87ffb70e6e89b90)

Original commit changeset: 9eb3bd20875d

fbshipit-source-id: bef8d0c693696307a3be7abd5331b7fa813d754a

TST Adds gradcheck and gradgradcheck to module info (#64444)

Summary:
Follow up to https://github.com/pytorch/pytorch/issues/61935

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64444

Reviewed By: ngimel

Differential Revision: D30867266

Pulled By: jbschlosser

fbshipit-source-id: cbc0733261517dbfcdd3415d969b9e802b62b7ac

Preserve types during empty container assignment (#58911)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58911

Stack from [ghstack](https://github.com/ezyang/ghstack):
* __->__ #58911

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D30785623

Pulled By: ansley

fbshipit-source-id: 4e05d6369318974290fea02ad2bc148293c25090

Always upload stats to S3 (#64853)

Summary:
It's not very useful when stats are only uploaded when the tests all pass.

Like for this failing run, the stats were not uploaded to Scribe or S3: https://github.com/pytorch/pytorch/runs/3568714658

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64853

Reviewed By: seemethere

Differential Revision: D30878361

Pulled By: janeyx99

fbshipit-source-id: 19a4c520efdd5575785a3ffbc60e6c09456b9e0d

[DataPipe] Remove ZipArchiveReader's dependency on FileLoader (#64786)

Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* https://github.com/pytorch/pytorch/issues/64788
* __->__ https://github.com/pytorch/pytorch/issues/64786

This PR removes ZipArchiveReader's dependency on FileLoader DataPipe, by allowing it to use a IterDataPipe of path names as input rather than a tuple of path name and a stream.

It also adds additional tests to ensure that the DataPipe is functioning properly when it is read multiple times or reset half way through reading.

The whole stack fixes issues related to unclosed buffer stream (see https://github.com/pytorch/pytorch/issues/64281).

cc VitalyFedyunin ejguan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64786

Reviewed By: ngimel

Differential Revision: D30870968

Pulled By: NivekT

fbshipit-source-id: 64b04d1697b99772f2fa20fc141668e6b8e18c41

.circleci: Skip cuda /cudnn install if existing (#64825)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64825

Rewrites this script to only install the CUDA tools if they are not already
pre-installed

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30869803

Pulled By: seemethere

fbshipit-source-id: 9eb3bd20875df0f2b18f5314ac825dbdf91637b5

[doc][hackathon] To add Adadelta Optimizer to the documentation (#63255)

Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of AdaDelta Algorithm to the documentation. For more details, we refer to the paper here https://arxiv.org/abs/1212.5701

<img width="654" alt="AdaDeltaalg" src="https://user-images.githubusercontent.com/73658284/132770544-82ccf90a-1d54-4ad5-8fc4-51c8dec63a12.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63255

Reviewed By: ngimel

Differential Revision: D30867589

Pulled By: iramazanli

fbshipit-source-id: 5ba602c20c724a4486bdd38b73e1b64c0e767bdc

Add more error checking in subclass creation (#64746)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64746

This extracts the error checking that used to be in the PR above.
We are not going to land the proposed fix there, but I think we want this error checking in right now as these would lead to respectively a memory leak and arbitrary memory read/write.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30867569

Pulled By: albanD

fbshipit-source-id: bf468033fb8b49fcb26eed423f5fad82b4a46c56

Move THPVariable_NewWithVar around (#64550)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64550

Just moves a function around to make the next PR easier to read.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30867570

Pulled By: albanD

fbshipit-source-id: 99ae925568ed29ca7fdea059762c21d430d4a204

[MicroBench] Added a log_vml version of the signed log1p kernel (#64205)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64205

The log_vml version of the micro-bench is over **2x** faster than the log1p version. Here are the perf numbers:

```
---------------------------------------------------------------------------------------------
Benchmark                                   Time             CPU   Iterations UserCounters...
---------------------------------------------------------------------------------------------
SignedLog1pBench/ATen/10/1467           45915 ns        45908 ns        14506 GB/s=2.5564G/s
SignedLog1pBench/NNC/10/1467            40469 ns        40466 ns        17367 GB/s=2.9002G/s
SignedLog1pBench/NNCLogVml/10/1467      19560 ns        19559 ns        35902 GB/s=6.00016G/s
```

Thanks to bertmaher for pointing this out.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30644716

Pulled By: navahgar

fbshipit-source-id: ba2b32c79d4265cd48a2886b0c62d0e89ff69c19

[nnc] Added an implementation of sign op (#64033)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64033

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30579197

Pulled By: navahgar

fbshipit-source-id: f9f7fa7f2ffa109cf4e441eb1af821b8b891d4d3

Extend 2Dim embedding bag benchmarking to include 3Dim benchmarks (#64647)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64647

Add support for benchmarking of 8 bit quantizations of N-D batched embeddings. Currently only works for 3Dim embeddings and still requires thought on ramping up from 3Dim to NDim.

Test Plan: ```buck run //caffe2/benchmarks/operator_benchmark/pt:qembedding_pack_test```

Reviewed By: jingsh

Differential Revision: D30770085

fbshipit-source-id: 26659020f3458991592065a05366bde0f060494e

Revert D30846958: [caffe2/aten] Remove loose #pragma warning ( pop ) in TensorBase.h

Test Plan: revert-hammer

Differential Revision:
D30846958 (https://github.com/pytorch/pytorch/commit/40098f48a1a37a06a456fd642d908ca522295706)

Original commit changeset: 52a3fb66e426

fbshipit-source-id: 1d749f6981756f2169d6867538555a945cbb8ca6

[DataPipe] fixing tests related fork() to remove warnings (#64827)

Summary:
There are two warnings produced by `test_fork_datapipe`. This PR addresses the issues raised by those warnings without impacting the test cases.

cc VitalyFedyunin ejguan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64827

Reviewed By: ejguan

Differential Revision: D30870528

Pulled By: NivekT

fbshipit-source-id: 580a001c6fa3ff6f8b04a7e5183e58861938204b

[tensorexpr] Add 'pre_alloc' argument in python API of tensorexpr kernel (#64718)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64718

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30826582

Pulled By: huiguoo

fbshipit-source-id: 6c173c8964f2643039273cdc83e64fb02bb5f381

Skip conjugate and negate fallback for view ops and their in-place versions (#64392)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64392

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30866330

Pulled By: anjali411

fbshipit-source-id: 7b2f51486bf1d610ad2b1472306bab608ee69c37

To add Rprop documentation (#63866)

Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Rprop to the documentation. For more details, we refer to the paper http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.21.1417

<img width="657" alt="Rpropalg" src="https://user-images.githubusercontent.com/73658284/132750009-a5ec059e-6d53-4c67-917b-57174c8ca27b.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63866

Reviewed By: ngimel

Differential Revision: D30867590

Pulled By: iramazanli

fbshipit-source-id: 0d2d4ffc6c4d939290bbbaa84d2c6e901ed8b54a

[ROCm] define C10_WARP_SIZE to warpSize HIP constant (#64302)

Summary:
warpSize is defined as a constexpr in HIP headers. It is incorrect to assume warpSize 64. This change fixes the C10_WARP_SIZE definition in torch sources similar to [how it was done in caffe2](https://github.com/pytorch/pytorch/blob/master/caffe2/utils/GpuDefs.cuh#L10-L14).

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64302

Reviewed By: mrshenli

Differential Revision: D30785975

Pulled By: malfet

fbshipit-source-id: 68f8333182ad4d02bd0c8d02f1751a50bc5bafa7

fix typo in torch/onnx/utils.py (#63396)

Summary:
fixes minor typo

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63396

Reviewed By: pbelevich

Differential Revision: D30644295

Pulled By: SplitInfinity

fbshipit-source-id: c506f67383909aa2c0c7c533038446b4b3d76a3a

build: bump bazel to 4.2.1 (#64455)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64455

Reviewed By: saketh-are

Differential Revision: D30752580

Pulled By: malfet

fbshipit-source-id: 4f5cc6f820396348181c09463f7e5628b5f69471

ROCm MIOpen NHWC Convolution support (#63617)

Summary:
- Added 2D-Convolution NHWC support
  - on ROCm 4.3, with `PYTORCH_MIOPEN_SUGGEST_NHWC=1` flag
  - May need to force MIOpen to search for solutions ( see examples below for flags )

**PYTORCH_MIOPEN_SUGGEST_NHWC Environment Flag**
MIOpen does not officially support NHWC yet, although convolution support has been added to tip-of-tree of MIOpen. This flag is intended to be a short-lived flag to explicitly turn on NHWC support until ROCm officially supports NHWC and performance is verified.

**Examples**
1. Example usage 1 : Run test on ROCm4.3
`PYTORCH_TEST_WITH_ROCM=1 PYTORCH_MIOPEN_SUGGEST_NHWC=1 MIOPEN_FIND_ENFORCE=4 MIOPEN_DEBUG_CONV_GEMM=0 MIOPEN_FIND_MODE=1 pytest test_nn.py -v -k "test_conv_cudnn_nhwc" `
2. Example usage 2: Run the following with `PYTORCH_MIOPEN_SUGGEST_NHWC=1` on ROCm4.3.
```
#!/usr/bin/env python3
import torch
model = torch.nn.Conv2d(8, 4, 3).cuda().half()
model = model.to(memory_format=torch.channels_last)
input = torch.randint(1, 10, (2, 8, 4, 4), dtype=torch.float32, requires_grad=True)
input = input.to(device="cuda", memory_format=torch.channels_last, dtype=torch.float16)

# should print True for is_contiguous(channels_last), and strides must match NHWC format
print(input.is_contiguous(memory_format=torch.channels_last), input.shape, input.stride() )

out = model(input)

# should print True for is_contiguous(channels_last), and strides must match NHWC format
print("Contiguous channel last :", out.is_contiguous(memory_format=torch.channels_last), " out shape :",  out.shape, "out stride :", out.stride() )
```

See https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html for more examples.

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63617

Reviewed By: saketh-are

Differential Revision: D30730800

Pulled By: ezyang

fbshipit-source-id: 61906a0f30be8299e6547d312ae6ac91cc7c3238

Let all_reduce_coalesced and all_gather_coalesced return Future objects (#64722)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64722

`all_reduce_coalesced` and `all_gather_coalesced` are never publicly
released in our API docs. So, I would assume the blast radius to be small.

The motivation for this change to allow implementing
`all_reduce_coalesced` and `all_gather_coalesced` by re-using `allreduce`
and `allgather` C++ cores and perform flatten and copy only on the Python
side. With that, we can then remove `all_reduce_coalesced` and
`all_gather_coalesced` from C++ ProcessGroup APIs. For the async mode,
the copy-back logic after the communication will need to be chained
as a callback on the returned Future and use the chained child Future
as the return value (otherwise, we will need to wrap the child Future
into another work handle). This PR tries to test if we can directly
return a Future without breaking tests and internal use cases. If yes,
it will make the consolidation a lot easier.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D30830994

Pulled By: mrshenli

fbshipit-source-id: dcde0ed9245e9e8fee357b3588b07d540a4b6318

`torch.lu`: forward AD support (#64742)

Summary:
As per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64742

Reviewed By: H-Huang

Differential Revision: D30841227

Pulled By: albanD

fbshipit-source-id: dc4d043ab94358594adb110fbbbb60750c98262a

[const_fold] Keep around node.meta for replaced folded ops (#64782)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64782

Previously, get_attrs that were added to the graph did not retain node.meta after folding. Add such support, and improve coverage in general here.

Test Plan: Added test coverage.

Reviewed By: protonu

Differential Revision: D30852704

fbshipit-source-id: ece87a61c69b2e68982964c6adc4dde14dae12c7

[caffe2/aten] Remove loose #pragma warning ( pop ) in TensorBase.h (#64773)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64773

Remove loose `#pragma warning ( pop )` in TensorBase.h.

Reviewed By: ezyang

Differential Revision: D30846958

fbshipit-source-id: 52a3fb66e426bc16ef7bde2a13e26e8293969026

Add TRTSplitter (#64762)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64762

Extract and format TRTSplitter from fx2trt_example code, current implementation is tentative, subject to changed based on feeds model lowering progress.

Test Plan:
manul print of supported operator:
`{<class 'torch.nn.modules.activation.ReLU'>: None, <function relu at 0x7f9b1abd0790>: None, <class 'torch.nn.modules.activation.Sigmoid'>: None, <class 'torch.nn.modules.pooling.AdaptiveAvgPool2d'>: None, <built-in method add of type object at 0x7f9b7f402498>: None, <built-in function add>: None, <built-in method add of PyCapsule object at 0x7f9b1a3dc690>: None, <built-in method add_relu of PyCapsule object at 0x7f9b1a34cf90>: None, <class 'torch.nn.modules.batchnorm.BatchNorm2d'>: None, <class 'torch.nn.quantized.modules.batchnorm.BatchNorm2d'>: None, <class 'torch.nn.modules.conv.Conv2d'>: None, <class 'torch.nn.quantized.modules.conv.Conv2d'>: None, <class 'torch.nn.intrinsic.quantized.modules.conv_relu.ConvReLU2d'>: None, <class 'torch.nn.modules.linear.Linear'>: None, <class 'torch.nn.quantized.modules.linear.Linear'>: None, <class 'torch.nn.modules.pooling.MaxPool2d'>: None, <built-in function mul>: None, <built-in method mul of type object at 0x7f9b7f402498>: None, <built-in method mul of PyCapsule object at 0x7f9b1a3dc6c0>: None, <built-in method flatten of type object at 0x7f9b7f402498>: None, <class 'torch.nn.quantized.modules.DeQuantize'>: None, <built-in method dequantize of type object at 0x7f9b7f402498>: None, 'dequantize': None, <class 'torch.nn.quantized.modules.Quantize'>: None, <built-in method quantize_per_tensor of type object at 0x7f9b7f402498>: None, <class 'torch.nn.modules.linear.Identity'>: None, <function conv2d at 0x7f9b1a1fe9d0>: None, <function flatten at 0x7f9b1a1f5ca0>: None, <function size at 0x7f9b1a1f5b80>: None, <function batch_norm at 0x7f9b1a1feaf0>: None, <function layer_norm at 0x7f9b1a1feb80>: None, <function softmax at 0x7f9b1a1f9550>: None, <function relu at 0x7f9b1a1fe040>: None, <function sin at 0x7f9b1a2030d0>: None, <function cos at 0x7f9b1a203160>: None, <function tan at 0x7f9b1a2031f0>: None, <function sinh at 0x7f9b1a1fe160>: None, <function cosh at 0x7f9b1a1fe280>: None, <function tanh at 0x7f9b1a1fe310>: None, <function asin at 0x7f9b1a1fe3a0>: None, <function acos at 0x7f9b1a1fe430>: None, <function atan at 0x7f9b1a1fe4c0>: None, <function exp at 0x7f9b1a1fe550>: None, <function log at 0x7f9b1a1fe5e0>: None, <function sqrt at 0x7f9b1a1fe670>: None, <function reciprocal at 0x7f9b1a1fe700>: None, <function abs at 0x7f9b1a1fe790>: None, <function neg at 0x7f9b1a1fe820>: None, <function floor at 0x7f9b1a1fe8b0>: None, <function ceil at 0x7f9b1a1fe940>: None, <function sum at 0x7f9b1a1f9c10>: None, <function max_pool2d at 0x7f9b1a1f5d30>: None, <function squeeze at 0x7f9b1a1f5c10>: None, <function add at 0x7f9b1a1f91f0>: None, <function sub at 0x7f9b1a1f9ca0>: None, <function div at 0x7f9b1a1f9dc0>: None, <function mul at 0x7f9b1a1f9d30>: None, <function pow at 0x7f9b1a1f9e50>: None, <function min_two_tensors_input at 0x7f9b1a1f9940>: None, <function unsqueeze at 0x7f9b1a1f9280>: None, <function topk at 0x7f9b1a203280>: None, <function adaptive_avg_pool2d at 0x7f9b1a1f5dc0>: None, <function avg_pool2d at 0x7f9b1a1f5e50>: None, <function reshape at 0x7f9b1a203550>: None, <function slice_tensor at 0x7f9b1a1fee50>: None, <function split at 0x7f9b1a1fec10>: None, <function linear at 0x7f9b1a1f51f0>: None, <function clamp at 0x7f9b1a1f93a0>: None, <function tuple_construct at 0x7f9b1a1fed30>: None, <function contiguous at 0x7f9b1a1f9430>: None, <function getitem at 0x7f9b1a203310>: None, <function cat at 0x7f9b1a1f9310>: None, <function transpose at 0x7f9b1a1f94c0>: None, <function matmul at 0x7f9b1a1f98b0>: None, <function sigmoid at 0x7f9b1a1fe1f0>: None, <function permute at 0x7f9b1a1f9670>: None, <function quantize_per_tensor at 0x7f9b1a1f9b80>: None, <function dequantize at 0x7f9b1a1f99d0>: None, <function sign at 0x7f9b1a1f5ee0>: None}`

Reviewed By: 842974287

Differential Revision: D30798047

fbshipit-source-id: 69076a550874425b7186fbbf2ecf03da4a99b42f

[PyTorch] Fix missing move in torch::jit::Lexer::next (#64653)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64653

Saves shared_ptr refcount inc/dec in SourceRange.
ghstack-source-id: 137608457

Test Plan: Profiled startup of framework overheads benchmark from high_per_models; self time spent in next() is way down.

Reviewed By: dhruvbird

Differential Revision: D30739240

fbshipit-source-id: ac455678c9d46e657b111d3788d4369983028674

[PyTorch] Use std::find in the JIT lexer (#64652)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64652

If nothing else, it is slightly clearer code.
ghstack-source-id: 137608456

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D30739239

fbshipit-source-id: bc7917b59883ca4a33fc6916b4e422bad79cf04b

[TensorExpr] Simplify TE IR before applying any transformations. (#64717)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64717

This also exposed several bugs, which are fixed in this PR.

Differential Revision:
D30826408
D30826408

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: a67ec5739aceed9ffdf0d24f77eb3787cefe4560

[quant][fix] Fix quantization for sub_scalar (#64603)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64603

We'll insert observer only when both the operator and dtype is supported

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_sub_scalar

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30797025

fbshipit-source-id: a77c21e2749405534fc245374cf33a0657a3d2c8

[Android] print type name for IValues (#64602)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64602

print type name in error message for easier debugging.

Test Plan:
Example:
java.lang.IllegalStateException: Expected IValue type Tensor, actual type TensorList

Reviewed By: beback4u

Differential Revision: D30782318

fbshipit-source-id: 60d88a659e7b4bb2b574b12c7652a28f0d5ad0d2

[caffe2][tiny] Add logging to report what the current lengths are when mismatched lengths are detected (#64768)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64768

as title

Differential Revision: D30846637

fbshipit-source-id: 266768c81b315fdebba854135ea2db1faf67fd6a

[doc][hackathon] To add Adagrad Optimizer to the documentation (#63254)

Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Adagrad to the documentation. For more details, we refer to the paper
http://jmlr.org/papers/v12/duchi11a.html

<img width="658" alt="AdaGradAlgo" src="https://user-images.githubusercontent.com/73658284/132743276-a52ea3fb-70a5-4788-94b7-f99367907a26.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63254

Reviewed By: albanD

Differential Revision: D30852139

Pulled By: iramazanli

fbshipit-source-id: 9e496560a97e92be8386585b01d9bd3bba4b0c66

[Static Runtime] Fix resize_output_check warning coming from prim::VarConcat (#64765)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64765

Test Plan: Tested the fix with BR v1 model predictor-replayer setup.

Reviewed By: ajyu

Differential Revision: D30846506

fbshipit-source-id: 3ef3c93f11285c7cd1e2b188ca298a7ab4fba579

Rename profiler metadata key (#63743)

Summary:
rename metadata key to be the same with variable name

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63743

Reviewed By: albanD

Differential Revision: D30839501

Pulled By: gdankel

fbshipit-source-id: b9b4e670dcc9557b8d8d0730baea0ad39a1a0ca4

Add support for lowering info during serialize_module, and add padding/partial to it (#5810)

Summary:
Pull Request resolved: https://github.com/pytorch/glow/pull/5810

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64725

- Any info added to the dict in node.meta["lowering_info"] will be added to the node_rep during serialization.
- Use this to add annotations on placeholders that allow partial inputs and require padding.
- Check for these annotations and set them in the NNPICompiledFunction as expected

Test Plan: Validated working on inline_cvr in stack. Additionally existing fx_glow end to end tests should still pass.

Reviewed By: 842974287

Differential Revision: D30824192

fbshipit-source-id: def64ef097aa35c337abb494415f7d437c6c7fa9

cat_shape_check: Fixes dimension in the error message for CUDA cat shape check and removes unnecessary offending index information (#64556)

Summary:
Fixes: https://github.com/pytorch/pytorch/issues/64207

Thank you, SsnL for providing the reproducing script.

cc ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64556

Reviewed By: albanD

Differential Revision: D30843859

Pulled By: ngimel

fbshipit-source-id: 457ebe80eaef793d9f5d35ee962b6697e5de1907

Enable the on-demand performance PR testing to run on a specified TB branch (#64701)

Summary:
This is to enable performance testing of experimental features such as LazyTensor.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64701

Test Plan:
TorchBench CI

RUN_TORCHBENCH: BERT_pytorch, mobilenet_v3_large
TORCHBENCH_BRANCH: v1.0

Reviewed By: seemethere

Differential Revision: D30847389

Pulled By: xuzhao9

fbshipit-source-id: 6853b368fa6f1ba8ffde517805c74bf318dcb35b

.github: Remove add_annotations workflow (#64449)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64449

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS

Reviewed By: suo, janeyx99

Differential Revision: D30738460

Pulled By: seemethere

fbshipit-source-id: f1259fcba2f0c14a9bcfbe811ec0a4bf61106619

[Dist/CI] Remove dist from target determinator (#64721)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64721

There are a couple PRs where distributed CI did not runa nd we expect
it to. Examples:

https://github.com/pytorch/pytorch/pull/64513/checks?check_run_id=3539190960,
https://github.com/pytorch/pytorch/pull/64113. All distributed tests should've
been run on these PRs, but we can see they were not:

```
Determination is skipping distributed/test_c10d_common
Determination is skipping distributed/test_c10d_gloo
Determination is skipping distributed/test_c10d_nccl
Determination is skipping distributed/test_c10d_spawn_gloo
Determination is skipping distributed/test_c10d_spawn_nccl
Running distributed/test_data_parallel without determination
Determination is skipping distributed/test_distributed_spawn
Determination is skipping distributed/test_jit_c10d
```

Since it is important to run distributed tests on PRs that touch distributed,
exclude distributed from target_det_list for now.
ghstack-source-id: 137654015

Test Plan: CI

Reviewed By: driazati, mrshenli

Differential Revision: D30830455

fbshipit-source-id: 8b0fdf5b57c2c647b0d82c48e2bb8e2bdbe4d307

fix acc topk's handling of the case when dim=0, fix tests as well (#64727)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64727

the acc ops convertor for topk has a subtle bug (i found this while trying to introduce max/min)
the code does not differentiate between dim == None and dim ==0, but these are both different computations

Reviewed By: jfix71, 842974287

Differential Revision: D30833621

fbshipit-source-id: 6cd84e6ca4e95bb1a6d6465e61830b76808a9c78

Fix a shadowed variable (#64695)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64695

Resolves this warning:
```
caffe2/aten/src/ATen/ParallelOpenMP.h:109:63: warning: declaration of 'int64_t begin' shadows a parameter [-Wshadow=compatible-local]
  109 |   internal::invoke_parallel(begin, end, grain_size, [&](int64_t begin, int64_t end) {
      |                                                       ~~~~~~~~^~~~~
caffe2/aten/src/ATen/ParallelOpenMP.h:86:1: note: shadowed declaration is here
   85 | inline scalar_t parallel_reduce(
      |                 ~~~~~~~~~~~~~~~~
   86 |     const int64_t begin,
      | ^   ~
```

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D30816128

fbshipit-source-id: 3adff6d94eea9fbd65885e88283cae10b87dba18

Added more version comparison operations (#63848)

Summary:
Currently the [TorchVersion](https://github.com/pytorch/pytorch/blob/1022443168b5fad55bbd03d087abf574c9d2e9df/torch/torch_version.py#L13) only only supports 'greater than', and 'equal to' operations for comparing torch versions and something like `TorchVersion('1.5.0') < (1,5,1)` or `TorchVersion('1.5.0') >= (1,5)` will throw an error.

I have added 'less than' (`__lt__()`), 'greater than or equal to' (`__ge__()`) and 'less than or equal to' (`__le__()`) operations, so that the TorchVersion object can be useful for wider range of version comparisons.

cc seemethere zsol

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63848

Reviewed By: fmassa, heitorschueroff

Differential Revision: D30526996

Pulled By: seemethere

fbshipit-source-id: 1db6bee555043e0719fd541cec27810852590940

Reverts cat and stack warning when out= is not the expected shape (#64714)

Summary:
These warnings are being thrown too aggressively at the moment. See https://github.com/pytorch/pytorch/issues/64709 for a follow-up to reenable them once internal call sites are reviewed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64714

Reviewed By: ngimel

Differential Revision: D30822965

Pulled By: mruberry

fbshipit-source-id: 3ad7c92d381d42ac6187ed84afab477c579a8f35

To add SequentialLR to PyTorch Core Schedulers (#64037)

Summary:
Partially resolves https://github.com/pytorch/vision/issues/4281

In this PR we are proposing a new scheduler --SequentialLR-- which enables list of different schedulers called in different periods of the training process.

The main motivation of this scheduler is recently gained popularity of warming up phase in the training time. It has been shown that having a small steps in initial stages of training can help convergence procedure get faster.

With the help of SequentialLR we mainly enable to call a small constant (or linearly increasing) learning rate followed by actual target learning rate scheduler.

```PyThon
scheduler1 = ConstantLR(optimizer, factor=0.1, total_iters=2)
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
scheduler = SequentialLR(optimizer, schedulers=[scheduler1, scheduler2], milestones=[5])

for epoch in range(100):
    train(...)
    validate(...)
    scheduler.step()
```

which this code snippet will call `ConstantLR` in the first 5 epochs and will follow up with `ExponentialLR` in the following epochs.

This scheduler could be used to provide call of any group of schedulers next to each other. The main consideration we should make is every time we switch to a new scheduler we assume that new scheduler starts from the beginning- zeroth epoch.

We also add Chained Scheduler to `optim.rst` and `lr_scheduler.pyi` files here.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64037

Reviewed By: albanD

Differential Revision: D30841099

Pulled By: iramazanli

fbshipit-source-id: 94f7d352066ee108eef8cda5f0dcb07f4d371751

[pytorch] Make qlinear weight packing thread safe (#63804)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63804

Adding a lock around weight packing section of qlinear + qlinear_dynamic

Test Plan: automated tests

Reviewed By: kimishpatel

Differential Revision: D30340957

fbshipit-source-id: 1c9faf796c4ffbc74345396188a6f1154a76bea6

`torch.lu_solve`: forward AD support (#64646)

Summary:
As per title.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64646

Reviewed By: VitalyFedyunin

Differential Revision: D30807898

Pulled By: albanD

fbshipit-source-id: 1f943c22357dd1b3662cfe0d2a26af68e3a2df4c

[nnc] Handled cast in index expression during inlining (#64716)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64716

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30826388

Pulled By: navahgar

fbshipit-source-id: 7e446602f650527e0d954e437f0370602019e040

[nnc] Updated indices during broadcast to use int64_t (#64627)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64627

This fixes the root cause of S242719

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30801686

Pulled By: navahgar

fbshipit-source-id: b6d3ebdc7eb57116eaced53c2f35c7798bb17e80

Revert D30745921: [DDP] Fix when buffers are reassigned in module

Test Plan: revert-hammer

Differential Revision:
D30745921 (https://github.com/pytorch/pytorch/commit/d59ecc02df70bad2273858c2fad2b4993133a3d3)

Original commit changeset: 25eb1edbf445

fbshipit-source-id: 343ead86bf1e2d0b2d4124be331ea2fa437303ad

Revert D30745961: [DDP] Remove self.modules_params

Test Plan: revert-hammer

Differential Revision:
D30745961 (https://github.com/pytorch/pytorch/commit/8c095102948c9601792a884dad56da5085c51bee)

Original commit changeset: 32d102502570

fbshipit-source-id: 59f7cc50d369b6cc2856cf4ebd0f58b96202336d

Revert D30745960: [DDP] Remove SPMD from self.modules_buffers

Test Plan: revert-hammer

Differential Revision:
D30745960 (https://github.com/pytorch/pytorch/commit/15532595209d2daf34d35e10f8d3d3b64966aea2)

Original commit changeset: 66a8f9847e9f

fbshipit-source-id: d3f3fb813c45ac1b0ff15c6154b2e99e5dbab433

[JIT] Add gradient check in constants (#64613)

Summary:
fixes internal issue

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64613

Reviewed By: Gamrix

Differential Revision: D30799016

Pulled By: eellison

fbshipit-source-id: 48ef52d1cac627919e6cd232216d24878a2a8b58

Filter out _disabled_torch_function_impl from handle_torch_function (#64689)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64689

This brings it in line with the C++ implementation.

Fixes https://github.com/pytorch/pytorch/issues/64687

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30816215

Pulled By: ezyang

fbshipit-source-id: ed36af6c35467ae678d9548197efd97c36d38dec

To add Rectified Adam Description to Documentation (#63772)

Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Rectified Adam Algorithm to the documentation. For more details, we refer to the paper https://arxiv.org/abs/1908.03265

<img width="446" alt="RadamAlgo" src="https://user-images.githubusercontent.com/73658284/132587815-4764b642-df53-4e41-975f-72e0f40fdc48.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63772

Reviewed By: datumbox

Differential Revision: D30839694

Pulled By: iramazanli

fbshipit-source-id: 6f5629ce56e10c66a451433334b587b99eda1610

[doc][hackathon] To add AdamW Optimizer to the documentation (#63252)

Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of AdamW Algorithm to the documentation. For more details, we refer to the paper here https://arxiv.org/abs/1711.05101

<img width="442" alt="AdamWalgo" src="https://user-images.githubusercontent.com/73658284/132589957-6d381e96-cb62-40d0-990f-82a32ec455be.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63252

Reviewed By: datumbox

Differential Revision: D30839685

Pulled By: iramazanli

fbshipit-source-id: 1a426c874ab86408d286a34f41aefcf5b21167c0

To add Adamax algorithm to documentation (#63903)

Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Adamax Algorithm to the documentation. For more details, we refer to the paper https://arxiv.org/abs/1412.6980

<img width="447" alt="Adamx" src="https://user-images.githubusercontent.com/73658284/132577306-878ce64c-627a-4086-808c-d0482868d4a1.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63903

Reviewed By: albanD

Differential Revision: D30819055

Pulled By: iramazanli

fbshipit-source-id: 37f748cbea9f93bf37193ee30fc295fb1a1e9ffd

[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT`

Reviewed By: zertosh

Differential Revision: D30835585

fbshipit-source-id: a7d35319fd3ae3eddd29b69d299d842f68d587f6

Fix lop1p lowering bug (#64724)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64724

`1` will introduce a int tensor instead of float tensor, which doesn't work well with downstream operators (elementwise). Error would be like
```
[TensorRT] WARNING: IElementWiseLayer with inputs (Unnamed Layer* 1) [Unary]_output and (Unnamed Layer* 2) [Constant]_output: first input has type Float but second input has type Int32.
```
Changing the constant to be float type fixes this.

Reviewed By: 842974287

Differential Revision: D30796959

fbshipit-source-id: 0538e4dd960df9ce87a2d4cafe8f1a0c061b6bad

Migrate uses of THCReduceApplyUtils to cuda_utils::BlockReduce (#64713)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64713

Resubmit of #64442

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30825646

Pulled By: ngimel

fbshipit-source-id: 66b06bd0b30b401833e337920681d19d96b11f9d

[DDP] Remove SPMD from self.modules_buffers (#64474)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64474

No need for a nested list here.
ghstack-source-id: 137526312

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D30745960

fbshipit-source-id: 66a8f9847e9fe1e02c51b79647e93bf7665cf4d9

[DDP] Remove self.modules_params (#64473)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64473

Unused after SPMD deprecated.
ghstack-source-id: 137526305

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D30745961

fbshipit-source-id: 32d102502570291e01579e5b47a6d74dc71013bb

[DDP] Fix when buffers are reassigned in module (#64472)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64472

Sometimes, user module can reassign tensor buffer, as in:

```
self.buffer = torch.randn(1, 2) # in init
self.buffer += 1 # in forward
```

in this case, `self.modules_buffers` will become outdated and we should
repopulate self.modules_buffers if we need to sync module buffers.

See https://github.com/pytorch/pytorch/issues/63916 for full description of the
issue.
ghstack-source-id: 137526309

Test Plan: CI

Reviewed By: zhaojuanmao

Differential Revision: D30745921

fbshipit-source-id: 25eb1edbf445703a481802e07f3058d38ea6fc64

[PyTorch] Fix MobileDebugInfo vector copy (#64030)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64030

ghstack-source-id: 137566816

Test Plan:
Pixel 3 before: https://our.intern.facebook.com/intern/aibench/details/320277034999340
Pixel 3 after: https://our.intern.facebook.com/intern/aibench/details/724509739115867

can see the vector copy disappear in the flame graph. Overall mean decreased from 354 ms to 348 ms (though I'm not sure if this is outside usual noise).

Reviewed By: raziel

Differential Revision: D30559032

fbshipit-source-id: 6d8bb5396d3449cc63023ee7acf694b5d146ddc1

[PyTorch] move from input ivalues in ByteCodeDeserializer (#64029)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64029

This should save us a separate pass over the data structure to destroy it.
ghstack-source-id: 137566821

Test Plan:
Pixel3
before:
https://www.internalfb.com/intern/aibench/details/503337445067962
after:
https://our.intern.facebook.com/intern/aibench/details/320277034999340

overall mean time decreased from 373 ms to 358 ms. In flame graph, we
can see that some time spent destroying a vector of IValues was moved
into parseMethods, and the new parseMethods time is less than the old
time plus the recursive destruction time.

Reviewed By: dhruvbird

Differential Revision: D30559530

fbshipit-source-id: d080295a846745ea03ac50f08f4f6c95f4eaf3d8

[PyTorch] Copy vectors less in Function::append_operator (#63977)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63977

Doesn't seem to be any reason to copy these argument vectors.
ghstack-source-id: 137566815

Test Plan: CI

Reviewed By: dhruvbird, raziel

Differential Revision: D30550301

fbshipit-source-id: 33c199f975e4fb62c50a8210dc08aa9bb7a3e2f2

[FX] make visualizer produce different formatted output (#64699)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64699

Previously we just hardcode to svg format. We should give folks a choice in terms of what format they want to see. If we give a weird extension like .abc and this will error out and we expect this to be the right behavior.

Reviewed By: houseroad

Differential Revision: D30718883

fbshipit-source-id: fe8827262f94ea6887999bb225de763d1909eef8

Re-enable nightly doc pushes (#64708)

Summary:
That were accidentally disabled by https://github.com/pytorch/pytorch/pull/64222

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64708

Reviewed By: seemethere

Differential Revision: D30822089

Pulled By: malfet

fbshipit-source-id: 056b5c006f236c78ffe8afa4a5eab2f35e1bce89

[acc_tracer] Enable check_mutable_operations (#64456)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64456

att

Test Plan: CI

Reviewed By: protonu

Differential Revision: D30679174

fbshipit-source-id: 73f3a07d58380cd44fb3481aa97d463c0a964de8

[tensorexpr] Allocate intermediate buffers at compile time (#64227)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64227

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30652220

Pulled By: huiguoo

fbshipit-source-id: cd75005cdfa42751318de7174b44e14a3a01634e

[tensorexpr] Add 'is_allocated' flag for buffers and use it to insert 'Alloc/Free' stmts (#64226)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64226

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30652221

Pulled By: huiguoo

fbshipit-source-id: ef9bb0e3db2c444b476e5fc23956bc34ae0f0111

[acc_normalizer] Improve error when kwarg normalization fails (#64408)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64408

att

Test Plan: NFC

Reviewed By: protonu

Differential Revision: D30716392

fbshipit-source-id: e1c3bb1afcd5363a9d502549d8a46b90226be40c

Update breakpad to an existing commit: 7d188f6 (#64666)

Summary:
Fixes issue https://github.com/pytorch/pytorch/issues/64561

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64666

Reviewed By: driazati

Differential Revision: D30814127

Pulled By: hyuen

fbshipit-source-id: 511a30fc26153569b1cd39f34e4a1a6bb99cc5e4

To add Stochastic Gradient Descent to Documentation (#63805)

Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236.

In this PR we are adding description of Stochastic Gradient Descent to the documentation.

<img width="466" alt="SGDalgo" src="https://user-images.githubusercontent.com/73658284/132585881-b351a6d4-ece0-4825-b9c0-126d7303ed53.png">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63805

Reviewed By: albanD

Differential Revision: D30818947

Pulled By: iramazanli

fbshipit-source-id: 3812028e322c8a64f4343552b0c8c4582ea382f3

.github: Upgrade windows CUDA 10.1 -> 10.2 (#64658)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64658

We don't release 10.1 anymore so let's bump to 10.2

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS

Reviewed By: malfet, janeyx99

Differential Revision: D30811178

Pulled By: seemethere

fbshipit-source-id: c504ebf7f0d4c0d6229319d774f808b4ba0facd9

Add plugin for linalg norm operation (#64611)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64611

Add plugin for torch.linalg.norm, this plugin correctly only support norm operation without batch_size change, so vector input or matrix input with dim including '0' is not supported with this plugin.

Test Plan: Unit test

Reviewed By: 842974287

Differential Revision: D30525958

fbshipit-source-id: 0d66b60a390bb6235166e5a80390090d0acf691a

Revert D30735341: Migrate uses of THCReduceApplyUtils to cuda_utils::BlockReduce

Test Plan: revert-hammer

Differential Revision:
D30735341 (https://github.com/pytorch/pytorch/commit/a5ad08ec704a3f765814eacf5c393e871c0174e1)

Original commit changeset: 3cb58bed8f1f

fbshipit-source-id: 874dd0f93b24a99694db42a15714834069d402bc

[fx] make const fold code more pythonic (#64451)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64451

No functional change.

Test Plan:
```
buck test caffe2/test:fx_const_fold
```

Reviewed By: jfix71, RoshanPAN, houseroad

Differential Revision: D30718255

fbshipit-source-id: 95f98561c7f33fcc6c839db68683c85eb152c949

[quant] Enable jit tracing on quantizable LSTM (resubmission) (#64638)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64638

The quantizable LSTM didn't support jit tracing because it had several non taceable paths. We sacrifice some of the user experience to enable the tracing.
The main UX feature removed is a user-friendly message when trying to access the backwards path in a bidirectional LSTM: When the bidirectional flag is False, we used to throw a nice error message when the user tried accessing backwards weights. Now the message is default (removed properties).

Test Plan: `buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm`

Reviewed By: HDCharles

Differential Revision: D30803753

fbshipit-source-id: a639955a96cee22538d9436f1c952a5d121f50f9

Factor out TensorBase that doesn't depend on native operators (#63612)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63612

This makes Tensor inherit from a new class TensorBase, that provides a subset of Tensor that doesn't
directly depend on native_functions.yaml. Code that only includes TensorBase.h with thus not need to
be rebuilt every time someone changes an operator signature.

Making `Tensor` inherit from this class means that `const TensorBase&` parameters will be callable
with an ordinary `Tensor`. I've also made `Tensor` constructible and assignable from `TensorBase` to
minimize friction in code mixing the two types.

To help enforce that `Tensor.h` and `Functions.h` aren't accidentally included, I've added an error
into `Operators.h` if `TORCH_ASSERT_NO_OPERATORS` is defined. We can either set this in the build
system for certain folders, or just define it at the top of any file.

I've also included an example of manually special-casing the commonly used `contiguous` operator.
The inline function's slow path defers to `TensorBase::__dispatch_contiguous` which is defined in
`Tensor.cpp`. I've made it so `OptionalTensorRef` is constructible from `TensorBase`, so I can
materialize a `Tensor` for use in dispatch without actually increasing its refcount.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30728580

Pulled By: ezyang

fbshipit-source-id: 2cbc8eee08043382ee6904ea8e743b1286921c03

Make doc previews use its own S3 bucket (#64594)

Summary:
We had been using the gha-artifacts bucket (which previously only stored workflow artifacts) to keep the docs around. This makes it hard to see how our storage for artifacts vs docs is trending.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64594

Reviewed By: seemethere

Differential Revision: D30794328

Pulled By: driazati

fbshipit-source-id: 6b2721a3d76e8a273bde055783d56551f8409edd

TST Adds inplace checks to module_info (#63739)

Summary:
Follow up to https://github.com/pytorch/pytorch/pull/61935

This PR adds inplace checks to `test_modules`. This version checks the constructor for `inplace` and performs the check automatically.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63739

Reviewed By: saketh-are

Differential Revision: D30737774

Pulled By: jbschlosser

fbshipit-source-id: 8813534511e9296c8424d1ca878412726ddd4043

Migrate uses of THCReduceApplyUtils to cuda_utils::BlockReduce (#64442)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64442

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30735341

Pulled By: ngimel

fbshipit-source-id: 3cb58bed8f1f5aa32fd49fd37b10c8490bcc645a

.github: Run docker containers in detach mode (#64459)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64459

Should allow users to exec into the docker container if using with-ssh,
even if the build / test command has finished executing

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D30742797

Pulled By: seemethere

fbshipit-source-id: 969ed8799216c6051439c7d41ab709b2d40938ac

[NNC] Add Softplus operator (#64589)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64589

Adding softplus operator lowering for NNC. Enabling element wise fusion as well.

Test Plan: Added a test in test_jit_fuser.py

Reviewed By: bertmaher

Differential Revision: D30736449

fbshipit-source-id: 6c5fc3bceb5cef2322ecd4449f827e4af018ea93

Add `__matmul__` to the magic methods for FX tracing (#64512)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/64483

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64512

Reviewed By: mrshenli

Differential Revision: D30797265

Pulled By: Chillee

fbshipit-source-id: 7630e048a960e0b27c4309d04d85301abe325189

update scatter formula (#64546)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/63430

Already tested OpInfo gradient tests
https://github.com/pytorch/pytorch/blob/544c8e6a5d26efdf1cf679b313893fe119825930/torch/testing/_internal/common_methods_invocations.py#L8575-L8577

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64546

Reviewed By: saketh-are

Differential Revision: D30768759

Pulled By: albanD

fbshipit-source-id: 27d144971c51a956a232fc7d02df5c9d2706d565