review.tizen.org Git - platform/upstream/pytorch.git/log

[nn] TripletMarginLoss and PairwiseDistance : no batch dim (#64882)

Summary:
Reference: https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64882

Reviewed By: malfet

Differential Revision: D31055577

Pulled By: jbschlosser

fbshipit-source-id: 2f0a5a08619b672026b48a78bc7d83a6dccba0bf

correlate forward and backward op (#62553)

Summary:
Use startThreadId+seqNumber of forward-op and fwdThreadId+seqNumber of backward-op to correlate pair of them.
third_party/kineto should be updated accordingly: https://github.com/pytorch/kineto/pull/372

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62553

Reviewed By: malfet

Differential Revision: D30125728

Pulled By: gdankel

fbshipit-source-id: 9877a54392ba043d0eac56ce5b7bbf244277fa7e

[docs] Remove .data from some docs (#65358)

Summary:
Related to https://github.com/pytorch/pytorch/issues/30987. Fix the following task:

- [ ] Remove the use of `.data` in all our internal code:
- [ ] ...
- [x] `docs/source/scripts/build_activation_images.py` and `docs/source/notes/extending.rst`

In `docs/source/scripts/build_activation_images.py`, I used `nn.init` because the snippet already assumes `nn` is available (the class inherits from `nn.Module`).

cc albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65358

Reviewed By: malfet

Differential Revision: D31061790

Pulled By: albanD

fbshipit-source-id: be936c2035f0bdd49986351026fe3e932a5b4032

Adds keyword only args to gradcheck (#65290)

Summary:
Changes the call signature of gradcheck so that kwargs are kwargs only.

Also modifies return call from gradgradcheck, to reflect these changes.

Fixes https://github.com/pytorch/pytorch/issues/65165

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65290

Reviewed By: soulitzer

Differential Revision: D31061316

Pulled By: albanD

fbshipit-source-id: 3505569a33a497a8be4347bdd425bb2b8e536999

[PyTorch Edge] Backport function for defaults args with out args, flag on (#63651)

Summary:
1. Enable support for operators with default args and out args. For `torch.add(x, h, out=x)`, the number of specified arguments will be 3 instead of 4.
2. Bump bytecode version from 6 to 7
3. Implement backport_v7_to_v6 function. Also slightly refactor the local_thread to allow re-emit operators.
4. unittest to cover backport function
5. Update expect result from 4 to 3 in unit test DefaultArgsWithOutArg to cover the number of specified arguments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63651

ghstack-source-id: 138539912

Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions
```

Reviewed By: raziel, tugsbayasgalan

Differential Revision: D30454080

fbshipit-source-id: 357c50b96682430675142d20d688d1f64e1de307

[JIT] Delete obsolete message: or if you absolutely have to, use c10::impl::GenericDict(c10::impl::deprecatedUntypedDict()) (#65164)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65164

Looks like it was forgotten in https://github.com/pytorch/pytorch/pull/25439

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31072625

Pulled By: pbelevich

fbshipit-source-id: a5ffcfb0836f962ab6952a187ba7717c4d4a6e33

[JIT] Support device as Dict key (#65079)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65079

This is required to use RPC DeviceMap aka Dict[torch.device, torch.device] in torchscript

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31072626

Pulled By: pbelevich

fbshipit-source-id: 51cfa5653db86de73b624e9157d68d1b319bfc64

Reduce PyTorch warnings: Cast fix xplat/caffe2/aten/src/ATen/core/DeprecatedTypeProperties.h (#65031)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65031

Test Plan:
```
buck build --show-output //caffe2/torch/fb/sparsenn:sparsenn_operators

buck test caffe2/torch/fb/sparsenn:test
```

Reviewed By: r-barnes

Differential Revision: D30948791

fbshipit-source-id: 13046e1d0ce2c24864ad38f318ca5e34b1bb9552

Basic implementation of ShardedLinear using ShardedTensor. (#64128)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64128

This PR implements a sharded nn.Linear layer using ShardedTensors with
the following limitations:

1) Works only for ChunkShardingSpec.
2) Implementation is only aimed to demonstrate functionality and is most likely
not performant at all.

The PR also introduces a `shard_parameter` API to easily shard parameters of
`nn.Modules`. This also has the following limitations:

1) Works only for ChunkShardingSpec.
2) Is not performant since it uses broadcast instead of scatter since
ProcessGroupNCCL doesn't yet support scatter.

Overall user API for running a sharded linear would be something like this:

```
# SPMD programming paradigm running same code on all nodes.
fc = nn.Linear(10, 10)

# Setup sharding.
sharding_spec=ChunkShardingSpec(...)
shard_parameter(fc, 'weight', sharding_spec, src_rank=0)

# Run as a normal linear layer.
inp = torch.rand(10, 10)
output = fc(inp)
```
ghstack-source-id: 138500985

Test Plan:
1) unit tests.
2) waitforbuildbot

Reviewed By: wanchaol, bowangbj

Differential Revision: D30621215

fbshipit-source-id: 1aa7478568c18a4572f6c3462fdf24a4cbde01d6

Track peak memory usage (#65157)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65157

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31029049

Pulled By: driazati

fbshipit-source-id: 3e87e94e4872d118ad191aef2b77b8cefe90aeb6

Fix logic to determine master vs PR (#65155)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65155

This was bugged before on empty strings which caused the hook to write on any job, not just `master` regardless of the `only_on_master` flag.

Test Plan: see `[scribe] Skipping RDS write on PR` in the logs for `linux-xenial-cuda11.3-py3.6-gcc7`

Reviewed By: malfet

Differential Revision: D31029048

Pulled By: driazati

fbshipit-source-id: 77c4a60e443d8fc19990755a3a346576afee86d8

[quant] Add fp32/fp16 zero_point support for CPU fakeQuant (#65055)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65055

Test Plan: Imported from OSS

Reviewed By: jingsh, supriyar

Differential Revision: D30975238

Pulled By: b-koopman

fbshipit-source-id: 2000660ffe71cb85d00fdabaf8fc3ba7323f9a1e

[PyPer] copy-free freeze_module (#65118)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65118

Cloning the module can increase memory use. By freezing the module directly without cloning it first, we can avoid this memory usage increase.

Reviewed By: eellison, movefast1990

Differential Revision: D30955053

fbshipit-source-id: 2feb738eddcf66aa68c92bf695cc05b57bd990f0

Reduce PyTorch warnings: Cast fix xplat/caffe2/c10/core/TensorOptions.h (#65030)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65030

Test Plan:
```
buck build --show-output //caffe2/torch/fb/sparsenn:sparsenn_operators

buck test caffe2/torch/fb/sparsenn:test
```

Reviewed By: r-barnes

Differential Revision: D30948721

fbshipit-source-id: 16fe42daab35709c56a4d3ccc276ea635a3510c1

[iOS] Zero out NSError to avoid heap corruptions for the OSS builds (#65355)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65355

I've been seeing heap corruptions in the CMake builds due to the NSError* not being initialized with `nil`. However, I haven't see this issue for the BUCK builds.
ghstack-source-id: 138502708

Test Plan:
1. Test the OSS builds to make sure the heap corruption has gone.
2. Test the Buck build in the playground app
3. Circle CI

Reviewed By: hanton

Differential Revision: D31048010

fbshipit-source-id: cfd8d614f3f91f09caee4aab61237007ec080481

Add crow_/col_indices to view types (#63176)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/61103

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63176

Reviewed By: malfet, albanD

Differential Revision: D30315882

Pulled By: cpuhrsch

fbshipit-source-id: eedae5265a757ed68fd69e4f9d07070b05de4bd8

Creating a helper function to generate an unique name for an attr in a module (#64970)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64970

Add a helper function to create an unique name for an attr.
This can be used when we want to add a weight to a module.

Test Plan: run CI.

Reviewed By: jfix71

Differential Revision: D30921497

fbshipit-source-id: 598569d107df8b516ff12920a4bef3a42577e987

Add support to lower acc_ops.transpose (#65036)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65036

Reviewed By: jfix71, 842974287

Differential Revision: D30934503

fbshipit-source-id: 51880d3d36492f5206f77c9d1a994d8532597b62

[fx] give warning instead of fatal the program when submod not found during adding get_attr (#65225)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65225

Currently when create get_attr node, if the attribute is in a submodule, we'll fist find the submodule. If the submodule isn't in the owning module we throw an exception.

However, if the attribute can't be found, we give a warning but still allow to create the get_attr node. To align with this behavior, we change the reaction when submod not found from fatal to giving a warning.

Test Plan: CI

Reviewed By: jamesr66a, jfix71

Differential Revision: D31021535

fbshipit-source-id: 4c0b471448c09cc927d0f47b5bf56594f25a8863

Remove @balioglu from PyTorch Distributed code owners (#65239)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65239

Due to too much noise caused by the GitHub notifications, going forward I prefer to track PRs manually.
ghstack-source-id: 138386041

Test Plan: N/A

Reviewed By: mrshenli

Differential Revision: D31027792

fbshipit-source-id: 6578e41d4ab53ad2c64a41584716f4903298cd6b

[CUDA graphs] Beta, not prototype (#65247)

Summary:
Powers have decided this API should be listed as beta.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65247

Reviewed By: malfet

Differential Revision: D31057940

Pulled By: ngimel

fbshipit-source-id: 137b63cbd2c7409fecdc161a22135619bfc96bfa

Fix full backward hook when grad is disabled (#65335)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/59901. See discussion in the issue.

cc albanD soulitzer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65335

Reviewed By: malfet

Differential Revision: D31055865

Pulled By: albanD

fbshipit-source-id: 53605df62bc73c99d8908248087ab400b81ac495

Fix unassigned ciflow trigger (#65354)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/65250#issuecomment-923120764

this is a limitation of github action triggers, it's hard to introduce condition before the workflow, that's why we intentionally pick the rare event ("unassigned"). The fix I think for people didn't opt-in ciflow and manually unassign, is to run all the CI (otherwise we introduce new condition on this and not worth to make things even complex).

`unassigned` event payload looks like this, just to make sure `github.event.assignee.login` is pointing to the right location.

```
  {
    "action": "unassigned",
    "assignee": {
      "avatar_url": "https://avatars.githubusercontent.com/u/658840?v=4",
      "events_url": "https://api.github.com/users/zhouzhuojie/events{/privacy}",
      "followers_url": "https://api.github.com/users/zhouzhuojie/followers",
      "following_url": "https://api.github.com/users/zhouzhuojie/following{/other_user}",
      "gists_url": "https://api.github.com/users/zhouzhuojie/gists{/gist_id}",
      "gravatar_id": "",
      "html_url": "https://github.com/zhouzhuojie",
      "id": 658840,
      "login": "zhouzhuojie",
      "node_id": "MDQ6VXNlcjY1ODg0MA==",
      "organizations_url": "https://api.github.com/users/zhouzhuojie/orgs",
      "received_events_url": "https://api.github.com/users/zhouzhuojie/received_events",
      "repos_url": "https://api.github.com/users/zhouzhuojie/repos",
      "site_admin": false,
      "starred_url": "https://api.github.com/users/zhouzhuojie/starred{/owner}{/repo}",
      "subscriptions_url": "https://api.github.com/users/zhouzhuojie/subscriptions",
      "type": "User",
      "url": "https://api.github.com/users/zhouzhuojie"
    },
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65354

Reviewed By: malfet, seemethere, janeyx99

Differential Revision: D31060212

Pulled By: zhouzhuojie

fbshipit-source-id: ce815cc96e8a00016646d6f02f0917169fa652dc

fix typo missing f string (#65226)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65226

Reviewed By: malfet

Differential Revision: D31055793

Pulled By: albanD

fbshipit-source-id: fafac53e75223c4f599bd2162095aacad7b690df

[iOS] Fix the TestApp (#65319)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65319

Test Plan: Imported from OSS

Reviewed By: hanton

Differential Revision: D31049543

Pulled By: xta0

fbshipit-source-id: ff0d0baac30682c63b2a28254ee0a5d8d9b8ca6f

[Pipe] Add a `WithDevice` wrapper to specify device execution for a module. (#65190)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65190

As described in https://github.com/pytorch/pytorch/issues/65093, there
could be modules which don't have any parameters/buffers. In this case, Pipe
determines that the module should be executed on CPU. However this might result
in unnecessary GPU to CPU transfers whereas the user expected the module to be
executed on the GPU itself by keeping its inputs and outputs on GPU.

For this use case, we introduce a `WithDevice` wrapper which can be used to
override which device a particular module should be executed on as part of the
pipeline.

#Closes: https://github.com/pytorch/pytorch/issues/65093
ghstack-source-id: 138376272

Test Plan:
1) waitforbuildbot
2) unit tests

Reviewed By: SciPioneer

Differential Revision: D31010027

fbshipit-source-id: 4c1c61d3c6feeef341e002e5f7e83dd33ff3a516

Torchhub: More robust assumption regarding main or master branch (#64364)

Summary:
Closes https://github.com/pytorch/pytorch/issues/63753

This PR changes the assumption regarding the default branch of a repo to the following:

> If main exist then use main,otherwise use master

This will make torchhub more robust w.r.t. to the ongoing changes where repo use `main` instead of `master` as the development / default branch.

cc nairbv NicolasHug

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64364

Reviewed By: saketh-are

Differential Revision: D30731551

Pulled By: NicolasHug

fbshipit-source-id: 7232a30e956dcccca21933a29de5eddd711aa99b

[Static Runtime] Implement and enable variadic tuple unpack (#64934)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64934

Add a new op `static_runtime::VarTupleUnpack` and a graph pass transforming graph sequences from:
```
%0, %1 = prim::TupleUnpack(%a)
%2, %3 = prim::TupleUnpack(%b)
```
into:
```
%0, %1, %2, %3 = static_runtime::VarTupleUnpack(%a, %b)
```

The pass is only applied to contiguous blocks of `TupleUnpack` nodes. This is the most straightforward way to guarantee correctness, and it is sufficient for the models we care about.

Test Plan: New unit tests: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- VarTupleUnpack`

Reviewed By: d1jang

Differential Revision: D30872109

fbshipit-source-id: 1ed4a7e201c532da28f703a3a50241c392a6c7e9

[quant][fx][graphmode] Fix a bug for sub (#65109)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65109

Previously for sub we set the dtype for sub with qconfig since it's matched with a QuantizeHandler,
however this is incorrect, the dtype for sub is decided by whether the output is quantized or not,
so we added a check of is_output_quantized while deciding the dtype for the output of sub.

Later: is_output_quantized now depends on is_reference, which is pretty confusing and it may cause problems down the road, we should remove this dependency in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_sub_scalar

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30977826

fbshipit-source-id: 551fd63bd61b43b3c3415944ff73174e3a21cc8a

Revert "Revert D30558877: Ported std/var to ReductionOpInfo (#65262)

Summary:
Reland of https://github.com/pytorch/pytorch/issues/63978

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65262

Reviewed By: mruberry

Differential Revision: D31037360

Pulled By: ngimel

fbshipit-source-id: 1c60f40c547229767cba3bbe7e11ca0fbbc8f95f

simplify `torch.meshgrid`'s shape computation (#62905)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62905

Reviewed By: mruberry

Differential Revision: D31021274

Pulled By: dagitses

fbshipit-source-id: c219389bdc543e9592f7b1c707acfbf752ee6f34

[DataPipe] Unlimited buffer for Forker and Demultiplexer (#64994)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64994

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D30934362

Pulled By: ejguan

fbshipit-source-id: d3b774d7e28c0b9659e999511e5a68c3929857d4

Automated submodule update: FBGEMM (#64640)

Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: https://github.com/pytorch/FBGEMM/commit/d1ecc7dbe28d06cec742b06d541d5f96faf940fc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64640

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: jspark1105

Differential Revision: D30805660

fbshipit-source-id: 9f783862e89fe3974badd5194ef793db55e7d275

[quant][fx2trt] Generate engine graph for explicit quant/implicit quant and fp16 graph (#65289)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65289

Turn on VERBOSE logging and use engine visualizer to generate the graph.

Runtime:
```
explicit quant result diff max tensor(0.0771)
implicit quant result diff max tensor(0.1909)
trt fp16 time (ms/iter) 1.0740923881530762
trt int8 time (ms/iter) 0.5288887023925781
trt implicit int8 time (ms/iter) 0.6334662437438965
PyTorch time (CUDA) (ms/iter) 4.448361396789551
PyTorch time (CPU) (ms/iter) 45.13296604156494
```

Generated Graphs:
```
explicit int8: https://www.internalfb.com/intern/graphviz/?paste=P458669571
implicit int8: https://www.internalfb.com/intern/graphviz/?paste=P458669656
fp16: https://www.internalfb.com/intern/graphviz/?paste=P458669708
```

Test Plan:
```
buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test 2>log
buck run //deeplearning/trt/fx2trt/tools:engine_layer_visualize -- --log_file log
```

Reviewed By: 842974287

Differential Revision: D30955035

fbshipit-source-id: 24949458ad9823fb026d56d78a6ee1c6874b6034

[Static Runtime] Add perf metrics for number of managed tensors & unmanaged values (#64992)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64992

This change lets Static Runtime print out number of managed tensors & unmanaged values as performance metrics during profile runs.

We will use /enhance these metrics to guide the effort of managing output tensors.

Test Plan:
Confirmed that a profile run prints out the added metric values on inline_cvr nets:
```
(inline_cvr/local)
...
Total number of managed tensors: 2754
Total number of unmanaged values: 3240
...
(inline_cvr/local_ro)
Total number of managed tensors: 1554
Total number of unmanaged values: 2966
...
(inline_cvr/remote_ro)
Total number of managed tensors: 1439
Total number of unmanaged values: 28
...
```

Reviewed By: hlu1

Differential Revision: D30926617

fbshipit-source-id: b86e071003ac941b9663db103eaa7c614466b4e0

Remove incorrect stride assert in Reduce.cuh (#65227)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/37583

Per discussion with ngimel, the condition asserted here may not always hold after TensorIterator's dimension coalescing and reordering. However, the reduction output should still be correct when `sub_iter.strides(0)[0]` is non-zero.

I've verified correctness empirically by:
1. Lowering the threshold ([configured here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/TensorIterator.cpp#L1127)) at which iterators are split into sub-iterators, making it easier to trigger.
2. Generating many tensors with random dimensions and randint elements which produce a non-zero `sub_iter.strides(0)[0]` in the CUDA kernel.
3. Verifying that the reduction `t.sum(dim=0)` produces the same results for those tensors on CPU and on CUDA.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65227

Reviewed By: ngimel

Differential Revision: D31031406

Pulled By: saketh-are

fbshipit-source-id: 5cbf2001224454c74f6db42455c507365ad1f2b1

support using gradients named for outputs in derivatives (#63947)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63947

Fixes #62196

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30541485

Pulled By: dagitses

fbshipit-source-id: ea1dd0edd1a51936a295631e52b85e9c022a9c87

clarify implementation of check_grad_usage (#64439)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64439

1) remove unused fully_implemented
2) rename used_grad to uses_grad and make it a boolean
3) rename used_grads to num_grads_uses
4) add comments explaining what some of the checks mean

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30733904

Pulled By: dagitses

fbshipit-source-id: dccbbef8a4be8713215ef91aa97a34124f06a7a1

[quant][fx2trt] Enable comparison with implicit quant mode (#65043)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65043

Currently got following result, will take a look at the executed graph again:
```
trt fp16 time (ms/iter) 1.0483217239379883
trt int8 time (ms/iter) 0.5329632759094238
trt implicit int8 time (ms/iter) 0.6769704818725586
PyTorch time (ms/iter) 6.453146934509277
```

Test Plan:
```
python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py
```

Imported from OSS

Reviewed By: 842974287

Differential Revision: D30954871

fbshipit-source-id: 8d7ff82b8f5d0b7946fbd38a7cddede7d40b28aa

[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK`

Reviewed By: zertosh

Differential Revision: D31039372

fbshipit-source-id: a5e54a9b1d2ef97e9bc206b9e2a82124e5a22a7a

Remove 9.2 related macros for CONSTEXPR (#65066)

Summary:
Removes C10_HOST_CONSTEXPR_EXCEPT_CUDA92 references in the code

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65066

Reviewed By: driazati

Differential Revision: D31022520

Pulled By: janeyx99

fbshipit-source-id: f02cdc6caba5b48405575242921f5845ff18f729

Make github.com in noproxy list (#65256)

Summary:
Fixes #{issue number}

Attempt to solve some ratelimiting issue we saw from calling GitHub apis

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65256

Reviewed By: seemethere

Differential Revision: D31035115

Pulled By: zhouzhuojie

fbshipit-source-id: 7efd5d5af7d91805e4bf27b86847791e991b741e

remove utils.cpp (#65184)

Summary:
Dead code

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65184

Reviewed By: mruberry

Differential Revision: D31031777

Pulled By: ngimel

fbshipit-source-id: 13633888229a7af8cfd8ea7e55ea2880b2e47273

[fx const fold] fix a case when some inputs are unused (#65223)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65223

If there're unused inputs, they won't appear in `submod_1`. We need to add all the unused inputs so that the model after const fold has the same inputs as the original model.

Reviewed By: jfix71

Differential Revision: D31021217

fbshipit-source-id: b7452c90d133b747e0699936a81d3fee14af9cc9

[Profiler] Update kineto submodule (#65236)

Summary:
Update to latest kineto revision. See Kineto repo for change log.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65236

Reviewed By: malfet

Differential Revision: D31031638

Pulled By: gdankel

fbshipit-source-id: 681655b2e8e151895afa91445ced0fd57a11fa93

[fx2trt] re-enable profiler and some miscs for TRTModule (#65072)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65072

Previously disabled attaching trt profiler to exec context in TRTModule because https://fburl.com/mc33n880 states that `enqueue()` doesn't support profiling. Seems to be a lie though. Re-enable attaching profiler in this diff.

Also added a bunch of checks for dtype and shape, and fixed saving state_dict and loading back.

Test Plan: buck run mode/opt -c python.package_style=inplace -j 40 deeplearning/trt/fx2trt:acc2trt_test

Reviewed By: yinghai

Differential Revision: D30962757

fbshipit-source-id: 9c664b0500a8169b7952f6f912239a5a05772aea

[package] Make it possible to re-save a PackageImporter module (#65101)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65101

As title. Previously this was guarded against for implementation
simplicity, as we didn't really think there was a use case for saving a
mangled module name directly.

But people started doing stuff like:
```
exporter.save_module(my_imported_obj.__module__)
```
which implicitly passes along the mangled module name.

This PR makes it so that given `PackageImporter` instance can always
import modules that it created, and changes `PackageExporter` to
properly demangle the resulting module name when writing the package to
the export archive.

Differential Revision:
D30975712
D30975712

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: d9e849bf651713890e72dccdcef74fa52d377149

[FX] Fix tracing of bitwise and/or (#65196)

Summary:
Previously resulted in `AttributeError: module 'operator' has no attribute 'and'`

and/or are python keywords, so they are renamed to `operator.and_` and `operator.or_`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65196

Reviewed By: Chillee

Differential Revision: D31020336

Pulled By: jansel

fbshipit-source-id: 51d888151fe78c0c1197ecaf161976b219c59694

Revert D30731191: [pytorch][PR] Torchhub: rewrite commit hash check to avoid using unnecessary GitHub API credits

Test Plan: revert-hammer

Differential Revision:
D30731191 (https://github.com/pytorch/pytorch/commit/f9bf144a0c5e3627f5fafb256cebf1f02152ab0c)

Original commit changeset: d1ee7c2ef259

fbshipit-source-id: 5c7207f66c5354ce7b9ac2594e4f5b8307619b0c

[ONNX] Deprecate enable_onnx_checker argument in torch.onnx.export() (#61708) (#64369)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64369

As of now, the "enable_onnx_checker" parameter was described as below:

enable_onnx_checker (bool, default True): If True the ONNX model checker will be run to ensure the exported model is a valid ONNX model.

An invalid ONNX graph is useless to users so such checker should be done for each call.

In this PR, we will still write the model to an ONNX file even it is invalid. And the exception will be thrown after the ONNX file has been created. This enables user output an invalid ONNX graph for debug.

This PR will still keep it in torch.onnx.export() function for backward support while all backend logic has been changed to work as enable_onnx_checker is set to True.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905267

Pulled By: malfet

fbshipit-source-id: 3ad3f68e77fcec012cc7ef674cc9a61755eebc9e

Co-authored-by: fatcat-z <zhang-ji@outlook.com>

[Static Runtime] Move MemoryPlanner out into memory_planner.cpp (#65123)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65123

This change re-reverts D30883290 (https://github.com/pytorch/pytorch/commit/0e11454d19e106ba6d5819c1147ca540cbce2943). D30883290 (https://github.com/pytorch/pytorch/commit/0e11454d19e106ba6d5819c1147ca540cbce2943) broke the OSS build since the change in this change implicitly removed the default move constructor of `StaticRuntime`.

```
ep 15 15:39:57 /var/lib/jenkins/workspace/benchmarks/static_runtime/deep_wide_pt_bench.cc:95:10: error: call to implicitly-deleted copy constructor of 'torch::jit::StaticRuntime'
Sep 15 15:39:57   return torch::jit::StaticRuntime(*smod);
Sep 15 15:39:57          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sep 15 15:39:57 /var/lib/jenkins/workspace/torch/csrc/jit/runtime/static/impl.h:321:34: note: copy constructor of 'StaticRuntime' is implicitly deleted because field 'planner_' has a deleted copy constructor
Sep 15 15:39:57   std::unique_ptr<MemoryPlanner> planner_;
Sep 15 15:39:57                                  ^
Sep 15 15:39:57 /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/unique_ptr.h:356:7: note: 'unique_ptr' has been explicitly marked deleted here
Sep 15 15:39:57       unique_ptr(const unique_ptr&) = delete;
Sep 15 15:39:57       ^
Sep 15 15:39:57 /var/lib/jenkins/workspace/benchmarks/static_runtime/deep_wide_pt_bench.cc:99:9: error: call to implicitly-deleted copy constructor of 'torch::jit::StaticRuntime'
Sep 15 15:39:57    auto sr = getStaticRuntime();
Sep 15 15:39:57         ^    ~~~~~~~~~~~~~~~~~~
Sep 15 15:39:57 /var/lib/jenkins/workspace/torch/csrc/jit/runtime/static/impl.h:321:34: note: copy constructor of 'StaticRuntime' is implicitly deleted because field 'planner_' has a deleted copy constructor
Sep 15 15:39:57   std::unique_ptr<MemoryPlanner> planner_;
Sep 15 15:39:57                                  ^
Sep 15 15:39:57 /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/unique_ptr.h:356:7: note: 'unique_ptr' has been explicitly marked deleted here
Sep 15 15:39:57       unique_ptr(const unique_ptr&) = delete;
Sep 15 15:39:57       ^
Sep 15 15:39:57 2 errors generated.
```

This change fixes the issue by explicitly defining the default move constructor (courtesy of mikeiovine).

Original Summary:

This change moves `MemoryPlanner` out of impl.cpp into memory_planner.cpp.

`MemoryPlanner` performs an independent sub-task of static analysis of a graph, and creating memory planning, and allocating/deallocating managed Tensors.

This change will reduce merge conflicts as I work on MemoryPlanner more actively for output Tensor support.

Test Plan: - Confirm that OSS build went well (See External Tests section).

Reviewed By: mikeiovine

Differential Revision: D30983292

fbshipit-source-id: a59f407fa1123527824157268111144a1bf58116

[PyTorch] Extract parseOperator() into a standalone source file (#65179)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65179

This is following up this PR: https://github.com/pytorch/pytorch/pull/61862. The purpose is to modularize operator parsing so that it can be used as needed without pulling the whole `import.cpp` into build.

Test Plan: Added a unit test in `test_lite_predictor.cpp` called `ParseOperators`, similar to `ParseBytecode`.

Reviewed By: iseeyuan

Differential Revision: D31006555

fbshipit-source-id: c38e221800af4cf72963a353c452c5437f56a0ac

[PyTorch] Improve OperatorEntry::getKernelForDispatchKey (#64838)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64838

The returned pointer, if present, could never be nullptr, so there is no reason to wrap it in an optional rather than just using the nullptr state. The repeated calls to kernels_.at() were not getting optimized away, so just use the perfectly good iterator find() already gave us.
ghstack-source-id: 138304429

Test Plan: CI

Reviewed By: bdhirsh

Differential Revision: D30875748

fbshipit-source-id: 9cbb875715b7a582380c7402155fdbe21944dc85

avoid moving Argument in infer_schema (#64822)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64822

Turns out the suppressed lint message was trying to tell us something: we can construct our Argument in-place rather than create a temporary and move into the argument vector.
ghstack-source-id: 138304423

Test Plan: CI, profile op registration and observe reduced Argument move ctor and dtor costs

Reviewed By: smessmer

Differential Revision: D30860718

fbshipit-source-id: c8da45ab7e61b5df9fa1273301896309bca108b5

[PyTorch] Fix missing move in Argument ctor (#64821)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64821

Not moving adds excess refcounting overhead.
ghstack-source-id: 138304432

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D30860720

fbshipit-source-id: de695e5cdfb1fa314b53a8bcb291343ae4eb87a5

[PyTorch] shrink Argument (#64820)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64820

Putting boolean fields next to each other avoids wasting space for padding.
ghstack-source-id: 138304433

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D30860717

fbshipit-source-id: ad45c37574a7c857958978aad42fd1333c6b29ee

[PyTorch] Compare pointers before calling expensive Type comparison (#64784)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64784

See code comment for explanation.
ghstack-source-id: 138304431

Test Plan: Reduced overhead in findSchemaDifferences while profiling registration at startup in a case where I forced duplicates to be registered (by looping in RegisterDispatchKey.cpp).

Reviewed By: dhruvbird

Differential Revision: D30854036

fbshipit-source-id: 568733c3cf449697cdeb74cf57fed0926729fa68

CI: Consolidate Build and Test naming for better stats collection (#65232)

Summary:
All build pytorch steps should now be named "Build" and test steps named "Test" for workflows that test PyTorch on Linux and Windows.

I left the binary stuff alone as that build is different.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65232

Reviewed By: driazati, seemethere

Differential Revision: D31024232

Pulled By: janeyx99

fbshipit-source-id: 24b1a1e2b1b25aba70b7adc41603ec8fa4ce7dd6

Back out "Revert D30745960: [DDP] Remove SPMD from self.modules_buffers" (#64778)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64778

Original commit changeset: d3f3fb813c45
ghstack-source-id: 138326910

Test Plan: ci

Reviewed By: H-Huang

Differential Revision: D30849443

fbshipit-source-id: 15dab8a959a29d2e2fefac6ad52b8d8168eacc02

Back out "Revert D30745961: [DDP] Remove self.modules_params" (#64777)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64777

Original commit changeset: 59f7cc50d369
ghstack-source-id: 138326909

Test Plan: ci

Reviewed By: H-Huang

Differential Revision: D30849442

fbshipit-source-id: bb87ba83935374d8a3ebbc29365df1417dd4f26f

Back out "Revert D30745921: [DDP] Fix when buffers are reassigned in module" (#64776)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64776

Original commit changeset: 343ead86bf1e
ghstack-source-id: 138326914

Test Plan: ci

Reviewed By: H-Huang

Differential Revision: D30849444

fbshipit-source-id: 9a72805416fe7d6c68e51bdcdb88f6e1fecb614d

[xplat][pytorch]: Disabling too many logging. (#65170)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65170

Disabling too many logging. These are per frame logging
and outputting lots of logs in Skylight command line.

Test Plan:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
cd -
```

Reviewed By: SS-JIA

Differential Revision: D30778852

fbshipit-source-id: bcf75ec417dfe3e9ce3df92a1894352772bd663d

delegate parallelism to Ninja when possible (#64733)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64733

The previous implementation was wrong when CPU scheduling affinity is
set. In fact, it is still wrong if Ninja is not being used.

When there is CPU scheduling affinity set, the number of processors
available on the system likely exceeds the number of processors that
are usable to the build. We ought to use
`len(os.sched_getaffinity(0))` to determine the effective parallelism.

This change is more minimal and instead just delegates to Ninja (which
handles this correctly) when it is used.

Test Plan:
I verified this worked as correctly using Ninja on a 96-core machine
with 24 cores available for scheduling by checking:
* the cmake command did not specify "-j"
* the number of top-level jobs in top/pstree never exceeded 26 (24 +
2)

And I verified we get the legacy behavior by specifying USE_NINJA=0 on
the build.

Reviewed By: jbschlosser, driazati

Differential Revision: D30968796

Pulled By: dagitses

fbshipit-source-id: 29547dd378fea793957bcc2f7d52d5def1ecace2

add test for number of jobs when building (#65162)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65162

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30998006

Pulled By: dagitses

fbshipit-source-id: 8b8d45668acf0e6c0f16df0f705a1af8c6d4f22d

Remove CUDA 9.2 references conditionals and workarounds (#65070)

Summary:
Title says it all

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65070

Reviewed By: malfet

Differential Revision: D30966464

Pulled By: janeyx99

fbshipit-source-id: e454906fd5d7d321d390939ba5d237e1d9b150f8

fix torch.distributed.elastic event docs (#64974)

Summary:
the example code wasn't working for me.

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang cbalioglu gcramer23

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64974

Reviewed By: kiukchung, cbalioglu

Differential Revision: D30926481

Pulled By: edward-io

fbshipit-source-id: f5e32cc2b948b6ee30d84a8247856f39fc786f67

[nnc] Updated inlining to handle cases when producer indices are constants after eval (#65044)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65044

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30954655

Pulled By: navahgar

fbshipit-source-id: dfaedb5af710b2625ceec3a443a6c4e34158ab16

[nnc] Updated inliner to remove assertions and exception (#64719)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64719

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30828583

Pulled By: navahgar

fbshipit-source-id: 9826a59085a210e44d101a843ff2cae440dfd633

[ONNX] Do not use `numpy` in ONNX opsets (#65188)

Summary:
Replace `torch.tensor([numpy.arange(a, b, c)])` with `torch.arange(a, b, c).unsqueeze(0)`
Replace `tuple(numpy.add(a, b))` with `tuple( x + y for (x, y) in zip(a, b)`

As `numpy` is an optional dependency, it shouldn't be used in PyTorch core by default

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65188

Reviewed By: mruberry

Differential Revision: D31009490

Pulled By: malfet

fbshipit-source-id: 528e48f055bf9ac1de1fd7e94c0be41915df9a0b

[CoreML][OSS] Include Core ML in iOS/MacOS nightlies (#65075)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65075

Need to drop one line at - https://github.com/pytorch/builder/blob/master/conda/pytorch-nightly/meta.yaml#L65
ghstack-source-id: 138324213

Test Plan:
- Check the iOS nightly builds
- `pod install LibTorch-Lite-Nightly`

Reviewed By: hanton

Differential Revision: D30912269

fbshipit-source-id: b07679b75ecf89beae2975c37cf17d2449a3304f

add a test case for const fold (#65224)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65224

Add a test case for the fix D30996277 (https://github.com/pytorch/pytorch/commit/8c38d141df429459ea6891847950ce157ac82b2c).

Test Plan: buck test mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=v100,a100 -c fbcode.enable_gpu_sections=true -j 40 caffe2/test:fx_const_fold -- test_const_fold_module_attr

Reviewed By: jfix71

Differential Revision: D31000386

fbshipit-source-id: f444361839decc583bf93ac946cfe2049376719e

[PyTorchEdge] promote prim ops by using ops table for mobile runtime (#64816)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64816

## Context:
Promoting prim ops:
Certain prim ops are frequent than others (like tupleIndex, raiseException, ...). These ops are frequent that they are chosen to be promoted as first class instructions. To promote it requires multiple steps and support from TS team as it changes how the bytecode is serialized and deserialized. So to prevent multiple bytecode version bumps and provided stability while these changes happen, an iterim iterative process is proposed which uses a table to lookup for "promoted" op's function. This allows us to rapidly update the ops list and test on production model without having to change the bytecode. In case of failure, we can quickly revert this change.

## Observation
The ops are chosen based on the notebook N1135657 which examines the top frequent ops.

## Fix
An iterim solution of having a static table, which when given a prim op name returns a function to be applied on the stack. This helps us check in `function.cpp` to get the "promoted" op. As a fall back, the "promoted" op still resides in `register_prim_ops.cpp` so that the function of prim op is never missed.

ghstack-source-id: 138261338

Test Plan:
```
[pavithran@67109.od ~/fbsource/fbcode (eddab7da6)]$ buck test caffe2/test/cpp/jit:jit -- BackendTest.TestComposite
Building: finished in 5.4 sec (100%) 7284/7284 jobs, 0/7284 updated
  Total time: 5.8 sec
More details at https://www.internalfb.com/intern/buck/build/480191aa-a1ba-42ca-99e9-ee4bf2b06d65
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 867382eb-327f-43d7-a45c-875b7f484b15
Trace available for this run at /tmp/tpx-20210914-100224.283682/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/844425134506115
    ✓ ListingSuccess: caffe2/test/cpp/jit:jit - main (12.159)
    ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestCompositeWithSetStates (0.797)
    ✓ Pass: caffe2/test/cpp/jit:jit - BackendTest.TestComposite (0.779)
Summary
  Pass: 2
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/844425134506115
```

{F663491347}

Reviewed By: iseeyuan

Differential Revision: D30819926

fbshipit-source-id: 4cbe05d5761bdc9d62ef08e18172dcf64cb49526

Revert D30993855: [pytorch][PR] OpInfo: nn.functional.conv2d

Test Plan: revert-hammer

Differential Revision:
D30993855 (https://github.com/pytorch/pytorch/commit/873255c6d95342d144e9d1b633c16410844b934e)

Original commit changeset: 7402f99addb4

fbshipit-source-id: b0539daa195dc6a3739bce5c264cb2177b7721ff

[CoreML][OSS] Integrate with CMake (#64523)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64523

- Build Pytorch with CoreML delegate - ` USE_PYTORCH_METAL=ON python setup.py install --cmake`
- Build iOS static libs - `IOS_PLATFORM=SIMULATOR USE_COREML_DELEGATE=1 ./scripts/build_ios.sh`
ghstack-source-id: 138324216

Test Plan:
- Test the Helloword example

{F657778559}

Reviewed By: iseeyuan

Differential Revision: D30594041

fbshipit-source-id: 8cece0b2d4b3ef82d3ef4da8c1054919148beb16

[Reland] [Model Averaging] Simplify PostLocalSGD Optimizer API (#65197)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65197

1. The constructor accepts a local optimizer instance instead of the inputs of local optimizer constructor and the class type.
2. The parameters are read from local optimizer's param_groups instead of a separate input.

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 138307226

Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity

Reviewed By: rohan-varma

Differential Revision: D31007439

fbshipit-source-id: bbb0526e6763ef76775b85088571506b3942c722

Bf16 matmul (#64619)

Summary:
Re-create PR to fix https://github.com/pytorch/pytorch/pull/61891.

Drop the support for addbmm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64619

Reviewed By: jbschlosser

Differential Revision: D30902995

Pulled By: VitalyFedyunin

fbshipit-source-id: dc318d73adff8f6974c9752d0d097e69276f8206

Torchhub: rewrite commit hash check to avoid using unnecessary GitHub API credits (#64362)

Summary:
This PR adds more detailed error messages to torchhub if the commit hash validation goes wrong, providing suggestions to the users on how to resolve the issue.

It also documents why such validation is important.

EDIT: it also avoids validatating some stuff when we know "stuff" isn't a commit since there's no risk in this case

CC malfet mthrok

cc nairbv NicolasHug

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64362

Reviewed By: gchanan, malfet

Differential Revision: D30731191

Pulled By: NicolasHug

fbshipit-source-id: d1ee7c2ef2591dd7a5291977af1635ada2552d1b

[FX] Ensure BC coverage for all of torch.fx.passes (#65081)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65081

Test Plan: Imported from OSS

Reviewed By: jbschlosser, khabinov

Differential Revision: D30967428

Pulled By: jamesr66a

fbshipit-source-id: 2ff83da728dc469f086cf504e71b43396db612d8

[FX] Move graph_manipulation and param_fetch out of experimental and into passes (#65183)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65183

ghstack-source-id: 138309655

Test Plan: waitforsadcastle

Reviewed By: protonu

Differential Revision: D31007630

fbshipit-source-id: 77d14b284737aabbe2b9e6394177a0c2e40aafba

[fx2trt] make gpu trace better (#65168)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65168

Add record_function to TRTModule and EngineHolder so each parts would appear on gpu trace.

Test Plan: CI

Reviewed By: wushirong

Differential Revision: D30997968

fbshipit-source-id: b90662f20a8c0d321846c222f3e8c8eb7e010eba

[CoreML][iOS/MacOS] Add the CoreML executor (#64522)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64522

The `PTMCoreMLExecutor` serves as a bridge between the delegate APIs and Core ML runtime.
ghstack-source-id: 138324217

allow-large-files

Test Plan:
iOS:
Run the CoreML tests in the playground app

MacOS:

```
buck test pp-macos

PASS 633ms 1 Passed 0 Skipped 0 Failed CoreMLTests
```

{F657776101}

Reviewed By: raziel, iseeyuan

Differential Revision: D30594042

fbshipit-source-id: a42a5307a24c2f364333829f3a84f7b9a51e1b3e

Allow extra unused arguments in symbolic shape function (#65095)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65095

The reason I didn't do this initially was because I was worried that matching one schema to another schema with an extra argument might change semantics, e.g. Add(Tensor, Tensor) to Add(Tensor, Tensor, Tensor) might be different. However we don't actually need to worry about this because the graph schema isn't used for node matching, unlike symbolic_script.cpp

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30972081

Pulled By: eellison

fbshipit-source-id: d4089e8feafc330df2ca158866fe779a7da0b073

Actually deprecate __torch_function__ as plain methods (#64843)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64843

Fix for https://github.com/pytorch/pytorch/issues/63767

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30991425

Pulled By: albanD

fbshipit-source-id: 1214143b8aea87e6ff406c7fc13096bd15d1a768

Update fx proxy to use classmethod for __torch_function__ (#64842)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64842

Change the `__torch_function__` to follow best guidelines of using classmethods.
I am not sure how to handle the case where multiple tracer objects are given as input but given that before we were getting an arbitrary tracer from there based on the "self" that was arbitrarily chosen by the torch_function caller, the new implementation is not worst?
Let me know what you think!

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30991423

Pulled By: albanD

fbshipit-source-id: d28940df230b543952b278a0eb2d61cf7ae123ce

Use classmethods for overrides (#64841)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64841

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30991424

Pulled By: albanD

fbshipit-source-id: 551e2119768f3a4292713f3bfa83930f5506adbd

Fix port allocation race condition for elastic test (#65149)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65149

Fixes #64789

There is a race condition between when the free port is acquired to when it is used to create the store in which it may have been used. Since this test only tests that timeout is triggered for tcpstore, we can bind to any port on tcpstore creation.

This only affects the test on the server (since that is where the port is used), but I changed both tests for clarity

cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang cbalioglu gcramer23

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D30993166

Pulled By: H-Huang

fbshipit-source-id: eac4f28d641ac87c4ebee89df83f90955144f2f1

Small improvements to compare_models_torch binary (#65171)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65171

Add the model comparison binary to BUCK, and also add some quality of life features such as controlling the input range.

Test Plan:
```
# Build the binary
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:ptmobile_compareAndroid\#android-arm64 --show-ou
# Push it to the device
adb push buck-out/gen/xplat/caffe2/ptmobile_compareAndroid\#android-arm64 /data/local/tmp/compare_models

# Run the benchmark binary
BENCH_CMD="/data/local/tmp/compare_models"
BENCH_CMD+=" --model=$PATH_TO_MODEL"
BENCH_CMD+=" --refmodel=$PATH_TO_REFERENCE_MODEL"
BENCH_CMD+=" --input_type=float --input_dims=$MODEL_INPUT_SIZE"
BENCH_CMD+=" --iter=100"
BENCH_CMD+=" --tolerance 1e-5"

```

Reviewed By: beback4u

Differential Revision: D30371322

fbshipit-source-id: 5e520aaf119c90985a1d5a135f76e4057148333b

Disable autograd fallback tests on Windows (#65147)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65147

I think they trigger an MSVC bug per https://github.com/pytorch/pytorch/issues/48763
ghstack-source-id: 138247203

Test Plan: breakpointed https://www.internalfb.com/intern/sandcastle/job/9007199738584981/ and sush'ed into the host and ran `buck build arvr/mode/win/opt //xplat/caffe2:autograd_libtorch_test_ovrsource` in `/cygdrive/d/ovrsource-null-hg`

Reviewed By: soulitzer

Differential Revision: D30992685

fbshipit-source-id: 06c6fb2c18d55490f89fc91ee5b7a4c5a7faf1c6

implement "xy" indexing for torch.meshgrid (#62724)

Summary:
This is step 4/7 of https://github.com/pytorch/pytorch/issues/50276. This allows the use of `"xy"` indexing but doesn't change any defaults.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62724

Reviewed By: heitorschueroff

Differential Revision: D30995290

Pulled By: dagitses

fbshipit-source-id: 08a6a6144b20bc019f68bc3c52e3bbf967976d8f

Allow parametrization to be nested (#65167)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/65163

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65167

Reviewed By: jbschlosser

Differential Revision: D31002318

Pulled By: albanD

fbshipit-source-id: b1f1c6c9efa9e83af9789ed13efc133f777f418e

Pass GITHUB_TOKEN to linux CI jobs and avoid skipping torchhub tests (#64807)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/64760

This should hopefully put the torchhub tests back.

This also avoids skipping the torchhub tests: currently the tests are skipped if they fail, which pretty much defeats the purpose of having a test in the first place since we're never notified when they do fail.

cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra nairbv NicolasHug

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64807

Reviewed By: seemethere

Differential Revision: D30994585

Pulled By: NicolasHug

fbshipit-source-id: 561782c22462b5cfec99cca153eb59623db5660a

[CoreML][fbcode] Add the `preprocess` python APIs (#64521)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64521

Add the preprocess part for the coreml delegate. Check out the `example.py` for the usage.
ghstack-source-id: 138324214

Test Plan:
```
(base) [taox@devvm2780.vll0 ~/fbsource/fbcode/caffe2/fb]  buck run coreml:example -- --model="/home/taox/mobilenetv2/mobilenetv2.pt" --out="/home/taox/mobilenetv2/mobilenetv2_coreml.pt"
Parsing buck files: finished in 0.5 sec
Downloaded 0/1 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 10.6 sec (100%) 12611/57623 jobs, 1/57623 updated
  Total time: 11.1 sec
Converting Frontend ==> MIL Ops: 100%|██████████████████████████████████████████▉| 382/383 [00:00<00:00, 692.58 ops/s]
Running MIL optimization passes: 100%|███████████████████████████████████████████| 18/18 [00:00<00:00, 45.55 passes/s]
Translating MIL ==> MLModel Ops: 100%|███████████████████████████████████████████| 704/704 [00:01<00:00, 468.56 ops/s]
input {
  name: "input_0"
  type {
    multiArrayType {
      shape: 1
      shape: 3
      shape: 224
      shape: 224
      dataType: FLOAT32
    }
  }
}
output {
  name: "645"
  type {
    multiArrayType {
      dataType: FLOAT32
    }
  }
}
metadata {
  userDefined {
    key: "com.github.apple.coremltools.source"
    value: "torch==1.10.0a0+fb"
  }
  userDefined {
    key: "com.github.apple.coremltools.version"
    value: "4.1"
  }
}

{'inputs': '[["input_0", "0", "[1, 3, 224, 224]"]]', 'outputs': '[["645", "0", "[1, 1000]"]]', 'config': '{"spec_ver": "4", "backend": "cpu", "allow_low_precision": "True"}', 'metadata': '{"coremltool_ver": "4.1", "torch_ver": "torch==1.10.0a0+fb"}'}
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0826 13:27:12.690302 2477051 backend_detail.cpp:376] Warning: Backend [coreml] is not available. Execution of this Module is still possible by saving and loading on a device where the backend is available. (function codegen_backend_module)
graph(%self.1 : torch.jit.LoweredModule.coreml.__torch__.torchvision.models.mobilenetv2.MobileNetV2,
      %x.1 : Tensor):
  %51 : str = prim::Constant[value="Exception: Backend is not available."]()
  %50 : str = prim::Constant[value="AssertionError: "]()
  %14 : str = prim::Constant[value="forward"]() # <string>:5:62
  %48 : Tensor = prim::Uninitialized()
  %44 : Tensor = prim::Uninitialized()
  %typed_inputs.1 : Any[] = prim::ListConstruct(%x.1)
  %__backend.3 : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1)
  %8 : bool = prim::CallMethod[name="is_available"](%__backend.3) # <string>:4:19
  %49 : Tensor = prim::If(%8) # <string>:4:16
    block0():
      %__backend : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1)
      %__handles : Dict(str, Any) = prim::GetAttr[name="__handles"](%self.1)
      %15 : Any = aten::__getitem__(%__handles, %14) # <string>:5:47
      %17 : Any[] = prim::CallMethod[name="execute"](%__backend, %15, %typed_inputs.1) # <string>:5:24
      %18 : Any = prim::ListUnpack(%17)
      %20 : bool = prim::isinstance[types=[Tensor]](%18)
      %39 : Tensor = prim::If(%20) # <string>:6:18
        block0():
          %22 : Tensor = prim::unchecked_cast(%18)
          -> (%22)
        block1():
           = prim::RaiseException(%50) # <string>:6:18
          -> (%44)
      -> (%39)
    block1():
       = prim::RaiseException(%51) # <string>:9:18
      -> (%48)
  return (%49)

```

Reviewed By: raziel

Differential Revision: D30585154

fbshipit-source-id: 66c7d2e931be6eaa3c43a0ee131ea8046452449d

[Static Runtime] Introduce static_runtime::dict_unpack (#64771)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64771

Test Plan:
- Added `StaticRuntime.RemoveImmutableInputDictLookupsWithImmutableInputDict`
- Added `StaticRuntime.RemoveImmutableInputDictLookupsWithMutableInputDict`
- TBD: Perf impact measurement

Reviewed By: mikeiovine

Differential Revision: D30685083

fbshipit-source-id: 050a92ef3b3ed0fdc0ab7a13a4b5dbfede9342a9

[ONNX] Update submodule to 1.10.1 (#63716) (#64576)

Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **https://github.com/pytorch/pytorch/issues/64576 [ONNX] Update submodule to 1.10.1 (https://github.com/pytorch/pytorch/issues/63716)**

* [ONNX] Update IR version to 7

* [ONNX] update submodule to 1.10.1

* Disable some tests in caffe2 that fail b/c caffe2 doesn't support the
new ops.
* Update Bazel file.

* Update expect files for new ONNX IR version

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64576

Reviewed By: jansel

Differential Revision: D31006896

Pulled By: msaroufim

fbshipit-source-id: f3bf97709f23a5a2cd49c708e7363231f2c1961a

[FX} Add torch.ops.profiler._record_function_{enter,exit} as stateful ops for DCE (#65180)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65180

Test Plan: Imported from OSS

Reviewed By: jansel

Differential Revision: D31007115

Pulled By: jamesr66a

fbshipit-source-id: 823b15db712a382a4f2a4fd409983d47bc067150

[quant] AO migration of the `torch/quantization/utils.py` (phase 1) (#64919)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64919

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the quantization utilities.
ghstack-source-id: 138303325

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: jerryzh168

Differential Revision: D30899082

fbshipit-source-id: 85eb38c419e417147e71758b682cd095308dd0c9

[acc_utils] Add print_model_info (#65045)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65045

This is a useful tool for printing out all of the ops that are found in a model after acc_tracer. It assumes the provided model has no `call_module` or `call_method`, which is generally a reasonable assumption assuming a model has been successfully traced by the acc_tracer.

Test Plan:
Tested locally. Sample output:
```
Model Info:
> placeholder: 1184
> get_attr: 655
> output: 2
> torch.fx.experimental.fx_acc.acc_ops.add: 2
> torch.fx.experimental.fx_acc.acc_ops.cat: 23
> torch.fx.experimental.fx_acc.acc_ops.embedding_bag: 576
> torch.fx.experimental.fx_acc.acc_ops.layer_norm: 15
> torch.fx.experimental.fx_acc.acc_ops.linear: 27
> torch.fx.experimental.fx_acc.acc_ops.matmul: 3
> torch.fx.experimental.fx_acc.acc_ops.mul: 17
> torch.fx.experimental.fx_acc.acc_ops.permute: 2
> torch.fx.experimental.fx_acc.acc_ops.reshape: 419
> torch.fx.experimental.fx_acc.acc_ops.sigmoid: 16
> torch.fx.experimental.fx_acc.acc_ops.slice_tensor: 630
> torch.fx.experimental.fx_acc.acc_ops.sum: 4
> torch.fx.experimental.fx_acc.acc_ops.tanh: 315
```

Reviewed By: 842974287

Differential Revision: D30954829

fbshipit-source-id: 5c4f0770667b72859b74099d9f4575284fc48bd2

Add back the owning_module fix (#65159)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65159

This was a legit fix originally introduced in D30905949 (https://github.com/pytorch/pytorch/commit/446d95a7f64cb464d28d27c4c87c48900a9fde79). But we hesitated and removed it for some reason. Putting it back.

Reviewed By: 842974287

Differential Revision: D30996277

fbshipit-source-id: 3f5eede11dba2072e7cd5ae6ca7ac81d55fb75fa

Add dropout shape inference as no-op in acc_tracer (#65113)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65113

Register dropout as no-op in acc_tracer & Add shape inference for no-op

Test Plan:
buck test glow/fb/fx/acc_tracer:test_acc_shape_inference --- test_unary_15_dropout_no_op
buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_dropout

Reviewed By: jfix71

Differential Revision: D30880679

fbshipit-source-id: 592fe50e17137c94c12727658191dedf08daf8cf

Pin SciPy to 1.6.2 on Windows (#65017)

Summary:
Re-enable previously disabled test_distributions

Note: conda does not have ScipPy-1.6.3, only 1.6.2

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65017

Reviewed By: seemethere

Differential Revision: D31003199

Pulled By: malfet

fbshipit-source-id: 96b9d2a833f703008bb1f4df9361db8ec6f8ccc6