review.tizen.org Git - platform/upstream/pytorch.git/log

Dist/Tizen: Revise packaging to fix build breaks on gcc 13

To fix build breaks on Tizen gcc-13, this patch revises the files in the
packaging directory as follows:
- Add an additional local patch file
- Update third-party external packages such as breakpad and kineto
- Append -Wno-unused-function -Wno-array-parameter -Wno-missing-braces
-Wno-nonnull to the default C/CXXFLAGS

Change-Id: Ifdb4acd855b65164768edc32e21a761ff69e6cab
Signed-off-by: Wook Song <wook16.song@samsung.com>

[Packaging] Update Protobuf with v3.20.3

This patch updates the protobuf package with v3.20.3.

Change-Id: Ice58247829f689a6dc740cb39adb601f6bc87433
Signed-off-by: Sangjung Woo <sangjung.woo@samsung.com>

[packaging/Tizen] Package PyTorch v1.10.2 for Tizen

- Add spec file to package the project
- Add a python script `typing_extensions.py` which used in build time

Change-Id: I9568eb83962da1cb434121fbe4980801868ff0a0
Signed-off-by: Yongjoo Ahn <yongjoo1.ahn@samsung.com>

[packaging] Import external sources

- Import external sources used for build pytorch

Change-Id: Id42cefb98e2408f2cf3a79bc9939a37e8c97ab4e
Signed-off-by: Yongjoo Ahn <yongjoo1.ahn@samsung.com>

fix formatting CIRCLE_TAG when building docs (#67026) (#69876)

Summary:
Similar to pytorch/text#1416
malfet, brianjo

The previous code failed when tags changed from `v0.9.0` to `v0.10.0`. I tested this offline, it would be nice to somehow be actually tag the repo and see that this adds the correct documentation directory to the pytorch/pytorch.github.io repo.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67026

Reviewed By: saketh-are

Differential Revision: D31843381

Pulled By: malfet

fbshipit-source-id: 21526ad9ed4c1751c2d7f6d621da305f166a7f55

Co-authored-by: mattip <matti.picus@gmail.com>

[release/1.10] Remove fgrad_input from slow_conv2d (#64280) (#69622)

Co-authored-by: Peter Bell <peterbell10@live.co.uk>

[release/1.10] fix pybind issue for get_autocast_cpu_dtype and get_autocast_gpu_dtype (#66396) (#69620)

Co-authored-by: XiaobingSuper <xiaobing.zhang@intel.com>

[release/1.10] Fix adaptive_max_pool2d for channels-last on CUDA (#67697) (#69618)

Co-authored-by: Xiao Wang <24860335+xwang233@users.noreply.github.com>

[release/1.10] TST Adds test for non-contiguous tensors (#64954) (#69617)

* TST Adds test for non-contiguous tensors (#64954)

Summary:
Follow up to https://github.com/pytorch/pytorch/issues/61935

This PR:

1. Adds test for non-contiguous tensors
2. Fixes bug in `NLLLoss` that was catch by the test.

The reason this was not catch in `common_nn` is because `CriterionTest` overrides `test_cuda` but does not call `test_nonconfig`.

cc albanD mruberry jbschlosser walterddr

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64954

Reviewed By: zou3519

Differential Revision: D31174149

Pulled By: jbschlosser

fbshipit-source-id: a16073e59b40ccc01c82ede016b63a8db2e810f5
(cherry picked from commit 0d3bf97fd05ce6ef5ddfb0a100c78ad82914cee4)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
* Cherry-pick changes from #64444

Namely, `make_weight` partial into `module_inputs_torch_nn_NLLLoss`

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>

[ONNX] Update onnxruntime to 1.9 for CI (#65029) (#67269) (#69641)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67269

Test Plan: Imported from OSS

Reviewed By: ngimel, msaroufim

Differential Revision: D31962516

Pulled By: malfet

fbshipit-source-id: 39b3c6a4a05d7b769f0ef5ce7ea597209516cde2

Co-authored-by: Gary Miguel <garymiguel@microsoft.com>

Fix strict aliasing rule violation in bitwise_binary_op (#66194) (#69619)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/66119

Failure on ARM Neoverse N1 before this PR:
```
======================================================================
FAIL: test_bitwise_ops_cpu_int16 (__main__.TestBinaryUfuncsCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 373, in instantiated_test
    result = test(self, **param_kwargs)
  File "test_binary_ufuncs.py", line 315, in test_bitwise_ops
    self.assertEqual(op(a, b), op(a_np, b_np))
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1633, in assertEqual
    self.assertEqual(
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1611, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!Found 176 different element(s) (out of 225), with the greatest difference of 21850 (-21846 vs. 4) occuring at index (0, 2).

======================================================================
FAIL: test_bitwise_ops_cpu_int32 (__main__.TestBinaryUfuncsCPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 373, in instantiated_test
    result = test(self, **param_kwargs)
  File "test_binary_ufuncs.py", line 315, in test_bitwise_ops
    self.assertEqual(op(a, b), op(a_np, b_np))
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1633, in assertEqual
    self.assertEqual(
  File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1611, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
AssertionError: False is not true : Tensors failed to compare as equal!Found 188 different element(s) (out of 225), with the greatest difference of 1335341061 (-1335341056 vs. 5) occuring at index (14, 8).

----------------------------------------------------------------------
```
which passes now.

CC malfet ezyang

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66194

Reviewed By: dagitses, bdhirsh, ngimel

Differential Revision: D31430274

Pulled By: malfet

fbshipit-source-id: bcf1c9d584c02eff328dd5b1f7af064fac5942c9
(cherry picked from commit 0b0674121aeb7d8bbcccd0461d939b64879a1273)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Co-authored-by: pbialecki <pbialecki@nvidia.com>

[LiteInterpreter] Specify `Loader` to `yaml.load` (#67694) (#69642)

Summary:
It became a mandatory argument since PyYaml-6, but has been present since PyYaml-3

Unblock migration to newer runtime

Pull Request resolved: https://github.com/pytorch/pytorch/pull/67694

Reviewed By: seemethere

Differential Revision: D32106043

Pulled By: malfet

fbshipit-source-id: 35246b97a974b168c066396ea31987b267534c7f

Fix python version in test tools CI job (#66947) (#69643)

Summary:
On the HUD, the test tools job is failing as the runners now install Python 3.10, which is not compatible with numpy 1.20

See https://github.com/pytorch/pytorch/runs/3952169950?check_suite_focus=true Install dependencies step:
```
ERROR: Command errored out with exit status 1:
   command: /opt/hostedtoolcache/Python/3.10.0/x64/bin/python /opt/hostedtoolcache/Python/3.10.0/x64/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmptq8aay7m
       cwd: /tmp/pip-install-dk_6t98q/numpy_e9431bf106b746148c0e7c36e46551b4
  Complete output (1169 lines):
  setup.py:66: RuntimeWarning: NumPy 1.20.0 may not yet support Python 3.10.
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66947

Reviewed By: suo, malfet

Differential Revision: D31799205

Pulled By: janeyx99

fbshipit-source-id: 64bf10c37c0aa4f5837c48e92d56e81d920722bd

Co-authored-by: Jane Xu <janeyx@fb.com>

(torch/elastic) add fqdn hostname to error printout (#66182) (#66662)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66182

closes https://github.com/pytorch/pytorch/issues/63174

Does a few things:

1. adds hostname to the error report
2. moves the "root cause" section to the end (presumably since the logs are being "tailed" we want the root cause to appear at the end)
3. moves redundant error info logging to debug
4. makes the border max 60 char in length and justifies left for the header

NOTE: YOU HAVE TO annotate your main function with torch.distributed.elastic.multiprocessing.errors.record, otherwise no traceback is printed (this is because python exception propagation does NOT work out of the both for IPC - hence the extra record annotation).

Test Plan:
Sample

```
============================================================
run_script_path FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2021-10-05_17:37:22
  host      : devvm4955.prn0.facebook.com
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 3296201)
  error_file: /home/kiuk/tmp/elastic/none_3_lsytqe/attempt_0/0/error.json
  traceback :
  Traceback (most recent call last):
    File "/tmp/jetter.xr3_x6qq/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 372, in wrapper
      return f(*args, **kwargs)
    File "main.py", line 28, in main
      raise RuntimeError(args.throws)
  RuntimeError: foobar

============================================================
```

Reviewed By: cbalioglu, aivanou

Differential Revision: D31416492

fbshipit-source-id: 0aeaf6e634e23ce0ea7f6a03b12c8a9ac57246e9

Handle shared memory cases in MathBitFallback (#66667)

* Handle shared memory cases in MathBithFallback (#63602)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63602

This PR fixes the case when a read and write is performed on a memory shared between mutable and (or) non-mutable arguments. Example:
```
a=torch.tensor([1+1j])
b=a.conj()
b.add_(a) # should return tensor([2]) but returns tensor ([2-2j])
```

The issue here is that in the conjugate fallback, we resolve the conjugation in-place for mutable arguments which can be a problem as shown above in the case when other input arguments share memory with the mutable argument(s).
This PR fixes this issue by:
1. first scanning through the operator input arguments and creating a vector of mutable arguments that have the conj bit set to `True` (and accordingly setting the flag `check_for_alias_with_mut_arg ` to `True` or `False`).
2. Iterating through all the arguments. At this time we only look at the non-mutable arguments. If `check_for_alias_with_mut_arg` is set to `True`, then we iterate through `mutable_inputs` to check if the current arg tensor in question doesn't alias any of the entries in `mutable_inputs`. If yes, then we clone the non-mutable tensor arg, else we resolve the conjugation as before.
3. Now we look through the mutable_inputs vector (which contains only mutable input tensors with conj bit set to `True`). We in-place conjugate each of the entries in the vector.
4. Do the computation.
5. Re-conjugate the mutable argument tensors.

NOTE: `TensorLists` are not fully handled in ConjugateFallback. Please see the in-line comment for more details.

Fixes https://github.com/pytorch/pytorch/issues/59943

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D30466905

Pulled By: anjali411

fbshipit-source-id: 58058e5e6481da04a12d03f743c1491942a6cc9b

* fix lint (#66572)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66572

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D31624043

Pulled By: suo

fbshipit-source-id: 9db9cee3140d78c2a2f0c937be84755206fee1dd

Co-authored-by: anjali411 <chourdiaanjali123@gmail.com>
Co-authored-by: Michael Suo <suo@fb.com>

Disable .numpy() and .tolist() for tensor subclasses subclasses and f… (#66642)

* Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (#66082)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66082

Fixes https://github.com/pytorch/pytorch/issues/66024 #65779

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD

Test Plan: Imported from OSS

Reviewed By: Gamrix, albanD

Differential Revision: D31615588

Pulled By: anjali411

fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19

* Apply suggestions from code review

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
Co-authored-by: Nikita Shulga <nshulga@fb.com>

Delete extraneous whitespaces

Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (#66082) (#66576)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/66082

Fixes https://github.com/pytorch/pytorch/issues/66024 #65779

cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD

Test Plan: Imported from OSS

Reviewed By: Gamrix, albanD

Differential Revision: D31615588

Pulled By: anjali411

fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19

Call `PyArray_Check` only if NumPy is available (#66433) (#66629)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/66353

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66433

Reviewed By: seemethere, janeyx99

Differential Revision: D31548290

Pulled By: malfet

fbshipit-source-id: 3b094bc8195d0392338e0bdc6df2f39587b85bb3

fix normal with empty std (#66524)

Fix cosine similarity dim checks (#66214)

* fix cosine similarity dimensionality check

* fix shapes in the doc

[ONNX] Deprecate various args (#65962)

* [ONNX] Remove argument _retain_param_name from torch.onnx.export() function. (#61702) (#64370)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64370

As of now, the "_retain_param_name" parameter has no description in PyTorch docs website. According to code, this argument determines if we keep the original parameter names of PyTorch model in the final ONNX graph. If this is False, those original parameter names will be replaced with a series of integers starting from 1.

Since setting numbers as parameter names make no sense to users, we remove this argument from the torch.onnx.export() function to increase user experience of calling this function.

This PR will still keep it in torch.onnx.export() function for backward support while all backend logic has been changed to work as _retain_param_name is set to True.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905270

Pulled By: malfet

fbshipit-source-id: ca60757ca17daaff937e9f08da42596086795f4a

Co-authored-by: fatcat-z <zhang-ji@outlook.com>
* [ONNX] Remove strip_doc_string param from torch.onnx.export() function. (#61712) (#64371)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64371

As of now, the "strip_doc_string" parameter was described as below:

strip_doc_string (bool, default True): do not include the field
doc_string``` from the exported model. Otherwise the field will mention the source code locations for model``.

This is usually useless to users who want to transform a PyTorch model to ONNX one. Only when the user wants to debug the export process, these source code locations could provide benefits.

To make the export() function more friendly by providing less parameters, we combined "strip_doc_string" into "verbose" parameter. If a user set verbose to True, it means the users need some log information for debugging the export process and this is similar with the purpose of strip_doc_string parameter.

But the usage of these 2 arguments are opposite: setting verbose to True means we want to print log information to help debug, which means strip_doc_string should be False. And this is how we replace strip_doc_string with verbose argument in this PR.

This PR will still keep it in torch.onnx.export() function for backward support while the usage of it has been combined with verbose argument.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905268

Pulled By: malfet

fbshipit-source-id: 2f06eb805c01fe15ff7a1b4f6595c937ba716d60

Co-authored-by: fatcat-z <zhang-ji@outlook.com>
* [ONNX] minor doc improvements and cleanup (#62514) (#64373)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64373

* Fix some bad formatting and clarify things in onnx.rst.
* In `export_to_pretty_string`:
    * Add documentation for previously undocumented args.
    * Document that `f` arg is ignored and mark it deprecated.
    * Update tests to stop setting `f`.
    * Warn if `_retain_param_name` is set.
* Use double quotes for string literals in test_operators.py.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905271

Pulled By: malfet

fbshipit-source-id: 3627eeabf40b9516c4a83cfab424ce537b36e4b3

* [ONNX] Deprecated the example_outputs param from torch.onnx.export() function. (#62815) (#64380)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64380

* `example_outputs` used to determine the type and shape of the outputs without tracing the execution of the model. And it must be provided when exporting a ScriptModule or ScriptFunction when using export() function.

* Since we can work out `example_outputs` in internal function instead of being provided by user, so we deprecated this argument in the export() function to increase user experience of calling this function.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905266

Pulled By: malfet

fbshipit-source-id: d00b00d7d02b365d165028288ad915678caa51f2

Co-authored-by: hwangdeyu <dejack953@outlook.com>
* [ONNX] Deprecate use_external_data_format param from torch.onnx.export() function. (#62257) (#64382)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64382

* This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit.

* When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself.

* This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905265

Pulled By: malfet

fbshipit-source-id: 82b4e17bfa6a8de2bfd700a5282c12f6835603cb

Co-authored-by: hwangdeyu <dejack953@outlook.com>
* fix clang-tidy error introduced by #64382 (#65977)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65977

Reviewed By: ngimel

Differential Revision: D31423174

Pulled By: malfet

fbshipit-source-id: 0ea560b9a6ddd6431f70bd3ac10ace68e26ab352

Co-authored-by: BowenBao <bowbao@microsoft.com>
Co-authored-by: fatcat-z <zhang-ji@outlook.com>
Co-authored-by: hwangdeyu <dejack953@outlook.com>

Convert Sampler back to lazily construction (#63646) (#65926)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63646

Fixes #63609

Test Plan: Imported from OSS

Reviewed By: NivekT

Differential Revision: D30451774

Pulled By: ejguan

fbshipit-source-id: 550d77494326446d1a42b5da0559e0d384c47413

Revert "Added option to update parameters using state_dict in AveragedModel (#65495) (#65755)" (#66308)

This reverts commit 5f1a434599b46afd99607839d15892e09269a1c4.

Added option to update parameters using state_dict in AveragedModel (#65495) (#65755)

* Added option to update parameters using state_dict in AveragedModel (#65495)

Summary:
While implementing [EMA](https://github.com/pytorch/vision/pull/4381)(which extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](https://github.com/pytorch/vision/pull/4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation.

Discussion: https://github.com/pytorch/vision/pull/4406#pullrequestreview-753734102

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65495

Reviewed By: datumbox

Differential Revision: D31176742

Pulled By: prabhat00155

fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2
(cherry picked from commit 2ea724b1fd543304e3be7bd223cac451cd093e16)

* Added validation of mode parameter in AveragedModel (#65921)

Summary:
Discussion: https://github.com/pytorch/pytorch/pull/65495#issuecomment-930460469

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65921

Reviewed By: albanD

Differential Revision: D31310105

Pulled By: prabhat00155

fbshipit-source-id: 417691832a7c793744830c11e0ce53e3972d21a3
(cherry picked from commit c7748fc172553da66368fd0b7fea3fe5661e2dc1)

Tweak `file_diff_from_base` for release/1.10 branch (#66202)

[DataPipe] DataPipe Fix and Deprecation Warnings for Release 1.10 (#65932)

* Unify the output pathname of archive reader and extractor (#65424)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65424

This PR is re-implementation for https://github.com/facebookexternal/torchdata/pull/93
Same PR has landed into torchdata https://github.com/facebookexternal/torchdata/pull/157

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D31090447

Pulled By: ejguan

fbshipit-source-id: 45af1ad9b24310bebfd6e010f41cff398946ba65

* [DatePipe] add deprecation warnings for DataPipes that will solely exist in TorchData (#65827)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65827

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31272794

Pulled By: NivekT

fbshipit-source-id: 8da8266184b4df050422904cbc5fca6d7c3d2e02

* [DataPipe] Fixes an issue where TarArchiveReader closes stream when read into a buffer (#65877)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65877

Fixes #65808

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D31296041

Pulled By: NivekT

fbshipit-source-id: cdcad3a333ae9781d6063678a122a128955b0ff4

Co-authored-by: Erjia Guan <erjia@fb.com>

[iOS][CI] Update dev certs (#66004) (#66188)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/65988

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66004

Reviewed By: xta0

Differential Revision: D31340893

Pulled By: malfet

fbshipit-source-id: 3bf0be266e9686a73d62e86c5cf0bebeb0416260

Co-authored-by: Tao Xu <taox@fb.com>

Fix backward compatibility tests (#66186)

Compare operator list against RC1 build rather than against nightly

Fix Windows ninja builds when MAX_JOBS is specified (#65444) (#66155)

Summary:
Reported by cloudhan in https://github.com/pytorch/pytorch/pull/64733#issuecomment-924545463

Fixes regression introduced by https://github.com/pytorch/pytorch/commit/047e68235f8ebf8dc9fd816829ba90561d423ff9

cc malfet seemethere

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65444

Reviewed By: dagitses, seemethere

Differential Revision: D31103260

Pulled By: malfet

fbshipit-source-id: 9d5454a64cb8a0b96264119cf16582cc5afed284

Binary building wthout python fix (#66031) (#66117)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/66030

Pull Request resolved: https://github.com/pytorch/pytorch/pull/66031

Reviewed By: VitalyFedyunin

Differential Revision: D31356243

Pulled By: malfet

fbshipit-source-id: d1537bc65bbba5d6497ecb8db7160a397eca81fd

[ci] try installing libgnutls to fix cert error (#65934) (#65979)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65934

see: https://github.com/pytorch/pytorch/issues/65931, this was a
suggested remediation on the linked issue

Test Plan: Imported from OSS

Reviewed By: malfet, zhouzhuojie

Differential Revision: D31313040

Pulled By: suo

fbshipit-source-id: a9e2b82a1e879962af768ed3049c73ab77394738

Co-authored-by: Michael Suo <suo@fb.com>

[DataPipe] Fix deepcopy filehandle for Mapper and in-place modification for IterableWrapper (#65220) (#65924)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65220

Fixes #65221

- Remove deepcopy from Mapper to support file handles
- Convert `IterableWrapper` to deepcopy iterable instance within each iterator to prevent in-place modification (different data per epoch)
- Convert `IDP` to `IterableWrapper` in test_datapipe.py
- Refine the variable names (prevent using `dp` that is module reference)

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31021886

Pulled By: ejguan

fbshipit-source-id: 72a9eee66c758e2717d591cd0942892bddedc223

Fix the slowdown of _object_to_tensor since 1.9 (#65721) (#65835)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65721

#Closes: https://github.com/pytorch/pytorch/issues/65696

The bug is introduced in https://github.com/pytorch/pytorch/pull/55861, and it causes 100X slowdown since 1.9.
ghstack-source-id: 139128267

Test Plan:
Performance test:
```
import time

from torch.distributed.distributed_c10d import _object_to_tensor

start = time.time()
_object_to_tensor("x" * 50_000_000)
print("Time:", time.time() - start)
```

Reviewed By: rohan-varma

Differential Revision: D31219794

fbshipit-source-id: 1abec38f9d51361c1eab6ad5efd87b589322e208

Co-authored-by: Yi Wang <wayi@fb.com>

Fix test reporting git merge-base (#65787)

[1.10] Remove torch.vmap (#65496)

torch.vmap is a prototype feature and should not be in the stable
binary. This PR:
- Removes the torch.vmap API
- Removes the documentation entry for torch.vmap
- Changes the vmap tests to use an internal API instead of torch.vmap.

Test Plan:
- Tested locally (test_torch, test_autograd, test_type_hints, test_vmap),
but also wait for CI.

[release/1.10] Pin builder and xla repo (#65433)

Pin builder to https://github.com/pytorch/builder/commits/release/1.10
Pin xla to https://github.com/pytorch/xla/tree/r1.10

THCTensor cleanup (#65369)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65369

Reviewed By: bhosmer

Differential Revision: D31071406

Pulled By: ngimel

fbshipit-source-id: bbc3f2781003333641524aeb692b944fd3ad8d7a

[PT/ShardedTensor]Allow zero size local shard (#65007)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65007

Relax shard size check in ShardMetadata to allow zero size local shard.

When sharding a tensor on N ranks, some ranks may have empty shard allocated. As we are assuming SPMD, the ranks w/ empty shard still need to participate in all collectives, and we need to allow this in ShardMetadata.

Test Plan: Unit tests and CLI

Reviewed By: jiaqizhai, wanchaol

Differential Revision: D30926566

fbshipit-source-id: afa562c94ffa8f8d91d65ddb4c348156d871dc36

OpInfo: nn.functional.conv2d (#65233)

Summary:
Reland : https://github.com/pytorch/pytorch/issues/63517
Reference: https://github.com/pytorch/pytorch/issues/54261

Reference: facebookresearch/functorch#78

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65233

Reviewed By: malfet

Differential Revision: D31025538

Pulled By: zou3519

fbshipit-source-id: b1cd38c22f4cb8eedd3f958e02dd7410dcbb8d8d

[JIT] Re-land "Add aten::slice optimization" (#65341)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65341

The changes in D30231044 (https://github.com/pytorch/pytorch/commit/babd4499783abc699faf36f3a72a9fc491e0e572) were removed due to a downstream issue in glow. Now that the issue has been fixed by D30849396, we can safely re-introduce the changes.

Test Plan:
`buck test //caffe2/test:jit -- TestPeephole`

Glow test:
* `buck test //glow/fb/torch_glow/tests:unfuse_glow_ops_test`
* qxy11 confirmed that the problematic glow model now loads correctly with these changes

Reviewed By: eellison

Differential Revision: D31056878

fbshipit-source-id: 049903ee04ba88885cc9d1a91427af0f1f44f681

[nn] TripletMarginLoss and PairwiseDistance : no batch dim (#64882)

Summary:
Reference: https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64882

Reviewed By: malfet

Differential Revision: D31055577

Pulled By: jbschlosser

fbshipit-source-id: 2f0a5a08619b672026b48a78bc7d83a6dccba0bf

correlate forward and backward op (#62553)

Summary:
Use startThreadId+seqNumber of forward-op and fwdThreadId+seqNumber of backward-op to correlate pair of them.
third_party/kineto should be updated accordingly: https://github.com/pytorch/kineto/pull/372

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62553

Reviewed By: malfet

Differential Revision: D30125728

Pulled By: gdankel

fbshipit-source-id: 9877a54392ba043d0eac56ce5b7bbf244277fa7e

[docs] Remove .data from some docs (#65358)

Summary:
Related to https://github.com/pytorch/pytorch/issues/30987. Fix the following task:

- [ ] Remove the use of `.data` in all our internal code:
- [ ] ...
- [x] `docs/source/scripts/build_activation_images.py` and `docs/source/notes/extending.rst`

In `docs/source/scripts/build_activation_images.py`, I used `nn.init` because the snippet already assumes `nn` is available (the class inherits from `nn.Module`).

cc albanD

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65358

Reviewed By: malfet

Differential Revision: D31061790

Pulled By: albanD

fbshipit-source-id: be936c2035f0bdd49986351026fe3e932a5b4032

Adds keyword only args to gradcheck (#65290)

Summary:
Changes the call signature of gradcheck so that kwargs are kwargs only.

Also modifies return call from gradgradcheck, to reflect these changes.

Fixes https://github.com/pytorch/pytorch/issues/65165

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65290

Reviewed By: soulitzer

Differential Revision: D31061316

Pulled By: albanD

fbshipit-source-id: 3505569a33a497a8be4347bdd425bb2b8e536999

[PyTorch Edge] Backport function for defaults args with out args, flag on (#63651)

Summary:
1. Enable support for operators with default args and out args. For `torch.add(x, h, out=x)`, the number of specified arguments will be 3 instead of 4.
2. Bump bytecode version from 6 to 7
3. Implement backport_v7_to_v6 function. Also slightly refactor the local_thread to allow re-emit operators.
4. unittest to cover backport function
5. Update expect result from 4 to 3 in unit test DefaultArgsWithOutArg to cover the number of specified arguments.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63651

ghstack-source-id: 138539912

Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.BackPortByteCodeModelAllVersions
```

Reviewed By: raziel, tugsbayasgalan

Differential Revision: D30454080

fbshipit-source-id: 357c50b96682430675142d20d688d1f64e1de307

[JIT] Delete obsolete message: or if you absolutely have to, use c10::impl::GenericDict(c10::impl::deprecatedUntypedDict()) (#65164)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65164

Looks like it was forgotten in https://github.com/pytorch/pytorch/pull/25439

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31072625

Pulled By: pbelevich

fbshipit-source-id: a5ffcfb0836f962ab6952a187ba7717c4d4a6e33

[JIT] Support device as Dict key (#65079)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65079

This is required to use RPC DeviceMap aka Dict[torch.device, torch.device] in torchscript

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31072626

Pulled By: pbelevich

fbshipit-source-id: 51cfa5653db86de73b624e9157d68d1b319bfc64

Reduce PyTorch warnings: Cast fix xplat/caffe2/aten/src/ATen/core/DeprecatedTypeProperties.h (#65031)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65031

Test Plan:
```
buck build --show-output //caffe2/torch/fb/sparsenn:sparsenn_operators

buck test caffe2/torch/fb/sparsenn:test
```

Reviewed By: r-barnes

Differential Revision: D30948791

fbshipit-source-id: 13046e1d0ce2c24864ad38f318ca5e34b1bb9552

Basic implementation of ShardedLinear using ShardedTensor. (#64128)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64128

This PR implements a sharded nn.Linear layer using ShardedTensors with
the following limitations:

1) Works only for ChunkShardingSpec.
2) Implementation is only aimed to demonstrate functionality and is most likely
not performant at all.

The PR also introduces a `shard_parameter` API to easily shard parameters of
`nn.Modules`. This also has the following limitations:

1) Works only for ChunkShardingSpec.
2) Is not performant since it uses broadcast instead of scatter since
ProcessGroupNCCL doesn't yet support scatter.

Overall user API for running a sharded linear would be something like this:

```
# SPMD programming paradigm running same code on all nodes.
fc = nn.Linear(10, 10)

# Setup sharding.
sharding_spec=ChunkShardingSpec(...)
shard_parameter(fc, 'weight', sharding_spec, src_rank=0)

# Run as a normal linear layer.
inp = torch.rand(10, 10)
output = fc(inp)
```
ghstack-source-id: 138500985

Test Plan:
1) unit tests.
2) waitforbuildbot

Reviewed By: wanchaol, bowangbj

Differential Revision: D30621215

fbshipit-source-id: 1aa7478568c18a4572f6c3462fdf24a4cbde01d6

Track peak memory usage (#65157)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65157

Test Plan: Imported from OSS

Reviewed By: malfet

Differential Revision: D31029049

Pulled By: driazati

fbshipit-source-id: 3e87e94e4872d118ad191aef2b77b8cefe90aeb6

Fix logic to determine master vs PR (#65155)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65155

This was bugged before on empty strings which caused the hook to write on any job, not just `master` regardless of the `only_on_master` flag.

Test Plan: see `[scribe] Skipping RDS write on PR` in the logs for `linux-xenial-cuda11.3-py3.6-gcc7`

Reviewed By: malfet

Differential Revision: D31029048

Pulled By: driazati

fbshipit-source-id: 77c4a60e443d8fc19990755a3a346576afee86d8

[quant] Add fp32/fp16 zero_point support for CPU fakeQuant (#65055)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65055

Test Plan: Imported from OSS

Reviewed By: jingsh, supriyar

Differential Revision: D30975238

Pulled By: b-koopman

fbshipit-source-id: 2000660ffe71cb85d00fdabaf8fc3ba7323f9a1e

[PyPer] copy-free freeze_module (#65118)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65118

Cloning the module can increase memory use. By freezing the module directly without cloning it first, we can avoid this memory usage increase.

Reviewed By: eellison, movefast1990

Differential Revision: D30955053

fbshipit-source-id: 2feb738eddcf66aa68c92bf695cc05b57bd990f0

Reduce PyTorch warnings: Cast fix xplat/caffe2/c10/core/TensorOptions.h (#65030)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65030

Test Plan:
```
buck build --show-output //caffe2/torch/fb/sparsenn:sparsenn_operators

buck test caffe2/torch/fb/sparsenn:test
```

Reviewed By: r-barnes

Differential Revision: D30948721

fbshipit-source-id: 16fe42daab35709c56a4d3ccc276ea635a3510c1

[iOS] Zero out NSError to avoid heap corruptions for the OSS builds (#65355)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65355

I've been seeing heap corruptions in the CMake builds due to the NSError* not being initialized with `nil`. However, I haven't see this issue for the BUCK builds.
ghstack-source-id: 138502708

Test Plan:
1. Test the OSS builds to make sure the heap corruption has gone.
2. Test the Buck build in the playground app
3. Circle CI

Reviewed By: hanton

Differential Revision: D31048010

fbshipit-source-id: cfd8d614f3f91f09caee4aab61237007ec080481

Add crow_/col_indices to view types (#63176)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/61103

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63176

Reviewed By: malfet, albanD

Differential Revision: D30315882

Pulled By: cpuhrsch

fbshipit-source-id: eedae5265a757ed68fd69e4f9d07070b05de4bd8

Creating a helper function to generate an unique name for an attr in a module (#64970)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64970

Add a helper function to create an unique name for an attr.
This can be used when we want to add a weight to a module.

Test Plan: run CI.

Reviewed By: jfix71

Differential Revision: D30921497

fbshipit-source-id: 598569d107df8b516ff12920a4bef3a42577e987

Add support to lower acc_ops.transpose (#65036)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65036

Reviewed By: jfix71, 842974287

Differential Revision: D30934503

fbshipit-source-id: 51880d3d36492f5206f77c9d1a994d8532597b62

[fx] give warning instead of fatal the program when submod not found during adding get_attr (#65225)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65225

Currently when create get_attr node, if the attribute is in a submodule, we'll fist find the submodule. If the submodule isn't in the owning module we throw an exception.

However, if the attribute can't be found, we give a warning but still allow to create the get_attr node. To align with this behavior, we change the reaction when submod not found from fatal to giving a warning.

Test Plan: CI

Reviewed By: jamesr66a, jfix71

Differential Revision: D31021535

fbshipit-source-id: 4c0b471448c09cc927d0f47b5bf56594f25a8863

Remove @balioglu from PyTorch Distributed code owners (#65239)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65239

Due to too much noise caused by the GitHub notifications, going forward I prefer to track PRs manually.
ghstack-source-id: 138386041

Test Plan: N/A

Reviewed By: mrshenli

Differential Revision: D31027792

fbshipit-source-id: 6578e41d4ab53ad2c64a41584716f4903298cd6b

[CUDA graphs] Beta, not prototype (#65247)

Summary:
Powers have decided this API should be listed as beta.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65247

Reviewed By: malfet

Differential Revision: D31057940

Pulled By: ngimel

fbshipit-source-id: 137b63cbd2c7409fecdc161a22135619bfc96bfa

Fix full backward hook when grad is disabled (#65335)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/59901. See discussion in the issue.

cc albanD soulitzer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65335

Reviewed By: malfet

Differential Revision: D31055865

Pulled By: albanD

fbshipit-source-id: 53605df62bc73c99d8908248087ab400b81ac495

Fix unassigned ciflow trigger (#65354)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/65250#issuecomment-923120764

this is a limitation of github action triggers, it's hard to introduce condition before the workflow, that's why we intentionally pick the rare event ("unassigned"). The fix I think for people didn't opt-in ciflow and manually unassign, is to run all the CI (otherwise we introduce new condition on this and not worth to make things even complex).

`unassigned` event payload looks like this, just to make sure `github.event.assignee.login` is pointing to the right location.

```
  {
    "action": "unassigned",
    "assignee": {
      "avatar_url": "https://avatars.githubusercontent.com/u/658840?v=4",
      "events_url": "https://api.github.com/users/zhouzhuojie/events{/privacy}",
      "followers_url": "https://api.github.com/users/zhouzhuojie/followers",
      "following_url": "https://api.github.com/users/zhouzhuojie/following{/other_user}",
      "gists_url": "https://api.github.com/users/zhouzhuojie/gists{/gist_id}",
      "gravatar_id": "",
      "html_url": "https://github.com/zhouzhuojie",
      "id": 658840,
      "login": "zhouzhuojie",
      "node_id": "MDQ6VXNlcjY1ODg0MA==",
      "organizations_url": "https://api.github.com/users/zhouzhuojie/orgs",
      "received_events_url": "https://api.github.com/users/zhouzhuojie/received_events",
      "repos_url": "https://api.github.com/users/zhouzhuojie/repos",
      "site_admin": false,
      "starred_url": "https://api.github.com/users/zhouzhuojie/starred{/owner}{/repo}",
      "subscriptions_url": "https://api.github.com/users/zhouzhuojie/subscriptions",
      "type": "User",
      "url": "https://api.github.com/users/zhouzhuojie"
    },
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65354

Reviewed By: malfet, seemethere, janeyx99

Differential Revision: D31060212

Pulled By: zhouzhuojie

fbshipit-source-id: ce815cc96e8a00016646d6f02f0917169fa652dc

fix typo missing f string (#65226)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65226

Reviewed By: malfet

Differential Revision: D31055793

Pulled By: albanD

fbshipit-source-id: fafac53e75223c4f599bd2162095aacad7b690df

[iOS] Fix the TestApp (#65319)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65319

Test Plan: Imported from OSS

Reviewed By: hanton

Differential Revision: D31049543

Pulled By: xta0

fbshipit-source-id: ff0d0baac30682c63b2a28254ee0a5d8d9b8ca6f

[Pipe] Add a `WithDevice` wrapper to specify device execution for a module. (#65190)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65190

As described in https://github.com/pytorch/pytorch/issues/65093, there
could be modules which don't have any parameters/buffers. In this case, Pipe
determines that the module should be executed on CPU. However this might result
in unnecessary GPU to CPU transfers whereas the user expected the module to be
executed on the GPU itself by keeping its inputs and outputs on GPU.

For this use case, we introduce a `WithDevice` wrapper which can be used to
override which device a particular module should be executed on as part of the
pipeline.

#Closes: https://github.com/pytorch/pytorch/issues/65093
ghstack-source-id: 138376272

Test Plan:
1) waitforbuildbot
2) unit tests

Reviewed By: SciPioneer

Differential Revision: D31010027

fbshipit-source-id: 4c1c61d3c6feeef341e002e5f7e83dd33ff3a516

Torchhub: More robust assumption regarding main or master branch (#64364)

Summary:
Closes https://github.com/pytorch/pytorch/issues/63753

This PR changes the assumption regarding the default branch of a repo to the following:

> If main exist then use main,otherwise use master

This will make torchhub more robust w.r.t. to the ongoing changes where repo use `main` instead of `master` as the development / default branch.

cc nairbv NicolasHug

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64364

Reviewed By: saketh-are

Differential Revision: D30731551

Pulled By: NicolasHug

fbshipit-source-id: 7232a30e956dcccca21933a29de5eddd711aa99b

[Static Runtime] Implement and enable variadic tuple unpack (#64934)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64934

Add a new op `static_runtime::VarTupleUnpack` and a graph pass transforming graph sequences from:
```
%0, %1 = prim::TupleUnpack(%a)
%2, %3 = prim::TupleUnpack(%b)
```
into:
```
%0, %1, %2, %3 = static_runtime::VarTupleUnpack(%a, %b)
```

The pass is only applied to contiguous blocks of `TupleUnpack` nodes. This is the most straightforward way to guarantee correctness, and it is sufficient for the models we care about.

Test Plan: New unit tests: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- VarTupleUnpack`

Reviewed By: d1jang

Differential Revision: D30872109

fbshipit-source-id: 1ed4a7e201c532da28f703a3a50241c392a6c7e9

[quant][fx][graphmode] Fix a bug for sub (#65109)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65109

Previously for sub we set the dtype for sub with qconfig since it's matched with a QuantizeHandler,
however this is incorrect, the dtype for sub is decided by whether the output is quantized or not,
so we added a check of is_output_quantized while deciding the dtype for the output of sub.

Later: is_output_quantized now depends on is_reference, which is pretty confusing and it may cause problems down the road, we should remove this dependency in the future.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_sub_scalar

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30977826

fbshipit-source-id: 551fd63bd61b43b3c3415944ff73174e3a21cc8a

Revert "Revert D30558877: Ported std/var to ReductionOpInfo (#65262)

Summary:
Reland of https://github.com/pytorch/pytorch/issues/63978

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65262

Reviewed By: mruberry

Differential Revision: D31037360

Pulled By: ngimel

fbshipit-source-id: 1c60f40c547229767cba3bbe7e11ca0fbbc8f95f

simplify `torch.meshgrid`'s shape computation (#62905)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62905

Reviewed By: mruberry

Differential Revision: D31021274

Pulled By: dagitses

fbshipit-source-id: c219389bdc543e9592f7b1c707acfbf752ee6f34

[DataPipe] Unlimited buffer for Forker and Demultiplexer (#64994)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64994

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D30934362

Pulled By: ejguan

fbshipit-source-id: d3b774d7e28c0b9659e999511e5a68c3929857d4

Automated submodule update: FBGEMM (#64640)

Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).

New submodule commit: https://github.com/pytorch/FBGEMM/commit/d1ecc7dbe28d06cec742b06d541d5f96faf940fc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64640

Test Plan: Ensure that CI jobs succeed on GitHub before landing.

Reviewed By: jspark1105

Differential Revision: D30805660

fbshipit-source-id: 9f783862e89fe3974badd5194ef793db55e7d275

[quant][fx2trt] Generate engine graph for explicit quant/implicit quant and fp16 graph (#65289)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65289

Turn on VERBOSE logging and use engine visualizer to generate the graph.

Runtime:
```
explicit quant result diff max tensor(0.0771)
implicit quant result diff max tensor(0.1909)
trt fp16 time (ms/iter) 1.0740923881530762
trt int8 time (ms/iter) 0.5288887023925781
trt implicit int8 time (ms/iter) 0.6334662437438965
PyTorch time (CUDA) (ms/iter) 4.448361396789551
PyTorch time (CPU) (ms/iter) 45.13296604156494
```

Generated Graphs:
```
explicit int8: https://www.internalfb.com/intern/graphviz/?paste=P458669571
implicit int8: https://www.internalfb.com/intern/graphviz/?paste=P458669656
fp16: https://www.internalfb.com/intern/graphviz/?paste=P458669708
```

Test Plan:
```
buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test 2>log
buck run //deeplearning/trt/fx2trt/tools:engine_layer_visualize -- --log_file log
```

Reviewed By: 842974287

Differential Revision: D30955035

fbshipit-source-id: 24949458ad9823fb026d56d78a6ee1c6874b6034

[Static Runtime] Add perf metrics for number of managed tensors & unmanaged values (#64992)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64992

This change lets Static Runtime print out number of managed tensors & unmanaged values as performance metrics during profile runs.

We will use /enhance these metrics to guide the effort of managing output tensors.

Test Plan:
Confirmed that a profile run prints out the added metric values on inline_cvr nets:
```
(inline_cvr/local)
...
Total number of managed tensors: 2754
Total number of unmanaged values: 3240
...
(inline_cvr/local_ro)
Total number of managed tensors: 1554
Total number of unmanaged values: 2966
...
(inline_cvr/remote_ro)
Total number of managed tensors: 1439
Total number of unmanaged values: 28
...
```

Reviewed By: hlu1

Differential Revision: D30926617

fbshipit-source-id: b86e071003ac941b9663db103eaa7c614466b4e0

Remove incorrect stride assert in Reduce.cuh (#65227)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/37583

Per discussion with ngimel, the condition asserted here may not always hold after TensorIterator's dimension coalescing and reordering. However, the reduction output should still be correct when `sub_iter.strides(0)[0]` is non-zero.

I've verified correctness empirically by:
1. Lowering the threshold ([configured here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/TensorIterator.cpp#L1127)) at which iterators are split into sub-iterators, making it easier to trigger.
2. Generating many tensors with random dimensions and randint elements which produce a non-zero `sub_iter.strides(0)[0]` in the CUDA kernel.
3. Verifying that the reduction `t.sum(dim=0)` produces the same results for those tensors on CPU and on CUDA.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65227

Reviewed By: ngimel

Differential Revision: D31031406

Pulled By: saketh-are

fbshipit-source-id: 5cbf2001224454c74f6db42455c507365ad1f2b1

support using gradients named for outputs in derivatives (#63947)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63947

Fixes #62196

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30541485

Pulled By: dagitses

fbshipit-source-id: ea1dd0edd1a51936a295631e52b85e9c022a9c87

clarify implementation of check_grad_usage (#64439)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64439

1) remove unused fully_implemented
2) rename used_grad to uses_grad and make it a boolean
3) rename used_grads to num_grads_uses
4) add comments explaining what some of the checks mean

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D30733904

Pulled By: dagitses

fbshipit-source-id: dccbbef8a4be8713215ef91aa97a34124f06a7a1

[quant][fx2trt] Enable comparison with implicit quant mode (#65043)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65043

Currently got following result, will take a look at the executed graph again:
```
trt fp16 time (ms/iter) 1.0483217239379883
trt int8 time (ms/iter) 0.5329632759094238
trt implicit int8 time (ms/iter) 0.6769704818725586
PyTorch time (ms/iter) 6.453146934509277
```

Test Plan:
```
python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py
```

Imported from OSS

Reviewed By: 842974287

Differential Revision: D30954871

fbshipit-source-id: 8d7ff82b8f5d0b7946fbd38a7cddede7d40b28aa

[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK`

Reviewed By: zertosh

Differential Revision: D31039372

fbshipit-source-id: a5e54a9b1d2ef97e9bc206b9e2a82124e5a22a7a

Remove 9.2 related macros for CONSTEXPR (#65066)

Summary:
Removes C10_HOST_CONSTEXPR_EXCEPT_CUDA92 references in the code

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65066

Reviewed By: driazati

Differential Revision: D31022520

Pulled By: janeyx99

fbshipit-source-id: f02cdc6caba5b48405575242921f5845ff18f729

Make github.com in noproxy list (#65256)

Summary:
Fixes #{issue number}

Attempt to solve some ratelimiting issue we saw from calling GitHub apis

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65256

Reviewed By: seemethere

Differential Revision: D31035115

Pulled By: zhouzhuojie

fbshipit-source-id: 7efd5d5af7d91805e4bf27b86847791e991b741e

remove utils.cpp (#65184)

Summary:
Dead code

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65184

Reviewed By: mruberry

Differential Revision: D31031777

Pulled By: ngimel

fbshipit-source-id: 13633888229a7af8cfd8ea7e55ea2880b2e47273

[fx const fold] fix a case when some inputs are unused (#65223)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65223

If there're unused inputs, they won't appear in `submod_1`. We need to add all the unused inputs so that the model after const fold has the same inputs as the original model.

Reviewed By: jfix71

Differential Revision: D31021217

fbshipit-source-id: b7452c90d133b747e0699936a81d3fee14af9cc9

[Profiler] Update kineto submodule (#65236)

Summary:
Update to latest kineto revision. See Kineto repo for change log.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65236

Reviewed By: malfet

Differential Revision: D31031638

Pulled By: gdankel

fbshipit-source-id: 681655b2e8e151895afa91445ced0fd57a11fa93

[fx2trt] re-enable profiler and some miscs for TRTModule (#65072)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65072

Previously disabled attaching trt profiler to exec context in TRTModule because https://fburl.com/mc33n880 states that `enqueue()` doesn't support profiling. Seems to be a lie though. Re-enable attaching profiler in this diff.

Also added a bunch of checks for dtype and shape, and fixed saving state_dict and loading back.

Test Plan: buck run mode/opt -c python.package_style=inplace -j 40 deeplearning/trt/fx2trt:acc2trt_test

Reviewed By: yinghai

Differential Revision: D30962757

fbshipit-source-id: 9c664b0500a8169b7952f6f912239a5a05772aea

[package] Make it possible to re-save a PackageImporter module (#65101)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65101

As title. Previously this was guarded against for implementation
simplicity, as we didn't really think there was a use case for saving a
mangled module name directly.

But people started doing stuff like:
```
exporter.save_module(my_imported_obj.__module__)
```
which implicitly passes along the mangled module name.

This PR makes it so that given `PackageImporter` instance can always
import modules that it created, and changes `PackageExporter` to
properly demangle the resulting module name when writing the package to
the export archive.

Differential Revision:
D30975712
D30975712

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: d9e849bf651713890e72dccdcef74fa52d377149

[FX] Fix tracing of bitwise and/or (#65196)

Summary:
Previously resulted in `AttributeError: module 'operator' has no attribute 'and'`

and/or are python keywords, so they are renamed to `operator.and_` and `operator.or_`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65196

Reviewed By: Chillee

Differential Revision: D31020336

Pulled By: jansel

fbshipit-source-id: 51d888151fe78c0c1197ecaf161976b219c59694

Revert D30731191: [pytorch][PR] Torchhub: rewrite commit hash check to avoid using unnecessary GitHub API credits

Test Plan: revert-hammer

Differential Revision:
D30731191 (https://github.com/pytorch/pytorch/commit/f9bf144a0c5e3627f5fafb256cebf1f02152ab0c)

Original commit changeset: d1ee7c2ef259

fbshipit-source-id: 5c7207f66c5354ce7b9ac2594e4f5b8307619b0c

[ONNX] Deprecate enable_onnx_checker argument in torch.onnx.export() (#61708) (#64369)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64369

As of now, the "enable_onnx_checker" parameter was described as below:

enable_onnx_checker (bool, default True): If True the ONNX model checker will be run to ensure the exported model is a valid ONNX model.

An invalid ONNX graph is useless to users so such checker should be done for each call.

In this PR, we will still write the model to an ONNX file even it is invalid. And the exception will be thrown after the ONNX file has been created. This enables user output an invalid ONNX graph for debug.

This PR will still keep it in torch.onnx.export() function for backward support while all backend logic has been changed to work as enable_onnx_checker is set to True.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D30905267

Pulled By: malfet

fbshipit-source-id: 3ad3f68e77fcec012cc7ef674cc9a61755eebc9e

Co-authored-by: fatcat-z <zhang-ji@outlook.com>

[Static Runtime] Move MemoryPlanner out into memory_planner.cpp (#65123)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65123

This change re-reverts D30883290 (https://github.com/pytorch/pytorch/commit/0e11454d19e106ba6d5819c1147ca540cbce2943). D30883290 (https://github.com/pytorch/pytorch/commit/0e11454d19e106ba6d5819c1147ca540cbce2943) broke the OSS build since the change in this change implicitly removed the default move constructor of `StaticRuntime`.

```
ep 15 15:39:57 /var/lib/jenkins/workspace/benchmarks/static_runtime/deep_wide_pt_bench.cc:95:10: error: call to implicitly-deleted copy constructor of 'torch::jit::StaticRuntime'
Sep 15 15:39:57   return torch::jit::StaticRuntime(*smod);
Sep 15 15:39:57          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sep 15 15:39:57 /var/lib/jenkins/workspace/torch/csrc/jit/runtime/static/impl.h:321:34: note: copy constructor of 'StaticRuntime' is implicitly deleted because field 'planner_' has a deleted copy constructor
Sep 15 15:39:57   std::unique_ptr<MemoryPlanner> planner_;
Sep 15 15:39:57                                  ^
Sep 15 15:39:57 /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/unique_ptr.h:356:7: note: 'unique_ptr' has been explicitly marked deleted here
Sep 15 15:39:57       unique_ptr(const unique_ptr&) = delete;
Sep 15 15:39:57       ^
Sep 15 15:39:57 /var/lib/jenkins/workspace/benchmarks/static_runtime/deep_wide_pt_bench.cc:99:9: error: call to implicitly-deleted copy constructor of 'torch::jit::StaticRuntime'
Sep 15 15:39:57    auto sr = getStaticRuntime();
Sep 15 15:39:57         ^    ~~~~~~~~~~~~~~~~~~
Sep 15 15:39:57 /var/lib/jenkins/workspace/torch/csrc/jit/runtime/static/impl.h:321:34: note: copy constructor of 'StaticRuntime' is implicitly deleted because field 'planner_' has a deleted copy constructor
Sep 15 15:39:57   std::unique_ptr<MemoryPlanner> planner_;
Sep 15 15:39:57                                  ^
Sep 15 15:39:57 /usr/bin/../lib/gcc/x86_64-linux-gnu/5.4.0/../../../../include/c++/5.4.0/bits/unique_ptr.h:356:7: note: 'unique_ptr' has been explicitly marked deleted here
Sep 15 15:39:57       unique_ptr(const unique_ptr&) = delete;
Sep 15 15:39:57       ^
Sep 15 15:39:57 2 errors generated.
```

This change fixes the issue by explicitly defining the default move constructor (courtesy of mikeiovine).

Original Summary:

This change moves `MemoryPlanner` out of impl.cpp into memory_planner.cpp.

`MemoryPlanner` performs an independent sub-task of static analysis of a graph, and creating memory planning, and allocating/deallocating managed Tensors.

This change will reduce merge conflicts as I work on MemoryPlanner more actively for output Tensor support.

Test Plan: - Confirm that OSS build went well (See External Tests section).

Reviewed By: mikeiovine

Differential Revision: D30983292

fbshipit-source-id: a59f407fa1123527824157268111144a1bf58116

[PyTorch] Extract parseOperator() into a standalone source file (#65179)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65179

This is following up this PR: https://github.com/pytorch/pytorch/pull/61862. The purpose is to modularize operator parsing so that it can be used as needed without pulling the whole `import.cpp` into build.

Test Plan: Added a unit test in `test_lite_predictor.cpp` called `ParseOperators`, similar to `ParseBytecode`.

Reviewed By: iseeyuan

Differential Revision: D31006555

fbshipit-source-id: c38e221800af4cf72963a353c452c5437f56a0ac

[PyTorch] Improve OperatorEntry::getKernelForDispatchKey (#64838)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64838

The returned pointer, if present, could never be nullptr, so there is no reason to wrap it in an optional rather than just using the nullptr state. The repeated calls to kernels_.at() were not getting optimized away, so just use the perfectly good iterator find() already gave us.
ghstack-source-id: 138304429

Test Plan: CI

Reviewed By: bdhirsh

Differential Revision: D30875748

fbshipit-source-id: 9cbb875715b7a582380c7402155fdbe21944dc85

avoid moving Argument in infer_schema (#64822)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64822

Turns out the suppressed lint message was trying to tell us something: we can construct our Argument in-place rather than create a temporary and move into the argument vector.
ghstack-source-id: 138304423

Test Plan: CI, profile op registration and observe reduced Argument move ctor and dtor costs

Reviewed By: smessmer

Differential Revision: D30860718

fbshipit-source-id: c8da45ab7e61b5df9fa1273301896309bca108b5

[PyTorch] Fix missing move in Argument ctor (#64821)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64821

Not moving adds excess refcounting overhead.
ghstack-source-id: 138304432

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D30860720

fbshipit-source-id: de695e5cdfb1fa314b53a8bcb291343ae4eb87a5

[PyTorch] shrink Argument (#64820)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64820

Putting boolean fields next to each other avoids wasting space for padding.
ghstack-source-id: 138304433

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D30860717

fbshipit-source-id: ad45c37574a7c857958978aad42fd1333c6b29ee

[PyTorch] Compare pointers before calling expensive Type comparison (#64784)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64784

See code comment for explanation.
ghstack-source-id: 138304431

Test Plan: Reduced overhead in findSchemaDifferences while profiling registration at startup in a case where I forced duplicates to be registered (by looping in RegisterDispatchKey.cpp).

Reviewed By: dhruvbird

Differential Revision: D30854036

fbshipit-source-id: 568733c3cf449697cdeb74cf57fed0926729fa68

CI: Consolidate Build and Test naming for better stats collection (#65232)

Summary:
All build pytorch steps should now be named "Build" and test steps named "Test" for workflows that test PyTorch on Linux and Windows.

I left the binary stuff alone as that build is different.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65232

Reviewed By: driazati, seemethere

Differential Revision: D31024232

Pulled By: janeyx99

fbshipit-source-id: 24b1a1e2b1b25aba70b7adc41603ec8fa4ce7dd6

Back out "Revert D30745960: [DDP] Remove SPMD from self.modules_buffers" (#64778)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64778

Original commit changeset: d3f3fb813c45
ghstack-source-id: 138326910

Test Plan: ci

Reviewed By: H-Huang

Differential Revision: D30849443

fbshipit-source-id: 15dab8a959a29d2e2fefac6ad52b8d8168eacc02