platform/upstream/pytorch.git
2 years ago[acc_ops] Add support for torch variants of squeeze and mul (#65037)
Jordan Fix [Thu, 16 Sep 2021 02:39:41 +0000 (19:39 -0700)]
[acc_ops] Add support for torch variants of squeeze and mul (#65037)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65037

att

Test Plan: updated unit tests

Reviewed By: yuhc

Differential Revision: D30952224

fbshipit-source-id: aaf75b27b4fc6c0436ba7bfcf324f761b900171b

2 years agoAdd NNC AOT Compiler executable (#63994)
Priya Ramani [Thu, 16 Sep 2021 02:12:47 +0000 (19:12 -0700)]
Add NNC AOT Compiler executable (#63994)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63994

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D30582149

Pulled By: priyaramani

fbshipit-source-id: 3bbf085428824c3cb308e006c18bb0a57f50fef6

2 years ago[quant] AO migration of the `_correct_bias.py`, `_equalize.py`, and `_learnable_fake_...
Zafar Takhirov [Thu, 16 Sep 2021 01:13:53 +0000 (18:13 -0700)]
[quant] AO migration of the `_correct_bias.py`, `_equalize.py`, and `_learnable_fake_quantize.py` (#64917)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64917

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates from torch.quantization to torch.ao.quantization the following files:
- `_correct_bias.py`
- `_equalize.py`
- `_learnable_fake_quantize.py`

**Note:** These file are migrated completely without any warning. The old location is thus silently deprecated.

Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestBiasCorrection`

Reviewed By: vkuzo

Differential Revision: D30898565

fbshipit-source-id: 1d39be2539dd1adfcb42e16bdcc0daf5c8316bbd

2 years ago.circleci/.jenkins: Remove 9.2 references in CI (#65024)
Jane Xu [Thu, 16 Sep 2021 01:03:19 +0000 (18:03 -0700)]
.circleci/.jenkins: Remove 9.2 references in CI (#65024)

Summary:
Removes 9.2 references in CI scripts and configs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65024

Reviewed By: driazati

Differential Revision: D30945948

Pulled By: janeyx99

fbshipit-source-id: 77890a00520c61500a934a90a74e3fcca84c09b5

2 years ago.github: GHA add retry for docker run in chown workspace step (#65104)
Jane Xu [Thu, 16 Sep 2021 01:00:24 +0000 (18:00 -0700)]
.github: GHA add retry for docker run in chown workspace step (#65104)

Summary:
This should help prevent further errors in GHA workflows during the Chown Workspace step such as https://github.com/pytorch/pytorch/runs/3614067053

I did not add retries to other steps with docker run

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65104

Reviewed By: seemethere

Differential Revision: D30976330

Pulled By: janeyx99

fbshipit-source-id: e403008548aa01c9a0a4ccebe56df0e889dd045c

2 years agoRevert D30752939: [pytorch][PR] nvfuser update
Eli Uriegas [Thu, 16 Sep 2021 00:37:10 +0000 (17:37 -0700)]
Revert D30752939: [pytorch][PR] nvfuser update

Test Plan: revert-hammer

Differential Revision:
D30752939 (https://github.com/pytorch/pytorch/commit/cfaecaf40bd6cabd3f4e0ef0d8c7252655349b61)

Original commit changeset: ce122e80f01b

fbshipit-source-id: 57685df8f9946032a06eff1de8a3d1498500d2d2

2 years ago[quant] AO migration of the `quant_types.py` (phase 1) (#64916)
Zafar Takhirov [Thu, 16 Sep 2021 00:24:09 +0000 (17:24 -0700)]
[quant] AO migration of the `quant_types.py` (phase 1) (#64916)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64916

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the quant_type.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization`

Reviewed By: vkuzo

Differential Revision: D30898422

fbshipit-source-id: 3e6126b49f0565a4136d6928cea9eb25368927ff

2 years ago[quant] AO migration of the `fuse_modules.py` (phase 1) (#64913)
Zafar Takhirov [Thu, 16 Sep 2021 00:24:09 +0000 (17:24 -0700)]
[quant] AO migration of the `fuse_modules.py` (phase 1) (#64913)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64913

AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the fuse_module.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: vkuzo

Differential Revision: D30882819

fbshipit-source-id: 1926ad6aa49136aceb5b625dcef4bfde3a2860d4

2 years ago[TensorExpr] Add a method for sanitizing Var and Buf names in Stmt. (#65010)
Mikhail Zolotukhin [Thu, 16 Sep 2021 00:13:48 +0000 (17:13 -0700)]
[TensorExpr] Add a method for sanitizing Var and Buf names in Stmt. (#65010)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65010

This pass ensures all names are legal and not-duplicated.

Fixes #52727.

Test Plan: Imported from OSS

Reviewed By: bertmaher, navahgar

Differential Revision: D30939717

Pulled By: ZolotukhinM

fbshipit-source-id: 7dbe7f937de41f22ad49137a5e067d698443ed63

2 years ago.github: Enable only specific workflows for canary (#65099)
Eli Uriegas [Wed, 15 Sep 2021 23:51:34 +0000 (16:51 -0700)]
.github: Enable only specific workflows for canary (#65099)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65099

Utilizes ciflow to enable only specific workflows for
pytorch/pytorch-canary to reduce noise on that specific repository

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D30973691

Pulled By: seemethere

fbshipit-source-id: 371765535b42a00bd72c2551c4faebf733d759f0

2 years agoci: Disable jit legacy on circleci, enable on gha (#65106)
Eli Uriegas [Wed, 15 Sep 2021 23:09:57 +0000 (16:09 -0700)]
ci: Disable jit legacy on circleci, enable on gha (#65106)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65106

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: malfet, janeyx99

Differential Revision: D30976186

Pulled By: seemethere

fbshipit-source-id: 8958f821eab9aa284496c57915894ed70f6b2fff

2 years agoCI: Upgrade windows 10.1 jobs to 10.2 (#65080)
Jane Xu [Wed, 15 Sep 2021 22:59:21 +0000 (15:59 -0700)]
CI: Upgrade windows 10.1 jobs to 10.2 (#65080)

Summary:
This is first 2 steps in the following task:
1. Upgrade 10.1 to 10.2
2. Migrate force_on_cpu job to GHA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65080

Test Plan: https://github.com/pytorch/pytorch/pull/65086

Reviewed By: seemethere

Differential Revision: D30973655

Pulled By: janeyx99

fbshipit-source-id: 67ab69ea99ff9e0336400a7173efef6d7daac07c

2 years agoReplace windows 10.2 smoke tests on PRs to be 11.3 (#65090)
Jane Xu [Wed, 15 Sep 2021 22:59:06 +0000 (15:59 -0700)]
Replace windows 10.2 smoke tests on PRs to be 11.3 (#65090)

Summary:
As we default to linux CUDA 11.3 on PRs, we should do the same thing with Windows (instead of having 10.2 be the default). This means that 10.2 will now be master only, and 11.3 windows smoke tests will run on every PR.

This also copies over the "run smoke tests only" config--removing that will be in a separate PR once there's more certain decision making.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65090

Reviewed By: seemethere

Differential Revision: D30968382

Pulled By: janeyx99

fbshipit-source-id: c73f9a2cc800b678909365c4d80627d29fc09f94

2 years agoRevert D30883290: [Static Runtime] Move MemoryPlanner out into memory_planner.cpp
Natalia Gimelshein [Wed, 15 Sep 2021 22:38:56 +0000 (15:38 -0700)]
Revert D30883290: [Static Runtime] Move MemoryPlanner out into memory_planner.cpp

Test Plan: revert-hammer

Differential Revision:
D30883290 (https://github.com/pytorch/pytorch/commit/0e11454d19e106ba6d5819c1147ca540cbce2943)

Original commit changeset: a37570f8d943

fbshipit-source-id: 65c57a2b0d2e3c7006765195dd519e8cf2472f72

2 years ago[quant] Removing hardcoded "torch.quantization.observer" for migration (#64981)
Charles David Hernandez [Wed, 15 Sep 2021 22:15:02 +0000 (15:15 -0700)]
[quant] Removing hardcoded "torch.quantization.observer" for migration (#64981)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64981

this would have cause errors when observer.py was moved to ao.

see: D30391189
ghstack-source-id: 138118430

Test Plan:
buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_dynamic_quant_multi_uses (quantization.jit.test_quantize_jit.TestQuantizeDynamicJitPasses)'

buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_save_load_state_dict_script (quantization.core.test_workflow_module.TestObserver)'

Reviewed By: supriyar

Differential Revision: D30432008

fbshipit-source-id: 754727a89c78f6ceada6f8ff92c304f3953f38fc

2 years ago[Caffe2][easy] Avoid spurious vector copy in TransposeOp (#64403)
Scott Wolchok [Wed, 15 Sep 2021 22:12:29 +0000 (15:12 -0700)]
[Caffe2][easy] Avoid spurious vector copy in TransposeOp (#64403)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64403

No need to copy to the heap here.
ghstack-source-id: 138033019

Test Plan: CI

Reviewed By: smacke

Differential Revision: D30712506

fbshipit-source-id: 5f4131b2569ebb1f5092262aaddb17215dea88f1

2 years ago[Caffe2] Don't pass vector by value in SqueezeOp (#64400)
Scott Wolchok [Wed, 15 Sep 2021 22:12:29 +0000 (15:12 -0700)]
[Caffe2] Don't pass vector by value in SqueezeOp (#64400)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64400

There appears to be no need to copy this vector.
ghstack-source-id: 138033020

Test Plan: CI

Reviewed By: smacke

Differential Revision: D30711014

fbshipit-source-id: b9fcf3d496a663b8478aa22d52b2c41f8f85e90f

2 years agoUse RDS for build size tracking (#64303)
David Riazati [Wed, 15 Sep 2021 21:46:11 +0000 (14:46 -0700)]
Use RDS for build size tracking (#64303)

Summary:
This adds 2 utilities: `register_rds_table` and `rds_write`. `register_rds_table` needs to be called once with the schema for the data that `rds_write` will write. These go to a lambda called `rds-proxy`, which will write to/read from the DB as necessary. This data can then be arbitrarily queried via `rds-proxy` (for use in CI) or on metrics.pytorch.org (for analysis).

It also hooks these up for build size tracking (which previously was not working on GHA)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64303

Reviewed By: mruberry

Differential Revision: D30941182

Pulled By: driazati

fbshipit-source-id: 12c5575ddd29902477464fc989ad76a052306b9b

2 years agonvfuser update (#63745)
jiej [Wed, 15 Sep 2021 21:40:18 +0000 (14:40 -0700)]
nvfuser update (#63745)

Summary:
Syncing nvfuser code base from devel branch, Listing a few of our development since last sync:

- Extends support to normalization and reduction kernels.
- Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation.
- profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes).

To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle.

internal updates are files located in:
1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda`
2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser`
3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h`

updates affecting integration:

1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/*`,
2. exposed a few more symbols `aten/src/ATen/core/*` used by codegen

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745

Reviewed By: saketh-are

Differential Revision: D30752939

Pulled By: malfet

fbshipit-source-id: ce122e80f01bcd3865f5bd3c4dfde660665fd84c

2 years agoAdd embedding shape analysis (#64323)
Elias Ellison [Wed, 15 Sep 2021 20:43:12 +0000 (13:43 -0700)]
Add embedding shape analysis (#64323)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64323

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30738145

Pulled By: eellison

fbshipit-source-id: be12408330d671bc65cf645aa2c20fafd954e6a9

2 years agoMax Pool with indices (#64121)
Elias Ellison [Wed, 15 Sep 2021 20:43:12 +0000 (13:43 -0700)]
Max Pool with indices (#64121)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64121

Add support for aten operators which return multiple outputs

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30738142

Pulled By: eellison

fbshipit-source-id: 0d7e51187bd5e3e9b43f0fdb5178366a97aec943

2 years agoAdd Maxpool to shape analysis / Opinfo (#63530)
Elias Ellison [Wed, 15 Sep 2021 20:43:12 +0000 (13:43 -0700)]
Add Maxpool to shape analysis / Opinfo (#63530)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63530

how to review: pretty much just check that the inputs generated are a good representation of the op semantics, that should be sufficient for correctness, and then you can also double check the op size semantics by going to https://codebrowser.bddppq.com/pytorch/pytorch/ typing in native::{op_name} and looking at the op implementation as a bonus if you want

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30738147

Pulled By: eellison

fbshipit-source-id: cf52339e572ee04e0d6167fd95d8a82d58ea7706

2 years ago[quant][refactor] Change the structure of the ao migration tests (#64912)
Zafar Takhirov [Wed, 15 Sep 2021 20:11:58 +0000 (13:11 -0700)]
[quant][refactor] Change the structure of the ao migration tests (#64912)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64912

The test naming was confusing and ambiguous. The file was changed to reflect the framework that is being migrated ("quantization" instead of "quantize"). Also, the common testing class was extracted out
ghstack-source-id: 138157450

Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization`

Reviewed By: vkuzo

Differential Revision: D30898214

fbshipit-source-id: 017f95995271d35bcdf6ff6a1b3974b837543e84

2 years agoAdd retries to ECR login step (#65013)
David Riazati [Wed, 15 Sep 2021 20:10:02 +0000 (13:10 -0700)]
Add retries to ECR login step (#65013)

Summary:
Switch retry mode from `legacy` to `standard` (https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-retries.html#cli-usage-retries-configure) and up the number of retries.

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65013

Reviewed By: zhouzhuojie, mruberry

Differential Revision: D30943292

Pulled By: driazati

fbshipit-source-id: 0a21e9b4eacbb77e6aca22f9256d94cd591b23cd

2 years agoTo add state dict and load_dict for Chained Scheduler (#65034)
Ilqar Ramazanli [Wed, 15 Sep 2021 20:07:59 +0000 (13:07 -0700)]
To add state dict and load_dict for Chained Scheduler (#65034)

Summary:
Adding state_dict() and load_state_dict() methods for Chained Scheduler

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65034

Reviewed By: prabhat00155, nateanl

Differential Revision: D30958207

Pulled By: datumbox

fbshipit-source-id: 1a587a330d34e0548e891a39f8fb5a3d251b71fa

2 years ago[ONNX] Enhance shape (two changes merged) (#64585)
BowenBao [Wed, 15 Sep 2021 19:56:33 +0000 (12:56 -0700)]
[ONNX] Enhance shape (two changes merged) (#64585)

Summary:
Enhanced shape inference by introducing typeReliableMap.
[ONNX] exporter changes for torch hub models (https://github.com/pytorch/pytorch/issues/62856)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64585

Reviewed By: ezyang

Differential Revision: D30870418

Pulled By: msaroufim

fbshipit-source-id: 87a294799cb87d649d1d13b6114a5cfbac9be15c

Co-authored-by: jiafatom <jiafa@microsoft.com>
2 years ago[Static Runtime] Move MemoryPlanner out into memory_planner.cpp (#65011)
Don Jang [Wed, 15 Sep 2021 19:50:22 +0000 (12:50 -0700)]
[Static Runtime] Move MemoryPlanner out into memory_planner.cpp (#65011)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65011

This change moves `MemoryPlanner` out of impl.cpp into memory_planner.cpp.

`MemoryPlanner` performs an independent sub-task of static analysis of a graph, and creating memory planning, and allocating/deallocating managed Tensors.

This change will reduce merge conflicts as I work on MemoryPlanner more actively for output Tensor support.

Test Plan: N/A

Reviewed By: mikeiovine

Differential Revision: D30883290

fbshipit-source-id: a37570f8d9430224a6987d2190bcf81cf875043d

2 years ago(torch.distributed.elastic) properly format traceback on error (#65041)
Kiuk Chung [Wed, 15 Sep 2021 19:48:28 +0000 (12:48 -0700)]
(torch.distributed.elastic) properly format traceback on error (#65041)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65041

Fixes a bug introduced in https://github.com/pytorch/pytorch/pull/64036 where the traceback of the error handler is printed out rather than the traceback of the actual exception.

Fixes https://github.com/pytorch/pytorch/issues/60910
Closes https://github.com/pytorch/pytorch/issues/60910

BEFORE (note that the `py_callstack` is NOT the traceback of the RuntimeError):
```
**************************************************************************************************************************************************************************************************************************************************
                                                                                                              run_script_path FAILED
==================================================================================================================================================================================================================================================
Root Cause:
[0]:
  time: 2021-09-14_22:01:06
  rank: 0 (local_rank: 0)
  exitcode: 1 (pid: 1092727)
  error_file: /tmp/torchelastic_aeyvjbpe/none_8zuih7tj/attempt_0/0/error.json
  msg:
    {
      "message": "RuntimeError: rasing error since --throw was specified",
      "extraInfo": {
        "py_callstack": [
          "  File \"<string>\", line 1, in <module>\n",
          "  File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/spawn.py\", line 116, in spawn_main\n    exitcode = _main(fd, parent_sentinel)\n",
          "  File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/spawn.py\", line 129, in _main\n    return self._bootstrap(parent_sentinel)\n",
          "  File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/process.py\", line 315, in _bootstrap\n    self.run()\n",
          "  File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/process.py\", line 108, in run\n    self._target(*self._args, **self._kwargs)\n",
          "  File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/multiprocessing/spawn.py\", line 59, in _wrap\n    fn(i, *args)\n",
          "  File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/api.py\", line 382, in _wrap\n    ret = record(fn)(*args_)\n",
          "  File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py\", line 373, in wrapper\n    error_handler.record_exception(e)\n",
          "  File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 86, in record_exception\n    _write_error(e, self._get_error_file_path())\n",
          "  File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 26, in _write_error\n    \"py_callstack\": traceback.format_stack(),\n"
        ],
        "timestamp": "1631682066"
      }
    }

==================================================================================================================================================================================================================================================
Other Failures:
  <NO_OTHER_FAILURES>
**************************************************************************************************************************************************************************************************************************************************
```

AFTER (note the traceback is the traceback of the RuntimeError):
```
********************************************************************************
                             run_script_path FAILED
================================================================================
Root Cause:
[0]:
  time: 2021-09-14_21:49:25
  rank: 0 (local_rank: 0)
  exitcode: 1 (pid: 1014681)
  error_file: /tmp/torchelastic_q0zods2c/none_qwmz5dgj/attempt_0/0/error.json
  msg: Traceback (most recent call last):
    File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper
      return f(*args, **kwargs)
    File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/run.py", line 671, in run_script_path
      runpy.run_path(sys.argv[0], run_name="__main__")
    File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 265, in run_path
      return _run_module_code(code, init_globals, run_name,
    File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 97, in _run_module_code
      _run_code(code, mod_globals, init_globals,
    File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 87, in _run_code
      exec(code, run_globals)
    File "/home/kiuk/tmp/test.py", line 55, in <module>
      main()
    File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper
      return f(*args, **kwargs)
    File "/home/kiuk/tmp/test.py", line 25, in main
      raise RuntimeError("rasing error since --throw was specified")
  RuntimeError: rasing error since --throw was specified

================================================================================
Other Failures:
  <NO_OTHER_FAILURES>
********************************************************************************
```

Test Plan:
(see summary for before and after)

`test.py` contents:
```
import argparse
import os
import sys

import torch
import torch.distributed as dist
import torch.nn.functional as F

from torch.distributed.elastic.multiprocessing.errors import record

def parse_args(argv):
    parser = argparse.ArgumentParser(description="test script")
    parser.add_argument("--init_method", type=str, default="env://")
    parser.add_argument("--backend", type=str, default="gloo")
    parser.add_argument("--throw", action="store_true", default=False)
    parser.add_argument("--exit", action="store_true", default=False)
    return parser.parse_args()

record
def main():
    args = parse_args(sys.argv[1:])

    if args.throw:
        raise RuntimeError("rasing error since --throw was specified")

    if args.exit:
        sys.exit(1)

    init_method=args.init_method
    backend=args.backend

    world_size = int(os.environ["WORLD_SIZE"])
    rank = int(os.environ["RANK"])

    print(f"initializing `{backend}` process group with rank={rank}, world_size={world_size} at {init_method}")

    dist.init_process_group(
        backend=backend,
        init_method=init_method,
        world_size=world_size,
        rank=rank)

    print(f"successfully initialized process group with rank={dist.get_rank()}, world_size={dist.get_world_size()}")

    t = F.one_hot(torch.tensor(rank), num_classes=world_size)
    dist.all_reduce(t)
    derived_world_size = torch.sum(t).item()
    if derived_world_size != world_size:
        raise RuntimeError(f"derived world size: {derived_world_size} != actual world size: {world_size}")
    else:
        print(f"sucessfully derived world size: {derived_world_size} (expected: {world_size}). Exiting")

if __name__ == "__main__":
    main()
```

run it as:

```
$ python -m torch.distributed.run --nproc_per_node 2 test.py --throw
```

Reviewed By: cbalioglu

Differential Revision: D30953731

fbshipit-source-id: bbea04c59c2aec58969cf44d8e3723d5f8abe8a8

2 years agoRemove `run_functional_checks` from `test_autograd` and create necessary OpInfos...
soulitzer [Wed, 15 Sep 2021 19:43:54 +0000 (12:43 -0700)]
Remove `run_functional_checks` from `test_autograd` and create necessary OpInfos (#64993)

Summary:
OpInfo tracker: https://github.com/pytorch/pytorch/issues/54261

 - Eliminate duplicated testing logic in test_autograd
 - Moved tests that rely on this testing logic to use OpInfos
   - `cat` already has OpInfo (no action needed)
   - Created OpInfo for `block_diag` and `broadcast_tensors`

Running into some FX errors. Added op to skip-list and created an issue here: https://github.com/pytorch/pytorch/issues/64997
Both `block_diag` and `broadcast_tensors` are variadic, so skipping `test_variant_consistency_jit` (from comments on other OpInfos, it looks like JIT does not support variadic tensors)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64993

Reviewed By: jbschlosser

Differential Revision: D30961736

Pulled By: soulitzer

fbshipit-source-id: e169305384a683acae1178c4e12e9e214a67226a

2 years agoDispatch.h: Avoid including ivalue (#64165)
Peter Bell [Wed, 15 Sep 2021 19:15:01 +0000 (12:15 -0700)]
Dispatch.h: Avoid including ivalue (#64165)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64165

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30728587

Pulled By: ezyang

fbshipit-source-id: d0d2e97491d9d5e2d2fc2d6e51420a4467c1bba4

2 years agoTo add state_dict and load_state_dict to SequentialLR (#65035)
Ilqar Ramazanli [Wed, 15 Sep 2021 18:55:53 +0000 (11:55 -0700)]
To add state_dict and load_state_dict to SequentialLR (#65035)

Summary:
To add state_dict() and load_state_dict() methods to SequentialLR

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65035

Reviewed By: prabhat00155, nateanl

Differential Revision: D30958204

Pulled By: datumbox

fbshipit-source-id: 65114e1b07146526ae2680233f5cd42b2534d67a

2 years ago[CircleCI] Disable pytorch_linux_xenial_cuda10_2 test jobs (#65071)
Nikita Shulga [Wed, 15 Sep 2021 18:52:06 +0000 (11:52 -0700)]
[CircleCI] Disable pytorch_linux_xenial_cuda10_2 test jobs (#65071)

Summary:
As all of them has been migrated to GHA:
- pytorch_linux_pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_distributed_test -> "linux-xenial-cuda11.3-py3.6-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (default, 1, 2,
linux.8xlarge.nvidia.gpu)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (default, 2, 2,
linux.8xlarge.nvidia.gpu)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (multigpu, 1, 1,
linux.16xlarge.nvidia.gpu)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_nogpu_NO_AVX2_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (nogpu_NO_AVX2, 1, 1, linux.2xlarge)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_nogpu_NO_AVX_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (nogpu_NO_AVX, 1, 1, linux.2xlarge)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_slow_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (slow, 1, 1, linux.8xlarge.nvidia.gpu)"

"pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build" is still a holdout due to slow gradchecks

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65071

Reviewed By: driazati, seemethere, janeyx99

Differential Revision: D30963413

Pulled By: malfet

fbshipit-source-id: d9a5188ce7eb2f60547b91b854a5db83af2b10e7

2 years agoStarter Task 1 (#64927)
Samuel Salas [Wed, 15 Sep 2021 18:49:28 +0000 (11:49 -0700)]
Starter Task 1 (#64927)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64927

Mypy error corrections

Test Plan: Corrected mypy errors to make code less prone to bugs by modifying types or adding lines that avoid special undesired cases e.g. asserting a variable to not None.

Reviewed By: wushirong

Differential Revision: D30901654

fbshipit-source-id: daae8692603b8b38203a98f673c455749c2fb855

2 years ago[ROCm] Update CI images for ROCm 4.3.1 (#64610)
Kyle Chen [Wed, 15 Sep 2021 18:48:33 +0000 (11:48 -0700)]
[ROCm] Update CI images for ROCm 4.3.1 (#64610)

Summary:
Signed-off-by: Kyle Chen <kylechen@amd.com>
reference:
https://github.com/pytorch/pytorch/issues/58017

jithunnair-amd
jeffdaily
arindamroy-eng

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64610

Reviewed By: seemethere

Differential Revision: D30964582

Pulled By: malfet

fbshipit-source-id: a8335d3d32d7f1557d3cf6cb055ad0f9c49ef7aa

2 years agoPort `all` and `any` full reductions to structured kernels. (#64642)
Yukio Siraichi [Wed, 15 Sep 2021 18:05:14 +0000 (11:05 -0700)]
Port `all` and `any` full reductions to structured kernels. (#64642)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64642

Tracking issue: #55070

This PR creates out overloads for both `all` and `any` kernels (full reduction overload),
and ports them to structured kernels.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30867354

Pulled By: ezyang

fbshipit-source-id: 46bccaf6c94a09ed77cc6c724d1183c82f801751

2 years ago[PyTorch] remove string_view::operator[] bounds check (#64670)
Scott Wolchok [Wed, 15 Sep 2021 16:55:02 +0000 (09:55 -0700)]
[PyTorch] remove string_view::operator[] bounds check (#64670)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64670

Bounds checking is not required for `std::string_view`, and the checking hoses performance for the following performance prototype diff.
ghstack-source-id: 138037531

Test Plan: CI

Reviewed By: ezyang, bhosmer

Differential Revision: D30747515

fbshipit-source-id: 1f4374415a82dfdccce76ea2c6885c13cb93d369

2 years ago[PyTorch][easy] Add cbegin/cend to SmallVector (#64682)
Scott Wolchok [Wed, 15 Sep 2021 16:55:02 +0000 (09:55 -0700)]
[PyTorch][easy] Add cbegin/cend to SmallVector (#64682)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64682

Looks like it was forked from llvm before cbegin and cend existed.
ghstack-source-id: 138036981

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D30814434

fbshipit-source-id: 9740fa8d3df1c90b77298a95ab9f1d0cf8c90320

2 years ago[PyTorch] Avoid extra std::vector in parseSchemaOrName (#64678)
Scott Wolchok [Wed, 15 Sep 2021 16:55:02 +0000 (09:55 -0700)]
[PyTorch] Avoid extra std::vector in parseSchemaOrName (#64678)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64678

We know we only want one declaration, so let's not create an excess std::vector (and thus a heap allocation) for that.
ghstack-source-id: 138036978

Test Plan: CI

Reviewed By: dhruvbird, tugsbayasgalan

Differential Revision: D30813785

fbshipit-source-id: c67e0100cdef5d894282939fb6d39a57309bc240

2 years ago[quant] Removing unnecessary import from torch/quantization/quantize.py (#64910)
Zafar Takhirov [Wed, 15 Sep 2021 16:37:36 +0000 (09:37 -0700)]
[quant] Removing unnecessary import from torch/quantization/quantize.py (#64910)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64910

This bled through from the original location. Removing it is not just refactoring, but also prevents potential recursive imports.
ghstack-source-id: 138112663

Test Plan: `buck test mode/dev //caffe2/test:quantization`

Reviewed By: vkuzo

Differential Revision: D30882924

fbshipit-source-id: 8652a334a5186c635761ea5e50f978d1f1078c12

2 years ago[Static Runtime] Check if outputs of a node do not overlap with each other (#63013)
Don Jang [Wed, 15 Sep 2021 15:35:57 +0000 (08:35 -0700)]
[Static Runtime] Check if outputs of a node do not overlap with each other (#63013)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63013

This change enhances the current memory overlapping check to include outputs: the enhancement enforces a constraint that all outputs of a node should NOT overlap with each other since they are supposed to be update by a node at the same time, holding the node's outputs.

This check will detect a problem like T97393697 immediately in debug mode.

Test Plan:
- Added a unittest `ProcessedNode.VerifyMemoryOverlapWithOverlappingOutputs`

- Ran `inline_cvr` on ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench with this diff and confirmed that the checking condition holds true during the run.

Reviewed By: hlu1

Differential Revision: D30211705

fbshipit-source-id: 994d8dace2422e2498e504eb61452a55739238c0

2 years agoForward fix SkipInfo missing mypy (#65063)
Jane Xu [Wed, 15 Sep 2021 15:28:00 +0000 (08:28 -0700)]
Forward fix SkipInfo missing mypy (#65063)

Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65063

Reviewed By: malfet

Differential Revision: D30961556

Pulled By: janeyx99

fbshipit-source-id: 9618e12ba873fb48fe5c846a48d4560ad521eb3e

2 years agoWhen test set_affinity, don't hardcode the CPU ID (#65042)
Hong Xu [Wed, 15 Sep 2021 15:09:49 +0000 (08:09 -0700)]
When test set_affinity, don't hardcode the CPU ID (#65042)

Summary:
The setaffinity test always fails when the number of CPUs is smaller
than 3. Changed the test to be dynamically based on the number of CPUs
of the system.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/65042

Reviewed By: jbschlosser

Differential Revision: D30960554

Pulled By: ejguan

fbshipit-source-id: 55ac12714b4b0964b48c3617b79a7a345d40ebce

2 years ago[DataPipe] Make TarArchiveReader and ZipArchiveReader accepts FileSream with attempt...
Kevin Tse [Wed, 15 Sep 2021 14:32:45 +0000 (07:32 -0700)]
[DataPipe] Make TarArchiveReader and ZipArchiveReader accepts FileSream with attempt to close and additional warning (#64788)

Summary:
ghstack is not working for the second commit so I'm manually creating this PR for now. Please only look at changes related to the second commit in this PR (there is a PR for the first commit).

This PR removes TarArchiveReader's dependency on FileLoader DataPipe, by allowing it to use a IterDataPipe of path names as input rather than a tuple of path name and a stream.

It also adds additional tests to ensure that the DataPipe is functioning properly when it is read multiple times or reset half way through reading.

The whole stack fixes https://github.com/pytorch/pytorch/issues/64281 - issues related to unclosed buffer stream.

Stack:
* __->__ https://github.com/pytorch/pytorch/issues/64788
* https://github.com/pytorch/pytorch/issues/64786

cc VitalyFedyunin ejguan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64788

Reviewed By: jbschlosser, ejguan

Differential Revision: D30901176

Pulled By: NivekT

fbshipit-source-id: 59746a8d0144fc6d3ce0feb2d76445b82e6d414e

2 years agoadd `OpInfo` for `torch.nn.functional.dropout` (#62315)
Philip Meier [Wed, 15 Sep 2021 14:16:29 +0000 (07:16 -0700)]
add `OpInfo` for `torch.nn.functional.dropout` (#62315)

Summary:
Addresses facebookresearch/functorch#78.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62315

Reviewed By: mruberry

Differential Revision: D30932765

Pulled By: zou3519

fbshipit-source-id: 481c67b59a966b4d640973d252b3e392d8db728e

2 years ago[dnnlowp] reduce num of test cases to avoid time out (#64935)
Jongsoo Park [Wed, 15 Sep 2021 04:30:45 +0000 (21:30 -0700)]
[dnnlowp] reduce num of test cases to avoid time out (#64935)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64935

As title

Test Plan: CI

Reviewed By: dskhudia

Differential Revision: D30889157

fbshipit-source-id: 316c808806b084bd2e44c56e1cdb61adf2369a9d

2 years agoGeneric test parametrization functionality (#60753)
Joel Schlosser [Wed, 15 Sep 2021 02:51:32 +0000 (19:51 -0700)]
Generic test parametrization functionality (#60753)

Summary:
This PR plays around with implementation & usage of a `parametrize` decorator for test parametrization similar to `pytest.mark.parametrize`, based on previous work introducing a `_TestParametrizer` class. It works with the internal `DeviceTest` hierarchy & composes with `dtype`, `skip*`, and other decorators. Basic usage is demonstrated in `test/test_blah.py`:

```python
import unittest
from itertools import product
from torch.testing._internal.common_device_type import (
    instantiate_device_type_tests, deviceCountAtLeast, ops)
from torch.testing._internal.common_methods_invocations import op_db
from torch.testing._internal.common_utils import (
    TestCase, run_tests, parametrize, instantiate_parametrized_tests, subtest)

class TestBlah(TestCase):
    parametrize("x", range(5))
    def test_default_names(self, x):
        print('Passed in:', x)

    # Use default names but add an expected failure.
    parametrize("x", [subtest(0, decorators=[unittest.expectedFailure]),
                       *range(1, 5)])
    def test_default_names_expected_failure(self, x):
        if x == 0:
            raise RuntimeError('Boom')
        print('Passed in:', x)

    parametrize("bias", [False, True], name_fn=lambda b: 'bias' if b else 'no_bias')
    def test_custom_names(self, bias):
        print('Passed in:', bias)

    parametrize("bias", [subtest(True, name='bias'),
                          subtest(False, name='no_bias')])
    def test_custom_names_alternate(self, bias):
        print('Passed in:', bias)

    parametrize("x,y", [(1, 2), (1, 3), (1, 4)])
    def test_two_things_default_names(self, x, y):
        print('Passed in:', x, y)

    parametrize("x", [1, 2, 3])
    parametrize("y", [4, 5, 6])
    def test_two_things_composition(self, x, y):
        print('Passed in:', x, y)

    parametrize("x", [subtest(0, decorators=[unittest.expectedFailure]),
                       *range(1, 3)])
    parametrize("y", [4, 5, subtest(6, decorators=[unittest.expectedFailure])])
    def test_two_things_composition_expected_failure(self, x, y):
        if x == 0 or y == 6:
            raise RuntimeError('Boom')
        print('Passed in:', x, y)

    parametrize("x", [1, 2])
    parametrize("y", [3, 4])
    parametrize("z", [5, 6])
    def test_three_things_composition(self, x, y, z):
        print('Passed in:', x, y, z)

    parametrize("x", [1, 2], name_fn=str)
    parametrize("y", [3, 4], name_fn=str)
    parametrize("z", [5, 6], name_fn=str)
    def test_three_things_composition_custom_names(self, x, y, z):
        print('Passed in:', x, y, z)

    parametrize("x,y", product(range(2), range(3)))
    def test_two_things_product(self, x, y):
        print('Passed in:', x, y)

    parametrize("x,y", [subtest((1, 2), name='double'),
                         subtest((1, 3), name='triple'),
                         subtest((1, 4), name='quadruple')])
    def test_two_things_custom_names(self, x, y):
        print('Passed in:', x, y)

    parametrize("x,y", [(1, 2), (1, 3), (1, 4)], name_fn=lambda x, y: '{}_{}'.format(x, y))
    def test_two_things_custom_names_alternate(self, x, y):
        print('Passed in:', x, y)

class TestDeviceBlah(TestCase):
    parametrize("x", range(10))
    def test_default_names(self, device, x):
        print('Passed in:', device, x)

    parametrize("x,y", [(1, 2), (3, 4), (5, 6)])
    def test_two_things(self, device, x, y):
        print('Passed in:', device, x, y)

    deviceCountAtLeast(1)
    def test_multiple_devices(self, devices):
        print('Passed in:', devices)

    ops(op_db)
    parametrize("flag", [False, True], lambda f: 'flag_enabled' if f else 'flag_disabled')
    def test_op_parametrized(self, device, dtype, op, flag):
        print('Passed in:', device, dtype, op, flag)

instantiate_parametrized_tests(TestBlah)
instantiate_device_type_tests(TestDeviceBlah, globals())

if __name__ == '__main__':
    run_tests()
```

Generated tests:
```
TestBlah.test_custom_names_alternate_bias
TestBlah.test_custom_names_alternate_no_bias
TestBlah.test_custom_names_bias
TestBlah.test_custom_names_no_bias
TestBlah.test_default_names_expected_failure_x_0
TestBlah.test_default_names_expected_failure_x_1
TestBlah.test_default_names_expected_failure_x_2
TestBlah.test_default_names_expected_failure_x_3
TestBlah.test_default_names_expected_failure_x_4
TestBlah.test_default_names_x_0
TestBlah.test_default_names_x_1
TestBlah.test_default_names_x_2
TestBlah.test_default_names_x_3
TestBlah.test_default_names_x_4
TestBlah.test_three_things_composition_custom_names_1_3_5
TestBlah.test_three_things_composition_custom_names_1_3_6
TestBlah.test_three_things_composition_custom_names_1_4_5
TestBlah.test_three_things_composition_custom_names_1_4_6
TestBlah.test_three_things_composition_custom_names_2_3_5
TestBlah.test_three_things_composition_custom_names_2_3_6
TestBlah.test_three_things_composition_custom_names_2_4_5
TestBlah.test_three_things_composition_custom_names_2_4_6
TestBlah.test_three_things_composition_x_1_y_3_z_5
TestBlah.test_three_things_composition_x_1_y_3_z_6
TestBlah.test_three_things_composition_x_1_y_4_z_5
TestBlah.test_three_things_composition_x_1_y_4_z_6
TestBlah.test_three_things_composition_x_2_y_3_z_5
TestBlah.test_three_things_composition_x_2_y_3_z_6
TestBlah.test_three_things_composition_x_2_y_4_z_5
TestBlah.test_three_things_composition_x_2_y_4_z_6
TestBlah.test_two_things_composition_expected_failure_x_0_y_4
TestBlah.test_two_things_composition_expected_failure_x_0_y_5
TestBlah.test_two_things_composition_expected_failure_x_0_y_6
TestBlah.test_two_things_composition_expected_failure_x_1_y_4
TestBlah.test_two_things_composition_expected_failure_x_1_y_5
TestBlah.test_two_things_composition_expected_failure_x_1_y_6
TestBlah.test_two_things_composition_expected_failure_x_2_y_4
TestBlah.test_two_things_composition_expected_failure_x_2_y_5
TestBlah.test_two_things_composition_expected_failure_x_2_y_6
TestBlah.test_two_things_composition_x_1_y_4
TestBlah.test_two_things_composition_x_1_y_5
TestBlah.test_two_things_composition_x_1_y_6
TestBlah.test_two_things_composition_x_2_y_4
TestBlah.test_two_things_composition_x_2_y_5
TestBlah.test_two_things_composition_x_2_y_6
TestBlah.test_two_things_composition_x_3_y_4
TestBlah.test_two_things_composition_x_3_y_5
TestBlah.test_two_things_composition_x_3_y_6
TestBlah.test_two_things_custom_names_alternate_1_2
TestBlah.test_two_things_custom_names_alternate_1_3
TestBlah.test_two_things_custom_names_alternate_1_4
TestBlah.test_two_things_custom_names_double
TestBlah.test_two_things_custom_names_quadruple
TestBlah.test_two_things_custom_names_triple
TestBlah.test_two_things_default_names_x_1_y_2
TestBlah.test_two_things_default_names_x_1_y_3
TestBlah.test_two_things_default_names_x_1_y_4
TestBlah.test_two_things_product_x_0_y_0
TestBlah.test_two_things_product_x_0_y_1
TestBlah.test_two_things_product_x_0_y_2
TestBlah.test_two_things_product_x_1_y_0
TestBlah.test_two_things_product_x_1_y_1
TestBlah.test_two_things_product_x_1_y_2
TestDeviceBlahCPU.test_default_names_x_0_cpu
TestDeviceBlahCPU.test_default_names_x_1_cpu
TestDeviceBlahCPU.test_default_names_x_2_cpu
TestDeviceBlahCPU.test_default_names_x_3_cpu
TestDeviceBlahCPU.test_default_names_x_4_cpu
TestDeviceBlahCPU.test_default_names_x_5_cpu
TestDeviceBlahCPU.test_default_names_x_6_cpu
TestDeviceBlahCPU.test_default_names_x_7_cpu
TestDeviceBlahCPU.test_default_names_x_8_cpu
TestDeviceBlahCPU.test_default_names_x_9_cpu
TestDeviceBlahCPU.test_multiple_devices_cpu
TestDeviceBlahCPU.test_op_parametrized_<opname>_<variant>_cpu_uint8_flag_enabled_cpu
TestDeviceBlahCPU.test_two_things_x_1_y_2_cpu
TestDeviceBlahCPU.test_two_things_x_3_y_4_cpu
TestDeviceBlahCPU.test_two_things_x_5_y_6_cpu
TestDeviceBlahMETA.test_default_names_x_0_meta
TestDeviceBlahMETA.test_default_names_x_1_meta
TestDeviceBlahMETA.test_default_names_x_2_meta
TestDeviceBlahMETA.test_default_names_x_3_meta
TestDeviceBlahMETA.test_default_names_x_4_meta
TestDeviceBlahMETA.test_default_names_x_5_meta
TestDeviceBlahMETA.test_default_names_x_6_meta
TestDeviceBlahMETA.test_default_names_x_7_meta
TestDeviceBlahMETA.test_default_names_x_8_meta
TestDeviceBlahMETA.test_default_names_x_9_meta
TestDeviceBlahMETA.test_multiple_devices_meta
TestDeviceBlahMETA.test_op_parametrized_<opname>_<variant>_meta_uint8_flag_enabled_meta
TestDeviceBlahMETA.test_two_things_x_1_y_2_meta
TestDeviceBlahMETA.test_two_things_x_3_y_4_meta
TestDeviceBlahMETA.test_two_things_x_5_y_6_meta
```

Caveats:
* `parametrize` decorators cannot be "stacked" yet; each one overwrites the previous. This will change to either:
  * Allow stacking of multiple decorators
  * Error out with a nice error message if multiple decorators are specified

The PR introduces `instantiate_parametrized_tests()` in addition to `instantiate_device_type_tests()`. The former should be used for non-device-specific tests, and the latter should be used for device-specific tests, as usual. Both of these support the `parametrize` decorator. Only the latter supports the `ops` decorator (no change here- this was already the case).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60753

Reviewed By: saketh-are

Differential Revision: D30606615

Pulled By: jbschlosser

fbshipit-source-id: a34f36d643f68a6e221f419d9bb3e1ae1d84dd65

2 years ago[vulkan] Use volk to load vulkan libraries and fix Windows build errors (#64988)
Sangbaek Park [Wed, 15 Sep 2021 02:33:27 +0000 (19:33 -0700)]
[vulkan] Use volk to load vulkan libraries and fix Windows build errors (#64988)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64988

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64968

The current wrapper (provided by [Vulkan-Tools](https://github.com/KhronosGroup/Vulkan-Tools/tree/master/common)) can't handle dynamically loading Vulkan on Windows/Mac. Therefore, we can bring in [volk](https://github.com/zeux/volk) to load the vulkan libraries for other platforms.

1. Use `volk` with `link_style="static"` only if Windows. Use `vulkan_wrapper` for all others (temporary solution)
2. Make DotSlash work on Windows when resolving glslc path

Test Plan:
For Android:

```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
cd -
```

For Mac:
```
buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac
./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64
```

On Local OSS repo with `pr/64988` branch:

The build and test are fine. Note that `VulkanAPITest.log_softmax()` has been broken for the past month. Ivan will take a look at when he is available.

Build: `BUILD_TEST=1 USE_VULKAN=1 USE_VULKAN_SHADERC_RUNTIME=1 USE_VULKAN_WRAPPER=0 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install`

Test: `$PYTORCH_ROOT/build/bin/vulkan_api_test /data/local/tmp`

```
Running main() from ../third_party/googletest/googletest/src/gtest_main.cc
[==========] Running 69 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 69 tests from VulkanAPITest
[ RUN      ] VulkanAPITest.adaptive_avg_pool2d
[       OK ] VulkanAPITest.adaptive_avg_pool2d (228 ms)
[ RUN      ] VulkanAPITest.add
[       OK ] VulkanAPITest.add (51 ms)
[ RUN      ] VulkanAPITest.add_broadcast0
[       OK ] VulkanAPITest.add_broadcast0 (13 ms)
[ RUN      ] VulkanAPITest.add_broadcast1
[       OK ] VulkanAPITest.add_broadcast1 (9 ms)
[ RUN      ] VulkanAPITest.add_broadcast2
[       OK ] VulkanAPITest.add_broadcast2 (9 ms)
[ RUN      ] VulkanAPITest.add_
[       OK ] VulkanAPITest.add_ (60 ms)
[ RUN      ] VulkanAPITest.add_broadcast0_
[       OK ] VulkanAPITest.add_broadcast0_ (10 ms)
[ RUN      ] VulkanAPITest.add_broadcast1_
[       OK ] VulkanAPITest.add_broadcast1_ (1 ms)
[ RUN      ] VulkanAPITest.add_scalar
[       OK ] VulkanAPITest.add_scalar (24 ms)
[ RUN      ] VulkanAPITest.add_scalar_
[       OK ] VulkanAPITest.add_scalar_ (8 ms)
[ RUN      ] VulkanAPITest.addmm
[       OK ] VulkanAPITest.addmm (22 ms)
[ RUN      ] VulkanAPITest.addmm_expand
[       OK ] VulkanAPITest.addmm_expand (12 ms)
[ RUN      ] VulkanAPITest.avg_pool2d
[       OK ] VulkanAPITest.avg_pool2d (9 ms)
[ RUN      ] VulkanAPITest.clamp
[       OK ] VulkanAPITest.clamp (92 ms)
[ RUN      ] VulkanAPITest.clamp_
[       OK ] VulkanAPITest.clamp_ (60 ms)
[ RUN      ] VulkanAPITest.conv2d
[       OK ] VulkanAPITest.conv2d (15 ms)
[ RUN      ] VulkanAPITest.conv2d_dw
[       OK ] VulkanAPITest.conv2d_dw (15 ms)
[ RUN      ] VulkanAPITest.conv2d_pw
[       OK ] VulkanAPITest.conv2d_pw (34 ms)
[ RUN      ] VulkanAPITest.conv2d_winograd
[       OK ] VulkanAPITest.conv2d_winograd (10 ms)
[ RUN      ] VulkanAPITest.copy
[       OK ] VulkanAPITest.copy (1 ms)
[ RUN      ] VulkanAPITest.div
[       OK ] VulkanAPITest.div (32 ms)
[ RUN      ] VulkanAPITest.div_broadcast0
[       OK ] VulkanAPITest.div_broadcast0 (11 ms)
[ RUN      ] VulkanAPITest.div_broadcast1
[       OK ] VulkanAPITest.div_broadcast1 (9 ms)
[ RUN      ] VulkanAPITest.div_broadcast2
[       OK ] VulkanAPITest.div_broadcast2 (7 ms)
[ RUN      ] VulkanAPITest.div_
[       OK ] VulkanAPITest.div_ (46 ms)
[ RUN      ] VulkanAPITest.div_broadcast0_
[       OK ] VulkanAPITest.div_broadcast0_ (9 ms)
[ RUN      ] VulkanAPITest.div_broadcast1_
[       OK ] VulkanAPITest.div_broadcast1_ (2 ms)
[ RUN      ] VulkanAPITest.div_scalar
[       OK ] VulkanAPITest.div_scalar (95 ms)
[ RUN      ] VulkanAPITest.div_scalar_
[       OK ] VulkanAPITest.div_scalar_ (18 ms)
[ RUN      ] VulkanAPITest.empty
[       OK ] VulkanAPITest.empty (0 ms)
[ RUN      ] VulkanAPITest.hardsigmoid
[       OK ] VulkanAPITest.hardsigmoid (76 ms)
[ RUN      ] VulkanAPITest.hardsigmoid_
[       OK ] VulkanAPITest.hardsigmoid_ (80 ms)
[ RUN      ] VulkanAPITest.hardshrink
[       OK ] VulkanAPITest.hardshrink (630 ms)
[ RUN      ] VulkanAPITest.hardshrink_
[       OK ] VulkanAPITest.hardshrink_ (573 ms)
[ RUN      ] VulkanAPITest.leaky_relu
[       OK ] VulkanAPITest.leaky_relu (271 ms)
[ RUN      ] VulkanAPITest.leaky_relu_
[       OK ] VulkanAPITest.leaky_relu_ (254 ms)
[ RUN      ] VulkanAPITest.hardswish
[       OK ] VulkanAPITest.hardswish (83 ms)
[ RUN      ] VulkanAPITest.hardswish_
[       OK ] VulkanAPITest.hardswish_ (72 ms)
[ RUN      ] VulkanAPITest.max_pool2d
[       OK ] VulkanAPITest.max_pool2d (16 ms)
[ RUN      ] VulkanAPITest.mean
[       OK ] VulkanAPITest.mean (17 ms)
[ RUN      ] VulkanAPITest.mean2d
[       OK ] VulkanAPITest.mean2d (20 ms)
[ RUN      ] VulkanAPITest.mm
[       OK ] VulkanAPITest.mm (12 ms)
[ RUN      ] VulkanAPITest.mul
[       OK ] VulkanAPITest.mul (28 ms)
[ RUN      ] VulkanAPITest.mul_broadcast0
[       OK ] VulkanAPITest.mul_broadcast0 (9 ms)
[ RUN      ] VulkanAPITest.mul_broadcast1
[       OK ] VulkanAPITest.mul_broadcast1 (9 ms)
[ RUN      ] VulkanAPITest.mul_broadcast2
[       OK ] VulkanAPITest.mul_broadcast2 (9 ms)
[ RUN      ] VulkanAPITest.mul_
[       OK ] VulkanAPITest.mul_ (43 ms)
[ RUN      ] VulkanAPITest.mul_broadcast0_
[       OK ] VulkanAPITest.mul_broadcast0_ (8 ms)
[ RUN      ] VulkanAPITest.mul_broadcast1_
[       OK ] VulkanAPITest.mul_broadcast1_ (1 ms)
[ RUN      ] VulkanAPITest.mul_scalar
[       OK ] VulkanAPITest.mul_scalar (64 ms)
[ RUN      ] VulkanAPITest.mul_scalar_
[       OK ] VulkanAPITest.mul_scalar_ (17 ms)
[ RUN      ] VulkanAPITest.reflection_pad2d
[       OK ] VulkanAPITest.reflection_pad2d (7 ms)
[ RUN      ] VulkanAPITest.reshape
[       OK ] VulkanAPITest.reshape (73 ms)
[ RUN      ] VulkanAPITest.reshape_
[       OK ] VulkanAPITest.reshape_ (41 ms)
[ RUN      ] VulkanAPITest.sigmoid
[       OK ] VulkanAPITest.sigmoid (81 ms)
[ RUN      ] VulkanAPITest.sigmoid_
[       OK ] VulkanAPITest.sigmoid_ (68 ms)
[ RUN      ] VulkanAPITest.softmax
[       OK ] VulkanAPITest.softmax (28 ms)
[ RUN      ] VulkanAPITest.log_softmax
Max Diff allowed: 5.87862e-05
../aten/src/ATen/test/vulkan_api_test.cpp:1470: Failure
Value of: check
  Actual: false
Expected: true
[  FAILED  ] VulkanAPITest.log_softmax (19 ms)
[ RUN      ] VulkanAPITest.tanh
[       OK ] VulkanAPITest.tanh (63 ms)
[ RUN      ] VulkanAPITest.tanh_
[       OK ] VulkanAPITest.tanh_ (68 ms)
[ RUN      ] VulkanAPITest.sub
[       OK ] VulkanAPITest.sub (28 ms)
[ RUN      ] VulkanAPITest.sub_broadcast0
[       OK ] VulkanAPITest.sub_broadcast0 (9 ms)
[ RUN      ] VulkanAPITest.sub_broadcast1
[       OK ] VulkanAPITest.sub_broadcast1 (9 ms)
[ RUN      ] VulkanAPITest.sub_broadcast2
[       OK ] VulkanAPITest.sub_broadcast2 (8 ms)
[ RUN      ] VulkanAPITest.sub_
[       OK ] VulkanAPITest.sub_ (43 ms)
[ RUN      ] VulkanAPITest.sub_broadcast0_
[       OK ] VulkanAPITest.sub_broadcast0_ (10 ms)
[ RUN      ] VulkanAPITest.sub_broadcast1_
[       OK ] VulkanAPITest.sub_broadcast1_ (2 ms)
[ RUN      ] VulkanAPITest.upsample_nearest2d
[       OK ] VulkanAPITest.upsample_nearest2d (5 ms)
[ RUN      ] VulkanAPITest.mobilenetv2
[       OK ] VulkanAPITest.mobilenetv2 (82 ms)
[----------] 69 tests from VulkanAPITest (3885 ms total)

[----------] Global test environment tear-down
[==========] 69 tests from 1 test suite ran. (3885 ms total)
[  PASSED  ] 68 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] VulkanAPITest.log_softmax

 1 FAILED TEST
```

Differential Revision: D30925995

fbshipit-source-id: 1b1b7f7f22090064424a5379d2f0559d0da7846a

2 years ago[fix] don't expose unique_dim in torch (#63080)
Kshiteej K [Wed, 15 Sep 2021 01:17:53 +0000 (18:17 -0700)]
[fix] don't expose unique_dim in torch (#63080)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/62793

This is mostly a quick fix. I think the more correct fix could be updating `unique_dim` to `_unique_dim` which could be BC-breaking for C++ users (� maybe). Maybe something else I am missing.

~~Not sure how to add a test for it.~~ Have tested it locally.

We can add a test like following. Tested this locally, it fails currently but passes with the fix.
```python
        def test_wildcard_import(self):
            exec('from torch import *')

```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63080

Reviewed By: gchanan

Differential Revision: D30738711

Pulled By: zou3519

fbshipit-source-id: b86d0190e45ba0b49fd2cffdcfd2e3a75cc2a35e

2 years ago[CUDA graphs] moves memory sharing intro paragraph (#64996)
Michael Carilli [Wed, 15 Sep 2021 00:52:27 +0000 (17:52 -0700)]
[CUDA graphs] moves memory sharing intro paragraph (#64996)

Summary:
Puts memory sharing intro under Sharing memory... header, where it should have been all along.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64996

Reviewed By: mruberry

Differential Revision: D30948619

Pulled By: ngimel

fbshipit-source-id: 5d9dd267b34e9d3fc499d4738377b58a22da1dc2

2 years agoRevert D30558877: Ported std/var to ReductionOpInfo and minimum/maximum to BinaryUfun...
Supriya Rao [Wed, 15 Sep 2021 00:32:15 +0000 (17:32 -0700)]
Revert D30558877: Ported std/var to ReductionOpInfo and minimum/maximum to BinaryUfuncInfo

Test Plan: revert-hammer

Differential Revision:
D30558877 (https://github.com/pytorch/pytorch/commit/382e008fbf5cc91c283fc902bb0dd6cb7d4bbfda)

Original commit changeset: 3e62ff24a935

fbshipit-source-id: 3b9f03c1f43c6d5f2738ed139d0236f2ded78dbf

2 years ago[Model Averaging] Simplify PostLocalSGD Optimizer API (#64885)
Yi Wang [Tue, 14 Sep 2021 23:35:32 +0000 (16:35 -0700)]
[Model Averaging] Simplify PostLocalSGD Optimizer API (#64885)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64885

1) The constructor accepts a local optimizer instance instead of the inputs of local optimizer constructor and the class type.
2) The parameters are read from local optimizer's `param_groups` instead of a separate input.

Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id: 137865867

Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity

Reviewed By: rohan-varma

Differential Revision: D30888794

fbshipit-source-id: 21261b480f6bbb9b2333426020e3f350da3f73c2

2 years agoPorted std/var to ReductionOpInfo and minimum/maximum to BinaryUfuncInfo (#63978)
Heitor Schueroff [Tue, 14 Sep 2021 23:16:47 +0000 (16:16 -0700)]
Ported std/var to ReductionOpInfo and minimum/maximum to BinaryUfuncInfo (#63978)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63978

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D30558877

Pulled By: heitorschueroff

fbshipit-source-id: 3e62ff24a935784fc93a76a0f46a1deb060ba680

2 years ago[DataPipe] Improve Mapper to accept input/output index when apply fn (#64951)
Erjia Guan [Tue, 14 Sep 2021 22:44:57 +0000 (15:44 -0700)]
[DataPipe] Improve Mapper to accept input/output index when apply fn (#64951)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64951

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D30910035

Pulled By: ejguan

fbshipit-source-id: d687fe10939920a3617a60552fe743e8526438a0

2 years ago[quant][tensorrt] Add tensorrt backend config (#64623)
Jerry Zhang [Tue, 14 Sep 2021 22:26:03 +0000 (15:26 -0700)]
[quant][tensorrt] Add tensorrt backend config (#64623)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64623

The config api will change, but we'll add configs gradually for TensorRT to unblock experimentation

Test Plan:
python torch/fx/experimental/fx2trt/example/unittests.py

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D30800474

fbshipit-source-id: 3c4640de1205a0f19b62943ab84f386d80394ec2

2 years ago[PyTorch] Add c10::hash<c10::ArrayRef<T>> (#64277)
Scott Wolchok [Tue, 14 Sep 2021 21:18:55 +0000 (14:18 -0700)]
[PyTorch] Add c10::hash<c10::ArrayRef<T>> (#64277)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64277

Just moved the vector implementation to ArrayRef and re-implemented the former using the latter.
ghstack-source-id: 137978947

Test Plan: existing CI

Reviewed By: dhruvbird

Differential Revision: D30647666

fbshipit-source-id: c0f4f06c348d36882ec0db802be44d8c7749562f

2 years ago[PyTorch] Add OpCode cache in ByteCodeDeserializer (#64110)
Scott Wolchok [Tue, 14 Sep 2021 21:18:55 +0000 (14:18 -0700)]
[PyTorch] Add OpCode cache in ByteCodeDeserializer (#64110)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64110

As the code comment says, we can exploit pickler string interning to accelerate OpCode parsing. No more strcmp!
ghstack-source-id: 137978946

Test Plan:
Pixel 3 before: https://www.internalfb.com/intern/aibench/details/591414145082422
Pixel 3 after: https://www.internalfb.com/intern/aibench/details/484557404703261

new mean is 292 ms, down from 302 ms.

Reviewed By: dhruvbird

Differential Revision: D30615052

fbshipit-source-id: 9707625e778388a7920ab72704d71ad57ddaac17

2 years ago[PyTorch] Remove implicit conversion from Tuple to vector reference (#63993)
Scott Wolchok [Tue, 14 Sep 2021 21:18:55 +0000 (14:18 -0700)]
[PyTorch] Remove implicit conversion from Tuple to vector reference (#63993)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63993

This seems to be unused, and it's pretty scary.
ghstack-source-id: 137978949

Test Plan: CI

Reviewed By: lw

Differential Revision: D30560441

fbshipit-source-id: 08b7ce971fd1e2dbeddbf37b02413fef513b4753

2 years ago[PyTorch] Fix SourceRangeDeserializer vector copy (#64031)
Scott Wolchok [Tue, 14 Sep 2021 21:18:55 +0000 (14:18 -0700)]
[PyTorch] Fix SourceRangeDeserializer vector copy (#64031)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64031

More copies of tuple elements.
ghstack-source-id: 137978948

Test Plan:
Pixel 3 before: https://our.intern.facebook.com/intern/aibench/details/724509739115867
Pixel 3 after: https://our.intern.facebook.com/intern/aibench/details/232361457767293

Top-line number doesn't seem to have moved, but we can see that the vector copy disappeared in the flame graph.

Reviewed By: raziel

Differential Revision: D30559545

fbshipit-source-id: e5343abae96b8e80e0ccec482ad316884ae231ea

2 years ago[fx2trt] fix elementwise op converter with one operand being a literal and has differ...
Shiyan Deng [Tue, 14 Sep 2021 19:25:45 +0000 (12:25 -0700)]
[fx2trt] fix elementwise op converter with one operand being a literal and has different type (#65004)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65004

If we have some code like `torch.add(x, 1)` and x is a float tensor then in conversion things would falling apart because currently we will add a constant layer of int32 dtype for `1` but we actually need float dtype.

This diff adds an arg to `get_trt_tensor` which specify the dtype of the constant layer we would created.

Also, start to add doc string for functions.

Reviewed By: yinghai

Differential Revision: D30852156

fbshipit-source-id: 650ce72d2794093a4616e640ea503dcc1c6b2bc4

2 years ago[PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level...
Salil Desai [Tue, 14 Sep 2021 19:09:45 +0000 (12:09 -0700)]
[PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level [2/2] (#64269)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64269

Revert changes in D29826210 (https://github.com/pytorch/pytorch/commit/693d8f2f0767413bb995b895fccad87dfd4f05a7) (we don't need operator lambda caching since there aren't duplicate operators anymore)

This diff stack results in an additional approx 12% speedup in model loading time (from 229ms to 200ms) when run against an 87MB speech model that jiatongzhou provided.
ghstack-source-id: 138014904

Test Plan:
**Speech Transducer v25 model (as in D29826210 (https://github.com/pytorch/pytorch/commit/693d8f2f0767413bb995b895fccad87dfd4f05a7))**

|| Before | After |
|Load Time|[229ms](https://www.internalfb.com/intern/aibench/details/160889436133243)|[200ms](https://www.internalfb.com/intern/aibench/details/837884532607514)|
|Save File Size|[86.23 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658544950)|[86.1 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658554403)|

The "after" flamegraph shows significantly less time is spent on ```append_operator``` than before.

Steps
- Check out desired commit in devserver (base branch or this diff)
- ```buck build bento/kernels:bento_kernel_pytorch```
- Use N1094068 with pytorch_local kernel to save model for lite interpreter
- Edit ```aibench/specifications/models/pytorch/speech_transducer/v25.json ``` to have new model location and md5
- ```buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote ```

**Test that saving a model with de-dup ops doesn't change its output**
https://www.internalfb.com/intern/anp/view/?id=1137434

Reviewed By: iseeyuan

Differential Revision: D30615710

fbshipit-source-id: bb4052f0f16eccab386585e94411056f94bce43c

2 years ago[PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level...
Salil Desai [Tue, 14 Sep 2021 19:09:45 +0000 (12:09 -0700)]
[PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level [1/2] (#64268)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64268

If the same pair of operator name and num inputs have been used to add an instruction to the operator table previously (and the operator's schema is not vararg), use the same index as that instruction rather than creating a new one.
ghstack-source-id: 138014905

Test Plan: Phabricator tests, and test performance changes in next diff

Reviewed By: iseeyuan, tugsbayasgalan

Differential Revision: D30615434

fbshipit-source-id: f442f557f12412693a73004ce44733ccef063b82

2 years ago.github: Add render test results step (#64937)
Eli Uriegas [Tue, 14 Sep 2021 18:20:51 +0000 (11:20 -0700)]
.github: Add render test results step (#64937)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64937

Adds CLI output for rendered test results to go alongside test exeuction, users should be able to quickly diagnose test failures like so:
![fdsfdsfdsfdsf](https://user-images.githubusercontent.com/1700823/133156245-ba939cbf-8aa2-47a7-b1fb-7cc876ca75c4.png)

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Differential Revision: D30917897

Pulled By: seemethere

fbshipit-source-id: f51ea499462e3cfd64496cb711b84a93971c91bd

2 years agoremove SkipInfo class (#64972)
Natalia Gimelshein [Tue, 14 Sep 2021 18:19:07 +0000 (11:19 -0700)]
remove SkipInfo class (#64972)

Summary:
per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64972

Reviewed By: mruberry

Differential Revision: D30924598

Pulled By: ngimel

fbshipit-source-id: 1ac1ec8fd50ca27e3cd36c12a588d334e7466899

2 years ago[PyTorch] Don't store multiple kernels per key on mobile (#64447)
Scott Wolchok [Tue, 14 Sep 2021 17:35:04 +0000 (10:35 -0700)]
[PyTorch] Don't store multiple kernels per key on mobile (#64447)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64447

As the code comment says, we needn't worry about Jupyter notebooks on mobile.
ghstack-source-id: 137951718

Test Plan: Profiled startup of //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark on devserver with -niter 0 -nrep 0 and `C10_DISPATCHER_ONE_KERNEL_PER_DISPATCH_KEY` defined. Time spent in sherwood_v3_table lookups went way down.

Reviewed By: ezyang, bhosmer

Differential Revision: D30736094

fbshipit-source-id: bcc22cd0d9adceba259a03898c992759d501fe89

2 years ago[fx const fold] fix some cases with deep model hierarchy (#64945)
Shiyan Deng [Tue, 14 Sep 2021 16:41:57 +0000 (09:41 -0700)]
[fx const fold] fix some cases with deep model hierarchy (#64945)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64945

In the const folding pass, we try to create `get_attr` nodes in submod_1 for `get_attr` nodes that are in the main graph. But we don't have the real attributes in submod_1. To fix this we assign main module as the owning module of sumod_1 graph.

The fix above would cause problem for `call_module` node in submod_1 because during split modules gets inlined (target changed from "mod.a.b" -> "mod_a_b") to submod_1. Changing the owning module would make those `call_module nodes unable to find the referring module. To fix this, we set the targeting module to main module.

Reviewed By: jfix71

Differential Revision: D30905949

fbshipit-source-id: cd67bc8fe4b8ad4344ae97b8e36753fdce3ece6d

2 years ago[Model Averaging] Revert #63895 (#64903)
Yi Wang [Tue, 14 Sep 2021 16:41:13 +0000 (09:41 -0700)]
[Model Averaging] Revert #63895 (#64903)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64903

Fix the accuracy regression caused by https://github.com/pytorch/pytorch/pull/63895.

Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager
buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity

Reviewed By: rohan-varma

Differential Revision: D30894688

fbshipit-source-id: fe00b8b23b860d9f806f87c1b6caba1d0b807485

2 years agoDrop incremental linking on Windows with REL_WITH_DEB_INFO=1. (#64892)
Nick Kreeger [Tue, 14 Sep 2021 16:40:33 +0000 (09:40 -0700)]
Drop incremental linking on Windows with REL_WITH_DEB_INFO=1. (#64892)

Summary:
The library will no longer link properly on VS 2019 (14.29.30133). To
ensure that engineers building on Windows can use and debug with this
build type, incremental linking needs to be turned off for this build
flag.

Verified that this build type successfully builds, links, and provides
debuggable Python modules on Windows.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64892

Reviewed By: jbschlosser

Differential Revision: D30902565

Pulled By: malfet

fbshipit-source-id: e5286a4c6f45c7cbe4cdc1b98560129bd386970b

2 years agoDisable target determination for now (#64921)
Nikita Shulga [Tue, 14 Sep 2021 16:38:34 +0000 (09:38 -0700)]
Disable target determination for now (#64921)

Summary:
There were several reports of target determinator incorrectly skipping
tests, most recent one is https://github.com/pytorch/pytorch/issues/64902

Let's disable it until it could be further stabilized

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64921

Reviewed By: seemethere, janeyx99

Differential Revision: D30901186

Pulled By: malfet

fbshipit-source-id: 531afd2d390c6b51f727330d5dd1882d70b6fdde

2 years agoprint_test_stats.py: dedup test report upload name with TEST_CONFIG (#64948)
Jane (Yuan) Xu [Tue, 14 Sep 2021 15:59:15 +0000 (08:59 -0700)]
print_test_stats.py: dedup test report upload name with TEST_CONFIG (#64948)

Summary:
Connected with issue https://github.com/pytorch/pytorch/issues/64845, takeover of https://github.com/pytorch/pytorch/issues/64091

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64948

Reviewed By: malfet, seemethere

Differential Revision: D30908592

Pulled By: janeyx99

fbshipit-source-id: dc31b0bbc9f4e35d23412aa14acbbab7422b4146

2 years agoMake {select,slice,diagonal}_backward primitives wrt autograd (#64933)
Richard Zou [Tue, 14 Sep 2021 15:07:01 +0000 (08:07 -0700)]
Make {select,slice,diagonal}_backward primitives wrt autograd (#64933)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64933

Fixes https://github.com/facebookresearch/functorch/issues/108

This is a short-term fix. A longer-term fix would be to either:
1. have proper {select,slice,diagonal}_embed functions
2. have efficient {select,slice,diagonal}_scatter functions (and
efficient zero tensors).

NB: I didn't use diag_embed because diag_embed is slightly different
from diagonal_backward.

There are no BC concerns because TorchScript (luckily) does not
serialize the backwards graph.

Test Plan:
- run tests
- run benchmarks.
https://gist.github.com/zou3519/e7c0774d1ac97f32aa02ec44d81e60e1.
Surprisingly the instruction count goes down. This is probably because
we create fewer autograd nodes now.

Reviewed By: ezyang

Differential Revision: D30909333

Pulled By: zou3519

fbshipit-source-id: 3b33e13010ba13b4d487b346aa9bee8a0e8c378c

2 years agoReplace composite dispatch with `CompositeExplicitAutograd` (#64641)
Yukio Siraichi [Tue, 14 Sep 2021 14:55:13 +0000 (07:55 -0700)]
Replace composite dispatch with `CompositeExplicitAutograd` (#64641)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64641

`sum`, `mean`, and `norm` were ported to structured kernels in #61642, #61643, and #62711,
respectively. Those PRs changed related overlads into composite kernels. However, their
dispatch section remained the same, when they really should be marked as
`CompositeExplicitAutograd`. This PR fixes this issue.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30867122

Pulled By: ezyang

fbshipit-source-id: b951aee41a3cab9ca546df826a285d60013e3b3a

2 years agoRevert D30711934: [pytorch][PR] Use RDS for build size tracking
Edward Yang [Tue, 14 Sep 2021 13:07:52 +0000 (06:07 -0700)]
Revert D30711934: [pytorch][PR] Use RDS for build size tracking

Test Plan: revert-hammer

Differential Revision:
D30711934 (https://github.com/pytorch/pytorch/commit/1cd0252eed8ddb26e4599ef2b0fec4d8843b8828)

Original commit changeset: 0af808ddf528

fbshipit-source-id: 6f67ed5cbaf333cc55729be2a23e385772e31b10

2 years ago[TensorExpr] Remove 'Placeholder' class. (#64887)
Mikhail Zolotukhin [Tue, 14 Sep 2021 07:19:57 +0000 (00:19 -0700)]
[TensorExpr] Remove 'Placeholder' class. (#64887)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64887

BufHandle has exactly the same functionality and should be used instead.

Differential Revision:
D30889483
D30889483

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 365fe8e396731b88920535a3de96bd3301aaa3f3

2 years ago[TensorExpr] PyBinds: improve QoL of pybind users. (#64886)
Mikhail Zolotukhin [Tue, 14 Sep 2021 07:19:57 +0000 (00:19 -0700)]
[TensorExpr] PyBinds: improve QoL of pybind users. (#64886)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64886

Bind methods for implicit conversions and constructors to avoid
boilerplate code.

Differential Revision:
D30889193
D30889193

Test Plan: Imported from OSS

Reviewed By: jbschlosser

Pulled By: ZolotukhinM

fbshipit-source-id: 137c0c98f7f1576e1bb97c8de8a900b28407a30e

2 years agoFix use of deprecated tensor.type() in SegmentReduce.cpp (#64151)
Peter Bell [Tue, 14 Sep 2021 06:15:10 +0000 (23:15 -0700)]
Fix use of deprecated tensor.type() in SegmentReduce.cpp (#64151)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64151

Reviewed By: mruberry

Differential Revision: D30917268

Pulled By: ngimel

fbshipit-source-id: 63427372b651ac495d48ef552eba5fbf0e4378e9

2 years ago[quant] handle empty input in fused_moving_avg_obs_fake_quant op (#64829)
Supriya Rao [Tue, 14 Sep 2021 05:21:05 +0000 (22:21 -0700)]
[quant] handle empty input in fused_moving_avg_obs_fake_quant op (#64829)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64829

If an empty input is passed in, the aminmax operator fails with a runtime error like
```
RuntimeError: aminmax(): cannot compute aminmax over an empty dimension as the operation has no identity.
```

To avoid this during training we just return the input if we find it to be empty

Test Plan:
python test/test_quantization.py TestFusedObsFakeQuant

Imported from OSS

Reviewed By: jingsh

Differential Revision: D30870879

fbshipit-source-id: 0cb4b187449a45a37150a77510d2292f93a7d1cd

2 years agoAdd forward AD for torch.linalg.eigh (#62163)
Ivan Yashchuk [Tue, 14 Sep 2021 04:13:56 +0000 (21:13 -0700)]
Add forward AD for torch.linalg.eigh (#62163)

Summary:
This PR adds forward mode differentiation for `torch.linalg.eigh` and a few other functions required for tests to pass.

For some reason running tests for `torch.linalg.eigvalsh` and complex `torch.linalg.eigh` hangs. These tests are skipped for now.

cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry heitorschueroff walterddr IvanYashchuk xwang233

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62163

Reviewed By: jbschlosser

Differential Revision: D30903988

Pulled By: albanD

fbshipit-source-id: d6a74adb9e6d2f4be8ac707848ecabf06d629823

2 years ago[THC] remove TensorTypeUtils and TensorInfo (#64965)
Natalia Gimelshein [Tue, 14 Sep 2021 03:34:57 +0000 (20:34 -0700)]
[THC] remove TensorTypeUtils and TensorInfo (#64965)

Summary:
per title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64965

Reviewed By: mruberry

Differential Revision: D30916754

Pulled By: ngimel

fbshipit-source-id: b24020d6a7ce8a05a5ab6c579d176dd94dd3b1d7

2 years agoEmbeddingBag sort thrust->cub (#64498)
Xiang Gao [Tue, 14 Sep 2021 02:49:33 +0000 (19:49 -0700)]
EmbeddingBag sort thrust->cub (#64498)

Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/57505

Also fixes a warning I found when compiling:
```
/home/gaoxiang/pytorch-cub/torch/csrc/distributed/c10d/quantization/quantization_gpu.cu(7): warning: inline qualifier ignored for "__global__" function
```
I also updated the bfloat16 guard to CUDA 11.5

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64498

Reviewed By: mruberry

Differential Revision: D30917077

Pulled By: ngimel

fbshipit-source-id: fb9df08fd469038478a563014b5af7452b4b28c0

2 years agoSpeed up torch.unique_consecutive() (#64835)
Chiang, Yu-Hsun (oToToT) [Tue, 14 Sep 2021 01:59:13 +0000 (18:59 -0700)]
Speed up torch.unique_consecutive() (#64835)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/62690

Like the way `unique_consecutive_cpu_template` implemented, this PR reimplements `_unique_dim_cpu_impl` to get better performance.
Also, because the overhead of `unique_dim_consecutive_cpu` is quite large, directly call `unique_consecutive_cpu_template` when we know the given input is a 1d-array.

## Benchmark
### Script
```python
import torch
import time

torch.manual_seed(0)
t = torch.randint(500, (10000000, ))
t = torch.sort(t)[0]

start = time.time()
uniques, inverse, counts = torch.unique_consecutive(t, dim=0, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive(dim=0) time:", end - start)

start = time.time()
uniques2, inverse2, counts2 = torch.unique_consecutive(t, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive() time:", end - start)

t = torch.randint(500, (10000000, 2))
t = torch.sort(t)[0]

start = time.time()
uniques, inverse, counts = torch.unique_consecutive(t, dim=0, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive(dim=0) time:", end - start)

start = time.time()
uniques, inverse, counts = torch.unique_consecutive(t, dim=1, return_inverse=True, return_counts=True)
end = time.time()
print("torch.unique_consecutive(dim=1) time:", end - start)
```

### Before
```
torch.unique_consecutive(dim=0) time: 78.64345622062683
torch.unique_consecutive() time: 0.029544353485107422
torch.unique_consecutive(dim=0) time: 91.49796152114868
torch.unique_consecutive(dim=1) time: 0.30872368812561035
```

### After
```
torch.unique_consecutive(dim=0) time: 0.08256125450134277
torch.unique_consecutive() time: 0.08162403106689453
torch.unique_consecutive(dim=0) time: 35.58408498764038
torch.unique_consecutive(dim=1) time: 1.6258199214935303
```

## System Information
```
Collecting environment information...
PyTorch version: 1.10.0a0+git7f1932e
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.11.0-34-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] torch==1.10.0a0+gitbe09195
[conda] Could not collect
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64835

Reviewed By: jbschlosser

Differential Revision: D30894906

Pulled By: ngimel

fbshipit-source-id: 42ab76d638391ce6c4e589d9c71bdf7579310ad9

2 years ago[WIP] Example of DataPipes and DataFrames integration (#60840)
Vitaly Fedyunin [Tue, 14 Sep 2021 01:48:48 +0000 (18:48 -0700)]
[WIP] Example of DataPipes and DataFrames integration (#60840)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60840

Test Plan: Imported from OSS

Reviewed By: wenleix, ejguan

Differential Revision: D29461080

Pulled By: VitalyFedyunin

fbshipit-source-id: 4909394dcd39e97ee49b699fda542b311b7e0d82

2 years agoRe-land Fix test report uploading (#64958)
driazati [Tue, 14 Sep 2021 01:34:40 +0000 (18:34 -0700)]
Re-land Fix test report uploading (#64958)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64958

This is a re-do of #64846 which was missing a path prefix for windows test reports

Test Plan: Imported from OSS

Reviewed By: seemethere

Differential Revision: D30915253

Pulled By: driazati

fbshipit-source-id: d14d0a64d2f8aabc335db9c4d0d2b63512887c66

2 years ago[iOS][OSS][BE] Add Simulator tests for full JIT (#64851)
Tao Xu [Tue, 14 Sep 2021 01:14:32 +0000 (18:14 -0700)]
[iOS][OSS][BE] Add Simulator tests for full JIT (#64851)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64851

ghstack-source-id: 137970229

Test Plan: CircleCI

Reviewed By: hanton, cccclai

Differential Revision: D30877963

fbshipit-source-id: 7bb8ade1959b85c3902ba9dc0660cdac8f558d64

2 years agoadd acc_ops.max, acc_ops.maximum, consolidate acc_ops.min and acc_ops.minimum
Emad El-Haraty [Tue, 14 Sep 2021 00:59:11 +0000 (17:59 -0700)]
add acc_ops.max, acc_ops.maximum, consolidate acc_ops.min and acc_ops.minimum

Summary:
This diff adds `acc_ops.max` and `acc_ops.maximum` support.
It further consolidates the logic for `acc_ops.min` and `acc_ops.minimum` to match the logic for max.

torch.max has three behaviors:
```1. max(input)
2. max(input, dim, keepdim=False, *, out=None)
3. max(input, other, *, out=None)
```

Likewise, `torch.min` has three identical behaviors.

I've chosen to implement each as an acc_op, then map to the appropriate one.

the third max function is effectively `torch.maximum`, so I've implemented it as that.

Reviewed By: yinghai, jfix71, 842974287

Differential Revision: D30551464

fbshipit-source-id: 0a2eec10e5185cbf7d9984eec3fd399b23528b2a

2 years agoAdd BFloat16 support for cross, tril, triu, tril_indices, triu_indices and cumsum...
CaoE [Tue, 14 Sep 2021 00:58:20 +0000 (17:58 -0700)]
Add BFloat16 support for cross, tril, triu, tril_indices, triu_indices and cumsum operators on CPU (#62454)

Summary:
Add BFloat16 support for cross, tril, triu, tril_indices, triu_indices and cumsum operators on CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62454

Reviewed By: albanD

Differential Revision: D30845805

Pulled By: heitorschueroff

fbshipit-source-id: f83836862e38109ec929e83567133e9e88096b8b

2 years agoUse RDS for build size tracking (#64303)
David Riazati [Tue, 14 Sep 2021 00:47:18 +0000 (17:47 -0700)]
Use RDS for build size tracking (#64303)

Summary:
This adds 2 utilities: `register_rds_table` and `rds_write`. `register_rds_table` needs to be called once with the schema for the data that `rds_write` will write. These go to a lambda called `rds-proxy`, which will write to/read from the DB as necessary. This data can then be arbitrarily queried via `rds-proxy` (for use in CI) or on metrics.pytorch.org (for analysis).

It also hooks these up for build size tracking (which previously was not working on GHA)

TODO:
* verify output in logs + clean up prints

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64303

Reviewed By: malfet, seemethere

Differential Revision: D30711934

Pulled By: driazati

fbshipit-source-id: 0af808ddf528a24875a378caeb1aa9cb0693f802

2 years agoAdd `skipIfTBB` decorator (#64942)
Nikita Shulga [Tue, 14 Sep 2021 00:10:30 +0000 (17:10 -0700)]
Add `skipIfTBB` decorator (#64942)

Summary:
And replace two existing usages in the codebase with it

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64942

Reviewed By: jbschlosser

Differential Revision: D30906382

Pulled By: malfet

fbshipit-source-id: e7f20f53aff734b0379eded361255543dab4fa4b

2 years agoRaise TypeError on assigned grad with wrong type (#64876)
Victor Quach [Mon, 13 Sep 2021 23:39:55 +0000 (16:39 -0700)]
Raise TypeError on assigned grad with wrong type (#64876)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/64813

Raises a TypeError when assigned value to a grad is not a Tensor or
None.

Adds tests.

cc ezyang gchanan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64876

Reviewed By: anjali411

Differential Revision: D30901678

Pulled By: soulitzer

fbshipit-source-id: dbb3cb5fd0bbac6918e0b2e2f51d340daa43dee0

2 years agokill SkipInfo (#64878)
Natalia Gimelshein [Mon, 13 Sep 2021 23:31:07 +0000 (16:31 -0700)]
kill SkipInfo (#64878)

Summary:
Per offline discussion, replaces SkipInfo with DecorateInfo. SkipInfo class itself is not removed yet to give functorch time to replace its SkipInfos.
cc zou3519

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64878

Reviewed By: mruberry

Differential Revision: D30908052

Pulled By: ngimel

fbshipit-source-id: 5124180b25c6e32517722883b9f3a2b488e3fe20

2 years agoFix TRTOperatorSupport (#64873)
Shirong Wu [Mon, 13 Sep 2021 22:53:20 +0000 (15:53 -0700)]
Fix TRTOperatorSupport (#64873)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64873

Fix TRTOperatorSupport's key naming to match the key generated by torch.fx.passes.tools_common.get_node_target. The get_node_target is used by splitter_base for comparing whether operator is supported by name.

Test Plan:
print out the supported operator dict and check name.
Run TRTSplitter with lrm_split_model_generator and verify split result is correct with all supported operators printed.
current split result:
````
Supported node types in the model:
acc_ops.size: ((), {'input': torch.float32})
acc_ops.getitem: ((), {'input': torch.float32})
acc_ops.getitem: ((), {'input': None})
acc_ops.reshape: ((), {'input': torch.float32})
acc_ops.unsqueeze: ((), {'input': torch.float32})
acc_ops.linear: ((), {'input': torch.float32, 'weight': torch.float32})
acc_ops.linear: ((), {'input': torch.float32, 'weight': torch.float32, 'bias': torch.float32})
acc_ops.mul: ((), {'input': torch.float32, 'other': torch.float32})
acc_ops.cat: ((), {})
acc_ops.add: ((), {'input': torch.float32, 'other': torch.float32})
acc_ops.add: ((), {'input': torch.float32})
acc_ops.tanh: ((), {'input': torch.float32})
acc_ops.transpose: ((), {'input': torch.float32})
acc_ops.matmul: ((), {'input': torch.float32, 'other': torch.float32})
acc_ops.div: ((), {'input': torch.float32, 'other': torch.float32})
acc_ops.squeeze: ((), {'input': torch.float32})
acc_ops.noop: ((), {'input': torch.float32})
acc_ops.layer_norm: ((), {'input': torch.float32, 'weight': torch.float32, 'bias': torch.float32})
acc_ops.permute: ((), {'input': torch.float32})
acc_ops.sigmoid: ((), {'input': torch.float32})
acc_ops.flatten: ((), {'input': torch.float32})
acc_ops.softmax: ((), {'input': torch.float32})
acc_ops.sum: ((), {'input': torch.float32})

Unsupported node types in the model:
torch.ops.fb.pad_sequence_embeddings: ((), {'embeddings': torch.float32, 'offsets': torch.int32})
acc_ops.linalg_norm: ((), {'input': torch
```

Reviewed By: yinghai

Differential Revision: D30884463

fbshipit-source-id: 22442aa6a69cd148ce9bc8be8f62157dd6d19954

2 years agoRevert D30878101: [pytorch][PR] Fix test report uploading
Eli Uriegas [Mon, 13 Sep 2021 22:21:51 +0000 (15:21 -0700)]
Revert D30878101: [pytorch][PR] Fix test report uploading

Test Plan: revert-hammer

Differential Revision:
D30878101 (https://github.com/pytorch/pytorch/commit/fba40bfc1ab45b4410504ec64b585c4df74b6f47)

Original commit changeset: 0730f17fa3f4

fbshipit-source-id: dad89e68b4daf656dd0b592bc9b2758f00af38c6

2 years agotorch.ao migration: fake_quantize.py, phase 1 (#64814)
Vasiliy Kuznetsov [Mon, 13 Sep 2021 22:20:44 +0000 (15:20 -0700)]
torch.ao migration: fake_quantize.py, phase 1 (#64814)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64814

1. move the file
```
hg mv caffe2/torch/quantization/fake_quantize.py caffe2/torch/ao/quantization/
```

2. create a new file in the old location and copy the imports
3. fix all callsites inside `torch`

Test Plan:
```
buck test mode/dev //caffe2/test:quantization
```

Reviewed By: z-a-f

Differential Revision: D30866792

fbshipit-source-id: 7a221cb46c0ab01f1c5de9be061f09ecc83ce23e

2 years ago[PyTorch] Reduce heap allocations in OperatorName::setNamespaceIfNotSet (#64673)
Scott Wolchok [Mon, 13 Sep 2021 21:31:36 +0000 (14:31 -0700)]
[PyTorch] Reduce heap allocations in OperatorName::setNamespaceIfNotSet (#64673)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64673

We are now guaranteed to allocate at most one time in this function.
ghstack-source-id: 137786392

Test Plan: Previous diff adds test coverage for this function.

Reviewed By: dhruvbird

Differential Revision: D30813014

fbshipit-source-id: 17d844a1cc8c30574afcc6b0b41b219e62c0b723

2 years ago[PyTorch] Add test for operator_name (#64672)
Scott Wolchok [Mon, 13 Sep 2021 21:31:36 +0000 (14:31 -0700)]
[PyTorch] Add test for operator_name (#64672)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64672

Just a small struct missing test coverage. Next diff changes it.
ghstack-source-id: 137786388

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D30813013

fbshipit-source-id: 05f39494bb9512a71a928bfe6fcfa710016bdf61

2 years agohandle the case in acc_ops.sum when dim == 0, differentiating it from the case when...
Emad El-Haraty [Mon, 13 Sep 2021 21:22:53 +0000 (14:22 -0700)]
handle the case in acc_ops.sum when dim == 0, differentiating it from the case when dim is None (#64869)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64869

handle the case in acc_ops.sum when dim == 0, differentiating it from the case when dim is None

Reviewed By: 842974287

Differential Revision: D30872739

fbshipit-source-id: 2755d3230804a16ef1c9289f804138c6dd7766b3

2 years agofix build error when system cmake3 version >=3.5 but <=3.10 (#64914)
XiaobingSuper [Mon, 13 Sep 2021 20:21:23 +0000 (13:21 -0700)]
fix build error when system cmake3 version >=3.5 but <=3.10 (#64914)

Summary:
For PyTorch source build using conda, there will raise an error in https://github.com/pytorch/pytorch/blob/8535418a06d75025541370cc656a8b6a0330ca0d/CMakeLists.txt#L1 when we get a CMake version < 3.10, it can be fixed by upgrade CMake in conda env, but for centos, there has CMake3, PyTorch fist check whether CMake3's verison<=3.5, so if user's system camke<= 3.5, PyTorch will use the system's cmake3, which will have build error like:
```
CMake Error at CMakeLists.txt:1 (cmake_minimum_required):
  CMake 3.10 or higher is required.  You are running version 3.6.3

-- Configuring incomplete, errors occurred!
```

we need to check CMake3 also >=3.10, if not, then check conda's CMake version.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64914

Reviewed By: jbschlosser

Differential Revision: D30901673

Pulled By: ezyang

fbshipit-source-id: 064e2c5bc0b9331d6ecd65cd700e5a42c3403790

2 years agoFix test report uploading (#64846)
driazati [Mon, 13 Sep 2021 20:21:09 +0000 (13:21 -0700)]
Fix test report uploading (#64846)

Summary:
Previously we just weren't uploading Windows test report XML files to S3, only to GitHub actions. This was different than Linux where we use both (though maybe we can kill the GHA upload in a follow up PR since I don't think it's very useful anymore). This factors it all out into a macro so they both do the same thing. This also fixes the naming of uploaded files to include info about the job name (the full config, so they can be matched to the job visually or by the included job id).

See https://hud.pytorch.org/pr/64846 for results

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64846

Reviewed By: seemethere

Differential Revision: D30878101

Pulled By: driazati

fbshipit-source-id: 0730f17fa3f46a32c131f52669084c3103b0e616

2 years agoPin SciPy to 1.6.3 on Mac (take 2) (#64922)
Nikita Shulga [Mon, 13 Sep 2021 19:46:11 +0000 (12:46 -0700)]
Pin SciPy to 1.6.3 on Mac (take 2) (#64922)

Summary:
It's already pinned by via docker install on Linux

`scipy.stats.`[`poission`|`geom`|`binom`] returns quite different results between 1.6.x and 1.7+ versions of SciPy, which results in several distributions tests failing accuracy thresholds

Reland of https://github.com/pytorch/pytorch/pull/64844 but limited to just Mac platform
Followup PR for Windows are coming as well

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64922

Reviewed By: janeyx99

Differential Revision: D30901257

Pulled By: malfet

fbshipit-source-id: 0543e7bae9d3bbeb8b6be7b3ecf605880f97665f

2 years ago[Deploy] Avoid use-after-free during autograd shutdown (#64620)
Don Jang [Mon, 13 Sep 2021 19:41:50 +0000 (12:41 -0700)]
[Deploy] Avoid use-after-free during autograd shutdown (#64620)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64620

`autograd` extension module's shutdown logic destructs `PyThreadState` by `pybind11::gil_scoped_acquire` using the RAII pattern.

The problem is that torch.deploy also destructs `PyThreadState` as part of its shutdown process (https://www.internalfb.com/phabricator/paste/view/P456363738), causing double destruction, use-after-free.

This change adds `defined(USE_DEPLOY)` as a special case to avoid destruction of `PyThreadState` to the existing special treatment for  `IS_PYTHON_3_9_PLUS`.

Test Plan: Added `TorchpyTest.Autograd` unittest to ensure that torch.deploy can create multiple instances that use autograd without causing a crash.

Reviewed By: albanD

Differential Revision: D30779080

fbshipit-source-id: 4de3283cc2d394acc9b8141c17cacbfab5eea052

2 years ago[Pytorch Edge] Quantized Ops Dtype Selective (#63680)
Jacob Szwejbka [Mon, 13 Sep 2021 17:54:08 +0000 (10:54 -0700)]
[Pytorch Edge] Quantized Ops Dtype Selective (#63680)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63680

Quantized ops not covered by DType Selectivity. Add the check, and adjust call sites to be constexpr friendly.

Test Plan: CI (this covers all model unit tests), verified that segmentation (a model that uses some of these quant ops) still works on instagram.

Reviewed By: dhruvbird, raymondethan

Differential Revision: D30457626

fbshipit-source-id: 5ba850d2b53a18558dfbb1cfaa78d8f53b5dbad8