review.tizen.org Git - platform/upstream/pytorch.git/log

projects / platform / upstream / pytorch.git / log

Shen Li [Thu, 12 Aug 2021 18:39:31 +0000 (11:39 -0700)]

Revert D30279364: [codemod][lint][fbcode/c*] Enable BLACK by default

Test Plan: revert-hammer

Differential Revision:
D30279364 (https://github.com/pytorch/pytorch/commit/b0043072529b81276a69df29e00555333117646c)

Original commit changeset: c1ed77dfe43a

fbshipit-source-id: eab50857675c51e0088391af06ec0ecb14e2347e

commit | commitdiff | tree

jiej [Thu, 12 Aug 2021 18:03:32 +0000 (11:03 -0700)]

LayerNorm Support in autodiff: (#50467)

Summary:
1. extend autodiff by adding entry for layer_norm in symbolic script, we now use native_layer_norm_backward
2. added backward function `layernorm_double_backward` for `native_layer_norm_backward`, preserves double backward support for LayerNorm in autodiff/ScriptModule
3. added python test to verify autodiff on layer_norm with various configuration of optional tensors; (verify the fix in https://github.com/pytorch/pytorch/issues/49430)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50467

Reviewed By: eellison

Differential Revision: D30232864

Pulled By: jansel

fbshipit-source-id: b9c33075386aff96afff7415df9f94388bfb474a

Co-authored-by: Ryan Spring <rspring@nvidia.com>
Co-authored-by: Jie <jiej@nvidia.com>

commit | commitdiff | tree

Zsolt Dollenstein [Thu, 12 Aug 2021 17:56:55 +0000 (10:56 -0700)]

[codemod][lint][fbcode/c*] Enable BLACK by default

Test Plan: manual inspection & sandcastle

Reviewed By: zertosh

Differential Revision: D30279364

fbshipit-source-id: c1ed77dfe43a3bde358f92737cd5535ae5d13c9a

commit | commitdiff | tree

Kushashwa Ravi Shrimali [Thu, 12 Aug 2021 16:45:17 +0000 (09:45 -0700)]

[reland] OpInfo: `adaptive_avg_pool2d` (#62935)

Summary:
This PR is an attempt to reland https://github.com/pytorch/pytorch/pull/62704.

**What has changed?**

The op has non-deterministic behavior, hence an appropriate `gradcheck` wrapper had to be added.

cc: mruberry zou3519 heitorschueroff kshitij12345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62935

Reviewed By: anjali411

Differential Revision: D30225095

Pulled By: zou3519

fbshipit-source-id: 644873cc21d44b19c8b68f9edff691913778de0e

commit | commitdiff | tree

Rong Rong (AI Infra) [Thu, 12 Aug 2021 15:13:01 +0000 (08:13 -0700)]

[BE] shorten CI name part2 (#63030)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/62357
there's no need to specify cudnn version since they are recommended from cuda version already.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63030

Reviewed By: zhouzhuojie, driazati

Differential Revision: D30226354

Pulled By: walterddr

fbshipit-source-id: 7e2dc577810e0ce80ee27569c25a814566250ab1

commit | commitdiff | tree

Rohan Varma [Thu, 12 Aug 2021 07:37:30 +0000 (00:37 -0700)]

Skip zero test on windows (#63087)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63087

Test failed on windows unexpectedly see
https://github.com/pytorch/pytorch/issues/63086. Skip for now while we
investigate
ghstack-source-id: 135631811

Test Plan: CI

Reviewed By: ngimel

Differential Revision: D30251300

fbshipit-source-id: 8acb1ea8863c654c171fe989ac24446c321c085d

commit | commitdiff | tree

Peter Bell [Thu, 12 Aug 2021 06:46:12 +0000 (23:46 -0700)]

BatchNorm: Use resize_output and empty, instead of empty_like (#63084)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/62967

This lets each of the three implementations choose which memory format
to use for the output, meaning channels_last can be used in more cases.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63084

Reviewed By: saketh-are

Differential Revision: D30255740

Pulled By: ngimel

fbshipit-source-id: 48d42850952ec910b29521a1c4e530eb2b29df5e

commit | commitdiff | tree

Supriya Rao [Thu, 12 Aug 2021 05:05:30 +0000 (22:05 -0700)]

[quant] Make version 1 the default for get_default_qat_qconfig (#63043)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63043

In version 1 we use the fused module/operator during QAT. Making this the default for all QAT runs going forward.

Older models saved after prepare_qat_fx can still load their state_dict into a model prepared using version 1.
The state_dict will still have the same attribute for the observer/fake_quant modules.

There may be some numerics difference between the old observer code in observer.py and the new fused module that was
re-written in C++/CUDA to perform observe + fake_quantize.

This PR also updates the test to check for the new module instead of the default FakeQuantize module.
Note: there are also some changes to make the operator work for multi-dim per-channel quantization + updated the test for that.

Test Plan:
python test/test_quantization.py TestSerialization.test_default_qat_qconfig

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D30232222

fbshipit-source-id: f3553a1926ab7c663bbeed6d574e30a7e90dfb5b

commit | commitdiff | tree

Pritam Damania [Thu, 12 Aug 2021 04:41:31 +0000 (21:41 -0700)]

Fix sharded tensor tests. (#63054)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63054

1) Ensure these tests are skipped in environments without any GPUs.
2) Add the test to run_test.py
ghstack-source-id: 135595698

Test Plan: waitforbuildbot

Reviewed By: wanchaol

Differential Revision: D30239159

fbshipit-source-id: 21b543ba72e8d10182bc77e7ae1fd34fd4096509

commit | commitdiff | tree

Meghan Lele [Thu, 12 Aug 2021 04:01:28 +0000 (21:01 -0700)]

Port `log_softmax_backward_data` to structured kernel (#62372)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62372

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D30240242

Pulled By: SplitInfinity

fbshipit-source-id: 67d5e4b1543c2e43675e905ce18ca49c11e33748

commit | commitdiff | tree

Meghan Lele [Thu, 12 Aug 2021 04:01:28 +0000 (21:01 -0700)]

Port `log_softmax` to structured kernel (#57374)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57374

Test Plan: Imported from OSS

Reviewed By: saketh-are

Differential Revision: D30240243

Pulled By: SplitInfinity

fbshipit-source-id: de6617c75d16e26d607a884c25b8752b7b561737

commit | commitdiff | tree

zhouzhuojie [Thu, 12 Aug 2021 00:09:02 +0000 (17:09 -0700)]

Add ciflow_ruleset.json generator along with gha ci (#63097)

Summary:
- Add `.github/generated-ciflow-ruleset.json` for ciflow-bot (so that we can generate better comments)
- The lint job also checks git dirty to make sure that the file is always in sync with ciflow configs

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63097

Reviewed By: saketh-are

Differential Revision: D30263278

Pulled By: zhouzhuojie

fbshipit-source-id: bad68105a228e892ba071b29ecfdf433e1038054

commit | commitdiff | tree

Jiewen Tan [Wed, 11 Aug 2021 23:42:34 +0000 (16:42 -0700)]

Improve IMethod::getArgumentNames to deal with empty argument names list (#62947)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62947

This diff improved IMethod::getArgumentNames to deal with empty argument names list.

Test Plan:
buck test mode/dev //caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesValidationMode
buck test mode/dev //caffe2/caffe2/fb/predictor:pytorch_predictor_test -- PyTorchDeployPredictor.GetEmptyArgumentNamesRealMode

Reviewed By: wconstab

Differential Revision: D30179974

fbshipit-source-id: c7aec35c360a73318867c5b77ebfec3affee47e3

commit | commitdiff | tree

Amy He [Wed, 11 Aug 2021 21:24:06 +0000 (14:24 -0700)]

Fix Nnapi backend execute's dangling pointer (#63092)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63092

Bug discovered while testing NNAPI Delegate on SparkAR.
Using
```
c10::IntArrayRef order = {0, 2, 3, 1};
fixed_inputs.push_back(tensorInp.get(i).permute(order).contiguous());
```
results in a garbage value for order in `permute()`.
Moving order inside the call to `permute()` fixes this issue. Problem is seemingly related to https://github.com/pytorch/pytorch/issues/44409, but luckily the solution in this case is simple.

Bug wasn't caught earlier, since regular unit tests weren't affected by the dangling pointer, and address sanitizer NNAPI tests are turned off due to there being a different failure (T95764916).
ghstack-source-id: 135526129

Test Plan:
Run Unit tests: `python test/test_jit.py`

Build and run SparkAR on an Android phone at the top of this diff stack (D30173959): `buck build --show-output arstudioplayer_arm64_debug -c pt.enable_nnapi=1`

Reviewed By: raziel, iseeyuan

Differential Revision: D30237504

fbshipit-source-id: c946d81feefc453b43d9295d8d6f509cafdcec03

commit | commitdiff | tree

Nikita Shulga [Wed, 11 Aug 2021 21:05:55 +0000 (14:05 -0700)]

Fix warnings (#62930)

Summary:
Add `-Wno-writable-strings`(which is clang's flavor of `-Wwrite-strings`) to list of warnings ignored while compiling torch_python.
Avoid unnecessary copies in range loop
Fix number of signed-unsigned comparisons

Found while building locally on M1

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62930

Reviewed By: albanD

Differential Revision: D30171981

Pulled By: malfet

fbshipit-source-id: 25bd43dab5675f927ca707e32737ed178b04651e

commit | commitdiff | tree

Tao Xu [Wed, 11 Aug 2021 20:28:09 +0000 (13:28 -0700)]

[iOS][GPU] Consolidate array and non-array kernel for upsampling_nearest2d (#63061)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63061

Cleanup the redundant shader code for the upsampling nearest kernel.
ghstack-source-id: 135524349

Test Plan:
- `buck test pp-macos`
- Op tests in PyTorchPlayground app

Reviewed By: husthyc

Differential Revision: D30236905

fbshipit-source-id: e1e001b446452b077e6db719b0519c9070f3300b

commit | commitdiff | tree

Richard Barnes [Wed, 11 Aug 2021 20:12:16 +0000 (13:12 -0700)]

irange-ify 13b (#62476)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62476

Test Plan: Sandcastle

Reviewed By: malfet

Differential Revision: D30001445

fbshipit-source-id: 6f4525338c80e9f929695f47f36ca9c72d96a75d

commit | commitdiff | tree

CaoE [Wed, 11 Aug 2021 19:51:28 +0000 (12:51 -0700)]

Add BFloat16 support for unique and unique_consecutive on CPU (#62559)

Summary:
Add BFloat16 support for unique and unique_consecutive on CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62559

Reviewed By: saketh-are

Differential Revision: D30250675

Pulled By: ngimel

fbshipit-source-id: 26e48f971d87f3b86db237e8ad3a4b74eb3c1def

commit | commitdiff | tree

Alexander Grund [Wed, 11 Aug 2021 19:42:32 +0000 (12:42 -0700)]

Add Github action to upload full source releases (#63022)

Summary:
Those release tarballs include the submodules.
The action is run on every tag, master-branch push but will not upload anything.
This makes sure nothing is broken when an actual release happens.

On created releases the action runs and uploads the tarball

Fixes https://github.com/pytorch/pytorch/issues/62708

As I don't have access rights here and testing is obviously hard (as a new release needs to be published), I set up a test at https://github.com/Flamefire/pytorch/releases/tag/testtag
See also the run(s) at https://github.com/Flamefire/pytorch/actions/workflows/create_release.yml

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63022

Reviewed By: saketh-are

Differential Revision: D30256253

Pulled By: seemethere

fbshipit-source-id: ab5fe131452de14ae3768b91c221e68c536cb3aa

commit | commitdiff | tree

Xiang Gao [Wed, 11 Aug 2021 19:34:58 +0000 (12:34 -0700)]

Embedding thrust->cub: unique (#63042)

Summary:
Followup of https://github.com/pytorch/pytorch/pull/62495

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63042

Reviewed By: saketh-are

Differential Revision: D30231084

Pulled By: ngimel

fbshipit-source-id: 03b0a88107e8a2aee3570881d81bf2b676f525cd

commit | commitdiff | tree

Howard Cheng [Wed, 11 Aug 2021 19:32:10 +0000 (12:32 -0700)]

[PyTorch] Add flop count for addmm (#61895)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61895

* Add FLOP count for addmm, should be `2*m*n*k`.

Share the same code path for `addmm` and `mm`.

Test Plan:
Imported from OSS

`python test/test_profiler.py`
Run a sample profile and check that FLOPS for `aten::addmm` is correct.

`[chowar@devbig053.frc2 ~/local/pytorch/build] ninja bin/test_jit`
`[chowar@devbig053.frc2 ~/local/pytorch/build] ./bin/test_jit --gtest_filter='ComputeFlopsTest*'`

Reviewed By: dskhudia

Differential Revision: D29785671

fbshipit-source-id: d1512036202d7234a981bda897af1f75808ccbfe

commit | commitdiff | tree

Salil Desai [Wed, 11 Aug 2021 18:51:58 +0000 (11:51 -0700)]

XNNPack Input Pointer Caching Comment (#62818)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62818

Added a comment to explain why we no longer need to manually cache pointers/parameters for convolution, as removed in D29777605 (https://github.com/pytorch/pytorch/commit/f5c6c3947e4618d30ebd68a414f1cfcda27bdcd4)

Test Plan: Sandcastle tests (no code changed)

Reviewed By: kimishpatel

Differential Revision: D30113489

fbshipit-source-id: d697f05816acbd367d59a4aced1925303c683d40

commit | commitdiff | tree

rusty1s [Wed, 11 Aug 2021 18:35:53 +0000 (11:35 -0700)]

`_convert_coo_to_csr` CPP and CUDA functionality (#61838)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/57381 and improves https://github.com/pytorch/pytorch/pull/61340 via dedicated `coo_to_csr` functionalities.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61838

Reviewed By: ezyang

Differential Revision: D30132736

Pulled By: cpuhrsch

fbshipit-source-id: a1fd074c0d70366a524d219a620b94f8bed71d7c

commit | commitdiff | tree

Pritam Damania [Wed, 11 Aug 2021 18:22:48 +0000 (11:22 -0700)]

Add a _RemoteDevice structure for ShardedTensor/ShardingSpec. (#62927)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62927

As part of the ShardedTensor work, we realized we do need some sort of
_RemoteDevice structure that deals with our format of "workername/device" so
that users don't have to worry about parsing this string directly.

Right now this structure is just the bare minimum and is mostly a container for
describing a remote device. It is currently only used in ShardedTensor,
ShardingSpec and RemoteModule.

Once we actually have a consolidated remote device proposal, this class can be
extended appropriately if needed.
ghstack-source-id: 135534086

Test Plan:
1) unit tests
2) waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D30170689

fbshipit-source-id: 1ac2e81c7a597dc40bf3fbf2c1168c382c66649f

commit | commitdiff | tree

Jacob Szwejbka [Wed, 11 Aug 2021 18:14:25 +0000 (11:14 -0700)]

[Pytorch Edge] Move RuntimeCompatibilityInfo Factory Method (#63005)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63005

Realized I forgot to move the Runtime half of these functions be within the struct.

Test Plan: ci

Reviewed By: pavithranrao

Differential Revision: D30205521

fbshipit-source-id: ccd87d7d78450dd0dd23ba493bbb9d87be4640a5

commit | commitdiff | tree

Stephen Macke [Wed, 11 Aug 2021 18:09:02 +0000 (11:09 -0700)]

[easy] add an `inplace` argument to MutableNetProto.to_net() and core.Net() constructor (#63068)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63068

The caffe2 core.Net constructor can accept a caffe2_pb2.NetDef proto, but it always creates a copy. This is wasteful when we can prove that the proto being passed to it will not be used anywhere else. So we add an "inplace" argument to the `core.Net` constructor that allows clients to give away ownership of the passed proto without copying. We default this argument to `False`, ensuring that behavior does not change unless explicitly requested.

Test Plan: Let CI run.

Differential Revision: D29976510

fbshipit-source-id: 26e13ca76f3431b8ef0de51f08bbf263491d323e

commit | commitdiff | tree

zhouzhuojie [Wed, 11 Aug 2021 16:42:15 +0000 (09:42 -0700)]

Fix gha render-test-result mixed failure passthrough (#63056)

Summary:
To fix something like https://github.com/pytorch/pytorch/actions/runs/1114555082

![image](https://user-images.githubusercontent.com/658840/128956528-86997457-5e18-4ae1-83cc-aa7d0ca03c0e.png)

Not sure why `needs.test.result` doesn't capture the `failure` case before, so changed it to `needs.test.result != 'skipped' || failure()`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63056

Reviewed By: walterddr, tktrungna

Differential Revision: D30240112

Pulled By: zhouzhuojie

fbshipit-source-id: d159cc3f79ed5d604ae12583736b37ac28e8d87c

commit | commitdiff | tree

Yida Wang [Wed, 11 Aug 2021 16:36:49 +0000 (09:36 -0700)]

Fix issues with printing certain torch modules (#62447)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/54420

When I tested on master, with the testing code, there were multiple objects on the garbage collector that cannot be printed.

Testing code:
```
import torch
import gc
import os
import sys

print(torch.__version__)

a = torch.rand(10)

print(a)

objects = gc.get_objects()

for i in range(len(objects)):
print(objects[i])
```

### 1
```
print(torch.classes)
```

Like SplitInfinity has mentioned in the GitHub issue, the solution here is to set `__file__` for `torch.classes` to something. Similar to [_ops.py](https://github.com/pytorch/pytorch/blob/master/torch/_ops.py#L69), where `__file__` is set to `_ops.py`, we could set `__file__` for torch.classes to `_classes.py`.

### 2
```
print(torch._ops.ops.quantized)
print(torch._ops.ops.atan)
```

When we try to print these two modules, it will call `_OpNamespace::__getattr__`, but the `op_name` is `__file__`. This becomes a problem when `torch._C._jit_get_operation(qualified_op_name)` [(link)](https://github.com/pytorch/pytorch/blob/master/torch/_ops.py#L60) tries to look for an actual op on the native C++ side.

Only when we get the attribute for an actual op, e.g. `print(torch._ops.ops.quantized.elu)`, the `op_name` becomes proper (e.g. `elu`).

My current solution is to return a hardcoded string (i.e. “torch.ops”) if `op_name` is `"__file__"`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62447

Reviewed By: saketh-are

Differential Revision: D30234654

Pulled By: yidawang-oss

fbshipit-source-id: de43a8f599739c749fb3307eea015cc61f1da60e

commit | commitdiff | tree

Peter Bell [Wed, 11 Aug 2021 15:44:08 +0000 (08:44 -0700)]

Shard python_functions.cpp (#62186)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62186

This file takes 6 minutes on its own to compile and is the limiting factor for
building `libtorch_python` on a 32-core threadripper. This splits the file into
5 shards which take around 50 seconds each to compile.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D29962046

Pulled By: albanD

fbshipit-source-id: df13cfaebd54296f10609f67ae74a850c329bd37

commit | commitdiff | tree

Sze Wai Celeste Yuen [Wed, 11 Aug 2021 15:38:13 +0000 (08:38 -0700)]

Fix inconsisteny between Python and JIT power operation (#62842)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62842

Test Plan:
Wrote unit test TestAtenPow to test behavior of aten::pow when:
1. base is int, exponent is int
2. base is int, exponent is float
3. base is float, exponent is int
4. base is float, exponent is float

Specifically, we test that when base is zero and exponent is negative, we raise error. In all other cases, we expect behavior to be the same as the result returned by Python.

It is because the cpp code relies on overloading, we need to make sure all combinations of types give us the expected result.

Reviewed By: zhxchen17

Differential Revision: D30146115

Pulled By: szewaiyuen7

fbshipit-source-id: dc661897ad38da286ee454120fbe41314b7f2995

commit | commitdiff | tree

Dmytro Dzhulgakov [Wed, 11 Aug 2021 08:08:45 +0000 (01:08 -0700)]

Fix CUDA_KERNEL_ASSERT ambiguous symbol in NDEBUG mode (#62527)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62527

If NDEBUG is applied inconsistently in compilation we might get 'ambiguous declaration' error. Let's make sure that the forward declaration matches glibc including all specifiers.

Test Plan: sandcastle

Reviewed By: mdschatz

Differential Revision: D30030051

fbshipit-source-id: 9f4d5f1d4e74f0a4eaeeaaaad76b93ee485d8bcd

commit | commitdiff | tree

Pritam Damania [Wed, 11 Aug 2021 05:37:14 +0000 (22:37 -0700)]

[4/N] Enable opt-asan for distributed unit tests. (#62051)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62051

The goal here is to enable opt-asan for "spawn" based unit tests since
this works for "spawn" unlike "dev-asan". As a result, we can run ASAN for
"spawn" unit tests as well.

This means we can completely remove fork unit tests from the code base since
the only purpose for these tests was to run ASAN.
ghstack-source-id: 135523770

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D29854514

fbshipit-source-id: 02a5bfcfae2afc21badecff77082c7a6ad83636b

commit | commitdiff | tree

Lu Fang [Wed, 11 Aug 2021 04:56:41 +0000 (21:56 -0700)]

Back out "[fx] store Tracer class on Graph and GraphModule for package deserialization" (#63053)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63053

Original commit changeset: eca09424ad30

The original diff - D30019214 (https://github.com/pytorch/pytorch/commit/6286d338785c48a3e7a9b969e2bc3bd4d502851d) breaks the publish flow in model saving.

Test Plan: ci

Differential Revision: D30236517

fbshipit-source-id: 3e05db02fc1cbbc2ed262c83bf56d555277abb34

commit | commitdiff | tree

Rishi Puri [Wed, 11 Aug 2021 03:02:07 +0000 (20:02 -0700)]

rebase for autocast updates to include device_type and dtype flags (#61002)

Summary:
Fixes #{55374}
https://github.com/pytorch/pytorch/issues/55374

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61002

Reviewed By: malfet, mruberry

Differential Revision: D30016812

Pulled By: ngimel

fbshipit-source-id: 6e09a29f539d28e9aea5cd9489b1e633cc588033

commit | commitdiff | tree

Wei-Sheng Chin [Wed, 11 Aug 2021 02:46:46 +0000 (19:46 -0700)]

Fix missing element types and shapes when autograd.Function has multiple tensor outputs (#57966)

Summary:
When generating IR for autograd.Function, if the function has multiple outputs, a TupleUnpack may be inserted after the original function node, and Pytorch only assigns proper information (tensor element type and shape) to the TupleUnpack and forgets the original function node. In contrast, if autograd.Function only produces one output, the original function node may have tensor
element type and shape in its output schema.

Before this PR:
- (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp -> output (tensor, dtype=float32, shape=[4, 5])
- (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp -> output_0 **(tensor)**, output_1 **(tensor)** -> TupleUnpack output_2 (tensor, dtype=float32, shape=[4, 5]), output_3 (tensor, dtype=float32, shape=[6, 7])

After this PR:
- (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp -> output (tensor, dtype=float32, shape=[4, 5])
- (simplified) IR for autograd.Function with one output: input (tensor, dtype=float32, shape=[2, 3]) -> PythonOp ->output_0 **(tensor, dtype=float32, shape=[4, 5])**, output_1 **(tensor, dtype=float32, shape=[6, 7])** -> TupleUnpack output_2 (tensor, dtype=float32, shape=[4, 5]), output_3 (tensor, dtype=float32, shape=[6, 7])

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57966

Reviewed By: zhxchen17

Differential Revision: D30208207

Pulled By: gmagogsfm

fbshipit-source-id: 42a3d1f9c0932133112a85df0c49cf4ea0afa175

commit | commitdiff | tree

Natalia Gimelshein [Wed, 11 Aug 2021 01:39:45 +0000 (18:39 -0700)]

remove dead code (#63031)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63031

Reviewed By: mruberry

Differential Revision: D30225094

Pulled By: ngimel

fbshipit-source-id: 3666a0fa120bea85225cd3ee04f89d64952d2862

commit | commitdiff | tree

Natalia Gimelshein [Wed, 11 Aug 2021 01:23:00 +0000 (18:23 -0700)]

Revert D30199482: [pytorch][PR] Add BFloat16 support for unique and unique_consecutive on CPU

Test Plan: revert-hammer

Differential Revision:
D30199482 (https://github.com/pytorch/pytorch/commit/fc0b8e60337ae46b90ed5d2f6d1f623f0f8d6581)

Original commit changeset: 6f2d9cc1a528

fbshipit-source-id: 39e9f202bcbd978525f792173d4f97b5b329b5b1

commit | commitdiff | tree

Richard Barnes [Wed, 11 Aug 2021 00:57:22 +0000 (17:57 -0700)]

Use `const auto` with irange (#62990)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62990

Test Plan: Sandcastle

Reviewed By: zhouzhuojie

Differential Revision: D30199748

fbshipit-source-id: 284b208ffa3c6c4749e5ac9b1fccb28914590f2c

commit | commitdiff | tree

Eddie Yan [Wed, 11 Aug 2021 00:44:40 +0000 (17:44 -0700)]

change nccl version reporting (#62916)

Summary:
https://github.com/pytorch/pytorch/issues/62295

Previously the packing and unpacking of the NCCL version "integer" was done to have parity with the upstream NCCL version encoding. However, there doesn't seem to be any place where this integer is directly compared with a version integer sourced from upstream NCCL, and syncing the encoding seems to be error-prone (e.g., a recent change where a special case was added for minor versions >= 10 https://github.com/NVIDIA/nccl/blob/7e515921295adaab72adf56ea71a0fafb0ecb5f3/src/nccl.h.in#L22).

This patch changes the reporting to return a tuple of version numbers instead (to preserve ease-of-use for comparisons) and tweaks the passing between C/Python to avoid the digit overflow problem.

CC ngimel mcarilli

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62916

Reviewed By: anjali411

Differential Revision: D30201069

Pulled By: mrshenli

fbshipit-source-id: 2e4e7c69f001c3f22bd04aa6df6a992e538bea45

commit | commitdiff | tree

tktrungna [Tue, 10 Aug 2021 23:24:57 +0000 (16:24 -0700)]

Update test_torch_deploy (#62838)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62838

Fixes #62380

* update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder
* add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208).

### Test plan
check if all ci workflows pass

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D30193141

Pulled By: tktrungna

fbshipit-source-id: 72c2bd3a740fca0f72e4803df505240193692c44

commit | commitdiff | tree

tktrungna [Tue, 10 Aug 2021 23:24:57 +0000 (16:24 -0700)]

update test_libtorch (#62797)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62797

Fixes #62380

* update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder
* add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208).

### Test plan
check if all ci workflows pass

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D30193140

Pulled By: tktrungna

fbshipit-source-id: d8e54c403f42abbbbe4556abf40c22a7955df737

commit | commitdiff | tree

tktrungna [Tue, 10 Aug 2021 23:24:57 +0000 (16:24 -0700)]

update test distributed (#62796)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62796

Fixes #62380

* update test functions to call wheel install folder {sitepackages}/torch instead of build/ folder
* add symbolic link for shared libraries which are called by the tests (this is a bit hacky and should be fixed the rpath before compiling -- similar to https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/test.sh#L204-L208).

### Test plan
check if all ci workflows pass

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30193142

Pulled By: tktrungna

fbshipit-source-id: 1247f9eda1c11c763c31c7383c77545b1ead1a60

commit | commitdiff | tree

tktrungna [Tue, 10 Aug 2021 23:24:57 +0000 (16:24 -0700)]

update test_vulkan (#62795)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62795

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D30124421

Pulled By: tktrungna

fbshipit-source-id: 235ba166b02f7334e89cb2493024067851bf5b9b

commit | commitdiff | tree

tktrungna [Tue, 10 Aug 2021 23:24:57 +0000 (16:24 -0700)]

update test_rpc (#62781)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62781

Test Plan: Imported from OSS

Reviewed By: walterddr, zhouzhuojie

Differential Revision: D30124391

Pulled By: tktrungna

fbshipit-source-id: 99c275d6c9f23b4f274fd0ca19a16879ed27afd5

commit | commitdiff | tree

Matej Sladek [Tue, 10 Aug 2021 23:19:39 +0000 (16:19 -0700)]

[ONNX] add support for prim::Unitialized in lower_tuples pass (#56912)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/56911

Code from issue generates this Torchscript:
```
graph(%self : __torch__.MyModule,
      %t.1 : Tensor):
  %12 : None = prim::Constant()
  %7 : str = prim::Constant[value="Negative input"]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:11:28
  %3 : int = prim::Constant[value=0]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:15
  %9 : int = prim::Constant[value=5]() # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:31
  %33 : (Tensor, Tensor) = prim::Uninitialized()
  %4 : Tensor = aten::lt(%t.1, %3) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:11
  %6 : bool = aten::Bool(%4) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:11
  %34 : (Tensor, Tensor) = prim::If(%6) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:10:8
    block0():
       = prim::RaiseException(%7) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:11:12
      -> (%33)
    block1():
      %11 : int[] = prim::ListConstruct(%9)
      %16 : Tensor = aten::zeros(%11, %12, %12, %12, %12) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:19
      %18 : int[] = prim::ListConstruct(%9)
      %23 : Tensor = aten::zeros(%18, %12, %12, %12, %12) # /mnt/nvdl/usr/msladek/notes/python_code/unitialized.py:13:35
      %24 : (Tensor, Tensor) = prim::TupleConstruct(%16, %23)
      -> (%24)
  return (%34)
```

Problem is that onnx exporter during lower_tuples pass doesn't support forwarding of tuples in prim::Unitialized.
Solution is:
1. add prim::Unitialized to supported_op in lower_tuples pass
1. As prim::Unitialized has now multiple outputs, we should call giveFreshAlias for every output

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56912

Reviewed By: nikithamalgifb

Differential Revision: D29837200

Pulled By: SplitInfinity

fbshipit-source-id: 321fae6fe52b1523df5653dbb9ea73b998ef1cda

commit | commitdiff | tree

Howard Huang [Tue, 10 Aug 2021 22:56:18 +0000 (15:56 -0700)]

Remove process_group_agent and faulty_process_group_agent files (#62985)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62985

Remove the process_group_agent and faulty_process_group_agent code now that PROCESS_GROUP backend has been deprecated for RPC (https://github.com/pytorch/pytorch/issues/55615). Discussed with xush6528 that it was okay to remove ProcessGroupAgentTest and ProcessGroupAgentBench which depended on process_group_agent.

Test Plan: CI tests

Reviewed By: pritamdamania87

Differential Revision: D30195576

fbshipit-source-id: 8b4381cffadb868b19d481198015d0a67b205811

commit | commitdiff | tree

Natalia Gimelshein [Tue, 10 Aug 2021 22:44:09 +0000 (15:44 -0700)]

fix sort and topk with discontiguous out (#63029)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/62645 and https://github.com/pytorch/pytorch/issues/62940. The root cause of those bugs is in the bad interaction between `collapseDims` and setting the size of sorting/topK dimension to 1. If all other dimensions happen to be 1, `collapseDims` thinks that that `1` dimension is collapsible (even though it was specifically marked to be preserved) and loses its stride information. If dimension was really of size 1, the stride information would be unimportant, but since in reality that dimension is not 1 and was set to 1 for convenience, the loss of stride information results in incorrect outputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63029

Reviewed By: heitorschueroff

Differential Revision: D30224925

Pulled By: ngimel

fbshipit-source-id: 269dd375c5cd57c6007fe91f729f8c60a2e7a264

commit | commitdiff | tree

Hanton Yang [Tue, 10 Aug 2021 22:15:23 +0000 (15:15 -0700)]

[iOS] enable Metal in the nightly build (#62855)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62855

Test Plan: Test on Private Pod with the [HelloWorld](https://fburl.com/3hiwkkhm) demo

Reviewed By: xta0

Differential Revision: D30174151

Pulled By: hanton

fbshipit-source-id: 22cd8663ac239811bf8ed1c3b6301460d798dbfa

commit | commitdiff | tree

Christian Puhrsch [Tue, 10 Aug 2021 22:14:00 +0000 (15:14 -0700)]

test_cudnn_convolution_relu skipCUDAIfRocm

Summary: skip rocm test for test_cudnn_convolution_relu

Test Plan: This skips a test

Reviewed By: ngimel

Differential Revision: D30233620

fbshipit-source-id: 31eab8b03c3f15674e0d262a8f55965c1aa6b809

commit | commitdiff | tree

Victor Quach [Tue, 10 Aug 2021 21:58:16 +0000 (14:58 -0700)]

Add docstring for saved tensors default hooks (#62361)

Summary:
Add documentation for the saved tensors default hooks introduced in https://github.com/pytorch/pytorch/issues/61834 / https://github.com/pytorch/pytorch/issues/62563

Sister PR: https://github.com/pytorch/pytorch/issues/62362 (will add a link from autograd.rst to notes/autograd in whatever PR does not land first)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62361

Reviewed By: zou3519

Differential Revision: D30081997

Pulled By: Varal7

fbshipit-source-id: cb923e943e1d96db9669c1d863d693af30910c62

commit | commitdiff | tree

Tao Xu [Tue, 10 Aug 2021 21:32:11 +0000 (14:32 -0700)]

[iOS][CI] Store every version of nightlies in S3 (#63039)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63039

Test Plan: Imported from OSS

Reviewed By: hanton

Differential Revision: D30229385

Pulled By: xta0

fbshipit-source-id: 15b438a6326159258803ab97e67dc9ec5db50d59

commit | commitdiff | tree

Jerry Zhang [Tue, 10 Aug 2021 20:57:14 +0000 (13:57 -0700)]

[quant][graphmode] Reference pattern support for elu (#62607)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62607

Removing the quantize handler for elu since it can be covered by DefaultNodeQuantizeHandler

Test Plan:
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30053977

fbshipit-source-id: 426789443e928bb01a88907de616cbda5866f621

commit | commitdiff | tree

kshitij12345 [Tue, 10 Aug 2021 20:55:37 +0000 (13:55 -0700)]

[fix] TestMultiThreadAutograd: propagate exception from child thread to main thread (#63018)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/62895

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63018

Reviewed By: anjali411

Differential Revision: D30225856

Pulled By: Varal7

fbshipit-source-id: b5dd7999de5060e06f8958ea3ce49e0b74110971

commit | commitdiff | tree

Amy He [Tue, 10 Aug 2021 20:36:02 +0000 (13:36 -0700)]

[1/N] Nnapi backend execute and compile (#62272)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62272

Added Android NNAPI delegate implementation of runtime initialization (compilation) and execution.
The delegate's preprocess step was [previously implemented](https://github.com/pytorch/pytorch/pull/62225). Now, the reset of the delegate, which implements client-side execution, is added.

**nnapi_backend_lib.cpp**:
Implementation of delegate's compile and execute.
`execute()` is essentially a C++ implementation of [`NnapiModule`](https://github.com/pytorch/pytorch/blob/master/torch/backends/_nnapi/prepare.py), which wraps an NNAPI Compilation and handles preparation of weights, inputs, and outputs.
- Any steps that can be done before execution are moved to `compile()`.
- `init()` cannot be moved to `compile()` because it requires real inputs for dynamic shaping.
- `shape_compute_module` cannot currently be deserialized in `compile()`, since mobile::Module has no IValue conversion.
- Processed arguments that are modified by `init()` must be kept as member variables. Any other processed arguments are passed through a dictionary, `handles`.

**nnapi_bind.cpp & nnapi_bind.h**:
Created a header file for `nnapi_bind.cpp`, so that it's NnapiCompilation class can be used by `nnapi_backend_lib.cpp`.
**test_backend_nnapi.py**:
Enabled execution testing.
ghstack-source-id: 135432844

Test Plan:
Imported from OSS

Tested on devserver.
1. Load and unpack a special devserver build of NNAPI: `jf download GICWmAAzUR0eo20TAPasVts8ObhobsIXAAAz --file "nnapi-host-linux.tar.xz"`
2. `export LIBNEURALNETWORKS_PATH=/path/to/libneuralnetworks.so`
3. Run unittests: `python test/test_jit.py TestNnapiBackend` and `python test/test_nnapi.py`

TODO: test with lite interpreter runtime

Reviewed By: raziel, iseeyuan

Differential Revision: D29944873

fbshipit-source-id: 48967d873e79ef2cce9bcba2aeea3c52f7a18c07

commit | commitdiff | tree

CaoE [Tue, 10 Aug 2021 20:21:22 +0000 (13:21 -0700)]

Add BFloat16 support for unique and unique_consecutive on CPU (#62559)

Summary:
Add BFloat16 support for unique and unique_consecutive on CPU.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62559

Reviewed By: anjali411

Differential Revision: D30199482

Pulled By: ngimel

fbshipit-source-id: 6f2d9cc1a528bea7c723139a4f1b14e4b2213601

commit | commitdiff | tree

Jerry Zhang [Tue, 10 Aug 2021 19:16:00 +0000 (12:16 -0700)]

[quant][refactor] Checking activation_dtype instead of activation_post_process (#62489)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62489

Addressing comment from previous PR: https://github.com/pytorch/pytorch/pull/62374#discussion_r679354145

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30053980

fbshipit-source-id: 79c216410282eccd6f0a8f24e38c55c4d18ec0d0

commit | commitdiff | tree

Raghav Kansal [Tue, 10 Aug 2021 17:59:43 +0000 (10:59 -0700)]

LU solve uses cuBLAS and cuSOLVER for matrices with dim > 1024 (#61815)

Summary:
This PR builds off of https://github.com/pytorch/pytorch/issues/59148 and modifies the `lu_solve` routine to avoid MAGMA for `b` or `lu_data` matrices with any dimension > 1024, since MAGMA has a bug when dealing with such matrices (https://bitbucket.org/icl/magma/issues/19/dgesv_batched-dgetrs_batched-fails-for).
Fixes https://github.com/pytorch/pytorch/issues/36921
Fixes https://github.com/pytorch/pytorch/issues/61929

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61815

Reviewed By: anjali411

Differential Revision: D30199618

Pulled By: ngimel

fbshipit-source-id: 06870793f697e9c35aaaa8254b8a8b1a38bd3aa9

commit | commitdiff | tree

Wanchao Liang [Tue, 10 Aug 2021 17:56:41 +0000 (10:56 -0700)]

[sharded_tensor] add default fields to ShardedTensorMetadata (#62867)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62867

This add default fields for ShardedTensorMetadata, to allow easy construction and modification afterwards.
ghstack-source-id: 135284133

Test Plan: ShardedTensorMetadata validity should be guarded with `init_from_local_shards` API and its tests.

Reviewed By: pritamdamania87

Differential Revision: D30148481

fbshipit-source-id: 0d99f41f23dbeb4201a36109556ba23b9a6c6fb1

commit | commitdiff | tree

Rohan Varma [Tue, 10 Aug 2021 17:46:50 +0000 (10:46 -0700)]

[DDP] Dont set thread local state in reducer autograd hook. (#62996)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62996

No need to set this because autograd engine already propagates TLS
states.
ghstack-source-id: 135438220

Test Plan: CI

Reviewed By: albanD

Differential Revision: D30202078

fbshipit-source-id: e5e917269a03afd7a6b8e61f28b45cdb71ac3e64

commit | commitdiff | tree

Pyre Bot Jr [Tue, 10 Aug 2021 17:22:43 +0000 (10:22 -0700)]

[typing] suppress errors in `fbcode/caffe2` - batch 2

Test Plan: Sandcastle

Differential Revision: D30222378

fbshipit-source-id: 6a0a5d210266f19de63273240a080365c9143eb0

commit | commitdiff | tree