Jerry Zhang [Wed, 18 Aug 2021 14:36:47 +0000 (07:36 -0700)]
[fx2trt] Add dequantize support (#63448)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63448
Only available after TensorRT 8.0
Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_dequantize
Reviewed By:
842974287
Differential Revision:
D30296863
fbshipit-source-id:
44b9630ef0d210e7f20e650dc81c519f7e41f5f3
Philip Meier [Wed, 18 Aug 2021 14:36:22 +0000 (07:36 -0700)]
add `OpInfo` for `torch.linalg.tensorinv` (#62326)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53739.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62326
Reviewed By: H-Huang
Differential Revision:
D30136376
Pulled By: zou3519
fbshipit-source-id:
04ec9450e8866667649af401c7559b96ddc91491
JackCaoG [Wed, 18 Aug 2021 13:42:51 +0000 (06:42 -0700)]
Update cuda amp to also check xla device (#63413)
Summary:
Fixes https://github.com/pytorch/xla/issues/3086. Pytorch/XLA:GPU also use cuda amp. I verified the pt/xla `test_autocast` with this fix and all test passed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63413
Reviewed By: ngimel
Differential Revision:
D30380785
Pulled By: bdhirsh
fbshipit-source-id:
fd1a1de7d224c616fc3fa90b80a688a21f6b1ecc
CodemodService FBSourceClangFormatLinterBot [Wed, 18 Aug 2021 11:18:47 +0000 (04:18 -0700)]
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT`
Reviewed By: zertosh
Differential Revision:
D30391472
fbshipit-source-id:
d4eb1e7debea8905e7fee5f026c082bee65e78f3
Michael Dagitses [Wed, 18 Aug 2021 11:04:43 +0000 (04:04 -0700)]
enhance comparison tests for c10::optional (#62887)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62887
Reviewed By: VitalyFedyunin
Differential Revision:
D30305044
Pulled By: dagitses
fbshipit-source-id:
d0a3a9e4ea186915ef087543aaf81a606f943380
Michael Dagitses [Wed, 18 Aug 2021 10:59:51 +0000 (03:59 -0700)]
clarify the documentation of `torch.meshgrid` (#62977)
Summary:
Also warn about the behavior differences from `numpy.meshgrid`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62977
Reviewed By: mruberry, ngimel
Differential Revision:
D30220930
Pulled By: dagitses
fbshipit-source-id:
ae6587b41792721cae2135376c58121b4634e296
Pritam Damania [Wed, 18 Aug 2021 08:58:05 +0000 (01:58 -0700)]
[5/N] Run opt-asan with detect_leaks=0 (#63361)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63361
Python multiprocessing doesn't support LSAN and causes false positives
instead. As a result, disabling LSAN for these tests so that we can still run
with opt-asan
ghstack-source-id:
135962489
Test Plan: waitforbuildbot
Reviewed By: rohan-varma
Differential Revision:
D30352269
fbshipit-source-id:
f6ab5abce7bdef00cd5e1f5977424d2b151174af
Wanchao Liang [Wed, 18 Aug 2021 06:10:48 +0000 (23:10 -0700)]
[sharded_tensor] fix typing issue for placement (#63426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63426
placement should either be a string or a _remote_device, this fixes the type to match the behaviors
ghstack-source-id:
136041125
Reviewed By: pritamdamania87
Differential Revision:
D30379702
fbshipit-source-id:
34e226494240923b433e3a39cc08c84d42cdad6b
Pavithran Ramachandran [Wed, 18 Aug 2021 05:26:22 +0000 (22:26 -0700)]
[easy][PyTorchEdge] print error message when failing to load model file (#63404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63404
# Context
Loading a model file using `fopen` might error out for multiple reasons. Repro'ing the error on devices takes some time and efforts. Logging the error no# will help in debugging and fixing the error quickly.
# Mitigation
Printout the error message of the `fopen` to help users debug the issue.
Test Plan:
```
(base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] buck run xplat/caffe2/fb/lite_predictor:lite_predictor -- --model=/home/pavithran/models/prod/GAaNhAoTIV6cIvgJAHn30m8NR1QgbmQwAAAA.ptl --use_bundled_input=0
Building: finished in 0.5 sec (100%) 354/354 jobs, 0/354 updated
Total time: 0.6 sec
Run with 24 threads
Run with 24 threads
Loading model...
terminate called after throwing an instance of 'c10::Error'
what(): open file failed because of errno 2 on fopen: No such file or directory, file path: /home/pavithran/models/prod/GAaNhAoTIV6cIvgJAHn30m8NR1QgbmQwAAAA.ptl
Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:15 (most recent call first):
(no backtrace available)
```
Reviewed By: dhruvbird
Differential Revision:
D30372308
fbshipit-source-id:
5346e828f53f6bc5d871b403586566a3332a389a
Jerry Zhang [Wed, 18 Aug 2021 04:35:55 +0000 (21:35 -0700)]
[fx2trt] Add quantize_per_tensor support (#63447)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63447
Only available in TRT 8.0 and above
Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_quantize_per_tensor
Reviewed By:
842974287
Differential Revision:
D30322844
fbshipit-source-id:
dfd925e3432de128f2925b1aa55d6125e63359af
Shen Li [Wed, 18 Aug 2021 03:12:51 +0000 (20:12 -0700)]
Fix RPC Python User Function Error Handling (#63406)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63406
The `RemoteException` will be thrown on the caller side when converting
the response message to IValue. Since it is a Python error, the error
message needs to be extracted explicitly and clear the `PyErr`.
Test Plan: Imported from OSS
Reviewed By: rohan-varma, ngimel
Differential Revision:
D30372741
Pulled By: mrshenli
fbshipit-source-id:
1f72a7ee0c39cc2ef070f99884c142f7b3e0543d
Aliaksandr Ivanou [Wed, 18 Aug 2021 02:54:30 +0000 (19:54 -0700)]
[torch] Set default log level for torch elastic (#63214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63214
The default log level in fb and oss is different: in oss we use WARNING and in fb we use INFO.
Test Plan: unittests,
f291441502
Reviewed By: cbalioglu
Differential Revision:
D30296298
fbshipit-source-id:
89067352be767255fbc66e790ec333582de64c6c
Rohan Varma [Wed, 18 Aug 2021 00:12:32 +0000 (17:12 -0700)]
[BE] remove _SUPPORTED_OPTIM_MAP from tests (#63383)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63383
Per title
ghstack-source-id:
135966157
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision:
D30358921
fbshipit-source-id:
965e054e525194b1ee55980340df275bab355c9b
Rohan Varma [Wed, 18 Aug 2021 00:12:32 +0000 (17:12 -0700)]
[DDP] Support step_param for AdamW (#63382)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63382
Per title
ghstack-source-id:
135966156
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision:
D30255446
fbshipit-source-id:
e6ffbf339db0bc5b4702d02b74a462309df07c75
Jerry Zhang [Tue, 17 Aug 2021 23:54:09 +0000 (16:54 -0700)]
[quant][graphmode][fx][fix] Fix quantization for tuple arguments (#63376)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63376
Previously when tuple is an argument for a quantizable op it would be transformed to a list by mistake,
this PR fixes that.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_preserve_tuple
Imported from OSS
Reviewed By: raghuramank100
Differential Revision:
D30357642
fbshipit-source-id:
82d10805d9c00c003cc99983dca68b6455ff7b2e
zhouzhuojie [Tue, 17 Aug 2021 23:53:08 +0000 (16:53 -0700)]
Add more ciflow labels for more workflows (#63410)
Summary:
- Add more ciflow labels and enable it for more workflows.
- Only the 'ciflow/default' workflows are run by default on pull_request time
- Other labels can be manually triggered by (adding the labels + unassign pytorchbot), OR wait for pytorchbot's comment opt-in rollout
- The label design is a logical operator `OR`, i.e. adding ('ciflow/cuda' + 'ciflow/win') will trigger the union of them. (design feedback is needed here)
Typical default workflows for normal PRs.
<details>
<summary>Generated label rules</summary>
![image](https://user-images.githubusercontent.com/658840/
129779905-
eb5e56dd-a696-4040-9eb6-
71ecb6487dc1.png)
```
{
"label_rules": {
"ciflow/all": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-cuda10.2-py3.6-gcc7",
"linux-xenial-cuda11.1-py3.6-gcc7",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"periodic-linux-xenial-cuda11.3-py3.6-gcc7",
"periodic-win-vs2019-cuda11.3-py3",
"win-vs2019-cpu-py3",
"win-vs2019-cuda10.1-py3",
"win-vs2019-cuda11.1-py3"
],
"ciflow/bazel": [
"linux-xenial-py3.6-gcc7-bazel-test"
],
"ciflow/coverage": [
"linux-bionic-py3.8-gcc9-coverage"
],
"ciflow/cpu": [
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"win-vs2019-cpu-py3"
],
"ciflow/cuda": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-xenial-cuda10.2-py3.6-gcc7",
"linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"periodic-linux-xenial-cuda11.3-py3.6-gcc7",
"periodic-win-vs2019-cuda11.3-py3",
"win-vs2019-cuda10.1-py3",
"win-vs2019-cuda11.1-py3"
],
"ciflow/default": [
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-cuda11.1-py3.6-gcc7",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"win-vs2019-cpu-py3",
"win-vs2019-cuda10.1-py3"
],
"ciflow/libtorch": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7"
],
"ciflow/linux": [
"libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
"libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-bionic-py3.8-gcc9-coverage",
"linux-xenial-cuda10.2-py3.6-gcc7",
"linux-xenial-cuda11.1-py3.6-gcc7",
"linux-xenial-py3.6-gcc5.4",
"linux-xenial-py3.6-gcc7-bazel-test",
"periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"periodic-linux-xenial-cuda11.3-py3.6-gcc7"
],
"ciflow/scheduled": [
"periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
"periodic-linux-xenial-cuda11.3-py3.6-gcc7",
"periodic-win-vs2019-cuda11.3-py3"
],
"ciflow/slow": [
"linux-bionic-cuda10.2-py3.9-gcc7",
"linux-xenial-cuda10.2-py3.6-gcc7"
],
"ciflow/win": [
"periodic-win-vs2019-cuda11.3-py3",
"win-vs2019-cpu-py3",
"win-vs2019-cuda10.1-py3",
"win-vs2019-cuda11.1-py3"
]
},
"version": "v1"
}
```
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63410
Reviewed By: ngimel
Differential Revision:
D30378553
Pulled By: zhouzhuojie
fbshipit-source-id:
4e0953740793e5e72b95018f8ab2ce4a6a364c38
Masaki Kozuki [Tue, 17 Aug 2021 23:51:34 +0000 (16:51 -0700)]
`F.avg_pool3` CUDA backward: gpuAtomicAddNoReturn -> fastAtomicAdd (#63387)
Summary:
Rel: https://github.com/pytorch/pytorch/issues/62695
In the following two tables, I set `kernel_size` to 3 and `stride` to 2.
In benchmark, input tensors have the shape of (N, C, n_features, n_features, n_features).
Tested on RTX3080 w/ CUDA11.4 Update 1.
## This PR
| N | C | n_features | dtype | time |
|----:|----:|-------------:|:--------------|------------:|
| 32 | 3 | 8 | torch.float16 | 7.46846e-05 |
| 32 | 3 | 8 | torch.float32 | 8.18968e-05 |
| 32 | 3 | 32 | torch.float16 | 0.
000156748 |
| 32 | 3 | 32 | torch.float32 | 0.
000165236 |
| 32 | 3 | 128 | torch.float16 | 0.
00549854 |
| 32 | 3 | 128 | torch.float32 | 0.008926 |
## master (6acd87f)
| N | C | n_features | dtype | time |
|----:|----:|-------------:|:--------------|------------:|
| 32 | 3 | 8 | torch.float16 | 7.60436e-05 |
| 32 | 3 | 8 | torch.float32 | 7.55072e-05 |
| 32 | 3 | 32 | torch.float16 | 0.
000189292 |
| 32 | 3 | 32 | torch.float32 | 0.
000168645 |
| 32 | 3 | 128 | torch.float16 | 0.
00699538 |
| 32 | 3 | 128 | torch.float32 | 0.
00890226 |
master's time divided by PR's time is as follows:
| N | C | n_features | master / PR |
|---:|---:|---------------:|----------------:|
| 32 | 3 | 8 | 1.018 |
| 32 | 3 | 32 | 1.208 |
| 32 | 3 | 128 | 1.272|
cc: xwang233 ptrblck ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63387
Reviewed By: mruberry
Differential Revision:
D30381434
Pulled By: ngimel
fbshipit-source-id:
3b97aee4b0d457a0277a0d31ac56d4151134c099
Nikita Shulga [Tue, 17 Aug 2021 22:28:45 +0000 (15:28 -0700)]
Add pocketfft as submodule (#62841)
Summary:
Using https://github.com/mreineck/pocketfft
Also delete explicit installation of pocketfft during the build as it will be available via submodule
Limit PocketFFT support to cmake-3.10 or newer, as `set_source_files_properties` does not seem to work as expected with cmake-3.5
Partially addresses https://github.com/pytorch/pytorch/issues/62821
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62841
Reviewed By: seemethere
Differential Revision:
D30140441
Pulled By: malfet
fbshipit-source-id:
d1a1cf1b43375321f5ec5b3d0b538f58082f7825
Rohan Varma [Tue, 17 Aug 2021 22:01:21 +0000 (15:01 -0700)]
[wip] Move smallest bucket to end after rebuild buckets (#62279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62279
Before rebuild buckets, `kDefaultFirstBucketBytes` is actually misleading because we reverse the parameter indices when initialize reducer so it is actually the size of the last bucket.
Currently rebuild buckets sets this to be the first bucket size, but seeing if keeping it as last can help perf.
This is currently experimental only and don't plan to land it unless experiments show a clear win.
ghstack-source-id:
135966897
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision:
D29927931
fbshipit-source-id:
55b949986fa2c3bade6fcb4bf5b513461bf0f490
Kevin Tse [Tue, 17 Aug 2021 21:46:22 +0000 (14:46 -0700)]
adding a note to the documentation of polar (#63259)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63259
Fix #52919
Test Plan: Imported from OSS
Reviewed By: anjali411
Differential Revision:
D30342536
Pulled By: NivekT
fbshipit-source-id:
4c61a86f96a6370cc64652bf652c4ae25c9f4601
Jerry Zhang [Tue, 17 Aug 2021 21:40:19 +0000 (14:40 -0700)]
[quant][graphmode][fx][bc-breaking] Support for reference pattern for fixqparam ops in eval mode (#62608)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62608
Insert extra fixeqparam fake quant in the output of fixed qparam ops in fbgemm e.g. sigmoid
so that we can produce reference patterns for these ops
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: iramazanli
Differential Revision:
D30053978
fbshipit-source-id:
c527944b6e791bb4d45ebe96265af52794203695
Dhruv Matani [Tue, 17 Aug 2021 21:39:04 +0000 (14:39 -0700)]
Revert
D30281388: [PyTorch] Avoid using std::regex for device string parsing in Device.cpp
Test Plan: revert-hammer
Differential Revision:
D30281388 (https://github.com/pytorch/pytorch/commit/
4d6f98ecada2d85b2474b023838debad4305316d)
Original commit changeset:
4d998e9f313e
fbshipit-source-id:
11134b3400cc3e851155c9c1b6fb59308ff1567b
Richard Zou [Tue, 17 Aug 2021 20:39:52 +0000 (13:39 -0700)]
Fix zero-dim handling in torch.matmul (#63359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63359
Fixes #63352. The problem was that in e.g. `torch.matmul(A, B)` with A,
B having shapes [3, 2, 0] and [0, 2], the code attempts to call
`A.view(-1, 0)` which fails due to "-1 being ambiguous". The solution is
to manually compute what we want the shape of the view to be.
Test Plan: - new tests
Reviewed By: ngimel
Differential Revision:
D30351583
Pulled By: zou3519
fbshipit-source-id:
7625691fe8b85d96a4073409596a932c303e3e8c
Mikhail Zolotukhin [Tue, 17 Aug 2021 20:39:36 +0000 (13:39 -0700)]
[TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195
This helps us to later switch from using KernelArena with raw pointers
to shared pointers without having to change all our source files at
once.
The changes are mechanical and should not affect any functionality.
With this PR, we're changing the following:
* `Add*` --> `AddPtr`
* `new Add(...)` --> `alloc<Add>(...)`
* `dynamic_cast<Add*>` --> `to<Add>`
* `static_cast<Add*>` --> `static_to<Add>`
Due to some complications with args forwarding, some places became more
verbose, e.g.:
* `new Block({})` --> `new Block(std::vector<ExprPtr>())`
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision:
D30292779
Pulled By: ZolotukhinM
fbshipit-source-id:
150301c7d2df56b608b035827b6a9a87f5e2d9e9
Kushashwa Ravi Shrimali [Tue, 17 Aug 2021 20:35:32 +0000 (13:35 -0700)]
OpInfo fix: `conv_transpose2d` (#63389)
Summary:
Addresses comment: https://github.com/pytorch/pytorch/pull/62882#issuecomment-
899679606.
cc: mruberry ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63389
Reviewed By: mruberry
Differential Revision:
D30377481
Pulled By: ngimel
fbshipit-source-id:
0fa21acc3503c259c9b27463e8555247c43d9e2e
Mike Iovine [Tue, 17 Aug 2021 20:34:44 +0000 (13:34 -0700)]
[Static Runtime] Implement aten::append (#63350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63350
Add a native implementation for `aten::append`, the list append op.
Test Plan: New unit test: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Append`
Reviewed By: hlu1
Differential Revision:
D30326461
fbshipit-source-id:
0dbdf6cc82e78c7c36db39583256f6b87385e3d3
Ivan Kobzarev [Tue, 17 Aug 2021 20:34:20 +0000 (13:34 -0700)]
[vulkan] Add log_softmax (#63193)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63193
Test Plan: Imported from OSS
Reviewed By: SS-JIA
Differential Revision:
D30291987
fbshipit-source-id:
89c6560274e5a841e5af249f6963b67ef6826f4c
Supriya Rao [Tue, 17 Aug 2021 18:39:16 +0000 (11:39 -0700)]
[quant][fx] Ensure qconfig works for QAT with multiple modules (#63343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63343
The previous implementation had a bug where we were trying to modify an ordered dict value while iterating through it.
This fixes it by creating a copy before modifying it.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qconfig_qat_module_type
Imported from OSS
Reviewed By: raghuramank100
Differential Revision:
D30346116
fbshipit-source-id:
0e33dad1163e8bff3fd363bfd04de8f7114d7a3a
Yi Wang [Tue, 17 Aug 2021 18:28:43 +0000 (11:28 -0700)]
Add return type hint and improve the docstring of consume_prefix_in_state_dict_if_present method (#63388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63388
Context: https://discuss.pytorch.org/t/how-to-use-the-helper-function-consume-prefix-in-state-dict-if-present/129505/3
Make it clear that this method strips the prefix in place rather than returns a new value.
Additional reformatting is also applied.
ghstack-source-id:
135973393
Test Plan: waitforbuildbot
Reviewed By: rohan-varma
Differential Revision:
D30360931
fbshipit-source-id:
1a0c7967a4c86f729e3c810686c21dec43d1dd7a
Elias Ellison [Tue, 17 Aug 2021 18:21:50 +0000 (11:21 -0700)]
Add handling of ifs to shape propagation (#62914)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62914
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision:
D30196945
Pulled By: eellison
fbshipit-source-id:
1c0c7f938c4547330fd1dba8ab7dd0b99a79b6a9
Elias Ellison [Tue, 17 Aug 2021 18:21:50 +0000 (11:21 -0700)]
Small shape analysis changes (#62911)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62911
Test Plan: Imported from OSS
Reviewed By: anjali411
Differential Revision:
D30196946
Pulled By: eellison
fbshipit-source-id:
2562bab323088d9c1440ae0431e533f9bcc513d3
Elias Ellison [Tue, 17 Aug 2021 18:21:50 +0000 (11:21 -0700)]
Add a few peepholes (#62910)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62910
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision:
D30196947
Pulled By: eellison
fbshipit-source-id:
d88c92616d4de4f47ff4fcf5c1994e629ca20395
Elias Ellison [Tue, 17 Aug 2021 18:21:50 +0000 (11:21 -0700)]
Propagate symbolic dimensions through idioms like x.view(y.size()) (#61975)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61975
Propagate symbolic dimensions through size calls. We did this by associating SymbolicSizes with integer inputs by looking through their constructors for `x.size(1)` or `x.size()` nodes.
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision:
D30196948
Pulled By: eellison
fbshipit-source-id:
377fc1d2f6d396c52dc0e87fa814b15720f1414e
Jerry Zhang [Tue, 17 Aug 2021 17:41:38 +0000 (10:41 -0700)]
[fx2trt] Refactor linear op to use mm + add
Summary:
Previously linear is translated to fully_connected which only works when weight is a constant,
this diff changes that to mm + add so that the weight can be an ITensor so that we can have the weight - quantize - dequantize
pattern in the produced TensorRT network
Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_linear
Reviewed By:
842974287
Differential Revision:
D30294751
fbshipit-source-id:
596fbd4c81caef8df41a002a2e14fbf22d9d2a80
Mike Ruberry [Tue, 17 Aug 2021 17:37:57 +0000 (10:37 -0700)]
Updates set_default_dtype documentation (#63233)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60560.
The description of set_default_dtype is updated to clarify that it affects the interpretation of Python numbers as either float32 (complex64) or float64 (complex128) and that default (floating) dtypes other than float32 or float64 are unsupported.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63233
Reviewed By: VitalyFedyunin
Differential Revision:
D30306396
Pulled By: mruberry
fbshipit-source-id:
bbee62f323c773b23b2fa45cb99122bc28197432
Amy He [Tue, 17 Aug 2021 17:31:02 +0000 (10:31 -0700)]
Remove backend_debug from torch_core srcs and replace with library dependency (#63111)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63111
### Problem:
Buck contains at least two libraries which have `backend_debug_info.cpp` as a source, `torch_core` and `backend_interface_lib`. `backend_debug_info.cpp` registers BackendDebugInfo as a class. If targets contain both libraries (e.g. sparkAR debug build with NNAPI delegation), then BackendDebugInfo is registered twice, causing a runtime error.
### Solution:
These changes remove `backend_debug_info.cpp` and `backend_interface.cpp` as a source in `torch_core` and adds backend_interface_lib as a dependency instead.
**build_variables.bzl:**
- Added a list that excludes `backend_debug_info.cpp` and `backend_interface.cpp` ( both srcs already included by `backend_interface_lib`)
**buck:**
- torch_core: Removed `backend_debug_info.cpp` from srcs and added `backend_interface_lib` deps
- backend_interface_lib: Replaced `torch_mobile_core` dep with more specific deps
- to avoid an indirect dep between `torch_core` and `torch_mobile_core`
ghstack-source-id:
135981061
Test Plan:
### Test Plan:
Build and run SparkAR internally with Android NNAPI Delegation (`buck build --show-output arstudioplayer_arm64_debug`)
and internal tests.
Reviewed By: iseeyuan
Differential Revision:
D30259034
fbshipit-source-id:
0c14c827732f07fb9b9bd25a999828b51793cdcc
Amy He [Tue, 17 Aug 2021 17:31:02 +0000 (10:31 -0700)]
Move Android Nnapi srcs from aten_native_cpu to aten_cpu (#62919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62919
Move Android NNAPI srcs (nnapi_bind.cpp, nnapi_wrapper.cpp, nnapi_model_loader.cpp) from aten_native_cpu to aten_cpu, so that later the NNAPI delegate's execution library can depend on it.
aten_native_cpu is built selectively per app, but the srcs have no selective components and are required for the NNAPI delegate library in
D30259033.
See Buck Dependencies: https://docs.google.com/document/d/17RuWkqWKCO6sc5fKzIDkGeNhhvMk7BvJOqeSnGsHZ8o/edit?usp=sharing
ghstack-source-id:
135981062
Test Plan: `buck build --show-output arstudioplayer_arm64_debug` and internal tests
Reviewed By: iseeyuan
Differential Revision:
D30164867
fbshipit-source-id:
0beff481ff250e75664ce8393beabbeb9db66770
Ivan Kobzarev [Tue, 17 Aug 2021 17:12:11 +0000 (10:12 -0700)]
[android][vulkan] Fix model loading for Vulkan backend (#63402)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63402
Test Plan: Imported from OSS
Reviewed By: SS-JIA
Differential Revision:
D30370692
Pulled By: IvanKobzarev
fbshipit-source-id:
73311b9b767fe9ed3ae390db59d6aa2c4a98f06d
Peter Bell [Tue, 17 Aug 2021 17:11:05 +0000 (10:11 -0700)]
Advertise USE_PRECOMPILED_HEADERS in CONTRIBUTING.md (#62827)
Summary:
This option was added in https://github.com/pytorch/pytorch/issues/61940 and fits with this section's theme of improving build times.
I've also changed it to a `cmake_dependent_option` instead of `FATAL_ERROR`ing for older CMake versions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62827
Reviewed By: astaff
Differential Revision:
D30342102
Pulled By: malfet
fbshipit-source-id:
3095b44b7085aee8a884ec95cba9f8998d4442e7
Bradley Davis [Tue, 17 Aug 2021 16:55:25 +0000 (09:55 -0700)]
[fx] persist `tracer_cls` on `fx.Graph` when deep copying (#63353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63353
Custom deepcopy method copies all nodes but does not copy the tracer_cls attribute
Reviewed By: houseroad
Differential Revision:
D30349424
fbshipit-source-id:
3e98bdac8a8a992eb0b4ec67fe80bb2e5cf3884d
Dhruv Matani [Tue, 17 Aug 2021 16:20:49 +0000 (09:20 -0700)]
[PyTorch] Avoid using std::regex for device string parsing in Device.cpp (#63204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63204
Currently, `std::regex` is used for parsing device strings. This is undesirable for a few reasons.
1. Increases binary size
2. Slows down model loading
3. Potentially uses more memory at runtime
4. Takes marginally longer time to build code that uses std::regex v/s not using std::regex
This change avoids the use of `std::regex` for parsing the device string since we don't need to.
ghstack-source-id:
136006963
Test Plan:
### AI Bench Runs
**Before this change:**
1. Model Load time: [252ms](https://www.internalfb.com/intern/aibench/details/
332471502816548)
2. Model unload time: 3.5ms
**After this change:**
1. Model Load time: [240ms](https://www.internalfb.com/intern/aibench/details/
652195589031318), which is an approx 5% reduction for the current model. I suspect percentage wise, it will be larger for smaller models since this is a fixed cost reduction.
2. Model unload time: 3.3ms (probably too small to be meaningfully impactful to an end user).
### BSB Results
```
D30281388-V1 (https://www.internalfb.com/intern/diff/
D30281388/?dest_number=
135713848)
messenger-pika-optimized-device: Succeeded
Change in Download Size for arm64 + 3x assets variation: -7.1 KiB
Change in Uncompressed Size for arm64 + 3x assets variation: -17.6 KiB
Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:
551399955987465@base/bsb:
551399955987465@diff/
```
Reviewed By: raziel
Differential Revision:
D30281388
fbshipit-source-id:
4d998e9f313e6366d9d89a6a73cd090ddfb059fc
Dhruv Matani [Tue, 17 Aug 2021 16:20:49 +0000 (09:20 -0700)]
[PyTorch] Add Device_test.cpp (#63203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63203
Currently, `c10::Device` isn't being tested - i.e. there's no test to ensure that the device string parsing works as expected. This diff adds very basic tests to assert that the stuff we expect to work works, and the stuff that we don't expect to work doesn't work.
ghstack-source-id:
136006962
Test Plan:
New test. Ran as:
```
cd fbsource/fbcode/
buck test //caffe2/c10:c10_test_0 -- -r '.*DeviceTest.*'
```
Reviewed By: dreiss, raziel
Differential Revision:
D30286910
fbshipit-source-id:
b5699068dcbba89d5d224dbaf74b175f3f785a00
Taylor Robie [Tue, 17 Aug 2021 16:09:59 +0000 (09:09 -0700)]
change with_callable_args to return a fresh _PartialWrapper (#63374)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63326
Currently `get_callable_args` has the side effect of mutating the input _PartialWrapper. When that input is one of the global defaults, there are all sorts of lifetime issues that crop up. (Details in the linked issue.) So far as I can tell, we only need to make a constructor which is module (and by extension device) aware, so making a fresh one should have the same effect without leaking the last call's module.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63374
Test Plan: the repro in https://github.com/pytorch/pytorch/issues/63326 now reports no leaked Tensors, and all quantization tests pass locally.
Reviewed By: HDCharles
Differential Revision:
D30359360
Pulled By: robieta
fbshipit-source-id:
aef33261ac49952d8d90da868a57ab063dfc456e
Victor Quach [Tue, 17 Aug 2021 15:55:25 +0000 (08:55 -0700)]
Fix flaky test for dp saved tensor hooks (#63324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63324
Fix for https://www.internalfb.com/tasks/?t=
98258963
`catch_warnings` seem to only trigger once in certain cases where it
should trigger twice.
This test is only meant to test whether hooks are trigger / not trigger,
so changing it to self.assertGreater is ok.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision:
D30340833
Pulled By: Varal7
fbshipit-source-id:
1bfb9437befe9e8ab8f95efe5f513337fa9bdc5c
Erjia Guan [Tue, 17 Aug 2021 14:26:08 +0000 (07:26 -0700)]
Add mode to TarArchiveReader (#63332)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63332
Add a corresponding PR from [torchdata](https://github.com/facebookexternal/torchdata/pull/101)
Test Plan: Imported from OSS
Reviewed By: astaff
Differential Revision:
D30350151
Pulled By: ejguan
fbshipit-source-id:
bced4a1ee1ce89d4e91e678327342e1c095dbb9e
Michael Dagitses [Tue, 17 Aug 2021 11:03:02 +0000 (04:03 -0700)]
add torch.meshgrid() OpInfo (#62720)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62719
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62720
Reviewed By: astaff
Differential Revision:
D30344574
Pulled By: dagitses
fbshipit-source-id:
ed42d9fe20741df98018efb08e640fca370583fb
Mike Ruberry [Tue, 17 Aug 2021 05:22:15 +0000 (22:22 -0700)]
Extends warning on norm docs (#63310)
Summary:
torch.norm has a couple documentation issues, like https://github.com/pytorch/pytorch/issues/44552 and https://github.com/pytorch/pytorch/issues/38595, but since it's deprecated this PR simply clarifies that the documentation (and implementation) of torch.norm maybe be incorrect. This should be additional encouragement for users to migrate to torch.linalg.vector_norm and torch.linalg.matrix_norm.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63310
Reviewed By: ngimel
Differential Revision:
D30337997
Pulled By: mruberry
fbshipit-source-id:
0fdcc438f36e4ab29e21e0a64709e4f35a2467ba
Peter Bell [Tue, 17 Aug 2021 05:09:25 +0000 (22:09 -0700)]
Cleanup dead code (#63328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63328
This code supported the old `at::_fft_with_size` operator which no longer exists.
Test Plan: Imported from OSS
Reviewed By: astaff
Differential Revision:
D30343557
Pulled By: mruberry
fbshipit-source-id:
7a71585e013acb46c98f14fd40e15bdfbf026bac
Peter Bell [Tue, 17 Aug 2021 05:09:25 +0000 (22:09 -0700)]
Workaround for cuFFT bug (#63327)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63327
Fixes #63152
Test Plan: Imported from OSS
Reviewed By: astaff
Differential Revision:
D30343558
Pulled By: mruberry
fbshipit-source-id:
68e17a07650f65f397e26efc417e97e2ab302f82
Nikita Shulga [Tue, 17 Aug 2021 03:35:12 +0000 (20:35 -0700)]
Add step to report code coverage from GHA (#63373)
Summary:
Similar to the logic provided in https://github.com/pytorch/pytorch/blob/
b2069e7d01814d776c417042e28133c6b0e5082f/.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml#L197-L201
Fixes https://github.com/pytorch/pytorch/issues/63366
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63373
Reviewed By: walterddr
Differential Revision:
D30357737
Pulled By: malfet
fbshipit-source-id:
20b115eb4d6412bd9895680308a9097742d2ae7b
Mikhail Zolotukhin [Tue, 17 Aug 2021 03:34:49 +0000 (20:34 -0700)]
[TensorExpr] Remove test_train from tensorexpr tests. (#63194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63194
This test implements functionality used nowhere, and the author no
longer works on that. This PR also adds test_approx to CMakeLists where
it's been missing before.
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision:
D30292777
Pulled By: ZolotukhinM
fbshipit-source-id:
ab6d98e729320a16f1b02ea0c69734f5e7fb2554
Don Jang [Tue, 17 Aug 2021 00:30:26 +0000 (17:30 -0700)]
[JIT] Set future's error to current exception as is when `--torch_jit_enable_rethrow_caught_exception=true` (#63348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63348
This change addresses singlaiiit's comment on
D30241792 (https://github.com/pytorch/pytorch/commit/
61b49c8e41a2faf7fd40278ca72616c5d92963cb), which makes the JIT interpreter's behavior consistent between `future` is set and not.
Test Plan: Enhanced `EnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException` to cover the modified code path.
Reviewed By: singlaiiit
Differential Revision:
D30347782
fbshipit-source-id:
79ce57283154ca4372e5341217d942398db21ac8
Don Jang [Mon, 16 Aug 2021 23:50:30 +0000 (16:50 -0700)]
[Static Runtime] Fix a bug that assigns multiple outputs to single storage (#63012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63012
This change fixes a bug that the static runtime's memory optimizer assigns multiple outputs of a node to the same storage. Fixing this bug enables the static runtime to run `inline_cvr` with its memory optimizer enabled.
A problematic line from `inline_cvr` was as follows:
```
%7767 : Tensor, %getitem_6419.1 : Tensor = fb::gather_ranges(%tensor74.1, %7764)
```
where enabling the memory optimizer assigns `%7767` and `%getitem_6419.1` to the same storage, which made their data corrupted during the 2nd iteration.
This change fixed the aforementioned bug by marking all inputs & outputs of a node as `alive` during our liveness analysis. By doing that, no inputs / outputs will collide with each other. I believe this is a fair assumption that most ops' implementation always has, but missing in our analysis before this change.
Test Plan: - Added a unittest `StaticRuntime.ValuesShareSameStorageDoesNotContainOutputsFromSameNode` to cover the new code.
Reviewed By: hlu1
Differential Revision:
D30202018
fbshipit-source-id:
10287a1bee9e86be16a5201e9a7cd7c7f046bab9
Yi Wang [Mon, 16 Aug 2021 23:33:21 +0000 (16:33 -0700)]
[Model Averaging] Add a few member methods of PostLocalSGDOptimizer (#63340)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63340
Some methods are needed such as accessing optimizer states. These are necessary for integration with PyTorch Lightning.
Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id:
135912246
Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD
Reviewed By: rohan-varma
Differential Revision:
D30328794
fbshipit-source-id:
e585b874313bd266fdc7c79936e2af98700c7bad
Hao Lu [Mon, 16 Aug 2021 23:30:53 +0000 (16:30 -0700)]
[PyPer] Skip printing out per node time when do_profile is on (#63256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63256
This suppresses printing out the per node time which is very long when the net has too many ops. It can be easily turned on by setting `--pt_sr_print_per_node_time=1`.
Reviewed By: ajyu, mikeiovine
Differential Revision:
D30298331
fbshipit-source-id:
32b3f93b3fe19d335654168311fda93331a1e706
Amy He [Mon, 16 Aug 2021 22:42:14 +0000 (15:42 -0700)]
Refactor NnapiCompilation registration into it's own file (#63183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63183
Move registration of NnapiCompilation into it's own file, so that `nnapi_bind.cpp` (which contains the implementation of NnapiCompilation) can be moved to `aten_cpu`, while maintaining the selectiveness for registration.
`nnapi_bind.cpp` is moved to `aten_cpu` in https://github.com/pytorch/pytorch/pull/62919. See the PR for more details on why it's needed.
ghstack-source-id:
135900318
Test Plan: Nnapi unit tests: `python test/test_nnapi.py`
Reviewed By: iseeyuan
Differential Revision:
D30288708
fbshipit-source-id:
6ed5967fa6bd018075469d18e68f844d413cf265
Richard Zou [Mon, 16 Aug 2021 22:35:05 +0000 (15:35 -0700)]
Add section to CONTRIBUTING.md explaining developer docs (#63228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63228
It is a quick summary and links to a page on the Developer Wiki that has
more detail.
Test Plan: Imported from OSS
Reviewed By: SplitInfinity
Differential Revision:
D30347109
Pulled By: zou3519
fbshipit-source-id:
a6242986d275e5279ca3f61ade2294a132d268c4
Eli Uriegas [Mon, 16 Aug 2021 22:30:24 +0000 (15:30 -0700)]
test: Add ability to set CONTINUE_THROUGH_ERROR (#63357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63357
Adds the ability to set CONTINUE_THROUGH_ERROR as an environment
variable so that we can easily set it without having to add the flag
directly
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: astaff
Differential Revision:
D30351108
Pulled By: seemethere
fbshipit-source-id:
767fa9bd24e1399f359eb24d16f6cc985a2d7173
Bo Wang [Mon, 16 Aug 2021 22:18:01 +0000 (15:18 -0700)]
Add driver function to run test_sharded_tensor.py and test_sharding_spec.py (#63189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63189
Add main --> run_tests func in test file which is needed to launch the real test cases in OSS flow.
Test Plan:
b/f:
$ python test/distributed/_sharding_spec/test_sharding_spec.py --v ==> nothing happened
$ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v ==> nothing happened
after:
$ python test/distributed/_sharding_spec/test_sharding_spec.py --v ==>
test_chunked_sharding_spec (__main__.TestShardingSpec) ... ok
test_device_placement (__main__.TestShardingSpec) ... ok
test_enumerable_sharding_spec (__main__.TestShardingSpec) ... ok
$ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v
test_complete_world_size (__main__.TestShardedTensorChunked) ... ok
test_insufficient_sharding_dims (__main__.TestShardedTensorChunked) ... ok
test_invalid_pg_rpc_ranks (__main__.TestShardedTensorChunked) ... [W tensorpipe_agent.cpp:699] RPC agent for worker2 encountered error when reading incoming request from worker0: eof (this error originated at tensorpipe/transport/shm/connection_impl.cc:259)
ok
test_invalid_sharding (__main__.TestShardedTensorChunked) ... ok
test_load_state_dict_errors (__main__.TestShardedTensorChunked) ... ok
test_multiple_local_shards (__main__.TestShardedTensorChunked) ... ok
test_new_group (__main__.TestShardedTensorChunked) ... ok
test_partial_world_size (__main__.TestShardedTensorChunked) ... ok
test_sharded_tensor_metadata (__main__.TestShardedTensorChunked) ... ok
test_sharded_tensor_sizes (__main__.TestShardedTensorChunked) ... ok
test_sharding_columns (__main__.TestShardedTensorChunked) ... ok
test_state_dict (__main__.TestShardedTensorChunked) ... ok
test_state_dict_new_group (__main__.TestShardedTensorChunked) ... ok
test_state_dict_no_sharded_tensors (__main__.TestShardedTensorChunked) ... ok
test_grid_sharding (__main__.TestShardedTensorEnumerable) ... ok
test_multiple_local_shards (__main__.TestShardedTensorEnumerable) ... ok
test_new_group (__main__.TestShardedTensorEnumerable) ... ok
test_partial_world_size (__main__.TestShardedTensorEnumerable) ... ok
test_sharded_tensor_metadata (__main__.TestShardedTensorEnumerable) ... ok
test_uneven_shards (__main__.TestShardedTensorEnumerable) ... ok
test_with_rpc_names (__main__.TestShardedTensorEnumerable) ... ok
test_init_from_local_shards (__main__.TestShardedTensorFromLocalShards) ... ok
test_init_from_local_shards_invalid_shards (__main__.TestShardedTensorFromLocalShards) ... ok
test_init_from_local_shards_invalid_shards_gaps (__main__.TestShardedTensorFromLocalShards) ...
Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision:
D30294094
fbshipit-source-id:
08f0431a12ea854abe00dc920205b10ba43ae6b6
Shiyan Deng [Mon, 16 Aug 2021 22:16:51 +0000 (15:16 -0700)]
[fx2trt] add unsqueeze converter (#63355)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63355
Added converter for acc_ops.unsqueeze. Needed for ig model.
DIdn't add support for input that has more than one dynamic dim. This is not needed right now and I feel it would be a rare case.
Test Plan: unit test
Reviewed By: yinghai
Differential Revision:
D30138293
fbshipit-source-id:
899fe8eb68387de83195a2f6e199618d96f09a9e
Mike Iovine [Mon, 16 Aug 2021 21:50:27 +0000 (14:50 -0700)]
[Static Runtime] Implement prim::TupleUnpack (#63243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63243
Add `prim::TupleUnpack` native op to static runtime.
Test Plan: Unit test: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest`
Reviewed By: hlu1
Differential Revision:
D30306955
fbshipit-source-id:
21923d6cbd5545c144ac051b3d48b37ec6e610cf
Jerry Zhang [Mon, 16 Aug 2021 21:07:43 +0000 (14:07 -0700)]
[fx2trt] Factor out add_matrix_multiply_layer
Summary: Factor out the function so that it can be reused in future diffs
Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_matmul
Reviewed By:
842974287
Differential Revision:
D30322823
fbshipit-source-id:
069b945de2c744cdbcca1618b62827692dfb4174
MY_ [Mon, 16 Aug 2021 21:07:06 +0000 (14:07 -0700)]
A re-open PR: Avoid re-creating the random number generator in RandomSampler (#63026)
Summary:
More details can be found in the old pr: https://github.com/pytorch/pytorch/pull/53085
ejguan Thanks for your guidance. I tried to reopen this PR following your instructions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63026
Reviewed By: anjali411
Differential Revision:
D30224920
Pulled By: ejguan
fbshipit-source-id:
2fa83bd4a2661485e553447fe3e57ce723f2716d
Nikita Shulga [Mon, 16 Aug 2021 20:50:44 +0000 (13:50 -0700)]
Improve pip package determination (#63321)
Summary:
Invoking `pip` or `pip3` yields list of packages invoked for `pip` alias on the path, rather than for the one currently being executed. Changed `get_pip_packages` to use `sys.executable + '-mpip'`
Also, add mypy to the list of packages of interest
Discovered while looking at https://github.com/pytorch/pytorch/issues/63279
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63321
Reviewed By: walterddr
Differential Revision:
D30342099
Pulled By: malfet
fbshipit-source-id:
fc8d17cf2ddcf18236cfde5c1b9edb4e72804ee0
Lucas Kabela [Mon, 16 Aug 2021 20:34:56 +0000 (13:34 -0700)]
[Profiler] Change FLOP/s to Total FLOPs (#62779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62779
Change from floating point operations per second to total floating point operations. This requires removing the division by executing time from the Kineto computed FLOPs and updating necessary documentation
Test Plan:
Running the following script:
```
import torch
from torch.profiler import profile
import torchvision.models as models
model = models.resnet18().eval()
inputs = torch.randn(5, 3, 224, 224)
with torch.no_grad():
with profile(record_shapes=True, with_flops=True) as prof:
model(inputs)
print(prof.key_averages().table(sort_by="cpu_time_total"))
```
Before diff results in:
{
F636640118}
And after diff should be about `(27.78 * 10^9) FLOP/s * .652838 seconds =
18135839640 FLOP = 18.136 GFLOP`. Running the script again yields this answer:
{
F636655686}
------------------------------------
Reviewed By: gdankel
Differential Revision:
D29972997
fbshipit-source-id:
0f8d9f264b7d9f8f6bb3f10ab7c2c9794291e28b
zhouzhuojie [Mon, 16 Aug 2021 20:32:40 +0000 (13:32 -0700)]
Fix triage workflow when the card already exists in project (#63347)
Summary:
Fixes issues like https://github.com/pytorch/pytorch/runs/
3336787242
```
RequestError [HttpError]: Validation Failed: {"resource":"ProjectCard","code":"unprocessable","field":"data","message":"Project already has the associated issue"}
Error: Unhandled error: HttpError: Validation Failed: {"resource":"ProjectCard","code":"unprocessable","field":"data","message":"Project already has the associated issue"}
at /home/runner/work/_actions/actions/github-script/v2/dist/index.js:7531:23
at processTicksAndRejections (internal/process/task_queues.js:93:5)
at async eval (eval at callAsyncFunction (/home/runner/work/_actions/actions/github-script/v2/dist/index.js:7985:56), <anonymous>:63:1)
at async main (/home/runner/work/_actions/actions/github-script/v2/dist/index.js:8011:20) {
name: 'HttpError',
status: 422,
...
```
The card may already exist, thus no need to handle `422` status code. Anything else will re-throw the err.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63347
Reviewed By: malfet
Differential Revision:
D30348529
Pulled By: zhouzhuojie
fbshipit-source-id:
36647837bfccad43ce01eb5dfe6642e685615037
kshitij12345 [Mon, 16 Aug 2021 20:26:46 +0000 (13:26 -0700)]
[opinfo] nn.functional.pad (#62814)
Summary:
Reference: https://github.com/facebookresearch/functorch/issues/78
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62814
Reviewed By: VitalyFedyunin
Differential Revision:
D30307492
Pulled By: zou3519
fbshipit-source-id:
4f6062eb4a3c91ed1795df1f82846afa0abafcdc
Sam Estep [Mon, 16 Aug 2021 20:20:59 +0000 (13:20 -0700)]
Add expecttest to requirements.txt (#63320)
Summary:
This PR closes the developer environment gap left by https://github.com/pytorch/pytorch/issues/60658 by adding [expecttest](https://github.com/ezyang/expecttest) to `requirements.txt`. Thus it provides a solution to one of the short-term problems that https://github.com/pytorch/pytorch/issues/60697 tries to solve, but does not provide a long-term solution to https://github.com/pytorch/pytorch/issues/61375.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63320
Reviewed By: malfet
Differential Revision:
D30340654
Pulled By: samestep
fbshipit-source-id:
26c8f8c9889cce4a94fafb1bf2f0d6df4c70503f
kyshel [Mon, 16 Aug 2021 19:12:45 +0000 (12:12 -0700)]
add comma to prevent syntax errors (#62492)
Summary:
Fixes #{issue number}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62492
Reviewed By: VitalyFedyunin
Differential Revision:
D30304684
Pulled By: ezyang
fbshipit-source-id:
db08ca39bcecbfd79ea50df18536bf4e87f51e15
Bert Maher [Mon, 16 Aug 2021 19:10:50 +0000 (12:10 -0700)]
Retry apt-get during setup_ci_workspace (#63319)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63319
Test Plan: Imported from OSS
Reviewed By: anjali411
Differential Revision:
D30346067
Pulled By: bertmaher
fbshipit-source-id:
2aafa97e78f9297553d772b2524d6f1c0ebaa46e
Nikita Vedeneev [Mon, 16 Aug 2021 18:39:04 +0000 (11:39 -0700)]
Make `torch.lu` differentiable for wide/tall inputs + jit (#61564)
Summary:
As per title.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61564
Reviewed By: astaff
Differential Revision:
D30338136
Pulled By: mruberry
fbshipit-source-id:
f01436fc90980544cdfa270feee16bb3dda21b93
Yi Wang [Mon, 16 Aug 2021 17:05:47 +0000 (10:05 -0700)]
[Model Averaging] Allow subgroup to be None in PostLocalSGDState (#63277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63277
`PostLocalSGDState` requires a subgroup. To initialize this subgroup, a global process group must be initialized. However, this imposes a restriction that a hook state can only be provided after distributed environment initialization, which is not compatible with lightning DDP plugin setup where hook state should be provided before distributed environment initialization.
Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id:
135848575
Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_ddp_hook_parity_post_localSGD
Reviewed By: cbalioglu
Differential Revision:
D30325041
fbshipit-source-id:
7b870166d096d306c3f2f7c69816a705cec0bebd
Meghan Lele [Mon, 16 Aug 2021 16:12:57 +0000 (09:12 -0700)]
Revert "[docs] Update docs for NegativeBinomial (#45693)" (#63192)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63192
**Summary**
This reverts commit
402caaeba513929dcfe12df183c764b0ef43f688. As per the
dicussion in #62178, this commit was not needed.
**Test Plan**
Continuous integration.
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision:
D30293202
Pulled By: SplitInfinity
fbshipit-source-id:
91ee7ad0523a9880605d83fe9712c39df67384a8
Erjia Guan [Mon, 16 Aug 2021 13:39:56 +0000 (06:39 -0700)]
Refactor BucketBatch (#63185)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63185
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision:
D30288893
Pulled By: ejguan
fbshipit-source-id:
b88b792d12a83c99d8ea9e516e3b4c54a82100f6
Erjia Guan [Mon, 16 Aug 2021 13:39:56 +0000 (06:39 -0700)]
Replace str by repr for DataChunk (#63184)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63184
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision:
D30288892
Pulled By: ejguan
fbshipit-source-id:
45c88fdd3987e234f2c22ebbbfd8d5044983c34c
Raghavan Raman [Mon, 16 Aug 2021 07:07:51 +0000 (00:07 -0700)]
[nnc] Updated IRMutator and IRSimplifier to perform in-place mutations. (#63246)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63246
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision:
D30309636
Pulled By: navahgar
fbshipit-source-id:
409ea8d6982888cfee9127e6248044dd2ed9d8d4
Supriya Rao [Mon, 16 Aug 2021 05:44:44 +0000 (22:44 -0700)]
[docs][ao] Add overload information for fake_quantize_per_tensor_affine (#63258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63258
This function supports scalar and tensor qparams
Test Plan:
CI
Imported from OSS
Reviewed By: jerryzh168
Differential Revision:
D30316432
fbshipit-source-id:
8b2f5582e7e095fdda22c17d178abcbc89a2d1fc
Supriya Rao [Mon, 16 Aug 2021 05:44:44 +0000 (22:44 -0700)]
[docs][ao] Add missing docstrings for quantized_max_pool1d and quantized_max_pool2d (#63242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63242
These functions are part of the native functions namespace as well as the quantized namespace
Test Plan:
CI
Imported from OSS
Reviewed By: jerryzh168
Differential Revision:
D30316430
fbshipit-source-id:
cd9c839e5c1a961e3c6944e514c16fbc256a2f0c
Supriya Rao [Mon, 16 Aug 2021 05:44:44 +0000 (22:44 -0700)]
[docs][ao] Add missing documentation for torch.quantized_batch_norm (#63240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63240
Op is exposed via torch.quantized_batch_norm to the end user without any existing documentation
Test Plan:
CI
Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision:
D30316431
fbshipit-source-id:
bf2dc8b7b6f497cf73528eaa2bedef9f65029d84
Heitor Schueroff [Mon, 16 Aug 2021 01:06:41 +0000 (18:06 -0700)]
[OpInfo] Add expected_failure kwarg to SkipInfo (#62963)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62963
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision:
D30327199
Pulled By: heitorschueroff
fbshipit-source-id:
45231eca11d1697a4449d79849fb17264d128a6b
Heitor Schueroff [Mon, 16 Aug 2021 01:06:41 +0000 (18:06 -0700)]
Small refactor for OpInfo decorators (#62713)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62713
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision:
D30327200
Pulled By: heitorschueroff
fbshipit-source-id:
1899293990c8c0a66da88646714b38f1aae9179d
Kimish Patel [Sun, 15 Aug 2021 23:12:47 +0000 (16:12 -0700)]
[Pytorch Edge] Fix broken test post changes in error reporting format. (#63287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63287
Recent changes in https://github.com/pytorch/pytorch/pull/62419 changed
the way module hierarchy is reported. Now it includes information about
function names as well.
Test Plan:
python test/mobile/test_lite_script_module.py
TestLiteScriptModule.test_save_mobile_module_with_debug_info_with_trace
Imported from OSS
Reviewed By: iseeyuan
Differential Revision:
D30328512
fbshipit-source-id:
ddd6b11b9ab01cc725f4568a35eff7a92f17204b
Ilqar Ramazanli [Sun, 15 Aug 2021 19:30:18 +0000 (12:30 -0700)]
To add warm-up scheduler to optim (#60836)
Summary:
Warm up of learning rate scheduling has initially been discussed by Priya et. al. in the paper: https://arxiv.org/pdf/1706.02677.pdf .
In the section 2.2 of the paper they discussed and proposed idea of warming up learning schedulers in order to prevent big variance / noise in the learning rate. Then idea has been further discussed in the following papers:
* Akilesh Gotmare et al. https://arxiv.org/abs/1810.13243
* Bernstein et al http://proceedings.mlr.press/v80/bernstein18a/bernstein18a.pdf
* Liyuan Liu et al: https://arxiv.org/pdf/1908.03265.pdf
There are two type of popularly used learning rate warm up ideas
* Constant warmup (start with very small constant learning rate)
* Linear Warmup ( start with small learning rate and gradually increase)
In this PR we are adding warm up as learning rate scheduler. Note that learning rates are chainable, which means that we can merge warmup scheduler with any other learning rate scheduler to make more sophisticated learning rate scheduler.
## Linear Warmup
Linear Warmup is multiplying learning rate with pre-defined constant - warmup_factor in the first epoch (epoch 0). Then targeting to increase this multiplication constant to one in warmup_iters many epochs. Hence we can derive the formula at i-th step to have multiplication constant equal to:
warmup_factor + (1-warmup_factor) * i / warmup_iters
Moreover, the fraction of this quantity at point i to point i-1 will give us
1 + (1.0 - warmup_factor) / [warmup_iters*warmup_factor+(i-1)*(1-warmup_factor)]
which is used in get_lr() method in our implementation. Below we provide an example how to use linear warmup scheduler and to give an example to show how does it works.
```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=10, warmup_method="linear")
for epoch in range(15):
print(epoch, scheduler.get_last_lr()[0])
optimizer.step()
scheduler.step()
```
```
0 0.
010000000000000002
1 0.
019000000000000003
2 0.
028000000000000008
3 0.
03700000000000001
4 0.
04600000000000001
5 0.
055000000000000014
6 0.
06400000000000002
7 0.
07300000000000002
8 0.
08200000000000003
9 0.
09100000000000004
10 0.
10000000000000005
11 0.
10000000000000005
12 0.
10000000000000005
13 0.
10000000000000005
14 0.
10000000000000005
```
## Constant Warmup
Constant warmup has straightforward idea, to multiply learning rate by warmup_factor until we reach to epoch warmup_factor, then do nothing for following epochs
```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
for epoch in range(10):
print(epoch, scheduler.get_last_lr()[0])
optimizer.step()
scheduler.step()
```
```
0 0.
010000000000000002
1 0.
010000000000000002
2 0.
010000000000000002
3 0.
010000000000000002
4 0.
010000000000000002
5 0.
10000000000000002
6 0.
10000000000000002
7 0.
10000000000000002
8 0.
10000000000000002
9 0.
10000000000000002
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60836
Reviewed By: saketh-are
Differential Revision:
D29537615
Pulled By: iramazanli
fbshipit-source-id:
d910946027acc52663b301f9c56ade686e62cb69
Shiyan Deng [Sun, 15 Aug 2021 18:52:20 +0000 (11:52 -0700)]
Move fx2trt and oss_acc_tracer to oss (#63101)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63101
Move internal fx2trt to torch/fx/experimental/fx2trt and merge the two TRT interpreter we have right now. cc: mortzur as this might affect uru exporting script.
Move oss_acc_tracer to torch/fx/experimental/fx_acc.
Test Plan: CI
Reviewed By: jerryzh168
Differential Revision:
D30257909
fbshipit-source-id:
4e374965fbf88d72e91844d9e9b6ff9b98f467d1
Bert Maher [Sun, 15 Aug 2021 18:28:23 +0000 (11:28 -0700)]
Hide all symbols in llvm namespace (#63272)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63272
Test Plan: Imported from OSS
Reviewed By: nikithamalgifb
Differential Revision:
D30331695
Pulled By: bertmaher
fbshipit-source-id:
d35130c96f7e2a31fa86d9d80de59002e96301df
anjali411 [Sun, 15 Aug 2021 13:22:53 +0000 (06:22 -0700)]
Add copy button to code snippets in docs (#63149)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63149
Test Plan: Imported from OSS
Reviewed By: navahgar, albanD
Differential Revision:
D30308891
Pulled By: anjali411
fbshipit-source-id:
ad51180ab2f27c4525682b2603bbf753bb8f1ce9
Kimish Patel [Sat, 14 Aug 2021 04:37:57 +0000 (21:37 -0700)]
[Pytorch Edge] Enable kineto profiler on mobile via EdgeKinetoProfiler (#62419)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62419
This diff adds support for cpu only kineto profiler on mobile. Thus
enabling chrome trace generation on mobile. This bring cpp API for
mobile profiling on part with Torchscript.
This is done via:
1. Utilizating debug handle annotations in KinetoEvent.
2. Adding post processing capability, via callbacks, to
KinetoThreadLocalState
3. Creating new RAII stype profiler, KinetoEdgeCPUProfiler, which can be
used in surrounding scope of model execution. This will write chrome
trace to the location specified in profiler constructor.
Test Plan:
MobileProfiler.ModuleHierarchy
Imported from OSS
Reviewed By: raziel
Differential Revision:
D29993660
fbshipit-source-id:
0b44f52f9e9c5f5aff81ebbd9273c254c3c03299
Kimish Patel [Sat, 14 Aug 2021 04:37:57 +0000 (21:37 -0700)]
[Pytorch Mobile] Combing instructions and debug hanles in single struct (#62418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62418
Debug handles have one to one correspondence with instruction, so just
combine them in one.
Test Plan:
CI
Imported from OSS
Reviewed By: raziel
Differential Revision:
D29993661
fbshipit-source-id:
125c7163174cf66624dd95f110fdc8208fea8a07
Kimish Patel [Sat, 14 Aug 2021 04:37:57 +0000 (21:37 -0700)]
[Pytorch Profiler] Introduce scopes to enableProfiler (#62417)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62417
This diff adds an option to make enableProfiler enable callbacks only
for certain RecordScopes.
Why?
Profiling has some overhead when we repeatedly execute callbacks for
alls copes. On mobile side when we often have small quantized models
this overhead can be large. We observed that by only profiling top level
op and skipping profiling of other atend ops called within we can limit
this overhead. For example, instead of profling at::conv2d -> at::convolution ->
at::convolution_ and further more if ops like transpose etc. are called,
skipping profiling of those. Of course this limits the visibility, but
at the least this way we get a choice.
Test Plan: Imported from OSS
Reviewed By: ilia-cher
Differential Revision:
D29993659
fbshipit-source-id:
852d3ae7822f0d94dc6e507bd4019b60d488ef69
Kimish Patel [Sat, 14 Aug 2021 04:37:57 +0000 (21:37 -0700)]
[Pytorch Profiler] Add debug_handles to KinetoEvent (#62228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62228
This diff adds debug handles to events and provides a way to use
RECORD_FUNCTIONs that will pass debug_handles down to profiler, which
will record it in the events.
Why add debug_handles?
For pytorch mobile, with lite interpreter, we generate debug handles
that can be used for lazily symbolicate exception traces to model level
stack trace. Similar to the model level stack trace you get in
TorchScript models. The debug_handles also enable getting module
hierarchy for lite interpreter model, support for which was added to
KinetoProfiler in previous diffs.
Followup plan:
1. Enabled scope callbacks such that lite interpreter can use it to
profiler only top level ops.
2. Enable post processing callbacks that take KinetoEvents and populate
module hierarchy using debug handles.
This will let us use KinetoProfiler for lite interpter use cases on
mobile. Aim is to use RAII guard to similarly generate chrome trace for
mobile usecases as well, although only for top level ops.
Test Plan:
test_misc : RecordDebugHandles.Basic
Imported from OSS
Reviewed By: ilia-cher
Differential Revision:
D29935899
fbshipit-source-id:
4f06dc411b6b5fe0ffaebdd26d3274c96f8f389b
Kimish Patel [Sat, 14 Aug 2021 04:37:57 +0000 (21:37 -0700)]
[Pytorch Profiler] Move start timestamp to end of start callback (#62191)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62191
This moves start timestamping to end of callback. This way we dont
account for callstack/module hierarchy related overhead in op runtime.
Test Plan:
CI
Imported from OSS
Reviewed By: ilia-cher
Differential Revision:
D29910519
fbshipit-source-id:
f462031a81ae12b3db7993cf482e5ad93a35e096
Kimish Patel [Sat, 14 Aug 2021 04:37:57 +0000 (21:37 -0700)]
[Pytorch Profiler] Add support for adding module hierarchy to (#61792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61792
KinetoEvent
This PR adds module hierarchy information to events.
What is module hierarchy information attached to events?
During profiling a TorchScript module, when events are added, we ask JIT
what is the module hierarchy associated with the node being
executed. At the time of execution of that node, there might be multiple
frames in the stack of interpreter. For each frame, we find
corresponding node and the corresponding module hierarchy is queried.
Module hierarchy corresponding to the node is associated with node's
InlinedCallStack. InlinedCallStack of node tracks the path via which the
node is inlined. Thus during the inlining process we annotate
module information corresponding to the CallMethod nodes being inlined.
With this PR, chrome trace will contain additional metadata:
"Module Hierarchy". This can look like this:
TOP(ResNet)::forward.SELF(ResNet)::_forward_impl.layer1(Sequential)::forward.0(BasicBlock)::forward.conv1(Conv2d)::forward.SELF(Conv2d)::_conv_forward
It contains module instance, type name and the method name in the
callstack.
Test Plan:
test_profiler
Imported from OSS
Reviewed By: raziel, ilia-cher
Differential Revision:
D29745442
fbshipit-source-id:
dc8dfaf7c5b8ab256ff0b2ef1e5ec265ca366528
leslie-fang-intel [Sat, 14 Aug 2021 03:49:27 +0000 (20:49 -0700)]
add substract of max and testcase (#63132)
Summary:
As discussed here https://github.com/pytorch/pytorch/pull/62897, in the path of BF16/non-last-dim Softmax, we miss the subtractions of max value which will cause the overflow in the `exp()` calculation when the value of input tensor is large, such as `1000.0`.
To avoid this issue, we add the subtractions of max value and the corresponding test cases in this PR.
Note w/o subtractions of max value(accidental reverts or changes), we will get the underlying error message of the test case
```
AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.05 and atol=0.05, found 103984 element(s) (out of 126720) whose difference(s) exceeded the margin of error (including 103984 nan comparisons). The greatest difference was nan (0.0 vs. nan), which occurred at index (0, 0, 0, 1).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63132
Reviewed By: VitalyFedyunin
Differential Revision:
D30280792
Pulled By: cpuhrsch
fbshipit-source-id:
722821debf983bbb4fec878975fa8a4da0d1d866
Kushashwa Ravi Shrimali [Sat, 14 Aug 2021 00:10:07 +0000 (17:10 -0700)]
OpInfo: `nn.functional.conv_transpose2d` (#62882)
Summary:
See https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261.
cc: mruberry zou3519 Chillee
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62882
Reviewed By: bdhirsh
Differential Revision:
D30280804
Pulled By: zou3519
fbshipit-source-id:
e40cdf43e98c1f11e45df6b8bc13110b4d29c45f
Kefei Lu [Fri, 13 Aug 2021 23:57:47 +0000 (16:57 -0700)]
refactor fx2trt example script so it can be imported as a library (#63262)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63262
Just create a `__main__` guard.
Test Plan: run linter, sandcastle tests
Reviewed By:
842974287
Differential Revision:
D30263617
fbshipit-source-id:
8044ce5d815b043c3778591384cb13d9a89d0048
Hanton Yang [Fri, 13 Aug 2021 23:20:22 +0000 (16:20 -0700)]
[iOS] Add `LibTorch-Lite-Nightly` pod (#63239)
Summary:
D30090760 (https://github.com/pytorch/pytorch/commit/
e182b459d94fe77c1d9f623c94fc2621c8cc55de) was reverted by
D30303292 because of a lint issue in `LibTorch-Lite-Nightly.podspec.template`. Resubmit the diff after fixing the issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63239
Test Plan: Imported from OSS
Reviewed By: xta0
Differential Revision:
D30315690
Pulled By: hanton
fbshipit-source-id:
f0fa719ffc3b8181ab28c123584ae5c1da8992c0
Sameer Deshmukh [Fri, 13 Aug 2021 23:08:01 +0000 (16:08 -0700)]
Allow TransformerEncoder and TransformerDecoder to accept 0-dim batch sized tensors. (#62800)
Summary:
This issue fixes a part of https://github.com/pytorch/pytorch/issues/12013, which is summarized concretely in https://github.com/pytorch/pytorch/issues/38115.
This PR allows TransformerEncoder and Decoder (alongwith the inner `Layer` classes) to accept inputs with 0-dimensional batch sizes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62800
Reviewed By: VitalyFedyunin
Differential Revision:
D30303240
Pulled By: jbschlosser
fbshipit-source-id:
8f8082a6f2a9f9d7ce0b22a942d286d5db62bd12
Pruthvi Madugundu [Fri, 13 Aug 2021 21:57:17 +0000 (14:57 -0700)]
[ROCm] Update HIP_VERSION to TORCH_HIP_VERSION (#62786)
Summary:
- HIP_VERSION semantic versioning will change in ROCm4.3. The changes essentially remove the dependency on HIP_VERSION provided in the hip header to keep code compatible with older and newer versions of ROCm.
- TORCH_HIP_VERSION is derived from HIP_VERSION_MAJOR and HIP_VERSION_MINOR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62786
Reviewed By: bdhirsh
Differential Revision:
D30281682
Pulled By: seemethere
fbshipit-source-id:
e41e69fb9e13de5ddd1af99ba5bbdcbb7b64b673
Can Balioglu [Fri, 13 Aug 2021 20:47:37 +0000 (13:47 -0700)]
Respect user-set CMAKE_PREFIX_PATH (#61904)
Summary:
Fixes the case where the `CMAKE_PREFIX_PATH` variable gets silently overwritten by a user specified environment variable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61904
Reviewed By: walterddr, malfet
Differential Revision:
D29792014
Pulled By: cbalioglu
fbshipit-source-id:
babacc8d5a1490bff1e14247850cc00c6ba9e6be
gmagogsfm [Fri, 13 Aug 2021 20:06:08 +0000 (13:06 -0700)]
Remove left-over print in test_diff_graph_inline_threshold (#63231)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63231
Reviewed By: VitalyFedyunin
Differential Revision:
D30305851
Pulled By: gmagogsfm
fbshipit-source-id:
43da3b5f49ad4a6a2d6d174acf792f3ccf41a463