David Riazati [Tue, 7 Sep 2021 22:14:05 +0000 (15:14 -0700)]
Update explicit_ci_jobs to work with GHA (#64598)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64598
This adds a filter option rather than an all-or-nothing so it's easier to iterate on a specific job.
```bash
python tools/testing/explicit_ci_jobs.py --filter-gha '*generated-linux-*gcc5.4*'
```
See #64600 for an example usage
NB: If you regenerate the worfklows you will need to re-run that command to re-delete everything.
Test Plan: Imported from OSS
Reviewed By: janeyx99
Differential Revision:
D30788850
Pulled By: driazati
fbshipit-source-id:
a32c266bbd876c396665bceef9a0a961b4586564
Nikita Shulga [Tue, 7 Sep 2021 22:09:39 +0000 (15:09 -0700)]
Move ParallelTBB to GHA (take 2) (#64193)
Summary:
2nd attempt to do the same
Skip failing `TestTensorCreationCPU.test_trilu_indices_cpu`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64193
Reviewed By: mrshenli
Differential Revision:
D30779469
Pulled By: malfet
fbshipit-source-id:
5c51fcbb383d0823d0e953d7af181b5f22eda9ab
Mike Iovine [Tue, 7 Sep 2021 21:58:34 +0000 (14:58 -0700)]
[Static Runtime] Add first iter metric (#64457)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64457
The first iteration is special since it initializes the memory planner. This change logs and reports first iteration time during benchmarking. It also generates a FAI-PEP output when `generate_ai_pep_output` is set.
Test Plan:
Run any benchmark, and observe:
```
I0902 15:19:32.528977 2492358 impl.cpp:948] PyTorchObserver {"value":6.
415958881378174,"unit":"ms","metric":"latency","type":"static_runtime_first_iter"}
...
First iter time: 6.41596 ms
```
Note that this metric is likely to have significantly more noise than the others since we don't have as many data points.
Unit tests: `buck test //caffe2/test:static_runtime`
Reviewed By: d1jang
Differential Revision:
D30740619
fbshipit-source-id:
4dcfccd5629f4fa34254fd355073ef19e151245a
Wenliang Zhao [Tue, 7 Sep 2021 21:09:31 +0000 (14:09 -0700)]
add bubdle input into AIBench (#64557)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64557
MaskRCNN speed depends on how many people detected in the detection stage. A random input from dataloader doesn't satisfy this. In order to standardize the benchmarking, we use 2 standard image for benchmarking, 2/3 people.
Test Plan: AIBench result: https://www.internalfb.com/intern/aibench/details/
945883114818980
Reviewed By: axitkhurana
Differential Revision:
D30446049
fbshipit-source-id:
a2826fdb69e9f840c0afc566c4cbbcde1c2fba89
Facebook Community Bot [Tue, 7 Sep 2021 21:08:25 +0000 (14:08 -0700)]
Automated submodule update: FBGEMM (#64582)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).
New submodule commit: https://github.com/pytorch/FBGEMM/commit/
3ce04fc664beaa1cba1ae0a072c8db99c4ac91de
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64582
Test Plan: Ensure that CI jobs succeed on GitHub before landing.
Reviewed By: mrshenli
Differential Revision:
D30779695
fbshipit-source-id:
22460a4047e2462e672eb4931e44648ae6bde627
haozhe.zhu [Tue, 7 Sep 2021 19:59:00 +0000 (12:59 -0700)]
enable bf16 mkldnn path for gemm (#61891)
Summary:
# Goal: Integrate mkldnn bf16 Gemm to pytorch
## BF16 Suport for mm, addmm, bmm, addbmm, baddbmm, mv, addmv, dot (with mkldnn matmul primitive):
https://oneapi-src.github.io/oneDNN/group__dnnl__api__matmul.html
For gemm related ops, we keep all inputs under plain format. So we will not introduce opaque tensor for these ops to save mem copy here.
![mkldnn bf16 gemm integration](https://user-images.githubusercontent.com/
54701539/
126263077-
4b5134e1-52a7-4fad-94fb-
19e13a0377f6.png)
The minimized integration is only dispatch to mkldnn in addmm, but for gemm with 3-D input (with additional dim for"batch") this will call mkldnn gemm for "batch" times. Since mkldnn matmul support input with multiple dims, we directly dispatch to mkldnn gemm in {bmm, addbmm, baddbmm} to reduce the time to create mkldnn memory desc, primitive, etc.
For the different definition for "bias" between mkldnn(which must be shape of (1, N)) and pytorch (which can be same shape with gemm result (M, N)), we use a fused sum to handle it.
## User Case:
User case is exactly same with before because no opaque tensor's is introduced. Since the pytorch has already support bf16 data type with CPU tensor before, we can leverage the existed bf16 gemm UT.
## Gemm performance gain on CPX 28Cores/Socket:
Note: data is collected using PyTorch operator benchmarks: https://github.com/pytorch/pytorch/tree/master/benchmarks/operator_benchmark (with adding bfloat16 dtype)
### use 1 thread on 1 core
### torch.addmm (M, N) * (N, K) + (M, K)
| impl |16x16x16|32x32x32| 64x64x64 | 128x128x128| 256x256x256| 512x512x512|1024x1024x1024|
|:---:|:---:| :---: | :---: | :---: | :---: | :---: | :---: |
| aten-fp32| 4.115us|4.583us|8.230us|26.972us|211.857us|1.458ms|11.258ms|
| aten-bf16 | 15.812us| 105.087us|801.787us|3.767ms|20.274ms|122.440ms|836.453ms|
| mkldnn-bf16 |20.561us |22.510us|24.551us|37.709us|143.571us|0.835ms|5.76ms|
We can see mkldnn-bf16 are better than aten bf16, but for smaller shapes, mkldnn bf16 are not better than aten fp32. This is because onednn overhead, this overhead more like a "constant" overhead and while problems get larger, we can ignore it. Also we are continue optimize the kernel efficiency and decrease the overhead as well.
More shapes
| impl |1x2048x2048|2048x1x2048| 2048x2048x1 |
|:---:|:---:| :---: | :---: |
| aten-fp32| 0.640ms|3.794ms|0.641ms|
| aten-bf16 | 2.924ms| 3.868ms|23.413ms|
| mkldnn-bf16 |0.335ms |4.490ms|0.368ms|
### use 1 socket (28 thread, 28 core)
| impl | 256x256x256| 512x512x512|1024x1024x1024| 2048x2048x2048|4096x4096x4096|
|:---:| :---: | :---: | :---: | :---: | :---: |
| aten-fp32| 35.943us |140.315us|643.510us|5.827ms|41.761ms|
| mkldnn-bf16 |53.432us|114.716us|421.858us|2.863ms|23.029ms|
More shapes
| impl |128x2048x2048|2048x128x2048| 2048x2048x128 |
|:---:|:---:| :---: | :---: |
| aten-fp32| 0.561ms|0.458ms|0.406ms|
| mkldnn-bf16 |0.369ms |0.331ms|0.239ms|
We dose not show aten-bf16 for this case since aten-bf16 always compute as single thread and the performance is extreme poor. The trend for this case is similar for 1 thread on 1 core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61891
Reviewed By: iramazanli
Differential Revision:
D29998114
Pulled By: VitalyFedyunin
fbshipit-source-id:
459dc5874c638d62f290c96684ca0a694ded4b5a
Anirudh Dagar [Tue, 7 Sep 2021 19:34:15 +0000 (12:34 -0700)]
Array API: Add `torch.linalg.matmul` alias to `torch.matmul` (#63227)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62811
Add `torch.linalg.matmul` alias to `torch.matmul`. Note that the `linalg.matmul` doesn't have a `method` variant.
Also cleaning up `torch/_torch_docs.py` when formatting is not needed.
cc IvanYashchuk Lezcano mruberry rgommers
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63227
Reviewed By: mrshenli
Differential Revision:
D30770235
Pulled By: mruberry
fbshipit-source-id:
bfba77dfcbb61fcd44f22ba41bd8d84c21132403
Jane Xu [Tue, 7 Sep 2021 19:30:16 +0000 (12:30 -0700)]
[small BE] .github: refactor concurrency into a common macro (#64587)
Summary:
By using a macro for these concurrency groups, we can edit just one place for the linux and windows workflows (vs 2).
I wanted to loop all the other workflow files in as well, but since those aren't generated, the macros won't work the same way.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64587
Reviewed By: mrshenli
Differential Revision:
D30783224
Pulled By: janeyx99
fbshipit-source-id:
ae16ebb12d2d63a563d28f0ce88e280f68ed4b9b
Kevin Tse [Tue, 7 Sep 2021 18:34:27 +0000 (11:34 -0700)]
Fixes issue related torch.trapezoid broadcasting behavior and documentation (#64054)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64054
Fixes #63608
cc mruberry rgommers heitorschueroff
Test Plan: Imported from OSS
Reviewed By: saketh-are
Differential Revision:
D30617078
Pulled By: NivekT
fbshipit-source-id:
815896ec56d447562790df4d662e94fd13457e2a
Danielle Pintz [Tue, 7 Sep 2021 18:34:08 +0000 (11:34 -0700)]
Add space in Feature Request issue template (#64563)
Summary:
Add space between emoji and text in Feature Request issue template
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64563
Reviewed By: janeyx99
Differential Revision:
D30779429
Pulled By: seemethere
fbshipit-source-id:
3625299923a7022fa66473633524a6620d58188b
Lu Fang [Tue, 7 Sep 2021 18:23:52 +0000 (11:23 -0700)]
Clean up op BC check list (#64584)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64584
It has been a while since last clean up. The list is really long.
Test Plan: ci
Reviewed By: hl475
Differential Revision:
D30779350
fbshipit-source-id:
908b47d0b9a16b784aad6a34c5c87f923500c247
Ilqar Ramazanli [Tue, 7 Sep 2021 18:02:11 +0000 (11:02 -0700)]
[doc][hackathon] To add Adam Optimizer to the documentation (#63251)
Summary:
It has been discussed before that adding description of Optimization algorithms to PyTorch Core documentation may result in a nice Optimization research tutorial. In the following tracking issue we mentioned about all the necessary algorithms and links to the originally published paper https://github.com/pytorch/pytorch/issues/63236.
In this PR we are adding description of Adam Algorithm to the documentation. For more details, we refer to the paper https://arxiv.org/abs/1412.6980
<img width="442" alt="Screen Shot 2021-08-27 at 6 37 54 PM" src="https://user-images.githubusercontent.com/
73658284/
131195297-
35fce613-3691-4fed-b42d-
db234d4fcd7c.png">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63251
Reviewed By: albanD
Differential Revision:
D30779163
Pulled By: iramazanli
fbshipit-source-id:
319a80fc3952793b0d064d0e641ddc1de3c05a86
Yanli Zhao [Tue, 7 Sep 2021 16:28:30 +0000 (09:28 -0700)]
minor fix for elastic doc (#64531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64531
fix #64530
Test Plan: unit test
Reviewed By: mrshenli
Differential Revision:
D30760879
fbshipit-source-id:
94ed1476e886513427d928a36f5be6b9bfff0826
Philip Meier [Tue, 7 Sep 2021 15:57:43 +0000 (08:57 -0700)]
deprecate dtype getters from `torch.testing` namespace (#63554)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63554
Following https://github.com/pytorch/pytorch/pull/61840#issuecomment-
884087809, this deprecates all the dtype getters publicly exposed in the `torch.testing` namespace. The reason for this twofold:
1. If someone is not familiar with the C++ dispatch macros PyTorch uses, the names are misleading. For example `torch.testing.floating_types()` will only give you `float32` and `float64` skipping `float16` and `bfloat16`.
2. The dtype getters provide very minimal functionality that can be easily emulated by downstream libraries.
We thought about [providing an replacement](https://gist.github.com/pmeier/
3dfd2e105842ad0de4505068a1a0270a), but ultimately decided against it. The major problem is BC: by keeping it, either the namespace is getting messy again after a new dtype is added or we need to somehow version the return values of the getters.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision:
D30662206
Pulled By: mruberry
fbshipit-source-id:
a2bdb10ab02ae665df1b5b76e8afa9af043bbf56
Ilqar Ramazanli [Tue, 7 Sep 2021 15:41:09 +0000 (08:41 -0700)]
To change WarmUp Scheduler with ConstantLR and LinearLR (#64395)
Summary:
Partially unblocks https://github.com/pytorch/vision/issues/4281
Previously we have added WarmUp Schedulers to PyTorch Core in the PR : https://github.com/pytorch/pytorch/pull/60836 which had two mode of execution - linear and constant depending on warming up function.
In this PR we are changing this interface to more direct form, as separating linear and constant modes to separate Schedulers. In particular
```Python
scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
scheduler2 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="linear")
```
will look like
```Python
scheduler1 = ConstantLR(optimizer, warmup_factor=0.1, warmup_iters=5)
scheduler2 = LinearLR(optimizer, warmup_factor=0.1, warmup_iters=5)
```
correspondingly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64395
Reviewed By: datumbox
Differential Revision:
D30753688
Pulled By: iramazanli
fbshipit-source-id:
e47f86d12033f80982ddf1faf5b46873adb4f324
Mike Iovine [Tue, 7 Sep 2021 15:04:50 +0000 (08:04 -0700)]
[JIT] Freeze unrolls constant loops (#63614)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63614
There are a number of optimizations (`RemoveListMutation` in particular) that are tied to loop unrolling in `runOptimizations`. However, these were not invoked from `freeze_module` since the freezing pass should be idempotent.
This diff makes `runOptimizations` run `UnrollConstantLoops` instead of `UnrollLoops`. `freeze_module` is then able to run these optimizations.
Test Plan: Observed that `freeze_module` applies `RemoveListMutation`
Reviewed By: eellison
Differential Revision:
D30437356
fbshipit-source-id:
cba04bd958a48ad51b151aa3264f3d5bbb1fc2a4
Kefei Lu [Tue, 7 Sep 2021 11:00:49 +0000 (04:00 -0700)]
Fix fx2trt SplitterBase non_tensor_input logic (#64286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64286
During graph splitting, `_SplitterBase` supports taking into consideration whether the subnet boundary nodes
produces "supported" outputs that will cross the acc/non-acc boundary. Specifically, if the backend only
supports Tensor-based data passing cross boundary, then we cannot split the graph at a place where the node
output is a non-Tensor type (e.g., `Tuple[Tensor]`).
There's currently a bug in this logic that it does not correctly detect the output type of a Node. Instead of
using `Node.meta['tensor_meta']`, we should instead check `Node.meta['type']`.
`Node.meta['tensor_meta']` is not appropriate because this key will exist if the node output is an iterable
and one of the element is of type `Tensor`. So `Tuple[Tensor]` will be wrongly considered "supported".
Test Plan:
arc lint
run CI tests
Reviewed By: yinghai,
842974287
Differential Revision:
D30617147
fbshipit-source-id:
e8ba70dfaddc05cafb8037d58fca73b7ccbb1a49
Ivan Yashchuk [Tue, 7 Sep 2021 07:04:14 +0000 (00:04 -0700)]
Update error messages that use LAPACK error codes (#63864)
Summary:
This PR updates the` batchCheckErrors` and `singleCheckErrors` functions so that the error messages are defined only once.
`batchCheckErrors` function reuses `singleCheckErrors` now.
Fixes https://github.com/pytorch/pytorch/issues/63220, fixes https://github.com/pytorch/pytorch/issues/59779
cc jianyuh nikitaved pearu mruberry heitorschueroff walterddr IvanYashchuk xwang233 Lezcano
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63864
Reviewed By: ngimel
Differential Revision:
D30672933
Pulled By: mruberry
fbshipit-source-id:
0ba37ff98ef278efdb12c3890aa07d687047da7a
Anirudh Dagar [Tue, 7 Sep 2021 06:55:53 +0000 (23:55 -0700)]
Support `torch.concat` alias, add `cat` OpInfo & remove OpInfo test_out skips {cat, stack, hstack, vtack, dstack} (#62560)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61767
## Changes
- [x] Add `torch.concat` alias to `torch.cat`
- [x] Add OpInfo for `cat`/`concat`
- [x] Fix `test_out` skips (Use `at::native::resize_output` or `at::native::resize_output_check`)
- [x] `cat`/`concat`
- [x] `stack`
- [x] `hstack`
- [x] `dstack`
- [x] `vstack`/`row_stack`
- [x] Remove redundant tests for `cat`/`stack`
~I've not added `cat`/`concat` to OpInfo `op_db` yet, since cat is a little more tricky than other OpInfos (should have a lot of tests) and currently there are no OpInfos for that. I can try to add that in a subsequent PR or maybe here itself, whatever is suggested.~
**Edit**: cat/concat OpInfo has been added.
**Note**: I've added the named tensor support for `concat` alias as well, maybe that's out of spec in `array-api` but it is still useful for consistency in PyTorch.
Thanks to krshrimali for guidance on my first PR :))
cc mruberry rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff krshrimali
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62560
Reviewed By: saketh-are
Differential Revision:
D30762069
Pulled By: mruberry
fbshipit-source-id:
6985159d1d9756238890488a0ab3ae7699d94337
Natalia Gimelshein [Tue, 7 Sep 2021 04:24:38 +0000 (21:24 -0700)]
Remove dead code from THC (THCApply.cuh) (#64559)
Summary:
cc peterbell10
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64559
Reviewed By: mruberry
Differential Revision:
D30769526
Pulled By: ngimel
fbshipit-source-id:
034a5c778a2b902cffa57b76511fa0dcdea26825
Nikita Shulga [Mon, 6 Sep 2021 18:37:39 +0000 (11:37 -0700)]
Move ParallelNative and PureTorch to GHA (#64452)
Summary:
Separate ParallelTBB move to https://github.com/pytorch/pytorch/pull/64193 as it requires some further investiagation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64452
Reviewed By: seemethere, janeyx99
Differential Revision:
D30738337
Pulled By: malfet
fbshipit-source-id:
81c46423e903058bd1a3e8553e8a10ce978eeefd
Shen Xu [Sun, 5 Sep 2021 23:44:13 +0000 (16:44 -0700)]
Mark functions in backend header as inline to suppress warning (#64098)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64098
Reviewed By: kimishpatel, iseeyuan
Differential Revision:
D30593104
fbshipit-source-id:
328196b9bc4a89a28ad89bede7e337107976c303
Bert Maher [Sun, 5 Sep 2021 23:06:09 +0000 (16:06 -0700)]
Revert
D30745610: [nnc] Make our exceptions c10::Errors, get C++ stacktraces
Test Plan: revert-hammer
Differential Revision:
D30745610 (https://github.com/pytorch/pytorch/commit/
18b2751ea143374adbb690889427e06a9334da05)
Original commit changeset:
a1cfaa7364ef
fbshipit-source-id:
9b716053b96a65745240ddef1c456c44d5d09671
Sangbaek Park [Sun, 5 Sep 2021 19:52:46 +0000 (12:52 -0700)]
[Vulkan] Code Quality: Remove duplicate code for hardshrink and leaky_relu functions (#64405)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64405
Code quality improvement: removed duplicate code for hardshrink and leaky_relu functions.
ghstack-source-id:
137319378
Test Plan:
```buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"```
Reviewed By: SS-JIA
Differential Revision:
D30690251
fbshipit-source-id:
5729d1f32946e42f41df77756a8313f297dd822f
Mike Ruberry [Sun, 5 Sep 2021 09:23:31 +0000 (02:23 -0700)]
Back out "nn.functional.linear OpInfo" (#64517)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64517
Original commit changeset:
ca41dbd98176
Test Plan: PyTorch CI
Reviewed By: ngimel
Differential Revision:
D30758201
fbshipit-source-id:
2d3274293d340373b8af86083336607818019619
Chris Cai [Sun, 5 Sep 2021 03:54:29 +0000 (20:54 -0700)]
Back out "
D30740897 Add fusion enabled apis" (#64500)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64500
D30740897 (https://github.com/pytorch/pytorch/commit/
39aeb3bf63f61664bc6c4a929a80a660365c2a5e) broke caffe2/torch/fb/module_factory/optimizers/tests:test_full_sync_optimizer_needed_coverage (https://fburl.com/test/mb46jxon) and blocked training_platform_unit_tests
{
F660271297}
multsect results confirms
```
multisect --config FBCODE_TEST bisect
844424966128796 --workers 16 revisions --begin
09629edc --end
fc86b434
D30740897 (https://github.com/pytorch/pytorch/commit/
39aeb3bf63f61664bc6c4a929a80a660365c2a5e)
````
{
F660271232}
Test Plan:
```
buck test mode/opt //caffe2/torch/fb/module_factory/optimizers/tests:test_full_sync_optimizer_needed_coverage
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/
4785074671474181
✓ Pass: caffe2/torch/fb/module_factory/optimizers/tests:test_full_sync_optimizer_needed_coverage - main (3.729)
Summary
Pass: 1
```
Differential Revision:
D30753916
fbshipit-source-id:
302fd4113ef1f3069846be03edc2300d82b66719
Bert Maher [Sun, 5 Sep 2021 03:29:44 +0000 (20:29 -0700)]
[nnc] Make our exceptions c10::Errors, get C++ stacktraces (#64332)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64332
With this diff, if a compiler bug occurs (unlikely, I know!) we'll be able to get a c++ stacktrace leading to the exception, rather than just a terse message. E.g.,
```
RuntimeError: UNSUPPORTED DTYPE
Exception raised from compilation_error at ../torch/csrc/jit/tensorexpr/exceptions.h:32 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f966659b2eb in /fsx/users/bertrand/c\
onda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x376f099 (0x7f966a195099 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #2: <unknown function> + 0x3763bf5 (0x7f966a189bf5 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #3: torch::jit::tensorexpr::CudaCodeGen::Initialize() + 0xdd8 (0x7f966a193368 in /fsx/users/bertrand/conda/envs/pytorch/lib/python3.8/site-packages/torch/lib/libtorch_cuda\
.so)
```
Test Plan: Imported from OSS
Reviewed By: huiguoo
Differential Revision:
D30745610
Pulled By: bertmaher
fbshipit-source-id:
a1cfaa7364ef4120de834e9cbe57ced1d082ab4e
Peter Bell [Sat, 4 Sep 2021 19:37:09 +0000 (12:37 -0700)]
Ensure num_threads is initialized in get_num_threads (#64486)
Summary:
Possible source of the recent layernorm CI failures. `lazy_init_num_threads` appears at the top of `parallel_for` and can change the number of threads set. So, we need to ensure `num_threads` is initialized during `get_num_threads` calls as well. It's already done this way for OpenMP, but is missing from other parallel backends.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64486
Reviewed By: mruberry
Differential Revision:
D30752615
Pulled By: ngimel
fbshipit-source-id:
085873ce312edbee1254c0aaae30dec7fcfe2c57
Facebook Community Bot [Sat, 4 Sep 2021 07:43:25 +0000 (00:43 -0700)]
Automated submodule update: FBGEMM (#64338)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).
New submodule commit: https://github.com/pytorch/FBGEMM/commit/
9ccb2714a93e8324119676f6b3dc1c26eef0a703
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64338
Test Plan: Ensure that CI jobs succeed on GitHub before landing.
Reviewed By: jspark1105
Differential Revision:
D30690319
fbshipit-source-id:
884d1f950cd1f7d2a77b79affb9215f285d5d0da
Ivan Yashchuk [Sat, 4 Sep 2021 01:48:41 +0000 (18:48 -0700)]
Fix `copy_transpose_valid` condition for `copy_same_type_transpose_` (#64425)
Summary:
Thanks to ngimel for the hint where the problem might be (https://github.com/pytorch/pytorch/issues/64358#issuecomment-
910868849)!
I added a test that fails on master to verify the fix. The shape `(60, 60)` was chosen because of `MIN_SZ = 60 * 60` in `copy_transpose_valid`.
Fixes https://github.com/pytorch/pytorch/issues/64358
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64425
Reviewed By: mruberry
Differential Revision:
D30752725
Pulled By: ngimel
fbshipit-source-id:
f40370ea8365c94e30f8e8a3dcab5f3b3462464a
Michael Carilli [Fri, 3 Sep 2021 20:21:23 +0000 (13:21 -0700)]
[CUDA graphs] Error if attempting to capture uncapturable nccl (#64440)
Summary:
NCCL < 2.9.6 is not capturable. Attempting to capture it can cause nasty behavior (for example, ive seen capture succeed, but replay silently hang). Pytorch should preempt this with a friendlier error.
cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64440
Reviewed By: mruberry
Differential Revision:
D30733884
Pulled By: ngimel
fbshipit-source-id:
5f2df3cf5cc0e5e68f49bf22a80d9f58064dc7ec
Nikita Shulga [Fri, 3 Sep 2021 17:21:01 +0000 (10:21 -0700)]
Fix logical typo in _compare_trilu_indices (#64468)
Summary:
I'm pretty sure that repeating the same call twice is pretty meaningless and intend was to call `tril`/`tril_indices` in first case and `triu`/`triu_indices` in another
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64468
Reviewed By: mruberry
Differential Revision:
D30744978
Pulled By: malfet
fbshipit-source-id:
7cd36789a7ebf1cc263fb2d875e479c05e7588a4
Ansley Ussery [Fri, 3 Sep 2021 13:10:37 +0000 (06:10 -0700)]
Support Union in TorchScript (#64234)
Summary:
This PR is created to replace https://github.com/pytorch/pytorch/pull/53180 PR stack, which has all the review discussions. Reason for needing a replacement is due to a messy Sandcastle issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64234
Reviewed By: gmagogsfm
Differential Revision:
D30656444
Pulled By: ansley
fbshipit-source-id:
77536c8bcc88162e2c72636026ca3c16891d669a
Kefei Lu [Fri, 3 Sep 2021 06:03:02 +0000 (23:03 -0700)]
Add fx2trt pass for removing duplicate output args (#64461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64461
Fx2TRT does not support duplicate nodes in the output args tuple.
This pass removes duplicate output args from the target subnets and fixes their uses in the top level module where the subnets are called. This pass must be called after acc split on the top-level net and subsequent calls to the acc trace on the subnets.
This pass will change both the subnets and top level module.
Test Plan:
Run:
```
buck run mode/opt -c python.package_style=inplace //caffe2/torch/fb/fx2trt/tests/passes/:test_remove_duplicate_output_args
```
Reviewed By: yinghai
Differential Revision:
D30740499
fbshipit-source-id:
98459f7677980b21c7bffda918158001285572db
Elias Ellison [Fri, 3 Sep 2021 05:16:22 +0000 (22:16 -0700)]
Add fusion enabled apis (#64429)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64429
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision:
D30740897
Pulled By: eellison
fbshipit-source-id:
446aa63b5d763f1cfffea62547db7294368e3438
Elias Ellison [Fri, 3 Sep 2021 05:16:22 +0000 (22:16 -0700)]
update optimize_for_inference docs (#64428)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64428
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision:
D30740898
Pulled By: eellison
fbshipit-source-id:
b94d2c3deb661a6ba048f19e8c1d5e1799667eeb
James Reed [Fri, 3 Sep 2021 04:11:57 +0000 (21:11 -0700)]
[resubmit][FX] Prototype for guarding against mutable operations in tracing (#64467)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64467
Test Plan: Imported from OSS
Reviewed By: driazati
Differential Revision:
D30744870
Pulled By: jamesr66a
fbshipit-source-id:
fc652f8b17748f90dbeb83fabf3bd5bb57d6ff1a
Mike Ruberry [Fri, 3 Sep 2021 03:51:38 +0000 (20:51 -0700)]
Skips layer norm OpInfo on tbb platform (#64469)
Summary:
The OpInfo tests appear to be discovering a layer norm x tbb issue that requires investigation. Skipping tests on that platform for now to restore CI signal.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64469
Reviewed By: ngimel
Differential Revision:
D30745746
Pulled By: mruberry
fbshipit-source-id:
282484cc00b867fac85b7df61430d64277da6421
Peter Bell [Fri, 3 Sep 2021 00:43:59 +0000 (17:43 -0700)]
THC: Cleanup dead code (#64441)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64441
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision:
D30735342
Pulled By: ngimel
fbshipit-source-id:
84ab36f7aec6b8cd7f1f34c19a58a382c06ad68d
driazati [Fri, 3 Sep 2021 00:09:48 +0000 (17:09 -0700)]
Regenerate generated github workflows (#64465)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64465
These were out of date and causing master failures
Test Plan: Imported from OSS
Reviewed By: zhouzhuojie
Differential Revision:
D30744594
Pulled By: driazati
fbshipit-source-id:
09a21c3c5d9bc83b368d66cabbafd1ba83302dd3
David Riazati [Thu, 2 Sep 2021 23:58:59 +0000 (16:58 -0700)]
Revert
D30732630: [quant] Enable jit tracing on quantizable LSTM
Test Plan: revert-hammer
Differential Revision:
D30732630 (https://github.com/pytorch/pytorch/commit/
116142143cc2d66c7e582d9f96e00862456fd736)
Original commit changeset:
443e351ebb0e
fbshipit-source-id:
49001392f01366f3b1ccc31139f824c80b86cd40
Zafar Takhirov [Thu, 2 Sep 2021 23:58:36 +0000 (16:58 -0700)]
Revert
D30055886: [quant] AO migration of the `quantize.py`
Test Plan: revert-hammer
Differential Revision:
D30055886 (https://github.com/pytorch/pytorch/commit/
44e3ed88c9a1bd9ee6b0168ba5271a2c6b006cc8)
Original commit changeset:
8ef7470f9fa6
fbshipit-source-id:
c5bd3ead43a2d44b9e56872ec5bd7a195bdac725
Jane Xu [Thu, 2 Sep 2021 23:21:52 +0000 (16:21 -0700)]
[POC] .github: Add event name to concurrency (#64402)
Summary:
This would ensure that manually/API triggered workflows would not cancel other triggered workflows. For example, the manually triggered periodic 11.1 linux job cancelled the scheduled one here, which we may not want:
![image](https://user-images.githubusercontent.com/
31798555/
131752175-
1c99d56e-d344-46e1-b8ac-
9c12bba0569a.png).
This would be helpful later as we use more dispatched workflows (e.g., for bisect functionality)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64402
Reviewed By: malfet
Differential Revision:
D30734860
Pulled By: janeyx99
fbshipit-source-id:
220016716094666e9af836fcd716dd529cf23d8a
Garrett Cramer [Thu, 2 Sep 2021 23:11:10 +0000 (16:11 -0700)]
update rpc tensorpipe logic for sparse tensors (#62960)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62960
A bug was filed a few years ago for sending sparse tensor over rpc #30807.
This pr updates rpc/tensorpipe logic for CUDA sparse tensors. During the serialization process, the pickler.cpp implementation breaks down the sparse tensor into two tensors and metadata. torch/csrc/distributed/rpc/tensorpipe_agent.cpp needs to be updated because it does not have logic sparse tensors. It pushes a single device for a sparse tensor. This is wrong because after the sparse tensor has been serialized, there will be two tensors. The second tensor will not have a device. This will cause the second tensor to have the wrong target device. tensorpipe_utils.cpp needs to be updated because deserialization happens after the data is received on the target pipe. This takes the two tensors and metadata sent and rebuilds the sparse tensor. There will be two tpDescriptors but only one tensor after deserialization. The logic is updated to verify the sparse tensor is on the correct device using the first tpDescriptor.
This pr also updates ivalue.cpp and ivalue.h to support more paths for Sparse COO tensors.
I tested these changes by adding sparse tests to rpc_test.py and dist_autograd_test.py.
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision:
D30717285
Pulled By: gcramer23
fbshipit-source-id:
daee9a56764550f56b131f9dd8e74e23113d6714
Eli Uriegas [Thu, 2 Sep 2021 23:06:17 +0000 (16:06 -0700)]
Revert
D30675780: [FX] Prototype for guarding against mutable operations in tracing
Test Plan: revert-hammer
Differential Revision:
D30675780 (https://github.com/pytorch/pytorch/commit/
795387477fe90e03cb598f3077a32222896e65dd)
Original commit changeset:
b2116b51dcc8
fbshipit-source-id:
d4f1173f4989556ea54974f4c2739ef85a705fae
Zafar Takhirov [Thu, 2 Sep 2021 22:56:54 +0000 (15:56 -0700)]
[quant] Enable jit tracing on quantizable LSTM (#64438)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64438
The quantizable LSTM didn't support jit tracing because it had several non taceable paths. We sacrifice some of the user experience to enable the tracing.
The main UX feature removed is a user-friendly message when trying to access the backwards path in a bidirectional LSTM: When the bidirectional flag is `False`, we used to throw a nice error message when the user tried accessing backwards weights. Now the message is default (removed properties).
Test Plan: `buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm`
Reviewed By: mtl67
Differential Revision:
D30732630
fbshipit-source-id:
443e351ebb0e2b636c86dea9691b9bf42ffe618f
James Reed [Thu, 2 Sep 2021 22:15:24 +0000 (15:15 -0700)]
[FX] Prototype for guarding against mutable operations in tracing (#64295)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64295
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision:
D30675780
Pulled By: jamesr66a
fbshipit-source-id:
b2116b51dcc87357f0c84192c4c336680875e27a
Eli Uriegas [Thu, 2 Sep 2021 21:49:47 +0000 (14:49 -0700)]
.github: Migrate pytorch_linux_bionic_py_3_6_clang9 to GHA (#64218)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64218
Relies on https://github.com/fairinternal/pytorch-gha-infra/pull/11
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra bdhirsh
Test Plan: Imported from OSS
Reviewed By: malfet, H-Huang, janeyx99
Differential Revision:
D30651516
Pulled By: seemethere
fbshipit-source-id:
e5843dfe84f096f2872d88f2e53e9408ad2fe399
Erjia Guan [Thu, 2 Sep 2021 20:35:05 +0000 (13:35 -0700)]
Switch Shuffler to use iter-local buffer (#64195)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64195
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision:
D30642947
Pulled By: ejguan
fbshipit-source-id:
d4b52479b4ae37ad693388b9cdb8eed83a136474
Nikita Shulga [Thu, 2 Sep 2021 20:30:51 +0000 (13:30 -0700)]
Disable CircleCI ROCm build (#64434)
Summary:
Per jithunnair-amd suggestion
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64434
Reviewed By: seemethere, janeyx99
Differential Revision:
D30732289
Pulled By: malfet
fbshipit-source-id:
1932d0a7d1e648006f8030c8237b187d0709f688
Kevin Tse [Thu, 2 Sep 2021 20:06:18 +0000 (13:06 -0700)]
[DataPipe] removing filter's inheritance from map (#64404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64404
This PR remove `filter`'s inheritance from `map`. This allows `filter` to not have a `__len__` function and that behavior is what we would like.
cc VitalyFedyunin ejguan
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision:
D30713120
Pulled By: NivekT
fbshipit-source-id:
4d5d07555297ee2bd4b49842c0d26cdc00638f6c
Kevin Tse [Thu, 2 Sep 2021 20:06:18 +0000 (13:06 -0700)]
[DataPipe] adding/removing __len__ for different DataPipe (#64398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64398
cc VitalyFedyunin ejguan
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision:
D30710437
Pulled By: NivekT
fbshipit-source-id:
524eda43a2faa0db0c1a662bf9bb4283f0ade83c
Erjia Guan [Thu, 2 Sep 2021 19:25:15 +0000 (12:25 -0700)]
Fix test_ind_worker_queue by setting max_num_worker based on system resource (#63779)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63779
Fixes #63657
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision:
D30494185
Pulled By: ejguan
fbshipit-source-id:
d1bd24299b25d589889604aaf18ad347bdff4df4
Thomas J. Fan [Thu, 2 Sep 2021 19:15:03 +0000 (12:15 -0700)]
ENH Adds test and docs for modules that already support no batch dims (#62729)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62729
Reviewed By: H-Huang
Differential Revision:
D30669546
Pulled By: jbschlosser
fbshipit-source-id:
c771c98c1fd9d28fa984b72893585c738c736505
Rohan Varma [Thu, 2 Sep 2021 18:37:54 +0000 (11:37 -0700)]
[DDP] Fix logging iterations (#64411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64411
These are not actually the training iterations, but are offset by how
frequently DDP stats collection actually runs (default being
kDDPRuntimeLoggingSampleRate = 100). So with this change, they are actually
logged to scuba every:
10, 10 * 100, 40 * 100, etc iterations.
Test Plan: CI
Reviewed By: zhaojuanmao
Differential Revision:
D30718274
fbshipit-source-id:
146bd2428753c93363bee37e487f40104fce3c18
Eli Uriegas [Thu, 2 Sep 2021 18:23:38 +0000 (11:23 -0700)]
.github: Move squid vars to common vars (#64436)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64436
Moves the squid variables to our common jinja template so that when we
have to update them they're all in the same place.
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra
Test Plan: Imported from OSS
Reviewed By: malfet, zhouzhuojie
Differential Revision:
D30732776
Pulled By: seemethere
fbshipit-source-id:
22e3757c4eec775baa8abbaac2ba2a0c69c2b2a9
Eli Uriegas [Thu, 2 Sep 2021 18:23:38 +0000 (11:23 -0700)]
.github: Move upload-artifact-s3 to common var (#64435)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64435
Move upload-artifact-s3 to a common variable to be used amongst our
jinja templates, this should make it easier in the future to update
these images
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra
Test Plan: Imported from OSS
Reviewed By: malfet
Differential Revision:
D30732777
Pulled By: seemethere
fbshipit-source-id:
51cd485f5abae134c3c49dfa878e6303ba8e5f25
Richard Zou [Thu, 2 Sep 2021 18:06:34 +0000 (11:06 -0700)]
nn.functional.linear OpInfo (#61971)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61971
Test Plan: - wait for tests
Reviewed By: heitorschueroff
Differential Revision:
D30013750
Pulled By: zou3519
fbshipit-source-id:
ca41dbd98176c12e50ad1410a658f4b06fe99a1e
Eli Uriegas [Thu, 2 Sep 2021 17:56:57 +0000 (10:56 -0700)]
Revert
D30468409: Add fx2trt pass for removing duplicate output args
Test Plan: revert-hammer
Differential Revision:
D30468409 (https://github.com/pytorch/pytorch/commit/
6da7552a8eaae6b85e271bf3edac2fa2ae9f1148)
Original commit changeset:
b4d91b76ab5d
fbshipit-source-id:
e138dc425fe55ffe3585ea5fac4db476931bafed
Hui Guo [Thu, 2 Sep 2021 17:40:02 +0000 (10:40 -0700)]
[tensorexpr] Wrap error msgs with buildErrorMessages for internal asserts (#64409)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64409
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision:
D30717786
Pulled By: huiguoo
fbshipit-source-id:
a3b147d339ff4927f14efa24407cd3b63d80001d
Kefei Lu [Thu, 2 Sep 2021 17:38:43 +0000 (10:38 -0700)]
Add fx2trt pass for removing duplicate output args (#64433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64433
Fx2TRT does not support duplicate nodes in the output args tuple.
This pass removes duplicate output args from the target subnets and fixes their uses in the top level module where the subnets are called. This pass must be called after acc split on the top-level net and subsequent calls to the acc trace on the subnets.
This pass will change both the subnets and top level module.
Test Plan:
Run:
```
buck run mode/opt -c python.package_style=inplace //caffe2/torch/fb/fx2trt/tests/passes/:test_remove_duplicate_output_args
```
Reviewed By:
842974287
Differential Revision:
D30468409
fbshipit-source-id:
b4d91b76ab5d8a5275d68dd48d1327a44c22568e
Jane Xu [Thu, 2 Sep 2021 16:50:56 +0000 (09:50 -0700)]
CI: Enable using labels to control GHA workflows (#64314)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62852
Sets a global environment variable containing a list of PR labels. For this PR, the PR_LABELS variable looks like:
```
[
"cla signed",
"ciflow/default"
]
```
confirmed in a run: https://github.com/pytorch/pytorch/runs/
3490072161?check_suite_focus=true
This information can be used in other workflow steps to control the logic. For example, if I want to force a build, I can label my PR with "force-build" and do something like the following in my build script:
```
if [[ "${PR_LABELS}" = *force-build* ]]; then
python setup.py install
else
#use cached wheel or something
fi
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64314
Reviewed By: driazati
Differential Revision:
D30714570
Pulled By: janeyx99
fbshipit-source-id:
80b060ee32643ddd22eb7b8ec548579c7ccf6441
Nicolas Hug [Thu, 2 Sep 2021 16:27:44 +0000 (09:27 -0700)]
Fixes and details to torchhub docs (#63783)
Summary:
This PR:
- adds a few details regarding the newly added `skip_validation` parameter https://github.com/pytorch/pytorch/pull/62139
- uses double-backticks instead of single-backticks since this is rst, not mardown.
- adds a few minor doc nits here and there
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63783
Reviewed By: zou3519
Differential Revision:
D30696658
Pulled By: NicolasHug
fbshipit-source-id:
6f01c7eb3cfcd7e17e4c33c09d193054fa18ad36
Thomas J. Fan [Thu, 2 Sep 2021 16:02:35 +0000 (09:02 -0700)]
TST Adds __repr__ and str to module info (#63737)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/61935
This PR adds `test_repr` to `test_modules`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63737
Reviewed By: gchanan
Differential Revision:
D30729642
Pulled By: jbschlosser
fbshipit-source-id:
c11a28bc0739abd3ed40727389dd28ed4069edad
Zhaoheng Ni [Thu, 2 Sep 2021 15:59:53 +0000 (08:59 -0700)]
Fix torch.istft length mismatch and window runtime error (#63469)
Summary:
The PR fixes two issues:
- See https://github.com/pytorch/pytorch/issues/62747 and https://github.com/pytorch/audio/issues/1409. The length mismatch when the given ``length`` parameter is longer than expected. Add padding logic in consistent with librosa.
- See https://github.com/pytorch/pytorch/issues/62323. The current implementations checks if the min value of window_envelop.abs() is greater than zero. In librosa they normalize the signal on non-zero values by indexing. Like
```
approx_nonzero_indices = ifft_window_sum > util.tiny(ifft_window_sum)
y[approx_nonzero_indices] /= ifft_window_sum[approx_nonzero_indices]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63469
Reviewed By: fmassa
Differential Revision:
D30695827
Pulled By: nateanl
fbshipit-source-id:
d034e53f0d65b3fd1dbd150c9c5acf3faf25a164
Mike Iovine [Thu, 2 Sep 2021 15:12:48 +0000 (08:12 -0700)]
[Static Runtime] Add sign/abs/lop1p/mul fusion pass (#64209)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64209
Add a new fusion pass that turns transforms the following pattern:
```
graph(%input):
%0 : Tensor = aten::sign(%input)
%1 : Tensor = aten::abs(%input)
%2 : Tensor = aten::log1p(%1)
%res : Tensor = aten::mul(%0, %2)
return (%res)
```
Into a single op:
```
graph(%input):
%res : Tensor = static_runtim::signed_log1p(%input)
return (%res)
```
The intent is to reduce the number of passes over the tensor. However, enabling this pass actually causes a performance regression, probably due to a lack of vectorization in the fused implementation. Because of this issue, this diff **does not** enable this pass.
Followup: navahgar will add an NNC kernel which is faster than the the unfused version and enable this pass. We still need this version as a fallback since the NNC kernel will not support all dtypes.
Test Plan:
`buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- SignedLog1p`
Test passed with new graph pass disabled and enabled.
Reviewed By: hlu1
Differential Revision:
D30559929
fbshipit-source-id:
e4e080cb2e6a705cfdde1fc98bee92b723f8132a
CodemodService FBSourceClangFormatLinterBot [Thu, 2 Sep 2021 15:10:37 +0000 (08:10 -0700)]
[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT`
Reviewed By: zertosh
Differential Revision:
D30710635
fbshipit-source-id:
e8dae05a7e3a19d656067a4f102aab4a3c93ac42
Seth Elliott [Thu, 2 Sep 2021 14:48:47 +0000 (07:48 -0700)]
Fix broken caffe2 test: PlanExecutorTest.BlockingErrorPlan (#64401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64401
PlanExecutorTest.BlockingErrorPlan uses `ASSERT_DEATH` which internally performs a `fork()`. This can cause problems under certain configurations that use threads. This change updates this test to use the "threadsafe" style for GTest death tests in order to improve its quality in multithreaded environments.
Test Plan:
I confirmed that this change fixes the issue on my devvm with the following command:
```
buck test mode/dev //caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest.BlockingErrorPlan
```
Reviewed By: praihan
Differential Revision:
D30709447
fbshipit-source-id:
12ffd9ad0371e2e5b43a9873c80568e5ab02d246
Michael Dagitses [Thu, 2 Sep 2021 13:49:09 +0000 (06:49 -0700)]
simplify op name determination into a single forward pass (#64261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64261
Note that this does not preserve byte-for-byte compatibility with
existing names.
Test Plan:
* Rely on CI to catch gross errors.
* Merge after release cut to catch subtle issues.
Reviewed By: albanD
Differential Revision:
D30700647
Pulled By: dagitses
fbshipit-source-id:
7b02f34b8fae3041240cc78fbc6bcae498c3acd4
Vasiliy Kuznetsov [Thu, 2 Sep 2021 13:12:07 +0000 (06:12 -0700)]
fix copy.deepcopy on LinearPackedParams (#64367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64367
This is the same thing as https://github.com/pytorch/pytorch/pull/56154
but for quantized linear. It fixes the behavior of `copy.deepcopy` on
these modules. Before this PR, copied instances of `LinearPackedParams`
were not properly initialized, and inspecting them raised errors of
missing `_modules`. After this PR, inspecting and using the copies
works.
Test Plan:
```
python test/test_quantization.py TestStaticQuantizedModule.test_linear_api
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision:
D30702667
fbshipit-source-id:
38c26d1e72663416eeb989985b77ffc2052c12b9
Ivan Kobzarev [Thu, 2 Sep 2021 12:27:59 +0000 (05:27 -0700)]
[jit] shape propagation for prepack (#63585)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63585
Test Plan: Imported from OSS
Reviewed By: malfet
Differential Revision:
D30428905
Pulled By: IvanKobzarev
fbshipit-source-id:
c18f6605a69b2e000bdf14a23e637c5a1c2ec64c
Michael Dagitses [Thu, 2 Sep 2021 11:04:59 +0000 (04:04 -0700)]
extract TestAutogradComplex into its own test file (#63400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63400
This is the first step to break up test_autograd.py for #63205.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision:
D30541499
Pulled By: dagitses
fbshipit-source-id:
8d9d32007938b9eade0e88f95a6a3190e7e2ef01
Michael Dagitses [Thu, 2 Sep 2021 11:04:59 +0000 (04:04 -0700)]
require that `TARGET_DET_LIST` is sorted (and sort it here) (#64102)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64102
We sort this list so that we may add comments to indicate the absence
of a file right where that file would need to be put. This makes it
difficult to wrongly add such a file.
The sorting itself was done programmatically to ensure that no entries
were inadvertently removed.
I printed the sorted list with:
```
for p in sorted(TARGET_DET_LIST):
print(f' "{p}",')
```
Then copied it back into the file.
Test Plan: Imported from OSS
Reviewed By: driazati
Differential Revision:
D30625076
Pulled By: dagitses
fbshipit-source-id:
cf36fcb3e53e274b76d1f4aae83da1f53c03f9ed
Nicolas Hug [Thu, 2 Sep 2021 10:48:44 +0000 (03:48 -0700)]
Fix list() and help() torchhub functions for Windows (#63773)
Summary:
This PR Fixes the help() and list() torchhub functions which were probably failing for Windows since the `/` OS separator was hardcoded.
Before merging this I need to double check whether the CI actually runs the corresponding tests on Windows or not
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63773
Reviewed By: zou3519
Differential Revision:
D30695664
Pulled By: NicolasHug
fbshipit-source-id:
fac328163fd05db804a8186ae28f22b3cc3a6404
Nicolas Hug [Thu, 2 Sep 2021 10:46:59 +0000 (03:46 -0700)]
Remove outdated comment in hub.py (#63757)
Summary:
This PR removes an outdated comment about Python2 that was orginally introduced in https://github.com/pytorch/pytorch/pull/25083/files. The code has changed since then, but the comment wasn't removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63757
Reviewed By: zou3519
Differential Revision:
D30695656
Pulled By: NicolasHug
fbshipit-source-id:
431cf414588b9e5a1ad6acdae724ff5af1b16971
Nicolas Hug [Thu, 2 Sep 2021 10:45:06 +0000 (03:45 -0700)]
Update hub.load() signature to avoid polluting kwargs param (#63755)
Summary:
This PR addresses an old comment about Python2 EOL, directly putting some parameters in the function signature instead of in a `**kargs` dict.
I believe the changes are fully backward compatible.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63755
Reviewed By: zou3519
Differential Revision:
D30695634
Pulled By: NicolasHug
fbshipit-source-id:
398f347c5a04bfb58e77e46773a869cb9d0eb225
Kefei Lu [Thu, 2 Sep 2021 08:17:56 +0000 (01:17 -0700)]
Fix TRTModule not adding outputs in order (#64418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64418
In T99368564, we found that when running TRT lowered module, the output tensors are out-of-order, as compared to the output from the original, non-lowered module. It turns out that in `TRTModule.forward()`, we cannot rely on `ICudaEngine` bindings natural order indices to create the output tensors, but rather, we should explicitly construct the output tensor from the bindings' names, in an ordered that we supply.
Test Plan:
* Arc lint
* Run CI/sandcastle tests
* Run GPU lowering using commands and code changes in
D30171741 and ensure we don't observe out-of-order outputs
Reviewed By: yinghai
Differential Revision:
D30693545
fbshipit-source-id:
32a894ceeb148fcf4e8d279be3835c7d1f1aa2ba
Kushashwa Ravi Shrimali [Thu, 2 Sep 2021 08:08:53 +0000 (01:08 -0700)]
Port `gather` to structured kernel (#63312)
Summary:
Will add a description once this is ready for review.
cc: ysiraichi ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63312
Reviewed By: iramazanli
Differential Revision:
D30597447
Pulled By: ezyang
fbshipit-source-id:
d36e59835c2f4b38e286032dd2a1111a7e16b7e5
Pavel Belevich [Thu, 2 Sep 2021 07:57:39 +0000 (00:57 -0700)]
Replace std::unordered_map<c10::Device, c10::Device> with DeviceMap (#64393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64393
cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23
Test Plan: Imported from OSS
Reviewed By: rohan-varma
Differential Revision:
D30708384
Pulled By: pbelevich
fbshipit-source-id:
1c565727e4f09cd9e560874dd90aa403470b4a97
Chen Lai [Thu, 2 Sep 2021 07:50:40 +0000 (00:50 -0700)]
[PyTorch Edge] Support default args with out arg, flag off (#63540)
Summary:
1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag.
2. Add two unittests to cover this type of operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540
ghstack-source-id:
137211562
Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
```
Reviewed By: raziel, iseeyuan, tugsbayasgalan
Differential Revision:
D30414156
fbshipit-source-id:
0f3a219a22aee10ac53184cbd95940726c459d1f
Edward Yang [Thu, 2 Sep 2021 07:48:03 +0000 (00:48 -0700)]
Remove unnecessary resize_output (#64272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64272
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: H-Huang, bdhirsh
Differential Revision:
D30686941
Pulled By: ezyang
fbshipit-source-id:
de60e6f1115648f8cf7daaa1e652594fe8b06742
Shirong Wu [Thu, 2 Sep 2021 05:09:42 +0000 (22:09 -0700)]
Move graph util to fx2trt (#64064)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64064
Move original util in torch2trt to fx2trt dir since torch2trt is gonne be deprecated. This is a follow up diff for
D30379124
Test Plan: manual
Reviewed By: yinghai, mikekgfb
Differential Revision:
D30591687
fbshipit-source-id:
ae0e59dfbc2d2e2aa4f3ccea7cff2291c7deb388
Edward Yang [Thu, 2 Sep 2021 04:48:36 +0000 (21:48 -0700)]
Add a warning about DataLoader num_workers > 0 "memory leak" (#64337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64337
See https://github.com/pytorch/pytorch/issues/13246
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision:
D30690320
Pulled By: ezyang
fbshipit-source-id:
2751aca05a94e63d25162599f458855988516fad
Rohan Varma [Thu, 2 Sep 2021 04:07:01 +0000 (21:07 -0700)]
[Dist CI] Move rest of distributed tests to their own CI job (#64253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64253
Follow up to
D30496178 (https://github.com/pytorch/pytorch/commit/
f4aff3a346a0525e37d6071f318f7a4c54d5e1fb) to move the rest of distributed tests to their own jobs for Linux GHA.
ghstack-source-id:
137233785
Test Plan: CI
Reviewed By: walterddr
Differential Revision:
D30662999
fbshipit-source-id:
f7cfbc0d1223aca52120f17f9da987d70fda8de6
Rohan Varma [Thu, 2 Sep 2021 01:12:02 +0000 (18:12 -0700)]
[DDP] Log num threads (#64072)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64072
Log gloo threads to DDP logging.
ghstack-source-id:
137119480
Test Plan: CI
Reviewed By: mrshenli
Differential Revision:
D30596083
fbshipit-source-id:
2b4f6e762cb5d850be6056bcc5922029a1af3c91
Zeina Migeed [Thu, 2 Sep 2021 01:04:19 +0000 (18:04 -0700)]
add documentation to shape inference algorithm (#64312)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64312
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision:
D30709254
Pulled By: migeed-z
fbshipit-source-id:
3297d26fe6727c5b9ca176625b1683d787f59659
Yi Wang [Thu, 2 Sep 2021 00:32:39 +0000 (17:32 -0700)]
[DDP Comm Hook] Add debugging communication hooks to ddp_comm_hooks.rst (#64352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64352
as title
ghstack-source-id:
137246253
Test Plan: N/A
Reviewed By: rohan-varma
Differential Revision:
D30694089
fbshipit-source-id:
a78110b11d59bb0718f43c99ede23f2fd8ab21d0
Yi Wang [Thu, 2 Sep 2021 00:32:39 +0000 (17:32 -0700)]
[DDP Comm Hook] Create a noop hook for performance debugging (#64344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64344
As title.
Additionally, avoid using numpy array in test_ddp_hooks.py.
ghstack-source-id:
137170449
Test Plan: buck test mode/dev-nosan caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks -- test_ddp_comm_hook_noop_hook
Reviewed By: rohan-varma
Differential Revision:
D30693220
fbshipit-source-id:
e17f0d1c6198863cf20a53566f586a6bff602522
Rohan Varma [Thu, 2 Sep 2021 00:04:37 +0000 (17:04 -0700)]
[DDP] Add more logging iterations (#64071)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64071
Adding more logging iterations to get additional data.
ghstack-source-id:
137119476
Test Plan: CI
Reviewed By: mrshenli
Differential Revision:
D30579367
fbshipit-source-id:
57195266ada5e5926f0d8eaf4fb4e01dc98924d7
Rohan Varma [Wed, 1 Sep 2021 23:25:00 +0000 (16:25 -0700)]
Fix incorrect DDP test (#64074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64074
Previous PR https://github.com/pytorch/pytorch/pull/63831 did not actually test the error in https://github.com/pytorch/pytorch/issues/63812. Introduce a test
directly from the repro that simulates it.
ghstack-source-id:
137171460
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision:
D30569719
fbshipit-source-id:
fd61250ef6d291c093607663d91d6d2cb5574eb7
Rohan Varma [Wed, 1 Sep 2021 23:21:31 +0000 (16:21 -0700)]
[c10d] Prefer use of torch_check (#63928)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63928
throw std::invalid_argument results in not getting stacktraces with
TORCH_SHOW_CPP_STACKTRACES=1, so instead prefer torch_check here.
ghstack-source-id:
137135328
Test Plan: CI
Reviewed By: mrshenli
Differential Revision:
D30533955
fbshipit-source-id:
33e5bf4f449e3043dec68da93f8022f6624d9675
anjali411 [Wed, 1 Sep 2021 23:11:38 +0000 (16:11 -0700)]
Add fast path for addmm when the inputs are conjugate (#59380)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59380
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision:
D28898374
Pulled By: anjali411
fbshipit-source-id:
eab0e64d37bb57c18b54cabb8e5c00666338ba04
Yi Wang [Wed, 1 Sep 2021 23:09:46 +0000 (16:09 -0700)]
[DDP Comm Hook] Add bf16 gradient compression to ddp_comm_hooks.rst (#64346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64346
as title
ghstack-source-id:
137170288
Test Plan: N/A
Reviewed By: rohan-varma
Differential Revision:
D30693513
fbshipit-source-id:
8c64b8404ff3b0322e1bbbd93f6ef051ea91307d
Jerry Zhang [Wed, 1 Sep 2021 22:48:54 +0000 (15:48 -0700)]
[quant][graphmode][fx] Add fbgemm backend_config_dict (#64288)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64288
This is just to setup the file structure and unblock experimentation.
The format for backend_config_dict will change in the future
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: zou3519
Differential Revision:
D30699457
fbshipit-source-id:
28211a4def05d34757850c045a36e311f54760fe
Santiago Castro [Wed, 1 Sep 2021 22:18:14 +0000 (15:18 -0700)]
Make datasets in `ConcatDataset` not need to be sized (#64114)
Summary:
`datasets` needs to be iterable, but also sized because the length is checked. But immediately after it's converted to a list. By changing the order of these 2 lines, it doesn't need to be sized anymore.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64114
Reviewed By: H-Huang
Differential Revision:
D30641480
Pulled By: ejguan
fbshipit-source-id:
7e16548c2123afa65b83845f9929271fa07fe1e8
Richard Zou [Wed, 1 Sep 2021 22:12:05 +0000 (15:12 -0700)]
Restore LayerNorm numerics test (#64385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64385
It was deleted in https://github.com/pytorch/pytorch/pull/63276.
The numerics test was meant to check LayerNorm behavior on large inputs,
but we deleted it without realizing that.
Test Plan: - wait for tests.
Reviewed By: ngimel
Differential Revision:
D30702950
Pulled By: zou3519
fbshipit-source-id:
a480e26c45ec38fb628938b70416cdb22d976a46
Jerry Zhang [Wed, 1 Sep 2021 21:56:14 +0000 (14:56 -0700)]
[quant][graphmode][api] Add backend_config_dict to prepare_fx api (#64135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64135
We want to start aligning the api with the design in https://github.com/pytorch/pytorch/wiki/Extending-PyTorch-Quantization-to-Custom-Backends
We plan to gradually move things from `prepare_custom_config_dict` and `convert_custom_config_dict`
to `backend_config_dict` and allow custom backend developer to define their own way of quantizing operators.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: zou3519
Differential Revision:
D30699456
fbshipit-source-id:
e3c068da8d3da2270f57719f7159cc71cafa8598
zhouzhuojie [Wed, 1 Sep 2021 21:53:25 +0000 (14:53 -0700)]
Silent rm error for sccache log file (#64388)
Summary:
Sample reporting from dr.ci
![image](https://user-images.githubusercontent.com/658840/
131724645-
75afa04f-7554-4674-8e7c-
cf139c84d994.png)
The `rm` command is not actually running into problems, just need to silent the console output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64388
Reviewed By: walterddr, malfet, seemethere
Differential Revision:
D30704439
Pulled By: zhouzhuojie
fbshipit-source-id:
ecd35531decf05b75cef30d08d46635f81112f67
Yuchen Huang [Wed, 1 Sep 2021 21:48:00 +0000 (14:48 -0700)]
[xplat][metal] Add getters and setters for ivars in Conv2dOpContext (#57395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57395
As title
ghstack-source-id:
137223806
(Note: this ignores all push blocking failures!)
Test Plan:
### Lib Build
- `buck build caffe2:aten_metal_prepack`
### Integration Test
- `arc focus2 pp-ops -a ModelRunner`
- Click "Test Person/Hair Segmentation Model"
{
F612831435}
- Image Classification Demo
{
F614144868}
Reviewed By: xta0
Differential Revision:
D28132020
fbshipit-source-id:
73560263a9d14e9ecfa39c69deb158a2ed8cb179
Meghan Lele [Wed, 1 Sep 2021 21:24:54 +0000 (14:24 -0700)]
[structured] Preserve computed elements from meta func to impl (#61746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61746
**Summary**
This commit introduces a new feature for structured kernels that allows
kernels to declare quantities as "precomputed" in
`native_functions.yaml`, compute them once in the `meta` function and
reuse them again in the `impl`. The names and types of these quantities
are used to generate code for a struct containing them that the `meta`
function must return. In the case of a handful of surveyed kernels
(`all,`, `any`, `avg_pool2d`), these quantities that are used both in
the `meta` and `impl` have the same meaning as certain kernel arguments
and in fact supersede them. Accordingly, the correspondence between a
kernel argument and the precomputed elements that supersede it is also
captured in `native_functions.yaml`. This information is used to unpack
the struct returned by `meta` and pass its contents correctly to the
`impl` function.
The primary goal is to avoid recompute and enhance developer experience
(e.g. sometimes people can forget to compute these elements while
porting a kernel).
Test Plan: Imported from OSS
Reviewed By: tugsbayasgalan
Differential Revision:
D30407831
Pulled By: SplitInfinity
fbshipit-source-id:
00975525ea373721fe52d06f75cd4ac91f3dc556