Don Jang [Thu, 26 Aug 2021 15:08:53 +0000 (08:08 -0700)]
[Static Runtime] Disable out variant of aten::clone (#63980)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63980
The out variant implementation of `aten::clone` causes a crash, which needs further investigation. This change disables it until the problem gets fixed.
Note that `inline_cvr` doesn't use `aten::clone` as of now, so no perf implication: https://www.internalfb.com/phabricator/paste/view/P446858755?lines=121
Test Plan: N/A
Reviewed By: hlu1
Differential Revision:
D30544149
fbshipit-source-id:
facb334d67473f622b36862fbdb2633358556fdf
Rong Rong (AI Infra) [Thu, 26 Aug 2021 15:00:48 +0000 (08:00 -0700)]
[CI] move distributed test into its own CI job (#62896)
Summary:
Moving distributed to its own job.
- [x] ensure there should be a distributed test job for every default test job matrix (on GHA)
- [x] ensure that circleci jobs works for distributed as well
- [x] waiting for test distributed to have its own run_test.py launch options, see https://github.com/pytorch/pytorch/issues/63147
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62896
Reviewed By: seemethere
Differential Revision:
D30230856
Pulled By: walterddr
fbshipit-source-id:
0cad620f6cd9e56c727c105458d76539a5ae976f
albanD [Thu, 26 Aug 2021 14:48:20 +0000 (07:48 -0700)]
remove special grad_mode tls handling (#63116)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63116
This PR removes the special flag to disable grad mode tracking on the ThreadLocalState and replaces it with an explicit setter that users can use.
This allows to reduce complexity of ThreadLocalState.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision:
D30388098
Pulled By: albanD
fbshipit-source-id:
85641b3d711179fb78ff6a41ed077548dc821a2f
Heitor Schueroff [Thu, 26 Aug 2021 14:17:24 +0000 (07:17 -0700)]
Added API tests to ReductionOpInfo and ported amax/amin/nansum tests (#62899)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62899
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision:
D30408816
Pulled By: heitorschueroff
fbshipit-source-id:
6cb0aa7fa7edba93549ef873baa2fb8a003bd91d
Edward Yang [Thu, 26 Aug 2021 13:58:12 +0000 (06:58 -0700)]
Deify opmath_t into its own header, align with accscalar_t (#63986)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63986
Fixes #63985
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: malfet
Differential Revision:
D30555996
Pulled By: ezyang
fbshipit-source-id:
b6e4d56a5658ed028ffc105cc4b479faa6882b65
Heitor Schueroff [Thu, 26 Aug 2021 13:05:28 +0000 (06:05 -0700)]
[OpInfo] Added ReductionOpInfo subclass of OpInfo and ported sum test (#62737)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62737
ReductionOpInfo is a specialization of OpInfo for reduction operators. For now, it is designed to work with reductions that return a single tensor and that reduce all elements along one or more dimensions to a single value. In particular this excludes operators such as `max` and `min` that return multiple tensors and `quantile` that can return multiple values.
fixes https://github.com/pytorch/pytorch/issues/49746
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision:
D30406568
Pulled By: heitorschueroff
fbshipit-source-id:
218b1da1902f67bcf4c3681e2a0f0029a25d51f1
Luca Wehrstedt [Thu, 26 Aug 2021 12:43:05 +0000 (05:43 -0700)]
Update TensorPipe submodule
Summary: The bot failed to do it.
Test Plan:
D30542677
Reviewed By: beauby
Differential Revision:
D30573500
fbshipit-source-id:
50abd6fc415cead0a6b6d9290fa0e5f97d0e4989
Michael Dagitses [Thu, 26 Aug 2021 11:42:36 +0000 (04:42 -0700)]
use `const auto&` as type for grad alias (#63949)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63949
This is an extension of the discussion in
https://github.com/pytorch/pytorch/pull/63040#discussion_r687793027.
Test Plan: Imported from OSS
Reviewed By: albanD
Differential Revision:
D30546789
Pulled By: dagitses
fbshipit-source-id:
3046aff4f129d5492d73dfb67717a824e16ffee8
Kefei Lu [Thu, 26 Aug 2021 07:51:53 +0000 (00:51 -0700)]
Add logging for _MinimizerBase
Summary: Add logging so we know which nodes are currently being visited
Test Plan: lint & SC tests
Reviewed By:
842974287
Differential Revision:
D30509865
fbshipit-source-id:
09e77e44c97c825242e0b24f90463b50f3ca19c6
Rohan Varma [Thu, 26 Aug 2021 06:48:58 +0000 (23:48 -0700)]
Fix issue re: DDP and create_graph=True (#63831)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63831
Closes https://github.com/pytorch/pytorch/issues/63812
`at::mul_out` is not supported when `grad` itself requires grad, which is useful for computing higher order derivatives.
In this case, fall back to a mul + copy instead of mul_out.
ghstack-source-id:
136614644
Test Plan: UT
Reviewed By: SciPioneer
Differential Revision:
D30505573
fbshipit-source-id:
83532b6207b3d80116fcc4dff0e5520d73b3454f
Marjan Fariborz [Thu, 26 Aug 2021 06:40:09 +0000 (23:40 -0700)]
Adding BFP16 quantization/dequantization support to OSS (#63059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63059
Supporting BFP16 quantization method to OSS. Currently only support CPU
ghstack-source-id:
136639528
Test Plan: Imported from OSS
Reviewed By: wanchaol
Differential Revision:
D30194538
fbshipit-source-id:
ac248567ad8028457c2a91b77ef2ce81709fce53
Kiuk Chung [Thu, 26 Aug 2021 05:56:33 +0000 (22:56 -0700)]
(torch.distributed) Add torch.distributed.is_torchelastic_launched() util method + make init_method=tcp:// compatible with torchelastic (#63910)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63910
Addresses the current issue that `init_method=tcp://` is not compatible with `torch.distributed.run` and `torch.distributed.launch`. When running with a training script that initializes the process group with `init_method=tcp://localhost:$port` as such:
```
$ python -u -m torch.distributed.run --max_restarts 0 --nproc_per_node 1 --nnodes 1 --master_addr $(hostname) --master_port 6000 ~/tmp/test.py
```
An `Address in use` error is raised since the training script tries to create a TCPStore on port 6000, which is already taken since the elastic agent is already running a TCPStore on that port.
For details see: https://github.com/pytorch/pytorch/issues/63874.
This change does a couple of things:
1. Adds `is_torchelastic_launched()` check function that users can use in the training scripts to see whether the script is launched via torchelastic.
1. Update the `torch.distributed` docs page to include the new `is_torchelastic_launched()` function.
1. Makes `init_method=tcp://` torchelastic compatible by modifying `_tcp_rendezvous_handler` in `torch.distributed.rendezvous` (this is NOT the elastic rendezvous, it is the old rendezvous module which is slotted for deprecation in future releases) to check `is_torchelastic_launched()` AND `torchelastic_use_agent_store()` and if so, only create TCPStore clients (no daemons, not even for rank 0).
1. Adds a bunch of unittests to cover the different code paths
NOTE: the issue mentions that we should fail-fast with an assertion on `init_method!=env://` when `is_torchelastic_launched()` is `True`. There are three registered init_methods in pytorch: env://, tcp://, file://. Since this diff makes tcp:// compatible with torchelastic and I've validated that file is compatible with torchelastic. There is no need to add assertions. I did update the docs to point out that env:// is the RECOMMENDED init_method. We should probably deprecate the other init_methods in the future but this is out of scope for this issue.
Test Plan: Unittests.
Reviewed By: cbalioglu
Differential Revision:
D30529984
fbshipit-source-id:
267aea6d4dad73eb14a2680ac921f210ff547cc5
Joseph Spisak [Thu, 26 Aug 2021 05:49:22 +0000 (22:49 -0700)]
Update persons_of_interest.rst (#63907)
Summary:
Fixes #{issue number}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63907
Reviewed By: jspisak
Differential Revision:
D30534972
Pulled By: dzhulgakov
fbshipit-source-id:
ba726fc53e292a362c387cc8b5f7776ca2a2544c
Philip Meier [Thu, 26 Aug 2021 05:04:44 +0000 (22:04 -0700)]
enable equal_nan for complex values in isclose (#63571)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63571
Test Plan: Imported from OSS
Reviewed By: malfet, ngimel
Differential Revision:
D30560127
Pulled By: mruberry
fbshipit-source-id:
8958121ca24e7c139d869607903aebbe87bc0740
nikithamalgi [Thu, 26 Aug 2021 04:47:50 +0000 (21:47 -0700)]
Clean up related to type refinements (#62444)
Summary:
Creates a helper function to refine the types into a torchScript compatible format in the monkeytype config for profile directed typing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62444
Reviewed By: malfet
Differential Revision:
D30548159
Pulled By: nikithamalgifb
fbshipit-source-id:
7c09ce5f5e043d069313b87112837d7e226ade1f
Zeina Migeed [Thu, 26 Aug 2021 03:42:14 +0000 (20:42 -0700)]
inference for algebraic expressions (#63822)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63822
Infer algebraic expressions and add it to our symbolic inferencer. Works for conv2D and can be extended to other operations.
Test Plan: Imported from OSS
Reviewed By: jamesr66a
Differential Revision:
D30518469
Pulled By: migeed-z
fbshipit-source-id:
b92dfa40b2d834a535177da42b851701b8f7178c
Zafar Takhirov [Thu, 26 Aug 2021 03:37:56 +0000 (20:37 -0700)]
[quant] Fixing the conversion of the quantizable RNN (#63879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63879
Quantizable RNN had a bug, where the `from_observed` was an instance method, instead of a class method. This caused the `tq.convert` to fail. This fixes the issue by making the `from_observed` a classmethod.
The tests were passing before because the unittests were not using the custom module path, but a conventional `from_float`, which is also supported.
Test Plan:
`buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm`
```
buck test mode/dev //caffe2/test:quantization -- test_custom_module_lstm
Parsing buck files: finished in 0.5 sec
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 9.2 sec (100%) 12622/12622 jobs, 2/12622 updated
Total time: 9.7 sec
More details at https://www.internalfb.com/intern/buck/build/
0d87b987-649f-4d06-b0e2-97b5077
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id:
cb99305f-65c9-438b-a99f-
a0a2a3089778
Trace available for this run at /tmp/tpx-
20210824-115652.540356/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/
5066549645030046
✓ ListingSuccess: caffe2/test:quantization - main (12.550)
✓ Pass: caffe2/test:quantization - test_custom_module_lstm (quantization.core.test_quantized_op.TestQuantizedOps) (174.867)
Summary
Pass: 1
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/
5066549645030046
```
Reviewed By: jerryzh168, mtl67
Differential Revision:
D30520473
fbshipit-source-id:
bc5d0b5bb079fd146e2614dd42526fc7d4d4f3c6
Zhengxu Chen [Thu, 26 Aug 2021 03:09:12 +0000 (20:09 -0700)]
Make frozen symbol name customizable in torch deploy. (#63817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63817
ghstack-source-id:
136699671
Test Plan: eyes
Reviewed By: wconstab
Differential Revision:
D29571559
fbshipit-source-id:
8e3caa4932ef8d7c8559f264f0e9bb5474ad2237
Natalia Gimelshein [Thu, 26 Aug 2021 01:17:10 +0000 (18:17 -0700)]
Compute cuda reduction buffer size in elements (#63969)
Summary:
Resubmit of https://github.com/pytorch/pytorch/issues/63885
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63969
Reviewed By: mruberry
Differential Revision:
D30549423
Pulled By: ngimel
fbshipit-source-id:
b16d25030d44ced789c125a333d72b02a8f45067
Jerry Zhang [Thu, 26 Aug 2021 00:50:48 +0000 (17:50 -0700)]
Back out "Revert
D30384746: [fx2trt] Add a test for quantized resnet18" (#63973)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63973
Original commit changeset:
b93235323e22
Test Plan: buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test
Reviewed By:
842974287
Differential Revision:
D30546036
fbshipit-source-id:
2c8302456f072d04da00cf9ad97aa8304bc5e43e
Philip Meier [Wed, 25 Aug 2021 23:42:14 +0000 (16:42 -0700)]
replace `self.assertTrue(torch.allclose(..))` with `self.assertEqual(…)` (#63637)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63565
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63637
Reviewed By: malfet
Differential Revision:
D30541266
Pulled By: mruberry
fbshipit-source-id:
ab461949782c6908a589ea098fcfcf5c3e081ee6
David Riazati [Wed, 25 Aug 2021 22:54:31 +0000 (15:54 -0700)]
Remove render_test_results job (#63877)
Summary:
This removes the `render_test_results` job we had before which had been causing some confusion among devs when it failed and isn't really necessary now that we can actually render test results on the PR HUD.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63877
Reviewed By: walterddr, janeyx99
Differential Revision:
D30546705
Pulled By: driazati
fbshipit-source-id:
55fdafdb6f80924d941ffc15ee10787cb54f34a1
John Clow [Wed, 25 Aug 2021 22:27:37 +0000 (15:27 -0700)]
[EASY] Update the clang-tidy error message (#63370)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63370
As shown by this CI run, the actual thing that is incorrect is the prompt.
https://github.com/pytorch/pytorch/actions/runs/
1137298261
The CI runs the below command instead of the original command.
The original command errors out when importing another file on line 1.
Trying to fix the code to work with the original command causes the CI to error out.
We should actually ask the user to run
`python3 -m tools.linter.install.clang_tidy`
Test Plan: Imported from OSS
Reviewed By: janeyx99, heitorschueroff
Differential Revision:
D30530216
Pulled By: Gamrix
fbshipit-source-id:
2a2b8d539dcc2839e4000c13e82c207fa89bfc9f
Peter Bell [Wed, 25 Aug 2021 22:05:14 +0000 (15:05 -0700)]
Shard python_torch_functions.cpp (#62187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62187
This file can take 3 minutes on its own to compile, and after
python_functions.cpp is the second limiting factor for compile time of
`libtorch_python` on a 32-core threadripper. This splits it into 3 files that
take around 1 minute each to compile.
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision:
D29962048
Pulled By: albanD
fbshipit-source-id:
99016d75912bff483fe21b130cef43a6882f8c0e
Jithun Nair [Wed, 25 Aug 2021 22:00:47 +0000 (15:00 -0700)]
Add note on ifdefing based on CUDA_VERSION for ROCm path (#62850)
Summary:
CUDA_VERSION and HIP_VERSION follow very unrelated versioning schemes, so it does not make sense to use CUDA_VERSION to determine the ROCm path. This note explicitly addresses it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62850
Reviewed By: mruberry
Differential Revision:
D30547562
Pulled By: malfet
fbshipit-source-id:
02990fa66a88466c2330ab85f446b25b78545150
John Clow [Wed, 25 Aug 2021 21:49:06 +0000 (14:49 -0700)]
Small fixes to the Contributing.txt (#63385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63385
Correcting a mistake for the pytorch uninstall, and
adding an extra note for Darwin.
Test Plan: Imported from OSS
Reviewed By: janeyx99, heitorschueroff
Differential Revision:
D30530234
fbshipit-source-id:
e0f88a1725eeadabfb4b28c1da11e369ee878ab4
Rong Rong (AI Infra) [Wed, 25 Aug 2021 21:34:40 +0000 (14:34 -0700)]
Back out "Temporary fix for remote gpu execution issue" (#63983)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63983
Test for fixes in
D30545351. it should resolve the remote execution flag being populated incorrectly issue.
Test Plan: CI
Reviewed By: malfet, seemethere
Differential Revision:
D30549443
fbshipit-source-id:
b3895909f5cd654ba163b77950872b332fbad3fe
Priya Ramani [Wed, 25 Aug 2021 20:08:12 +0000 (13:08 -0700)]
Shape Propagation Pass: Fix AdaptiveAveragePooling2d (#63629)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63629
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision:
D30461727
Pulled By: priyaramani
fbshipit-source-id:
3873d1d636f79185680b82de06174d8de288c941
driazati [Wed, 25 Aug 2021 19:58:24 +0000 (12:58 -0700)]
Move existing target determinator to tools (#63809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63809
This moves out the modulefinder determinator to `tools/testing` since it is supposed to be CI-only. This also simplifies run_test.py a little bit.
Test Plan: Imported from OSS
Reviewed By: malfet, seemethere, janeyx99
Differential Revision:
D30497438
Pulled By: driazati
fbshipit-source-id:
1d203037af5af6a20c1e7812da935e7cbb5cd82f
Yi Wang [Wed, 25 Aug 2021 19:46:09 +0000 (12:46 -0700)]
Add a comment on the potential implicit type up-casting (#63905)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63905
as title
ghstack-source-id:
136590703
Test Plan: N/A
Reviewed By: mrshenli
Differential Revision:
D30527929
fbshipit-source-id:
69402bbfa87cfd8fc166ce313cde9736ee072589
mingfeima [Wed, 25 Aug 2021 18:53:52 +0000 (11:53 -0700)]
add BFloat16 support for bernoulli and Dropout on CPU (#56372)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56372
Test Plan: Imported from OSS
Reviewed By: heitorschueroff
Differential Revision:
D28836792
Pulled By: VitalyFedyunin
fbshipit-source-id:
ede951d172a59276e11383fd767778ab959b5a6b
Howard Huang [Wed, 25 Aug 2021 18:53:24 +0000 (11:53 -0700)]
Update torch.distributed.run OMP_NUM_THREADS message to log.warning (#63953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63953
Closes #61138
Test:
`python -m torch.distributed.run --nproc_per_node 2 test.py`
Still outputs message
`LOGLEVEL=ERROR python -m torch.distributed.run --nproc_per_node 2 test.py`
Does not output message anymore
cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23
Test Plan: Imported from OSS
Reviewed By: malfet
Differential Revision:
D30542997
Pulled By: H-Huang
fbshipit-source-id:
e7da30dcda51516abf4e56f1f510132e44397027
zhouzhuojie [Wed, 25 Aug 2021 18:30:28 +0000 (11:30 -0700)]
Fix ciflow/all label generation (#63954)
Summary:
the `ciflow/all` is automatically added but need to be added before we call `gen_root_job_condition`.
- fix the order of adding `ciflow/all`
- refactor all the string into global constants
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63954
Reviewed By: malfet
Differential Revision:
D30545596
Pulled By: zhouzhuojie
fbshipit-source-id:
83ab668f0234488afb855a72e3ebd4503f7f1a78
driazati [Wed, 25 Aug 2021 18:19:49 +0000 (11:19 -0700)]
Reformat run_test.py (#63808)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63808
`black run_test.py`
Test Plan: Imported from OSS
Reviewed By: seemethere
Differential Revision:
D30497437
Pulled By: driazati
fbshipit-source-id:
41b29b73f41fa4bb15fce5eaa69f8efe614e02f7
Raghavan Raman [Wed, 25 Aug 2021 18:12:57 +0000 (11:12 -0700)]
[Static Runtime] Added caching for the NNC code generated for Logit. (#63840)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63840
Added NNC generated code for Logit to the cache.
```
Logit NNC Benchmark Time (ns)
w/o cache w/ cache
logit_nnc_sleef/64 543 536
logit_nnc_sleef/512 3517 3465
logit_nnc_sleef/8192 88483 85881
logit_nnc_sleef/32768 337016 323090
logit_nnc_fast/64 167 163
logit_nnc_fast/512 866 817
logit_nnc_fast/8192 13069 12801
logit_nnc_fast/32768 53429 52530
logit_nnc_vml/64 164 151
logit_nnc_vml/512 783 769
logit_nnc_vml/8192 11563 11674
logit_nnc_vml/32768 46720 46452
```
Test Plan: Unit tests and inline_cvr model.
Reviewed By: hlu1
Differential Revision:
D30405424
fbshipit-source-id:
938b1b74758e2612ae151bac890c5f8ebbc42d50
Raghavan Raman [Wed, 25 Aug 2021 18:12:57 +0000 (11:12 -0700)]
[Static Runtime] Added a variable for clamp in the NNC code for Logit. (#63839)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63839
Replaced the use of a constant for clamp in the NNC code for Logit
with a variable. This makes it easier to enable caching for Logit.
There is no performance difference with this change, as shown in the micro-benchmarks below.
```
Logit NNC Benchmark Time (ns)
const-clamp var-clamp
logit_nnc_sleef/64 550 543
logit_nnc_sleef/512 3514 3517
logit_nnc_sleef/8192 85537 82900
logit_nnc_sleef/32768 347635 337016
logit_nnc_fast/64 173 167
logit_nnc_fast/512 829 866
logit_nnc_fast/8192 13286 13069
logit_nnc_fast/32768 51116 53429
logit_nnc_vml/64 146 164
logit_nnc_vml/512 773 783
logit_nnc_vml/8192 11556 11563
logit_nnc_vml/32768 44815 46720
```
Test Plan: SR unit tests and the inline_cvr model.
Reviewed By: bertmaher
Differential Revision:
D30405466
fbshipit-source-id:
adb891fdae5746439931ce5f43165291fec08f52
Raghavan Raman [Wed, 25 Aug 2021 18:12:57 +0000 (11:12 -0700)]
[Static Runtime] Moved NNC operator definitions to separate files. (#63838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63838
Refactored NNC operator definitions code into separate files.
Made `TEWrapper` a class with a fixed set of methods and added separate definitions for them based on `TORCH_ENABLE_LLVM` to keep the same functionality as before.
Test Plan: Build and ran Static Runtime tests.
Reviewed By: hlu1
Differential Revision:
D30405467
fbshipit-source-id:
606ef852bb820d5e23a0f8af1bf5dc122e90bceb
Aayush Prakash [Wed, 25 Aug 2021 18:11:08 +0000 (11:11 -0700)]
[Reland] Replacing the p.data acccess in utils with tensor.set_ . Passes both test_post_localSGD_optimizer_pari and test_periodic_model_averager tests (#63895)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63895
When updating the model parameter, updating `parameter.data` is no longer recommended, because this `data` field will be deprecated in the future.
The replacement is `tensor.set_`.
ghstack-source-id:
136593433
Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager
buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity
Reviewed By: SciPioneer
Differential Revision:
D30526178
fbshipit-source-id:
a1ac0ec3665d8623edd5bf94f01c1132daff5c00
albanD [Wed, 25 Aug 2021 18:07:24 +0000 (11:07 -0700)]
clean up engine.cpp thread state (#63115)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63115
This actually changes:
- callbacks now run with proper grad mode even in worker threads
- graphtask's Future callbacks now run with proper TLS when erroring
out from a worker thread
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision:
D30388100
Pulled By: albanD
fbshipit-source-id:
7ae9c461c2f0040548dd9e1e314f25e8da0c2e67
Shiyan Deng [Wed, 25 Aug 2021 17:22:17 +0000 (10:22 -0700)]
[fx2trt] Check input device in TRTModule (#63893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63893
Add a check to ensure all the inputs are on cuda device.
Test Plan: CI
Reviewed By: kflu, houseroad
Differential Revision:
D30525265
fbshipit-source-id:
6e50b70fd535defc1f802d51e8bb991b2dd73741
riship [Wed, 25 Aug 2021 16:56:41 +0000 (09:56 -0700)]
bf16 Error message cleanup as well as addition of is_bf16_supported (#63798)
Summary:
ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63798
Reviewed By: heitorschueroff
Differential Revision:
D30526187
Pulled By: ngimel
fbshipit-source-id:
c484aec14638097c96c720095d3491249b6b2d14
Karen Zhou [Wed, 25 Aug 2021 16:55:02 +0000 (09:55 -0700)]
[pruner] add getter for pruned outputs in base pruner (#63520)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63520
Rather than having to call `module.parametrizations.weight[0].pruned_outputs` each time we need to access the set of pruned indices, we add a getter `get_module_pruned_outputs` which takes the module as an argument and returns the set.
This is used for testing.
ghstack-source-id:
136561130
Test Plan:
` buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`
https://pxl.cl/1N4gK
Reviewed By: z-a-f
Differential Revision:
D30374558
fbshipit-source-id:
e38dfee0879cadde52b942e899a3d8d7151ee493
Karen Zhou [Wed, 25 Aug 2021 16:55:02 +0000 (09:55 -0700)]
[pruner] add support for pruning BatchNorm2d (#63519)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63519
If the pruner should be pruning biases along with weights, then if the model has BatchNorm2d following pruned Conv2d layers, then the corresponding channels of the BatchNorm must also be pruned.
Specifically, they need to zeroed out, rather than fully removed, since in eager mode, the dimensions between layers need to be preserved.
To do this, we add a pruning parametrization called `ZeroesParametrization` which zeroes out pruned channels, rather than removing them.
The user must provide in the config, a tuple of the Conv2d and BatchNorm layers that go together. The `prepare` method will add the tuple to the `module_groups`; then it will add a PruningParametrization to the Conv2d layer, and a ZeroesParametrization to BatchNorm, and then set their pruned sets to be the same set. That way, during `step`, both masks are updated with the same pruned indices.
ghstack-source-id:
136562278
Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`
https://pxl.cl/1N1P6
Reviewed By: z-a-f
Differential Revision:
D30349855
fbshipit-source-id:
3199d3688d5a70963f9b32d7a8fdac3962ae6a65
Peter Bell [Wed, 25 Aug 2021 16:35:26 +0000 (09:35 -0700)]
Minor OptionalTensorRef updates (#63611)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63611
A few minor updates to `OptionalTensorRef`:
1. use `Tensor`'s `unsafe_borrow_t` constructor which avoids an unnecesary `nullptr` check.
2. copy constructor cannot defer to the `const Tensor&` constructor because it checks the tensor is
defined, and so would fail for disengaged optionals.
3. use copy-swap idiom to avoid issues with self-assignment. `x = x` should be a no-op, but the old
version would clear `x`.
4. Add pointer-like access for consistency with `optional` and `MaybeOwned`
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision:
D30484704
Pulled By: ezyang
fbshipit-source-id:
738f4bd22359eaecd0a519a04e89a4b44d92da5b
Nikita Shulga [Wed, 25 Aug 2021 16:24:27 +0000 (09:24 -0700)]
Update CMake minimum version to 3.10 (#63660)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63660
Test Plan: Imported from OSS
Reviewed By: janeyx99, mruberry
Differential Revision:
D30543878
fbshipit-source-id:
a7d938807653f39727f2cc7d7ca167200567b6a0
Rong Rong (AI Infra) [Wed, 25 Aug 2021 16:04:28 +0000 (09:04 -0700)]
Temporary fix for remote gpu execution issue (#63899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63899
See: T99020845
Test Plan: sandcastle
Reviewed By: heitorschueroff
Differential Revision:
D30527384
fbshipit-source-id:
ce9933e5e181322c02d4ed17f3fdaabe4c5ba29e
Ansley Ussery [Wed, 25 Aug 2021 16:01:50 +0000 (09:01 -0700)]
Fix bug in `check_empty_containers` (#63492)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63492
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision:
D30402749
Pulled By: ansley
fbshipit-source-id:
7de533355fe91ca4f45b2bafc3bfb205a028c1ed
Jane Xu [Wed, 25 Aug 2021 16:00:13 +0000 (09:00 -0700)]
Swap CUDA 11.1 and 11.3 in CI to make 11.1 periodic (#63900)
Summary:
Preparing for supporting 11.3 in the next release.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63900
Reviewed By: malfet
Differential Revision:
D30541437
Pulled By: janeyx99
fbshipit-source-id:
a7297da7f7818a4291b1c321d62d76fc2c0f1f90
zhouzhuojie [Wed, 25 Aug 2021 15:50:00 +0000 (08:50 -0700)]
[skip ci] Add generated comment to ruleset json (#63896)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63896
Reviewed By: heitorschueroff
Differential Revision:
D30529820
Pulled By: zhouzhuojie
fbshipit-source-id:
7529803af23ea36a7bcb673cd399da80da8e3feb
Alban Desmaison [Wed, 25 Aug 2021 14:15:18 +0000 (07:15 -0700)]
Revert
D30526034: [pytorch][PR] compute reduction intermediate buffer size in elements
Test Plan: revert-hammer
Differential Revision:
D30526034 (https://github.com/pytorch/pytorch/commit/
e69a1398cbe534874060460faf36af21d24ce6e7)
Original commit changeset:
0aca7f887974
fbshipit-source-id:
a22472723818d6fe0c11a6e134080df1ac408038
Linbin Yu [Wed, 25 Aug 2021 07:42:03 +0000 (00:42 -0700)]
Revert
D30384746: [fx2trt] Add a test for quantized resnet18
Test Plan: revert-hammer
Differential Revision:
D30384746 (https://github.com/pytorch/pytorch/commit/
10dfa58eba055a1bbc1cc89df033cd2815cbb403)
Original commit changeset:
1a8638777116
fbshipit-source-id:
b93235323e229b391f5456f6e3543988062dd0d4
Jerry Zhang [Wed, 25 Aug 2021 04:33:12 +0000 (21:33 -0700)]
[fx2trt] Add a test for quantized resnet18 (#63446)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63446
Add a test for quantized resnet18 running in TensorRT
Test Plan: buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test
Reviewed By:
842974287
Differential Revision:
D30384746
fbshipit-source-id:
1a863877711618cd23d887694269ed9e44ee606c
Jerry Zhang [Wed, 25 Aug 2021 04:28:40 +0000 (21:28 -0700)]
[quant][graphmode][fx] Make maxpool and flatten produce the reference pattern (#63501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63501
Currently some of the ops are considered as working with both float and quantized input,
so we may have things like "quant - some_op - dequant" this might not work well with the backend,
we may consider change everything to produce "quant - dequant - some_op - quant - dequant" instead
in the future, this PR fixes it for maxpool and flatten only to unblock resnet benchmarking on TensorRT
Test Plan:
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: mruberry
Differential Revision:
D30402788
fbshipit-source-id:
892c5ff6552775070e2c1453f65846590fb12735
Mikhail Zolotukhin [Wed, 25 Aug 2021 04:21:57 +0000 (21:21 -0700)]
[TensorExpr] LLVMCodegen: Use addFnAttr instead of addAttribute which was deleted. (#63886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63886
cc gmagogsfm
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision:
D30523135
Pulled By: ZolotukhinM
fbshipit-source-id:
62e125f917b2a0153eb30879d93cf956587a05e0
Jerry Zhang [Wed, 25 Aug 2021 04:05:14 +0000 (21:05 -0700)]
[qunat][graphmode][fx] Add a separate lower_to_native_backend function for relu (#62861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62861
This PR adds a lower_to_native_backend function to lower a quantized reference model
to a model that uses fbgemm/qnnpack ops. We'll gradually add support and remove
the fbgemm/qnnpack specific handling in quantization_patterns.py
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision:
D30165828
fbshipit-source-id:
de1149cd7e7c1840c17c251cd4d35004afd015b7
Natalia Gimelshein [Wed, 25 Aug 2021 02:37:54 +0000 (19:37 -0700)]
compute reduction intermediate buffer size in elements (#63885)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63869
`iter` strides are in bytes, and we are additionally multiplying size computed using those strides by `sizeof(arg_t)`. Computing `output_memory_size` in elements should be enough.
This doesn't fix the still real problem of allocating large intermediate tensor, but it makes this tensor smaller by typically a factor of 4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63885
Reviewed By: mruberry
Differential Revision:
D30526034
Pulled By: ngimel
fbshipit-source-id:
0aca7f887974b7776e380463bbd82d32a5786ee8
Thomas J. Fan [Wed, 25 Aug 2021 02:03:07 +0000 (19:03 -0700)]
TST Adds more modules into common module tests (#62999)
Summary:
This PR moves some modules into `common_modules` to see what it looks like.
While migrating some no batch modules into `common_modules`, I noticed that `desc` is not used for the name. This means we can not use `-k` to filter tests. This PR moves the sample generation into `_parametrize_test`, and passes in the already generated `module_input` into users of `modules(modules_db)`.
I can see this is a little different from opsinfo and would be happy to revert to the original implementation of `modules`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62999
Reviewed By: heitorschueroff
Differential Revision:
D30522737
Pulled By: jbschlosser
fbshipit-source-id:
7ed1aeb3753fc97a4ad6f1a3c789727c78e1bc73
Joel Schlosser [Wed, 25 Aug 2021 02:00:33 +0000 (19:00 -0700)]
Allow arbitrary objects in state_dicts (#62976)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62094
Introduces functionality for adding arbitrary objects to module state_dicts. To take advantage of this, the following functions can be defined on a module:
* `get_extra_state(self) -> dict` - Returns a dict defining any extra state this module wants to save
* `set_extra_state(self, state)` - Subsumes the given state within the module
In the details, a sub-dictionary is stored in the state_dict under the key `_extra_state` for each module that requires extra state.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62976
Reviewed By: heitorschueroff
Differential Revision:
D30518657
Pulled By: jbschlosser
fbshipit-source-id:
5fb35ab8e3d36f35e3e96dcd4498f8c917d1f386
Thomas J. Fan [Wed, 25 Aug 2021 01:55:23 +0000 (18:55 -0700)]
TST Adds pickle testing for ModuleInfo (#63736)
Summary:
Follow up to https://github.com/pytorch/pytorch/pull/61935
This PR adds `test_pickle` to `test_modules`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63736
Reviewed By: heitorschueroff
Differential Revision:
D30522462
Pulled By: jbschlosser
fbshipit-source-id:
a03b66ea0d81c6d0845c4fddf0ddc3714bbf0ab1
Bert Maher [Wed, 25 Aug 2021 01:52:29 +0000 (18:52 -0700)]
Re-apply: [nnc] Support thread level parallelism in fused kernels (#63776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63776
I reverted this out of an abundance of caution because some test
failures occurred, but they were all due to precision issues fixed lower in
this stack. Let's try again.
I've rolled the elimination of the allow-parallelism-in-fusions toggle into
this diff since they're pretty tightly coupled.
ghstack-source-id:
136529847
Test Plan: CI
Reviewed By: huiguoo
Differential Revision:
D30484555
fbshipit-source-id:
38fd33520f710585d1130c365a8c60c9ce794a59
Bert Maher [Wed, 25 Aug 2021 01:52:29 +0000 (18:52 -0700)]
Don't switch executors mid test (#63830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63830
It's really not safe to change the executor out from under models that may have
already been partially compiled.
ghstack-source-id:
136526228
Test Plan:
```
DEBUG=1 CFLAGS="-fsanitize=address" CXXFLAGS="-fsanitize=address" USE_LLVM=$(realpath ../llvm-project/install) CMAKE_PREFIX_PATH=$CONDA_PREFIX python setup.py install
LD_PRELOAD=/lib64/libasan.so.5 numactl -C3 pytest -v --cov --cov-report xml:test/coverage.xml --cov-append onnx/test_pytorch_onnx_onnxruntime.py::TestONNXRuntime_opset11 -s
```
Reviewed By: desertfire
Differential Revision:
D30504489
fbshipit-source-id:
188581cb53f0cf5bd3442d1e9d46e8c0c7e124f8
Bert Maher [Wed, 25 Aug 2021 01:52:29 +0000 (18:52 -0700)]
[nnc] Disable erf and erfc (#63775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63775
These introduce small accuracy differences that cause some internal
tests to fail, and it's not worth fixing the tests right now because they're
slower than the ATen ops anyways.
ghstack-source-id:
136526229
Test Plan:
```
buck test mode/dev //aml/eccv/mcm/training:tests -- --exact 'aml/eccv/mcm/training:tests - test_build_torch_script_model (aml.eccv.mcm.training.tests.publish_helper_tests.TransformerPredictorPublishHelperTests)'
```
Reviewed By: navahgar
Differential Revision:
D30484557
fbshipit-source-id:
095a9c810539a499105b76e1d96843dbc61b0079
Peter Bell [Wed, 25 Aug 2021 01:48:25 +0000 (18:48 -0700)]
Migrate THCTensor_copyIgnoringOverlaps to ATen (#63505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63505
This isn't a public operator, just a helper function used in CUDA_tensor_apply.
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision:
D30441305
Pulled By: ngimel
fbshipit-source-id:
84fabc701cbd8479e02d80f373a3dd62d70df2ce
Jerry Zhang [Wed, 25 Aug 2021 01:20:43 +0000 (18:20 -0700)]
[quant][graphmode][fx] Add reference option support for binary ops (#62698)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62698
We also removed the special handling in match_utils for binary ops
Test Plan:
python test/test_quantize.py TestQuantizeFx
python test/test_quantize.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision:
D30093781
fbshipit-source-id:
58cc972de8211a80dd4d111e25dc4ad36057933f
Hao Lu [Wed, 25 Aug 2021 00:06:18 +0000 (17:06 -0700)]
[StaticRuntime] Fix bug in HasInplaceOp (#63842)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63842
Reviewed By: mikeiovine
Differential Revision:
D30506914
fbshipit-source-id:
b2e358cfb991dacdb295b61bbc37beb36b73b852
Harut Movsisyan [Tue, 24 Aug 2021 23:20:13 +0000 (16:20 -0700)]
Microbenchmarking matrix mult (einsum, torch.mult, torch.mm) (#63654)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63654
Test Plan:
```
> buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:matrix_mult_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: einsum_bmm
# Mode: Eager
# Name: einsum_bmm_B4_M5_N3_K2_cpu
# Input: B: 4, M: 5, N: 3, K: 2, device: cpu
Forward Execution Time (us) : 27.970
# Benchmarking PyTorch: einsum_bmm
# Mode: Eager
# Name: einsum_bmm_B32_M25_N20_K30_cpu
# Input: B: 32, M: 25, N: 20, K: 30, device: cpu
Forward Execution Time (us) : 41.830
# Benchmarking PyTorch: einsum_bmm
# Mode: Eager
# Name: einsum_bmm_B128_M100_N120_K110_cpu
# Input: B: 128, M: 100, N: 120, K: 110, device: cpu
Forward Execution Time (us) : 499.114
# Benchmarking PyTorch: bmm
# Mode: Eager
# Name: bmm_B4_M5_N3_K2_cpu
# Input: B: 4, M: 5, N: 3, K: 2, device: cpu
Forward Execution Time (us) : 6.268
# Benchmarking PyTorch: bmm
# Mode: Eager
# Name: bmm_B32_M25_N20_K30_cpu
# Input: B: 32, M: 25, N: 20, K: 30, device: cpu
Forward Execution Time (us) : 12.676
# Benchmarking PyTorch: bmm
# Mode: Eager
# Name: bmm_B128_M100_N120_K110_cpu
# Input: B: 128, M: 100, N: 120, K: 110, device: cpu
Forward Execution Time (us) : 438.219
# Benchmarking PyTorch: einsum_elementwise
# Mode: Eager
# Name: einsum_elementwise_B4_M5_N3_cpu
# Input: B: 4, M: 5, N: 3, device: cpu
Forward Execution Time (us) : 7.657
# Benchmarking PyTorch: einsum_elementwise
# Mode: Eager
# Name: einsum_elementwise_B32_M25_N20_cpu
# Input: B: 32, M: 25, N: 20, device: cpu
Forward Execution Time (us) : 18.523
# Benchmarking PyTorch: einsum_elementwise
# Mode: Eager
# Name: einsum_elementwise_B100_M90_N110_cpu
# Input: B: 100, M: 90, N: 110, device: cpu
Forward Execution Time (us) : 55.103
# Benchmarking PyTorch: mul
# Mode: Eager
# Name: mul_B4_M5_N3_cpu
# Input: B: 4, M: 5, N: 3, device: cpu
Forward Execution Time (us) : 2.501
# Benchmarking PyTorch: mul
# Mode: Eager
# Name: mul_B32_M25_N20_cpu
# Input: B: 32, M: 25, N: 20, device: cpu
Forward Execution Time (us) : 10.589
# Benchmarking PyTorch: mul
# Mode: Eager
# Name: mul_B100_M90_N110_cpu
# Input: B: 100, M: 90, N: 110, device: cpu
Forward Execution Time (us) : 50.102
Reviewed By: ajyu
Differential Revision:
D30455179
fbshipit-source-id:
9f2d92b2d2b860f41a8e59be2cc086d75b587f7b
Xiaodong Wang [Tue, 24 Aug 2021 22:45:59 +0000 (15:45 -0700)]
Turn off layer norm in jit symbolic differentiation (#63816)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63816
Test Plan:
Confirmed this can rescue the NE:
https://www.internalfb.com/mast/job/torchx_xdwang-SparseNNApplication_72cf593d
Reviewed By: ngimel
Differential Revision:
D30498746
fbshipit-source-id:
4a387f32ee2f70685de6104459c7f21bfbddc187
Alban Desmaison [Tue, 24 Aug 2021 22:32:42 +0000 (15:32 -0700)]
Add a common autograd TLS state (#63860)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63860
Test Plan: Imported from OSS
Reviewed By: heitorschueroff
Differential Revision:
D30513253
Pulled By: albanD
fbshipit-source-id:
97d76ed54dfbdf4ba3fc7051ce3b9bb636cefb4b
Eli Uriegas [Tue, 24 Aug 2021 21:13:04 +0000 (14:13 -0700)]
.github: Enable with-ssh for Windows (#63440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63440
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: janeyx99
Differential Revision:
D30521460
Pulled By: seemethere
fbshipit-source-id:
e987e170e73fb4f9d9f024bed0e58404ed206848
James Reed [Tue, 24 Aug 2021 20:44:52 +0000 (13:44 -0700)]
[FX] Fix _replicate_for_data_parallel (#63821)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63821
Test Plan: Imported from OSS
Reviewed By: suo
Differential Revision:
D30502115
Pulled By: jamesr66a
fbshipit-source-id:
0f004f95def6e1ba21ccbeab40cb0a739a0ad20c
soulitzer [Tue, 24 Aug 2021 20:02:27 +0000 (13:02 -0700)]
Do not modify saved variables in-place for spectral norm during power iteration (#62293)
Summary:
Interestingly enough, the original code did have a mechanism that aims to prevent this very issue:
but it performs a clone AFTER modifying u and v in-place.
This wouldn't work though because we can later use the cloned u and v in operations that save for backward, and the next time we execute forward, we modify the same cloned u and v in-place.
So if the idea is that we want to avoid modifying saved variable in-place we should clone it BEFORE the in-place operation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62293
Reviewed By: bdhirsh
Differential Revision:
D30489750
Pulled By: soulitzer
fbshipit-source-id:
cbe8dea885aef97adda8481f7a822e5bd91f7889
Peter Bell [Tue, 24 Aug 2021 19:43:27 +0000 (12:43 -0700)]
Migrate legacy lstsq from THC to ATen (CUDA) (#63504)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63504
Closes gh-24592
Test Plan: Imported from OSS
Reviewed By: mruberry
Differential Revision:
D30441304
Pulled By: ngimel
fbshipit-source-id:
ec176596f54bc084af48a73d1dbb0dcb82fec593
Edward Yang [Tue, 24 Aug 2021 19:19:16 +0000 (12:19 -0700)]
Revert
D30513613: Removing tensor.data usage in utils with tensor set_ method
Test Plan: revert-hammer
Differential Revision:
D30513613 (https://github.com/pytorch/pytorch/commit/
d08a36f831cbcb4516fc1b68e3e3deff8ab45aba)
Original commit changeset:
402efb9c30fa
fbshipit-source-id:
911c66a9852de77dc5274b5fb373258c0c97739a
Bo Wang [Tue, 24 Aug 2021 18:45:54 +0000 (11:45 -0700)]
Merge common fields from TensorInitParams and ShardedTensorMetadata into TensorProperties (#63731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63731
1) Follow up [PR/63378 last comment](https://github.com/pytorch/pytorch/pull/63378#discussion_r693143053)
2) Also updated the caller side (usage of ShardedTensorMetadta) in fbcode
Ref: [landing workflow 3](https://www.internalfb.com/intern/wiki/PyTorch/PyTorchDev/Workflow/Landing/#landing-your-prs-from-gi-1)
Test Plan:
Imported from OSS
OSS: (pytorch).. $ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v
FB: fbcode $ buck test mode/dev //aiplatform/modelstore/checkpointing/pyper/tests:checkpoint_utils_test
Reviewed By: wanchaol, heitorschueroff
Differential Revision:
D30472281
fbshipit-source-id:
727fb0e7f10eab4eb7a10476194e9008f2ac1fb5
Aayush Prakash [Tue, 24 Aug 2021 18:19:34 +0000 (11:19 -0700)]
Removing tensor.data usage in utils with tensor set_ method (#63867)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63867
When updating the model parameter, updating `parameter.data` is no longer recommended, because this `data` field will be deprecated in the future.
The replacement is `tensor.set_`.
ghstack-source-id:
136531233
Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager
Reviewed By: SciPioneer
Differential Revision:
D30513613
fbshipit-source-id:
402efb9c30fafc3f285bebc631639f656ceae585
Yi Zhang [Tue, 24 Aug 2021 17:50:57 +0000 (10:50 -0700)]
update readme and contributing.md (#63843)
Summary:
1. In fact, Visual Studio isn't supported as CMAKE generator
2. I was asked many times why there's error as 'Could NOT find OpenMP'
3. Add Newly added Best Practices link in contributing.md
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63843
Reviewed By: seemethere, heitorschueroff
Differential Revision:
D30514095
Pulled By: janeyx99
fbshipit-source-id:
76715a1d8c049122546e5a7778cafe54e4dfd5d6
peterjc123 [Tue, 24 Aug 2021 17:44:45 +0000 (10:44 -0700)]
Subprocess encoding fixes for cpp extension (#63756)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63584
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63756
Reviewed By: bdhirsh
Differential Revision:
D30485046
Pulled By: ezyang
fbshipit-source-id:
4f0ac383da4e8843e2a602dceae85f389d7434ee
mingfeima [Tue, 24 Aug 2021 17:30:18 +0000 (10:30 -0700)]
add bf16 support for bucketize (#55588)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55588
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision:
D28836796
Pulled By: VitalyFedyunin
fbshipit-source-id:
c9ae5b969c30a45473533be5f29bb497f8da5143
Karen Zhou [Tue, 24 Aug 2021 17:17:28 +0000 (10:17 -0700)]
[pruner] modify base pruner to prune bias by default (#63202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63202
By default, the prune will also prune biases, such that the whole output channel is removed. The user can manually set `also_prune_bias` to False when calling `prepare` if they don't want the bias to be pruned.
ghstack-source-id:
136466671
Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`
https://pxl.cl/1MV32
modify `fusion_tests` according to API change
`buck test mode/opt //scripts/kazhou:fusion_tests`
https://pxl.cl/1NbKz
Reviewed By: z-a-f
Differential Revision:
D30294494
fbshipit-source-id:
c84655648bee0035559195ca855b98fb7edaa134
Karen Zhou [Tue, 24 Aug 2021 17:17:28 +0000 (10:17 -0700)]
[pruner] amend base pruner API to match base sparsifier (#63178)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63178
Update base pruner API to match base sparsifier API as defined in
D28970960 / PR58955
Changes include:
- `enable_mask_update = True` in `__init__`
- `prepare` takes model and config instead of constructor
- convert functionality renamed to `squash_mask`, `convert` method call now raises Error
- `activation_handles` ad `bias_handles` initialized in `_prepare` instead of constructor
ghstack-source-id:
136467595
Test Plan:
Function names updates according to changes
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`
https://pxl.cl/1MTgH
TODO will need to modify `fbcode/scripts/kazhou/fusion_tests.py` to use new API
Reviewed By: z-a-f
Differential Revision:
D30287179
fbshipit-source-id:
d4727bea1873b500f2d4bb784db26d532bf26cce
Karen Zhou [Tue, 24 Aug 2021 17:17:28 +0000 (10:17 -0700)]
[pruner] refactor `ActivationReconstruction` forward hooks (#63158)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63158
Combined functionality for `ActivationReconstruction` for both Linear and Conv2d in one class. The only difference between the old classes was the size and indexing of the reconstructed tensor -- that logic can be generalized by iterating over the size of `output`.
ghstack-source-id:
136467465
Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`
https://pxl.cl/1MSSv
Reviewed By: raghuramank100
Differential Revision:
D30282765
fbshipit-source-id:
08a1e4e0650511019fff85cf52b41dd818b0c7f8
Mike Iovine [Tue, 24 Aug 2021 16:38:25 +0000 (09:38 -0700)]
[Static Runtime] Implement prim::VarStack out variant (#63579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63579
Provide a static runtime out variant implementation for the new op introduced in
D30426232 (https://github.com/pytorch/pytorch/commit/
1385f9fb12e6607c98d2d9d5edaaaab2bc07386f).
Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_VarStack`
Reviewed By: navahgar
Differential Revision:
D30410525
fbshipit-source-id:
bc59a3d8ad23e3d94561ec2dca9cc20687dbadf8
Xiang Gao [Tue, 24 Aug 2021 16:24:50 +0000 (09:24 -0700)]
[Reland] Embedding thrust->cub migration (#63806)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63427
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63806
Reviewed By: bdhirsh
Differential Revision:
D30498255
Pulled By: ngimel
fbshipit-source-id:
78b7085a92a168cf0163f53dcb712bac922f5235
mingfeima [Tue, 24 Aug 2021 15:54:36 +0000 (08:54 -0700)]
optimize BFloat16 elemwise operators CPU: sigmoid, sigmoid_backward, tanh_backward, addcmul, addcdiv (#55221)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55221
Test Plan: Imported from OSS
Reviewed By: bdhirsh
Differential Revision:
D28836797
Pulled By: VitalyFedyunin
fbshipit-source-id:
6b79098c902ffe65d228668118ef36fb49bab800
yanbing-j [Tue, 24 Aug 2021 15:32:33 +0000 (08:32 -0700)]
Enable BFloat16 LeakyReLU and RReLU in CPU path (#61514)
Summary:
Enable and optimize BFloat16 LeakyReLU and RReLU in CPU path.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61514
Reviewed By: ejguan
Differential Revision:
D30257612
Pulled By: VitalyFedyunin
fbshipit-source-id:
8cc0d1faacd02dcc9827af724a86d95b6952748f
Thomas J. Fan [Tue, 24 Aug 2021 15:26:21 +0000 (08:26 -0700)]
ENH Adds no_batch_dim for NLLLoss (#62651)
Summary:
Towards https://github.com/pytorch/pytorch/issues/60585
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62651
Reviewed By: VitalyFedyunin
Differential Revision:
D30303340
Pulled By: jbschlosser
fbshipit-source-id:
7ab478cf63bf6cd1f850cad5fd101e74a2cfe3f5
mingfeima [Tue, 24 Aug 2021 15:22:47 +0000 (08:22 -0700)]
fix batchnorm2d issue when input is non contiguous (#63392)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63392
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision:
D30476317
Pulled By: VitalyFedyunin
fbshipit-source-id:
03055a0aec21cf2c029b6f32315da2b09cb722d0
Mike Iovine [Tue, 24 Aug 2021 15:19:38 +0000 (08:19 -0700)]
[JIT] Add variadic stack op (#63578)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63578
Added a new op `prim::VarStack` and a pass that transforms instances of `aten::stack(list, dim)` into `prim::VarStack(list[0], ..., list[n], dim)`. Also provided a JIT interpreter implementation.
Most of the implementation/tests are the same as `prim::VarConcat`.
Test Plan: `buck test caffe2/test/cpp/jit:jit -- TestStackOpt`
Reviewed By: navahgar
Differential Revision:
D30426232
fbshipit-source-id:
9829a7db6e0a5038c9b7528c43c25b0c221aa2ce
Rong Rong (AI Infra) [Tue, 24 Aug 2021 15:01:36 +0000 (08:01 -0700)]
[BE] add distributed run_test options (#63147)
Summary:
Currently distributed tests are mixed within test_python.
We would like to split the distributed tests into its own batch thus we need to split them out.
Adding an option to include/exclude distributed tests with CUSTOM_HANDLERS.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63147
Test Plan:
- locally run with the addition run_test.py options.
- CI
Dependency: found a bug in mpiexec test and need https://github.com/pytorch/pytorch/issues/63580 to fix it first.
Reviewed By: bdhirsh
Differential Revision:
D30496178
Pulled By: walterddr
fbshipit-source-id:
7903a57b619f2425028028f944211938823918a6
Alban Desmaison [Tue, 24 Aug 2021 14:20:56 +0000 (07:20 -0700)]
Revert
D30388099: Add a common autograd TLS state
Test Plan: revert-hammer
Differential Revision:
D30388099 (https://github.com/pytorch/pytorch/commit/
83d9bad44a1e1e6202103cd22e4dbd2bd3d7dae0)
Original commit changeset:
8e03f940150f
fbshipit-source-id:
f6d60fec66e8292f5268335bb8a3e7e1a662f23b
Thomas J. Fan [Tue, 24 Aug 2021 13:58:05 +0000 (06:58 -0700)]
ENH Adds no_batch_dim tests/docs for LPPool1d and Identity (#62190)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60585
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62190
Reviewed By: ejguan
Differential Revision:
D29942385
Pulled By: jbschlosser
fbshipit-source-id:
00df6f6f01ad039631bb8679f8de94863aac7650
albanD [Tue, 24 Aug 2021 13:52:38 +0000 (06:52 -0700)]
Add a common autograd TLS state (#63114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63114
This PR collapses the GradMode and InferenceMode thread local booleans into a single thread local uint8.
This helps reducing the number of thread local variable accesses done when we propagate ThreadLocalStates.
Note that this is even more beneficial as we will add a forward mode AD TLS (similar to GradMode) higher in this stack and this new structure should reduce the perf impact of adding this new TLS.
Here is the full benchmark result between master and the top of this stack: https://gist.github.com/albanD/
e421101e9ed344e94999bef3a54bf0f3
tl;dr: give a benefit in most cases. It is never detrimental.
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision:
D30388099
Pulled By: albanD
fbshipit-source-id:
8e03f940150ff063c2edd792733663413ae2f486
Marjan Fariborz [Tue, 24 Aug 2021 08:43:33 +0000 (01:43 -0700)]
Separating quantization test from distributed_test (#63058)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63058
Dedicating separate tests for different quantization methods. Currently supporting FP16 method.
ghstack-source-id:
136499767
Test Plan: uck test mode/dev //caffe2/test/distributed/algorithms/quantization:quantization_gloo_fork -- name_of_the_test
Reviewed By: wanchaol
Differential Revision:
D30142580
fbshipit-source-id:
3aacec1a231a662067d2b48c001f0c69fefcdd60
Mikhail Zolotukhin [Tue, 24 Aug 2021 07:29:22 +0000 (00:29 -0700)]
[TensorExpr] Nuke KernelArena and KernelScope. (#63587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587
Now that there is no classes using KernelArena for memory management we
can remove it.
Differential Revision:
D30429115
D30429115
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id:
375f6f9294d27790645eeb7cb5a8e87047a57544
Mikhail Zolotukhin [Tue, 24 Aug 2021 07:29:22 +0000 (00:29 -0700)]
[TensorExpr] Make 'Tensor' a value type. (#63586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586
This is another commit in transition from KernelArena memory management.
Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need
to dynamically allocate it at all - it's cheap to pass it by value, and
that's what we're switching to in this commit.
After this change nothing uses KernelScope/KernelArena and they can be
safely removed.
Differential Revision:
D30429114
D30429114
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id:
f90b859cfe863692b7beffbe9bd0e4143df1e819
Mikhail Zolotukhin [Tue, 24 Aug 2021 07:29:22 +0000 (00:29 -0700)]
[TensorExpr] Switch Exprs and Stmt from kernel-arena to shared_ptr. (#63216)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63216
Currently there are three classes managed by KernelArena: Expr, Stmt,
and Tensor (and derived classes). KernelArena has been a long standing
painpoint for NNC devs and we're moving away from that memory management
model to ref-count based memory model (using shared_ptr). This commit
switches Expr and Stmt to shared_ptr and is the biggest change in this
transition. Later commits will detach Tensor from KernelArena and kill
the arena + scope altogether.
Differential Revision:
D30353195
D30353195
Test Plan: Imported from OSS
Reviewed By: navahgar
Pulled By: ZolotukhinM
fbshipit-source-id:
9575225ada3d0fb65087ae40435f3dfea4792cae
Mikhail Zolotukhin [Tue, 24 Aug 2021 07:29:22 +0000 (00:29 -0700)]
[TensorExpr] More NFC changes like Expr* -> ExprPtr. (#63778)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63778
This is a preparation for a switch from raw pointers to shared pointers
as a memory model for TE expressions and statements.
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision:
D30487425
Pulled By: ZolotukhinM
fbshipit-source-id:
9cbe817b7d4e5fc2f150b29bb9b3bf578868f20c
mingfeima [Tue, 24 Aug 2021 05:53:35 +0000 (22:53 -0700)]
add channels last for GroupNorm (#49821)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49821
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision:
D26007053
Pulled By: VitalyFedyunin
fbshipit-source-id:
34a48d5d3b66a159febf3c3d96748fbaba1b9e31
Jane Xu [Tue, 24 Aug 2021 01:44:46 +0000 (18:44 -0700)]
Add ROCm as a platform for which tests can be disabled (#63813)
Summary:
Realized we were missing ROCm as a platform on which one could disable a flaky test. (like how this issue specifies windows https://github.com/pytorch/pytorch/issues/61655)
cc jeffdaily sunway513 jithunnair-amd ROCmSupport
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63813
Reviewed By: seemethere
Differential Revision:
D30498478
Pulled By: janeyx99
fbshipit-source-id:
f1abe8677e1ddd01de3291e1618272ad8e287dc4
Mike Iovine [Tue, 24 Aug 2021 01:43:17 +0000 (18:43 -0700)]
[Static Runtime] SR clones graph input (#63704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63704
Previously SR did not clone the graph. This was leading to subtle bugs in `testStaticRuntime`; static runtime would modify its graph, and the graph used by the JIT interpreter would change as well. The JIT interpreter would then crash if SR-only ops were added!
Cloning the graph is more consistent with the behavior of the `Module` ctor.
Test Plan: `buck test caffe2/benchmarks/static_runtime/...`
Reviewed By: hlu1
Differential Revision:
D30463294
fbshipit-source-id:
b771551a1f55f95fde79373b23babcf3e5ddf726