Nicolas Hug [Thu, 2 Sep 2021 10:45:06 +0000 (03:45 -0700)]
Update hub.load() signature to avoid polluting kwargs param (#63755)
Summary:
This PR addresses an old comment about Python2 EOL, directly putting some parameters in the function signature instead of in a `**kargs` dict.
I believe the changes are fully backward compatible.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63755
Reviewed By: zou3519
Differential Revision:
D30695634
Pulled By: NicolasHug
fbshipit-source-id:
398f347c5a04bfb58e77e46773a869cb9d0eb225
Kefei Lu [Thu, 2 Sep 2021 08:17:56 +0000 (01:17 -0700)]
Fix TRTModule not adding outputs in order (#64418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64418
In T99368564, we found that when running TRT lowered module, the output tensors are out-of-order, as compared to the output from the original, non-lowered module. It turns out that in `TRTModule.forward()`, we cannot rely on `ICudaEngine` bindings natural order indices to create the output tensors, but rather, we should explicitly construct the output tensor from the bindings' names, in an ordered that we supply.
Test Plan:
* Arc lint
* Run CI/sandcastle tests
* Run GPU lowering using commands and code changes in
D30171741 and ensure we don't observe out-of-order outputs
Reviewed By: yinghai
Differential Revision:
D30693545
fbshipit-source-id:
32a894ceeb148fcf4e8d279be3835c7d1f1aa2ba
Kushashwa Ravi Shrimali [Thu, 2 Sep 2021 08:08:53 +0000 (01:08 -0700)]
Port `gather` to structured kernel (#63312)
Summary:
Will add a description once this is ready for review.
cc: ysiraichi ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63312
Reviewed By: iramazanli
Differential Revision:
D30597447
Pulled By: ezyang
fbshipit-source-id:
d36e59835c2f4b38e286032dd2a1111a7e16b7e5
Pavel Belevich [Thu, 2 Sep 2021 07:57:39 +0000 (00:57 -0700)]
Replace std::unordered_map<c10::Device, c10::Device> with DeviceMap (#64393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64393
cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23
Test Plan: Imported from OSS
Reviewed By: rohan-varma
Differential Revision:
D30708384
Pulled By: pbelevich
fbshipit-source-id:
1c565727e4f09cd9e560874dd90aa403470b4a97
Chen Lai [Thu, 2 Sep 2021 07:50:40 +0000 (00:50 -0700)]
[PyTorch Edge] Support default args with out arg, flag off (#63540)
Summary:
1. Allow consuming operators with defaults arguments and out arguments. Flag is off to keep the same behavior as v6, in pr 63651, turn on the flag.
2. Add two unittests to cover this type of operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63540
ghstack-source-id:
137211562
Test Plan:
```
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsWithOutArg
caffe2/test/cpp/jit:jit - LiteInterpreterTest.DefaultArgsPinvWithOutArg
```
Reviewed By: raziel, iseeyuan, tugsbayasgalan
Differential Revision:
D30414156
fbshipit-source-id:
0f3a219a22aee10ac53184cbd95940726c459d1f
Edward Yang [Thu, 2 Sep 2021 07:48:03 +0000 (00:48 -0700)]
Remove unnecessary resize_output (#64272)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64272
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: H-Huang, bdhirsh
Differential Revision:
D30686941
Pulled By: ezyang
fbshipit-source-id:
de60e6f1115648f8cf7daaa1e652594fe8b06742
Shirong Wu [Thu, 2 Sep 2021 05:09:42 +0000 (22:09 -0700)]
Move graph util to fx2trt (#64064)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64064
Move original util in torch2trt to fx2trt dir since torch2trt is gonne be deprecated. This is a follow up diff for
D30379124
Test Plan: manual
Reviewed By: yinghai, mikekgfb
Differential Revision:
D30591687
fbshipit-source-id:
ae0e59dfbc2d2e2aa4f3ccea7cff2291c7deb388
Edward Yang [Thu, 2 Sep 2021 04:48:36 +0000 (21:48 -0700)]
Add a warning about DataLoader num_workers > 0 "memory leak" (#64337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64337
See https://github.com/pytorch/pytorch/issues/13246
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: H-Huang
Differential Revision:
D30690320
Pulled By: ezyang
fbshipit-source-id:
2751aca05a94e63d25162599f458855988516fad
Rohan Varma [Thu, 2 Sep 2021 04:07:01 +0000 (21:07 -0700)]
[Dist CI] Move rest of distributed tests to their own CI job (#64253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64253
Follow up to
D30496178 (https://github.com/pytorch/pytorch/commit/
f4aff3a346a0525e37d6071f318f7a4c54d5e1fb) to move the rest of distributed tests to their own jobs for Linux GHA.
ghstack-source-id:
137233785
Test Plan: CI
Reviewed By: walterddr
Differential Revision:
D30662999
fbshipit-source-id:
f7cfbc0d1223aca52120f17f9da987d70fda8de6
Rohan Varma [Thu, 2 Sep 2021 01:12:02 +0000 (18:12 -0700)]
[DDP] Log num threads (#64072)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64072
Log gloo threads to DDP logging.
ghstack-source-id:
137119480
Test Plan: CI
Reviewed By: mrshenli
Differential Revision:
D30596083
fbshipit-source-id:
2b4f6e762cb5d850be6056bcc5922029a1af3c91
Zeina Migeed [Thu, 2 Sep 2021 01:04:19 +0000 (18:04 -0700)]
add documentation to shape inference algorithm (#64312)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64312
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision:
D30709254
Pulled By: migeed-z
fbshipit-source-id:
3297d26fe6727c5b9ca176625b1683d787f59659
Yi Wang [Thu, 2 Sep 2021 00:32:39 +0000 (17:32 -0700)]
[DDP Comm Hook] Add debugging communication hooks to ddp_comm_hooks.rst (#64352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64352
as title
ghstack-source-id:
137246253
Test Plan: N/A
Reviewed By: rohan-varma
Differential Revision:
D30694089
fbshipit-source-id:
a78110b11d59bb0718f43c99ede23f2fd8ab21d0
Yi Wang [Thu, 2 Sep 2021 00:32:39 +0000 (17:32 -0700)]
[DDP Comm Hook] Create a noop hook for performance debugging (#64344)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64344
As title.
Additionally, avoid using numpy array in test_ddp_hooks.py.
ghstack-source-id:
137170449
Test Plan: buck test mode/dev-nosan caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks -- test_ddp_comm_hook_noop_hook
Reviewed By: rohan-varma
Differential Revision:
D30693220
fbshipit-source-id:
e17f0d1c6198863cf20a53566f586a6bff602522
Rohan Varma [Thu, 2 Sep 2021 00:04:37 +0000 (17:04 -0700)]
[DDP] Add more logging iterations (#64071)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64071
Adding more logging iterations to get additional data.
ghstack-source-id:
137119476
Test Plan: CI
Reviewed By: mrshenli
Differential Revision:
D30579367
fbshipit-source-id:
57195266ada5e5926f0d8eaf4fb4e01dc98924d7
Rohan Varma [Wed, 1 Sep 2021 23:25:00 +0000 (16:25 -0700)]
Fix incorrect DDP test (#64074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64074
Previous PR https://github.com/pytorch/pytorch/pull/63831 did not actually test the error in https://github.com/pytorch/pytorch/issues/63812. Introduce a test
directly from the repro that simulates it.
ghstack-source-id:
137171460
Test Plan: CI
Reviewed By: SciPioneer
Differential Revision:
D30569719
fbshipit-source-id:
fd61250ef6d291c093607663d91d6d2cb5574eb7
Rohan Varma [Wed, 1 Sep 2021 23:21:31 +0000 (16:21 -0700)]
[c10d] Prefer use of torch_check (#63928)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63928
throw std::invalid_argument results in not getting stacktraces with
TORCH_SHOW_CPP_STACKTRACES=1, so instead prefer torch_check here.
ghstack-source-id:
137135328
Test Plan: CI
Reviewed By: mrshenli
Differential Revision:
D30533955
fbshipit-source-id:
33e5bf4f449e3043dec68da93f8022f6624d9675
anjali411 [Wed, 1 Sep 2021 23:11:38 +0000 (16:11 -0700)]
Add fast path for addmm when the inputs are conjugate (#59380)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59380
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision:
D28898374
Pulled By: anjali411
fbshipit-source-id:
eab0e64d37bb57c18b54cabb8e5c00666338ba04
Yi Wang [Wed, 1 Sep 2021 23:09:46 +0000 (16:09 -0700)]
[DDP Comm Hook] Add bf16 gradient compression to ddp_comm_hooks.rst (#64346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64346
as title
ghstack-source-id:
137170288
Test Plan: N/A
Reviewed By: rohan-varma
Differential Revision:
D30693513
fbshipit-source-id:
8c64b8404ff3b0322e1bbbd93f6ef051ea91307d
Jerry Zhang [Wed, 1 Sep 2021 22:48:54 +0000 (15:48 -0700)]
[quant][graphmode][fx] Add fbgemm backend_config_dict (#64288)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64288
This is just to setup the file structure and unblock experimentation.
The format for backend_config_dict will change in the future
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: zou3519
Differential Revision:
D30699457
fbshipit-source-id:
28211a4def05d34757850c045a36e311f54760fe
Santiago Castro [Wed, 1 Sep 2021 22:18:14 +0000 (15:18 -0700)]
Make datasets in `ConcatDataset` not need to be sized (#64114)
Summary:
`datasets` needs to be iterable, but also sized because the length is checked. But immediately after it's converted to a list. By changing the order of these 2 lines, it doesn't need to be sized anymore.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64114
Reviewed By: H-Huang
Differential Revision:
D30641480
Pulled By: ejguan
fbshipit-source-id:
7e16548c2123afa65b83845f9929271fa07fe1e8
Richard Zou [Wed, 1 Sep 2021 22:12:05 +0000 (15:12 -0700)]
Restore LayerNorm numerics test (#64385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64385
It was deleted in https://github.com/pytorch/pytorch/pull/63276.
The numerics test was meant to check LayerNorm behavior on large inputs,
but we deleted it without realizing that.
Test Plan: - wait for tests.
Reviewed By: ngimel
Differential Revision:
D30702950
Pulled By: zou3519
fbshipit-source-id:
a480e26c45ec38fb628938b70416cdb22d976a46
Jerry Zhang [Wed, 1 Sep 2021 21:56:14 +0000 (14:56 -0700)]
[quant][graphmode][api] Add backend_config_dict to prepare_fx api (#64135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64135
We want to start aligning the api with the design in https://github.com/pytorch/pytorch/wiki/Extending-PyTorch-Quantization-to-Custom-Backends
We plan to gradually move things from `prepare_custom_config_dict` and `convert_custom_config_dict`
to `backend_config_dict` and allow custom backend developer to define their own way of quantizing operators.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: zou3519
Differential Revision:
D30699456
fbshipit-source-id:
e3c068da8d3da2270f57719f7159cc71cafa8598
zhouzhuojie [Wed, 1 Sep 2021 21:53:25 +0000 (14:53 -0700)]
Silent rm error for sccache log file (#64388)
Summary:
Sample reporting from dr.ci
![image](https://user-images.githubusercontent.com/658840/
131724645-
75afa04f-7554-4674-8e7c-
cf139c84d994.png)
The `rm` command is not actually running into problems, just need to silent the console output.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64388
Reviewed By: walterddr, malfet, seemethere
Differential Revision:
D30704439
Pulled By: zhouzhuojie
fbshipit-source-id:
ecd35531decf05b75cef30d08d46635f81112f67
Yuchen Huang [Wed, 1 Sep 2021 21:48:00 +0000 (14:48 -0700)]
[xplat][metal] Add getters and setters for ivars in Conv2dOpContext (#57395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57395
As title
ghstack-source-id:
137223806
(Note: this ignores all push blocking failures!)
Test Plan:
### Lib Build
- `buck build caffe2:aten_metal_prepack`
### Integration Test
- `arc focus2 pp-ops -a ModelRunner`
- Click "Test Person/Hair Segmentation Model"
{
F612831435}
- Image Classification Demo
{
F614144868}
Reviewed By: xta0
Differential Revision:
D28132020
fbshipit-source-id:
73560263a9d14e9ecfa39c69deb158a2ed8cb179
Meghan Lele [Wed, 1 Sep 2021 21:24:54 +0000 (14:24 -0700)]
[structured] Preserve computed elements from meta func to impl (#61746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61746
**Summary**
This commit introduces a new feature for structured kernels that allows
kernels to declare quantities as "precomputed" in
`native_functions.yaml`, compute them once in the `meta` function and
reuse them again in the `impl`. The names and types of these quantities
are used to generate code for a struct containing them that the `meta`
function must return. In the case of a handful of surveyed kernels
(`all,`, `any`, `avg_pool2d`), these quantities that are used both in
the `meta` and `impl` have the same meaning as certain kernel arguments
and in fact supersede them. Accordingly, the correspondence between a
kernel argument and the precomputed elements that supersede it is also
captured in `native_functions.yaml`. This information is used to unpack
the struct returned by `meta` and pass its contents correctly to the
`impl` function.
The primary goal is to avoid recompute and enhance developer experience
(e.g. sometimes people can forget to compute these elements while
porting a kernel).
Test Plan: Imported from OSS
Reviewed By: tugsbayasgalan
Differential Revision:
D30407831
Pulled By: SplitInfinity
fbshipit-source-id:
00975525ea373721fe52d06f75cd4ac91f3dc556
Mike Iovine [Wed, 1 Sep 2021 21:19:21 +0000 (14:19 -0700)]
[Static Runtime] Make per-op latency readable by FAI-PEP (#64315)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64315
Add a new flag `generate_ai_pep_output` to `StaticRuntime::benchmark`. If set, produces per-op-kind average total latency in milliseconds in a JSON format recognized by [Facebook AI performance evaluation platform (FAI-PEP)](https://github.com/facebook/FAI-PEP).
This is useful for observing the impact of changes that make a big difference for a specific op, but do not affect the overall SR latency by more than a few percent.
Reviewed By: hlu1
Differential Revision:
D30679352
fbshipit-source-id:
c847fa6ea20774aaf1e7949b11db4421d1f70b7e
Salil Desai [Wed, 1 Sep 2021 21:08:02 +0000 (14:08 -0700)]
Update optimize_for_mobile to preserve node's debug information (#63106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63106
Propagate debug info to the re-written nodes in the graph.
Test Plan:
- Clone open source repo and build
- ``` python3 test/test_jit.py TestOptimizeForMobilePreserveDebugInfo ```
- Tests pass
Reviewed By: kimishpatel
Differential Revision:
D28654659
fbshipit-source-id:
2d7c87f2fb95a3be53246375f35639bbd97c237e
David Reiss [Wed, 1 Sep 2021 20:41:37 +0000 (13:41 -0700)]
Break up "@generated" string so Phabricator shows changes
Summary: Created from CodeHub with https://fburl.com/edit-in-codehub
Test Plan:
CI
Sandcastle run
Reviewed By: larryliu0820
Differential Revision:
D30701781
fbshipit-source-id:
3acab8b65a327c4ec7da90bc855ecf02f801c40a
Alban Desmaison [Wed, 1 Sep 2021 20:34:48 +0000 (13:34 -0700)]
Add forward AD support for custom Functions (#64061)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64061
Test Plan: Imported from OSS
Reviewed By: soulitzer
Differential Revision:
D30640868
Pulled By: albanD
fbshipit-source-id:
b0e6610430a879074d6d5306443772fc154b431f
Tanvir Zaman [Wed, 1 Sep 2021 20:31:45 +0000 (13:31 -0700)]
Fix bytes_written and bytes_read (#64244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64244
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64040
In operator cost inference functions, in many places we are using sizeof(x.data_type()). Since data_type() returns a 32 bit integer from [this enum](https://www.internalfb.com/code/fbsource/[
15e7ffe4073cf08c61077c7c24a4839504b964a2]/fbcode/caffe2/caffe2/proto/caffe2.proto?lines=20), we are basically always getting 4 for sizeof(x.data_type()) no matter what actual data type x has. Big thanks to Jack Langman for specifically pointing to this bug.
We would instead use the size in bytes based on actual data type.
Test Plan:
Added unit tests BatchMatMulMemCostTest:
buck test //caffe2/caffe2/fb/fbgemm:batch_matmul_op_test -- BatchMatMulMemCostTest
Extended existing unit test test_columnwise_concat for different data types:
buck test //caffe2/caffe2/python/operator_test:concat_op_cost_test -- test_columnwise_concat
Reviewed By: CrazySherman
Differential Revision:
D30656698
fbshipit-source-id:
d42c0c9a0c5b0ddc5dba39e4994f1f85a5e618bf
Scott Wolchok [Wed, 1 Sep 2021 20:24:11 +0000 (13:24 -0700)]
[Caffe2] Create fewer strings during argument fetching (#64285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64285
With C++14 heterogeneous ordered container lookup, it is no longer necessary to create a `std::string` in order to look up elements of a `CaffeMap` keyed by std::string. Accordingly, this diff reworks the argument-getting operator functions to avoid that in favor of `c10::string_view`.
ghstack-source-id:
137139818
ghstack-source-id:
137139818
Test Plan: buildsizebot iOS apps -- code size win. less strings is probably marginally good for perf but this only happens at setup time anyway.
Reviewed By: dzhulgakov
Differential Revision:
D26826676
fbshipit-source-id:
ee653b14dc2c528bae8c90f0fc6a7a419cbca1d6
Kimish Patel [Wed, 1 Sep 2021 19:38:39 +0000 (12:38 -0700)]
Back out "Revert
D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling." (#64307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64307
Original commit changeset:
0b2aa7c57d08
Restores original changes.
This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
chrome trace generation.
operator level memory profiling (to be added)
flop counts (to be added)
Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)
Furthermore removes some tests from test_lite_interpreter.cpp which were testing module hierarchy in debug info. They should be covered by test_mobile_profiler.cpp.
Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/
219598441154763
Model with debug info and --print_module_info true (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/
617154236292985
Reviewed By: raziel
Differential Revision:
D30680354
fbshipit-source-id:
b6ba0d59c510c13d13d9935b1d8051cc82ffa4e9
Rohan Varma [Wed, 1 Sep 2021 19:28:23 +0000 (12:28 -0700)]
Add a record scope around autograd::engine::evaluate_function (#63619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63619
Adds a RECORD_FUNCTION with the function that is being valuate as part
of backwards execution. This has been useful in picking up some operations
in the backwards pass that otherwise would not show up, for example custom cpp
functions that use custom C++ code.
ghstack-source-id:
137041723
Test Plan:
CI
benchmark:
buck run mode/opt //scripts/rvarm1/ddp:bench
Reviewed By: albanD
Differential Revision:
D30439492
fbshipit-source-id:
955917770cdf2a2edb0303223ace710b668ba388
Patrick Kan [Wed, 1 Sep 2021 19:20:50 +0000 (12:20 -0700)]
[Bootcamp] Include both python unittest and parser parameters in --help and -h flag (#64297)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45945
Creates a new thread to run -h or --help with unittest.main if the help flag is present, and keeps the add_help default for parameters.
Includes both python unittest and parser parameters in --help and -h flag and will remain up to date since both messages are displayed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64297
Test Plan:
Imported from GitHub
`python test/test_spectral_ops.py --help`
Output:
```
% python test/test_spectral_ops.py --help
usage: test_spectral_ops.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]]
positional arguments:
tests a list of any number of test modules, classes and test methods.
optional arguments:
-h, --help show this help message and exit
-v, --verbose Verbose output
-q, --quiet Quiet output
--locals Show local variables in tracebacks
-f, --failfast Stop on first fail or error
-c, --catch Catch Ctrl-C and display results so far
-b, --buffer Buffer stdout and stderr during tests
-k TESTNAMEPATTERNS Only run tests which match the given substring
Examples:
test_spectral_ops.py - run default set of tests
test_spectral_ops.py MyTestSuite - run suite 'MyTestSuite'
test_spectral_ops.py MyTestCase.testSomething - run MyTestCase.testSomething
test_spectral_ops.py MyTestCase - run all 'test*' test methods
in MyTestCase
usage: test_spectral_ops.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT]
[--test_bailouts] [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX]
[--run-parallel RUN_PARALLEL] [--import-slow-tests [IMPORT_SLOW_TESTS]]
[--import-disabled-tests [IMPORT_DISABLED_TESTS]]
optional arguments:
-h, --help show this help message and exit
--subprocess whether to run each test in a subprocess
--seed SEED
--accept
--jit_executor JIT_EXECUTOR
--repeat REPEAT
--test_bailouts
--save-xml [SAVE_XML]
--discover-tests
--log-suffix LOG_SUFFIX
--run-parallel RUN_PARALLEL
--import-slow-tests [IMPORT_SLOW_TESTS]
--import-disabled-tests [IMPORT_DISABLED_TESTS]
```
Also ran some other tests to make sure tests still worked, and other tests with --help or -h flag
Reviewed By: seemethere
Differential Revision:
D30677776
Pulled By: PatrickKan
fbshipit-source-id:
eb3d6e3fa677137ec703ec3a23808efb99acc896
Patrick Hu [Wed, 1 Sep 2021 17:49:39 +0000 (10:49 -0700)]
[FX] Fix python code generation for wrapped getattr() with default value (#64271)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64271
Closes #60417
Modified emit_node() in fx/graph.py to generate getattr() call with default value when len(node.args) != 2 instead of accessing the attribute.
Added test_torch_fx_getattr() in test/test_fx.py.
Test Plan:
pytest test/test_fx.py
Imported from OSS
Reviewed By: jamesr66a
Differential Revision:
D30671265
fbshipit-source-id:
f2db9ea47e0cb247547e200684f715aab006c374
Raghavan Raman [Wed, 1 Sep 2021 17:28:02 +0000 (10:28 -0700)]
[nnc] Updated generic error message with info about turning off the fuser (#64316)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64316
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision:
D30683942
Pulled By: navahgar
fbshipit-source-id:
d86607563672213f99a1436dcf4f5dc28053b713
Xiang Gao [Wed, 1 Sep 2021 17:17:52 +0000 (10:17 -0700)]
Fixes reduction launch config (#64304)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48573
See also https://github.com/pytorch/pytorch/pull/64194
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64304
Reviewed By: janeyx99
Differential Revision:
D30689600
Pulled By: ngimel
fbshipit-source-id:
bf2103ca177fd3b6e27bc0324b81925234483a29
Kushashwa Ravi Shrimali [Wed, 1 Sep 2021 15:48:25 +0000 (08:48 -0700)]
OpInfo for `nn.functional.layer_norm` (#63276)
Summary:
Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261.
Note:
* This PR also adds a reference test inspired by existing tests in `test_nn.py`.
cc: mruberry zou3519
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63276
Reviewed By: ejguan
Differential Revision:
D30452483
Pulled By: zou3519
fbshipit-source-id:
2578d01ca34e031668a41bd284db60c31ae1fba8
Nima Elyasi [Wed, 1 Sep 2021 15:47:44 +0000 (08:47 -0700)]
fix GradBucket.is_last() logic (#63768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63768
passed number of buckets to GradBucket constructor, to check if index is equal to num_buckets - 1 in the .is_last() function.
Test Plan:
buck test mode/dev-nosan //caffe2/test/distributed/algorithms/ddp_comm_hooks:test_ddp_hooks
test output: https://www.internalfb.com/intern/testinfra/testconsole/testrun/
8162774375985873/
Reviewed By: SciPioneer, mrshenli
Differential Revision:
D30455913
fbshipit-source-id:
8c67ca69cbf191d6e189e09248407eb167bb24b6
Richard Zou [Wed, 1 Sep 2021 14:16:55 +0000 (07:16 -0700)]
Revert
D29699456: [pytorch][PR] Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA]
Test Plan: revert-hammer
Differential Revision:
D29699456 (https://github.com/pytorch/pytorch/commit/
ad4848565e1d9f4d408c60614f213acb52035181)
Original commit changeset:
407ae53392ac
fbshipit-source-id:
b6c70ba8bb28c0c38de47857030b69792a8470de
James Reed [Wed, 1 Sep 2021 05:20:41 +0000 (22:20 -0700)]
[FX] Rename reduce functions back to their old, public names (#64324)
Summary:
Unfortunately pickle serializes the names of these functions. Also put them under backward-compatibility enforcement.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64324
Test Plan: Local repro https://fb.workplace.com/groups/
3440841732711443/permalink/
4018921611570116/
Reviewed By: SplitInfinity, TailofJune
Differential Revision:
D30684185
Pulled By: jamesr66a
fbshipit-source-id:
900701220155d15115cd0c07cf7774a2891bd04f
Yuchen Huang [Wed, 1 Sep 2021 05:00:11 +0000 (22:00 -0700)]
[Metal][GPU] Enable metal for simulators and fix test failures if possible (#64322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64322
As title
ghstack-source-id:
137143877
Test Plan:
- `aibench-cli mobile`
- Select iOS -> `y` -> `1` -> `n` -> "--metal_op_test"
- Select all iPhone 6 + iPhone 7 + iPhone 8 and a iPhone X or 11 or 12
```
Benchmark Submitted. Find more details at: https://our.intern.facebook.com/intern/aibench/details/
318120612514604
Benchmark Status:
D10 (https://github.com/pytorch/pytorch/commit/
b8256280ce45f02a7e105d3b3db4a547990e683d)AP-12.0.1: DONE
N71mAP-14.3: DONE
DUMMY latency:
D10 (https://github.com/pytorch/pytorch/commit/
b8256280ce45f02a7e105d3b3db4a547990e683d)AP-12.0.1: 4319.3
N71mAP-14.3: 8868.51
I0831 16:06:27.210558 605277 ClientSingletonManager.cpp:99] Shutting down Manifold ClientSingletonManager
```
Reviewed By: xta0
Differential Revision:
D30147163
fbshipit-source-id:
2de6bbd9bd525e32ca92b2845eb435800855edcc
Michael Carilli [Wed, 1 Sep 2021 04:43:25 +0000 (21:43 -0700)]
[CUDA graphs] hotfix for test_graph_ (#64339)
Summary:
Graphed workloads that try to capture a full backward pass must do warmup on a non-default stream. If warmup happens on the default stream, AccumulateGrad functions might tag themselves to run on the default stream, and therefore won't be capturable.
ngimel and I suspect some test_cuda.py tests run with the default stream as the ambient stream, which breaks `test_graph_grad_scaling` because `test_graph_grad_scaling` does warmup on the ambient stream _assuming_ the ambient stream is a non-default stream.
This PR explicitly sets a side stream for the warmup in `test_graph_grad_scaling`, which is what I should have done all along because it's what the new documentation recommends.
I pushed the PR branch straight to the main pytorch repo because we need to run ci-all on it, and I'm not sure what the requirements are these days.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64339
Reviewed By: mruberry
Differential Revision:
D30690711
Pulled By: ngimel
fbshipit-source-id:
91ad75f46a11f311e25bc468ea184e22acdcc25a
gmagogsfm [Wed, 1 Sep 2021 04:27:46 +0000 (21:27 -0700)]
Remove outdated warning about RecursiveScriptModule not being copiable (#64085)
Summary:
RecursiveScriptModule has its customized `__copy__` and `__deepcopy__` defined. The warning/error that says it is not copiable is outdated
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64085
Reviewed By: rohan-varma
Differential Revision:
D30598623
Pulled By: gmagogsfm
fbshipit-source-id:
0701d8617f42d818bc7b88244caee4cd47fbe976
Mikhail Zolotukhin [Wed, 1 Sep 2021 03:27:44 +0000 (20:27 -0700)]
[TensorExpr] Wrap error messages with buildErrorMessage call. (#64330)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64330
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision:
D30687226
Pulled By: ZolotukhinM
fbshipit-source-id:
ade1be2ad6847c6afbba60307ef854696821b4e3
Pritam Damania [Wed, 1 Sep 2021 03:19:55 +0000 (20:19 -0700)]
Fix bug in ShardedTensorMetadata serde. (#63902)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63902
The 'memory_format' field was not being serialized correctly and used
the same encoding for different fields.
ghstack-source-id:
137142406
Test Plan: waitforbuildbot
Reviewed By: bowangbj
Differential Revision:
D30527324
fbshipit-source-id:
f0f223e2d660ef6e4abae9649d9992acc36e1278
Pavel Belevich [Wed, 1 Sep 2021 03:14:08 +0000 (20:14 -0700)]
Delete some dead code from RRefMessageBase (#64298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64298
cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse agolynski SciPioneer H-Huang mrzzd cbalioglu gcramer23
Test Plan: Imported from OSS
Reviewed By: rohan-varma
Differential Revision:
D30676702
Pulled By: pbelevich
fbshipit-source-id:
77dbc0f8064c3518376454ff573d45ed0274956b
Matti Picus [Wed, 1 Sep 2021 01:54:44 +0000 (18:54 -0700)]
disallow empty named dims list to flatten(names, name) (#61953)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/61137 by raising an error if an empty tuple is passed in for the names:
```
>>> torch.empty((2, 3), names=['a', 'b']).flatten((), 'abc')
RuntimeError: flatten(tensor, dims, out_dim): dims cannot be empty
```
or from the original issue:
```
>>> torch.empty((2, 3)).flatten((), 'abc')
RuntimeError: flatten(tensor, dims, out_dim): dims cannot be empty
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61953
Reviewed By: iramazanli
Differential Revision:
D30574571
Pulled By: malfet
fbshipit-source-id:
e606e84458a8dd66e5da6d0eb1a260f37b4ce91b
Scott Wolchok [Wed, 1 Sep 2021 01:22:23 +0000 (18:22 -0700)]
[caffe2][easy] Save heap allocation in ConcatOp (#63529)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63529
Output() takes an IntArrayRef, so we can just use a std::initializer_list (stack-allocated array) instead of std::vector here.
ghstack-source-id:
137085908
Test Plan: existing CI
Reviewed By: mruberry
Differential Revision:
D29687400
fbshipit-source-id:
9f2a7c6679f2552c098bb1bf7befaca18e0e5d4d
Edward Yang [Wed, 1 Sep 2021 00:55:23 +0000 (17:55 -0700)]
Convert mul to use opmath_gpu_kernel_with_scalars (#64019)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64019
Note that previously the functor operated on scalar_t and
this modifies it to operate on opmath_t, but this is not
a problem as half precision was implemented by performing the
compute in float anyway.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision:
D30575282
Pulled By: ezyang
fbshipit-source-id:
cc6900ef996e755740afe48f9cb4d0366858dd47
soulitzer [Wed, 1 Sep 2021 00:51:55 +0000 (17:51 -0700)]
Use the correct overloaded name to skip boxed autograd not implemented kernel registration (#64182)
Summary:
Some internal use_count tests are failing for `dequantize_self` because we only compare the skip list with the base name `dequantize` when we should be comparing with the full name including the overload
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64182
Reviewed By: albanD
Differential Revision:
D30639909
Pulled By: soulitzer
fbshipit-source-id:
d4d22dd1a5c8f7180251ce7739830764cce6f151
Ray Peng [Wed, 1 Sep 2021 00:45:50 +0000 (17:45 -0700)]
[Static Runtime] Out version for softmax (#64243)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64243
Test Plan:
```
> buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1
...
V0830 16:35:22.524479 613839 impl.cpp:1410] Switch to out variant for node: %5 : Tensor = aten::softmax(%a.1, %dim.1, %dtype.1)
...
[ OK ] StaticRuntime.IndividualOps_Softmax (803 ms)
```
Reviewed By: hlu1
Differential Revision:
D30656149
fbshipit-source-id:
115b7b4a75448fd6a5c526808080ca9a4251302c
Eli Uriegas [Wed, 1 Sep 2021 00:38:42 +0000 (17:38 -0700)]
.circleci: Remove already migrated CUDA configs (#64231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64231
This migrates over the CUDA 11.1 and CUDA 10.2 configs that we had
previously migrated to GHA
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra
Test Plan: Imported from OSS
Reviewed By: zhouzhuojie
Differential Revision:
D30683811
Pulled By: seemethere
fbshipit-source-id:
71b0761461557d871c26eb02f665a2e4d9b1d9fb
Eli Uriegas [Wed, 1 Sep 2021 00:38:42 +0000 (17:38 -0700)]
.github: Consolidate linux setup / teardown (#64229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64229
Consolidates linux setup / teardown into easy to use jinja2 macros
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra
Test Plan: Imported from OSS
Reviewed By: zhouzhuojie, driazati
Differential Revision:
D30683810
Pulled By: seemethere
fbshipit-source-id:
2578630df3e212fb79392a699090553baef44cc2
Nikita Shulga [Wed, 1 Sep 2021 00:33:11 +0000 (17:33 -0700)]
Add ciflow-tracking issue to pytorch-probot (#64125)
Summary:
Doesn't do anything yet...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64125
Reviewed By: zhouzhuojie
Differential Revision:
D30620283
Pulled By: malfet
fbshipit-source-id:
91869d35c1b70a55e32261d2c32fb0136ec33960
Mikhail Zolotukhin [Wed, 1 Sep 2021 00:32:00 +0000 (17:32 -0700)]
[TensorExpr] Move declaration of buildErrorMessage to exception.h (#64301)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64301
Test Plan: Imported from OSS
Reviewed By: navahgar, huiguoo
Differential Revision:
D30678215
Pulled By: ZolotukhinM
fbshipit-source-id:
599c83b3890450a0fb6526815f037eec9563661c
Jay Leverett [Wed, 1 Sep 2021 00:28:42 +0000 (17:28 -0700)]
Fix redundant class definition in GraphModule singleton constructor (#64274)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64274
Reviewed By: jamesr66a
Differential Revision:
D30675970
Pulled By: jayleverett
fbshipit-source-id:
e74ef2a28013f0fa7c58d14f38e66cfe48d26b74
Nikita Shulga [Wed, 1 Sep 2021 00:19:11 +0000 (17:19 -0700)]
Discover new tests in run_tests.py (#64246)
Summary:
Introduce `discover_tests` function that globs for all Python files
starting with `test_` in test folder excluding subfolders which are
executed differently
Fixes https://github.com/pytorch/pytorch/issues/64178
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64246
Reviewed By: walterddr, seemethere
Differential Revision:
D30661652
Pulled By: malfet
fbshipit-source-id:
a52e78ec717b6846add267579dd8d9ae75326bf9
Richard Zou [Tue, 31 Aug 2021 21:53:01 +0000 (14:53 -0700)]
Revert
D30543236: Add python mode
Test Plan: revert-hammer
Differential Revision:
D30543236 (https://github.com/pytorch/pytorch/commit/
4bd03b02424d93b72f15e28c542ede13f88ea929)
Original commit changeset:
ef5444d96a5a
fbshipit-source-id:
b0042ac2c22765fa11d6d00bf751f6a4489eb6d8
Kevin Tse [Tue, 31 Aug 2021 20:55:59 +0000 (13:55 -0700)]
[DataPipe] export fork, mux, demux for public usage (#64279)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64279
cc VitalyFedyunin ejguan
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision:
D30671971
Pulled By: NivekT
fbshipit-source-id:
056ac12ef7183b254d1eec341145594639e47ef6
Kevin Tse [Tue, 31 Aug 2021 20:55:59 +0000 (13:55 -0700)]
[DataPipe] adding description, __len__, tests for mux() (#64224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64224
cc VitalyFedyunin ejguan
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision:
D30651551
Pulled By: NivekT
fbshipit-source-id:
f8af98ba71a592900b992a8077432062ec57bb48
zhouzhuojie [Tue, 31 Aug 2021 20:48:28 +0000 (13:48 -0700)]
Try the forked checkout action with retry (#64120)
Summary:
Fixes #{issue number}
The main difference is:
https://github.com/zhouzhuojie/checkout/commit/
ffc6f93ad4b6e3cdcdd1a34e8c896765002f9b34
Can test multiple times in this PR to see if it works, will make the `retry` number configurable if it's usable.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64120
Reviewed By: malfet
Differential Revision:
D30656099
Pulled By: zhouzhuojie
fbshipit-source-id:
a89932196bb0c44e412a34664ed6a061b02ef92e
Rishi Puri [Tue, 31 Aug 2021 20:47:29 +0000 (13:47 -0700)]
fix syntax error in bfloat16 PR (#64122)
Summary:
fixes prior syntax error from PR ngimel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64122
Reviewed By: H-Huang
Differential Revision:
D30643596
Pulled By: ngimel
fbshipit-source-id:
0a2d5a40fb6dc7339cd03112e57ef0e1bf8a000e
Michael Carilli [Tue, 31 Aug 2021 20:29:39 +0000 (13:29 -0700)]
[CUDA graphs] Prototype API and documentation (#63269)
Summary:
RFC: https://github.com/pytorch/pytorch/issues/61880
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63269
Reviewed By: mruberry
Differential Revision:
D30596643
Pulled By: ngimel
fbshipit-source-id:
b1f8061406364b667e2c2d4d30fbce1f0d8456be
Rohan Varma [Tue, 31 Aug 2021 19:51:20 +0000 (12:51 -0700)]
Remove ref to test_distributed_fork (#64197)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64197
Removes this line as test is gone.
ghstack-source-id:
136986275
Test Plan: CI
Reviewed By: walterddr
Differential Revision:
D30642929
fbshipit-source-id:
a0c7dfdfb35a4a7f7ec1b881dbea53d85136012c
Eli Uriegas [Tue, 31 Aug 2021 19:50:11 +0000 (12:50 -0700)]
.circleci: Remove migrated jobs, move docs builds (#64222)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64222
Removes both backwards_compat as well as docs_test from the general
gcc5.4 config and moves the docs build from being run on every PR to
only being run on master.
We can remove docs builds when we migrate the docs push job (including
all secrets associated with that)
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
cc ezyang seemethere malfet walterddr lg20987 pytorch/pytorch-dev-infra
Test Plan: Imported from OSS
Reviewed By: malfet
Differential Revision:
D30650953
Pulled By: seemethere
fbshipit-source-id:
ac11da6a551a6c81f3dc1d47fd81846cbfe9975a
Raghuraman Krishnamoorthi [Tue, 31 Aug 2021 19:22:13 +0000 (12:22 -0700)]
[ao][docs] Clarify operator support for quantization (#63270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63270
Add table to quantization main page showing supported modules
for static and dynamic quantization.
ghstack-source-id:
137087204
Test Plan: Imported from OSS
Reviewed By: HDCharles
Differential Revision:
D30658654
fbshipit-source-id:
a82c998e1db6370596d5b0ca4c7cc96c1c90f30e
Vasiliy Kuznetsov [Tue, 31 Aug 2021 19:09:59 +0000 (12:09 -0700)]
ns for fx: make layer types more readable (#64270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64270
Before this PR, layer types were populated by doing
`str(module_instance)` and `str(function)`. This resulted
in moderately readable strings for modules, and poorly readable
strings for functions.
This PR switches the logic to use `torch.typename` utility instead.
The results are significantly more readable.
Example function type:
```
# before
'<built-in method linear of PyCapsule object at 0x7fe9b20ce7b0>'
# after
'torch._ops.quantized.PyCapsule.linear'
```
Example module type:
```
# before
"<class 'torch.nn.quantized.modules.conv.Conv2d'>"
# after
'torch.nn.quantized.modules.conv.Conv2d'
```
Test Plan:
Manually inspect NS results for modules and functions, verify they are
more readable.
Manually inspect NS results for modules and functions, verify they are
more readable.
Imported from OSS
Differential Revision:
D30669545
D30669545
Reviewed By: jerryzh168
Pulled By: vkuzo
fbshipit-source-id:
60959e5cafa0a4992b083bf99f5d8260f9acdac0
Shiyan Deng [Tue, 31 Aug 2021 18:29:07 +0000 (11:29 -0700)]
[fx2trt] Add acc_ops.sign and converter for it (#63876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63876
Add `acc_ops.sign` which maps from `torch.sign`.
Add a plugin (not support dynamic shape currently) for `acc_ops.sign`. The plugin calls `at::sign` directly.
Test Plan: buck test mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=a100 caffe2/torch/fb/fx2trt:test_unary_ops
Reviewed By: yinghai
Differential Revision:
D30518081
fbshipit-source-id:
a0b9e6c30deac0b04b8cb09a162579e229985330
Saketh Are [Tue, 31 Aug 2021 17:59:57 +0000 (10:59 -0700)]
Use stacklevel for floordiv deprecation warnings (#64034)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/60548
`Tensor.__floordiv__` was indirectly deprecated by deprecation of `torch.floor_divide` (see https://github.com/pytorch/pytorch/issues/43874). Deprecating it directly provides clearer feedback.
Repro:
```
import torch
x = torch.tensor(0)
x // 1
```
Before this change, a deprecation warning was triggered within the C++ implementation of floor_divide:
```
UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:571.)
return torch.floor_divide(self, other)
```
After this change, the warning instead cites the user's offending line of Python code:
```
UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
x // 1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64034
Reviewed By: mruberry
Differential Revision:
D30658010
Pulled By: saketh-are
fbshipit-source-id:
b0e6c5008d741897509d102f4a89efb47de4aa2a
Raghuraman Krishnamoorthi [Tue, 31 Aug 2021 16:45:28 +0000 (09:45 -0700)]
[ao][docs] Add description of qconfig and qengine to quantization page (#63582)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63582
Current quantization docs do not define qconfig and qengine. Added text to define these concepts before they are used.
ghstack-source-id:
137051719
Test Plan: Imported from OSS
Reviewed By: HDCharles
Differential Revision:
D30658656
fbshipit-source-id:
a45a0fcdf685ca1c3f5c3506337246a430f8f506
Kushashwa Ravi Shrimali [Tue, 31 Aug 2021 16:45:09 +0000 (09:45 -0700)]
Add OpInfo for `nn.functional.cosine_similarity` (#62959)
Summary:
Please see https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261.
Notes:
* Some redundant tests from `test_nn.py` have been removed. I'm unsure about precision checks if they can be removed as well.
* Broadcasting is also checked in the OpInfo for `cosine_similarity`.
cc: mruberry zou3519 Chillee
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62959
Reviewed By: heitorschueroff
Differential Revision:
D30520176
Pulled By: zou3519
fbshipit-source-id:
14e902eb4bcce875edab28a1669a2ea021052b9b
Kevin Tse [Tue, 31 Aug 2021 15:07:23 +0000 (08:07 -0700)]
[DataPipe] implementing __len__ for fork (no valid length for demux) (#64215)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64215
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision:
D30648672
Pulled By: NivekT
fbshipit-source-id:
4780f2f6a79ae15a4009092475e7d92f96dd09a2
Kevin Tse [Tue, 31 Aug 2021 15:07:23 +0000 (08:07 -0700)]
[DataPipe] implementing demux() (#63650)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63650
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision:
D30493944
Pulled By: NivekT
fbshipit-source-id:
0aa06dee8c7fb1744975b8f6a0694b90c11ef80d
Kevin Tse [Tue, 31 Aug 2021 15:07:23 +0000 (08:07 -0700)]
[DataPipe] implementing fork() (#63649)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63649
Test Plan: Imported from OSS
Reviewed By: ejguan
Differential Revision:
D30493945
Pulled By: NivekT
fbshipit-source-id:
40db7d4134facd266d86bc0dc2edf2729c4e5842
Kimish Patel [Tue, 31 Aug 2021 14:36:53 +0000 (07:36 -0700)]
Revert
D30327514: [Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling.
Test Plan: revert-hammer
Differential Revision:
D30327514 (https://github.com/pytorch/pytorch/commit/
bc9277dca3a40d99147d4a1a3e0160a4a8e91f9f)
Original commit changeset:
3bb2f2daaaed
fbshipit-source-id:
0b2aa7c57d08de77c9aaa75e546a7d0938610f64
Harut Movsisyan [Tue, 31 Aug 2021 07:49:39 +0000 (00:49 -0700)]
[Static Runtime] Implement aten::nonzero out variant (#64126)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64126
Test Plan:
Confirm out variant is called:
```
> buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1
```
Reviewed By: mikeiovine
Differential Revision:
D30617729
fbshipit-source-id:
752749638c8f467815efa57021cb3de5c728ab1b
Facebook Community Bot [Tue, 31 Aug 2021 04:31:11 +0000 (21:31 -0700)]
Automated submodule update: FBGEMM (#64213)
Summary:
This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM).
New submodule commit: https://github.com/pytorch/FBGEMM/commit/
9d69998df6236d6714aa37ae6142a2a2d4fb2bf6
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64213
Test Plan: Ensure that CI jobs succeed on GitHub before landing.
Reviewed By: jspark1105
Differential Revision:
D30647878
fbshipit-source-id:
b903b39441b4e28dda7eab226ac874e2227e750a
Kimish Patel [Tue, 31 Aug 2021 03:53:50 +0000 (20:53 -0700)]
[Pytorch lite predictor] Use KinetoEdgeCPUProfiler for operator profiling. (#63367)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63367
This diff changes the way operator profiling is done in lite predictor
benchmarking binary.
Instead of using custom callbacks it uses KinetoEdgeCPUProfiler to profile
events and then generate operator level metric from it.
Since KinetoEvents do not contain cpu clock time, now we report only wallclock
time.
This unifies various profiling effort that we have for benchmarking purpose. In
production we will still use observer based mechanism, but the advantage of
using kineto profiler is that we get few other things for free, such as:
- chrome trace generation.
- operator level memory profiling (to be added)
- flop counts (to be added)
Furthermore possible we can use python post processing script to parse chrome
trace and generate output similar to torch.profiler. (To be done)
Test Plan:
aibench run
Model without debug info:
https://www.internalfb.com/intern/aibench/details/
219598441154763
Model with debug info and `--print_module_info true` (see Operator summary has now module hierarchy information).
https://www.internalfb.com/intern/aibench/details/
617154236292985
Reviewed By: raziel
Differential Revision:
D30327514
fbshipit-source-id:
3bb2f2daaaedfb04bd6f5d9c91292783f9c4344f
Peter Bell [Tue, 31 Aug 2021 03:17:12 +0000 (20:17 -0700)]
Compile BatchLinearAlgebra without nvcc (#64146)
Summary:
These files only use cuda libraries interfaces, so don't actually need to be compiled with nvcc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64146
Reviewed By: ezyang
Differential Revision:
D30633189
Pulled By: ngimel
fbshipit-source-id:
c9d0ae5259a10cb49332d31f0da89ad758736ea8
Bert Maher [Tue, 31 Aug 2021 03:08:15 +0000 (20:08 -0700)]
[nnc] Enable fusion of bfloat16 ops (#64196)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64196
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision:
D30643864
Pulled By: bertmaher
fbshipit-source-id:
e95edeaf7089464d713ea1d1f951743d3e5f61c5
James Reed [Tue, 31 Aug 2021 02:54:50 +0000 (19:54 -0700)]
[WIP][FX] BC guarantees for 1.10 (#63888)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63888
Test Plan: Imported from OSS
Reviewed By: pbelevich
Differential Revision:
D30523133
Pulled By: jamesr66a
fbshipit-source-id:
b04cc0d842a74862f42ecba98b757310cd2ec7b0
leslie-fang-intel [Tue, 31 Aug 2021 02:28:59 +0000 (19:28 -0700)]
add operation list for AutocastCPU (#63534)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63534
In this PR:
* We have changed the default dtype of `AutocastCPU` from `float16` to `bfloat16` as discussed here `https://github.com/pytorch/pytorch/pull/61002`
* We also update the operation list which needs casting to `lower_precision_fp` or `float32`.
Test Plan: Imported from OSS
Reviewed By: zou3519
Differential Revision:
D30644914
Pulled By: ezyang
fbshipit-source-id:
8b93485ba452b3759611e3f0ac88e920fe495ac1
oleshp [Tue, 31 Aug 2021 02:22:05 +0000 (19:22 -0700)]
Update contribution_guide.rst (#64142)
Summary:
Grammatical update.
Fixes #{issue number}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64142
Reviewed By: mruberry
Differential Revision:
D30639394
Pulled By: ezyang
fbshipit-source-id:
cf1a4dfbd8e34b0772f1b09f5d820278e8ef8574
Santiago Castro [Tue, 31 Aug 2021 02:17:21 +0000 (19:17 -0700)]
Avoid an unnecessary list creation in `DataChunk` (#64111)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64111
Reviewed By: mruberry
Differential Revision:
D30639383
Pulled By: ezyang
fbshipit-source-id:
96b243307413c99a67d55d862a71937e1ef210f4
Samantha Andow [Tue, 31 Aug 2021 02:15:16 +0000 (19:15 -0700)]
Add optional tensor arguments to (#63967)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63435
Adds optional tensor arguments to check handling torch function checks. The only one I didn't do this for in the functional file was `multi_head_attention_forward` since that already took care of some optional tensor arguments but not others so it seemed like arguments were specifically chosen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63967
Reviewed By: albanD
Differential Revision:
D30640441
Pulled By: ezyang
fbshipit-source-id:
5ef9554d2fb6c14779f8f45542ab435fb49e5d0f
CaoE [Tue, 31 Aug 2021 02:12:23 +0000 (19:12 -0700)]
add BFloat16 support for fold and unfold on CPU (#62880)
Summary:
Add BFloat16 support for fold and unfold operators on CPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62880
Reviewed By: iramazanli
Differential Revision:
D30576387
Pulled By: zou3519
fbshipit-source-id:
c48f6e56702bfea34448db1b3a1634c49c5d8ec8
Edward Yang [Tue, 31 Aug 2021 02:08:45 +0000 (19:08 -0700)]
Add acc_gpu_kernel_with_scalars and port add to use it (#63884)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63884
See https://dev-discuss.pytorch.org/t/cuda-loops-case-study-code-generation-vs-templates/302
for explanation of what's going on here.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision:
D30545296
Pulled By: ezyang
fbshipit-source-id:
f0da52153ae63599fe1d57e90e73f50ca2116939
Erjia Guan [Tue, 31 Aug 2021 01:41:08 +0000 (18:41 -0700)]
Modify inline doc for DataPipe (#64221)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64221
List of tasks in this PR
- [x] Add inline doc for DataPipe
- [x] Improve the inline doc
- [x] Expose DataPipe to `datapipes.iter` (`UnBatcher`) Note: `Forker`, `Demux`, `Mux` are exposed in another PR authored by Kevin
- [x] Add correct typing to DataPipe
- [x] Unify the argument to `datapipe` rather than `source_datapipe`
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision:
D30650541
Pulled By: ejguan
fbshipit-source-id:
c09d1b9742b8097d8e645c15947cef80c876877b
Erjia Guan [Tue, 31 Aug 2021 01:41:08 +0000 (18:41 -0700)]
Replace group_by_key by group_by IterDataPipe (#64220)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64220
Remove `ByKeyGrouperIterDataPipe` due to duplicated functionality.
Fix a bug in `GrouperIterDataPipe` using the existing test.
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision:
D30650542
Pulled By: ejguan
fbshipit-source-id:
666b4d28282fb4f49f3ff101b8d08be16a50d836
Richard Zou [Tue, 31 Aug 2021 01:39:50 +0000 (18:39 -0700)]
Add python mode (#63496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63496
This PR adds a (private) enable_python_mode context manager.
(see torch/utils/_python_dispatch.py).
enable_python_mode accepts the type of a __torch_dispatch__ object
as its argument. Whenever an operator gets called inside of the
context manager, it dispatches to the __torch_dispatch__ of
the passed-in type.
Example usage:
```
with enable_python_mode(LoggingTensor):
z = torch.empty([])
assert isinstance(z, LoggingTensor)
```
There are quite a few changes that were made to support this.
First, we added TorchDispatchTypeObject, a C++ struct that represents the
type of a `__torch_dispatch__` object (e.g. LoggingTensor).
It holds both the PyObject* representing the class and a PyInterpreter*
so we know which Python interpreter it came from.
Next, we updated the concrete_dispatch_fn in python_variable.cpp to accept
a `const std::shared_ptr<TorchDispatchTypeObject>&` argument. When this
is null, dispatching happens as usual. When it is non-null, we prepend
the TorchDispatchTypeObject's PyObject* to the overloaded args list so that
it is considered first for dispatch.
To get that to work, we changed how `handle_torch_dispatch_no_python_arg_parser`
works. The "overloaded args list" previously only consisted of Tensor PyObjects,
but now it can have types in addition to Tensors!
- We renamed `append_overloaded_arg` to `append_overloaded_arg`
- We added a new `append_overloaded_type` that appends a type to
overloaded_args
- We added special handling in `handle_torch_dispatch_no_python_arg_parser`
and `append_overloaded_arg` to handle types in addition to Tensors.
Then, there is PythonMode and PythonModeTLS.
- We reuse the DispatchKey::Python dispatch key as a mode key
- We use PythonMode::enter and PythonMode::exit to enable/disable
DispatchKey::Python and set the PythonModeTLS.
- PythonModeTLS stores a TorchDispatchTypeObject as metadata.
- PythonMode is in libtorch_python, and PythonModeTLS is in ATen.
This split is due to the libtorch_python library boundary (because we need
to save TLS in ATen/ThreadLocalState)
- We modify the PythonFallbackKernel to look up
the relevant TorchDispatchTypeObject (if Python Mode is active) and
dispatch using it.
There are two more miscellaneous changes:
- internal_new_from_data (torch/csrc/utils/tensor_new.cpp) gets an
exclude guard. enable_python_mode currently does not handle
torch.tensor and the exclude guard is to prevent a bug.
Future:
- This PR does not allow for the nesting of Python modes. In the future we
should be able to enable this with a more sane no_dispatch API and by changing
the TLS to a stack. For now I did not need this for CompositeImplicitAutograd testing.
Test Plan: - new tests
Reviewed By: malfet, albanD
Differential Revision:
D30543236
Pulled By: zou3519
fbshipit-source-id:
ef5444d96a5a957d1657b7e37dce80f9a497d452
Bert Maher [Tue, 31 Aug 2021 01:36:33 +0000 (18:36 -0700)]
[nnc] Fix half2float conversion and re-enable float16 (#64199)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64199
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision:
D30643865
Pulled By: bertmaher
fbshipit-source-id:
9de6adca53bd08839328cbaf6364f7de9550264b
Harut Movsisyan [Mon, 30 Aug 2021 23:16:45 +0000 (16:16 -0700)]
[Static Runtime] Implement aten::cumsum out variant (#64159)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64159
Test Plan:
Confirm out variant is called for both versions:
```
> buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --v=1
```
Reviewed By: mikeiovine
Differential Revision:
D30622819
fbshipit-source-id:
a2c8c7f969dae5f507718fb3d513e1fb4f026736
Richard Zou [Mon, 30 Aug 2021 22:58:50 +0000 (15:58 -0700)]
OpInfo for nn.functional.interpolate (#61956)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61956
Each mode goes through a different implementation so they are listed as
different variants.
Test Plan: - run tests
Reviewed By: malfet
Differential Revision:
D30013751
Pulled By: zou3519
fbshipit-source-id:
4253b40b55667d7486ef2d98b441c13d807ab292
Thomas J. Fan [Mon, 30 Aug 2021 22:03:40 +0000 (15:03 -0700)]
BUG Fixes regression for nllloss gradcheck (#64203)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64163
This PR includes the fix and the opinfo from https://github.com/pytorch/pytorch/pull/63854/ for non-regression testing.
cc albanD mruberry jbschlosser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64203
Reviewed By: albanD
Differential Revision:
D30647522
Pulled By: jbschlosser
fbshipit-source-id:
2974d299763505908fa93532aca2bd5d5b71f2e9
Ivan Yashchuk [Mon, 30 Aug 2021 22:03:15 +0000 (15:03 -0700)]
Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] (#59980)
Summary:
This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices.
The change is applied only to CUDA 11+ builds.
`cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980
Reviewed By: ngimel
Differential Revision:
D29699456
Pulled By: cpuhrsch
fbshipit-source-id:
407ae53392acb2f92396a62a57cbaeb0fe6e950b
Alban Desmaison [Mon, 30 Aug 2021 21:56:35 +0000 (14:56 -0700)]
Revert
D30561459: Fix bytes_written and bytes_read
Test Plan: revert-hammer
Differential Revision:
D30561459 (https://github.com/pytorch/pytorch/commit/
e98173ff3423247c597e21c923c8f47470ef07ab)
Original commit changeset:
976fa5167097
fbshipit-source-id:
43f4c234ca400820fe6db5b4f37a25e14dc4b0dd
Alban Desmaison [Mon, 30 Aug 2021 21:46:50 +0000 (14:46 -0700)]
Back out "Added reference tests to ReductionOpInfo" (#64183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64183
Original commit changeset:
6a1f82ac2819
Test Plan: CI
Reviewed By: soulitzer
Differential Revision:
D30639835
fbshipit-source-id:
e238043c6fbd0453317a9ed219e348298f98aaca
Jerry Zhang [Mon, 30 Aug 2021 21:21:39 +0000 (14:21 -0700)]
[quant][graphmode][fx] Add reference quantized conv module (#63828)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63828
Added reference quantized conv module for the custom backend flow, the reference quantized module will
have the following code:
```
w(float) -- quant - dequant \
x(float) ------------- F.conv2d ---
```
In the full model, we will see
```
w(float) -- quant - *dequant \
x -- quant --- *dequant -- *F.conv2d --- *quant - dequant
```
and the backend should be able to fuse the ops with `*` into a quantized linear
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_conv_linear_reference
Imported from OSS
Reviewed By: vkuzo
Differential Revision:
D30504749
fbshipit-source-id:
e1d8c43a0e0d6d9ea2375b8ca59a9c0f455514fb
Daya Khudia [Mon, 30 Aug 2021 20:58:47 +0000 (13:58 -0700)]
Back out "[JIT] Add aten::slice optimization"
Summary:
Original commit changeset:
d12ee39f6828
build-break
overriding_review_checks_triggers_an_audit_and_retroactive_review
Oncall Short Name: dskhudia
Test Plan: Local run succeeds
Differential Revision:
D30633990
fbshipit-source-id:
91cf7cc0ad7e47d919347c2a1527688e062e0c62