Tao Xu [Fri, 17 Sep 2021 16:16:39 +0000 (09:16 -0700)]
[CoreML][iOS/MacOS] Add the CoreML executor (#64522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64522
The `PTMCoreMLExecutor` serves as a bridge between the delegate APIs and Core ML runtime.
ghstack-source-id:
138324217
allow-large-files
Test Plan:
iOS:
Run the CoreML tests in the playground app
MacOS:
```
buck test pp-macos
PASS 633ms 1 Passed 0 Skipped 0 Failed CoreMLTests
```
{
F657776101}
Reviewed By: raziel, iseeyuan
Differential Revision:
D30594042
fbshipit-source-id:
a42a5307a24c2f364333829f3a84f7b9a51e1b3e
Elias Ellison [Fri, 17 Sep 2021 15:32:05 +0000 (08:32 -0700)]
Allow extra unused arguments in symbolic shape function (#65095)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65095
The reason I didn't do this initially was because I was worried that matching one schema to another schema with an extra argument might change semantics, e.g. Add(Tensor, Tensor) to Add(Tensor, Tensor, Tensor) might be different. However we don't actually need to worry about this because the graph schema isn't used for node matching, unlike symbolic_script.cpp
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision:
D30972081
Pulled By: eellison
fbshipit-source-id:
d4089e8feafc330df2ca158866fe779a7da0b073
albanD [Fri, 17 Sep 2021 15:01:33 +0000 (08:01 -0700)]
Actually deprecate __torch_function__ as plain methods (#64843)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64843
Fix for https://github.com/pytorch/pytorch/issues/63767
Test Plan: Imported from OSS
Reviewed By: heitorschueroff
Differential Revision:
D30991425
Pulled By: albanD
fbshipit-source-id:
1214143b8aea87e6ff406c7fc13096bd15d1a768
albanD [Fri, 17 Sep 2021 15:01:33 +0000 (08:01 -0700)]
Update fx proxy to use classmethod for __torch_function__ (#64842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64842
Change the `__torch_function__` to follow best guidelines of using classmethods.
I am not sure how to handle the case where multiple tracer objects are given as input but given that before we were getting an arbitrary tracer from there based on the "self" that was arbitrarily chosen by the torch_function caller, the new implementation is not worst?
Let me know what you think!
Test Plan: Imported from OSS
Reviewed By: heitorschueroff
Differential Revision:
D30991423
Pulled By: albanD
fbshipit-source-id:
d28940df230b543952b278a0eb2d61cf7ae123ce
albanD [Fri, 17 Sep 2021 15:01:33 +0000 (08:01 -0700)]
Use classmethods for overrides (#64841)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64841
Test Plan: Imported from OSS
Reviewed By: heitorschueroff
Differential Revision:
D30991424
Pulled By: albanD
fbshipit-source-id:
551e2119768f3a4292713f3bfa83930f5506adbd
Howard Huang [Fri, 17 Sep 2021 14:55:01 +0000 (07:55 -0700)]
Fix port allocation race condition for elastic test (#65149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65149
Fixes #64789
There is a race condition between when the free port is acquired to when it is used to create the store in which it may have been used. Since this test only tests that timeout is triggered for tcpstore, we can bind to any port on tcpstore creation.
This only affects the test on the server (since that is where the port is used), but I changed both tests for clarity
cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang cbalioglu gcramer23
Test Plan: Imported from OSS
Reviewed By: mrshenli
Differential Revision:
D30993166
Pulled By: H-Huang
fbshipit-source-id:
eac4f28d641ac87c4ebee89df83f90955144f2f1
Stephen Jia [Fri, 17 Sep 2021 14:48:04 +0000 (07:48 -0700)]
Small improvements to compare_models_torch binary (#65171)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65171
Add the model comparison binary to BUCK, and also add some quality of life features such as controlling the input range.
Test Plan:
```
# Build the binary
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:ptmobile_compareAndroid\#android-arm64 --show-ou
# Push it to the device
adb push buck-out/gen/xplat/caffe2/ptmobile_compareAndroid\#android-arm64 /data/local/tmp/compare_models
# Run the benchmark binary
BENCH_CMD="/data/local/tmp/compare_models"
BENCH_CMD+=" --model=$PATH_TO_MODEL"
BENCH_CMD+=" --refmodel=$PATH_TO_REFERENCE_MODEL"
BENCH_CMD+=" --input_type=float --input_dims=$MODEL_INPUT_SIZE"
BENCH_CMD+=" --iter=100"
BENCH_CMD+=" --tolerance 1e-5"
```
Reviewed By: beback4u
Differential Revision:
D30371322
fbshipit-source-id:
5e520aaf119c90985a1d5a135f76e4057148333b
Edward Yang [Fri, 17 Sep 2021 14:40:59 +0000 (07:40 -0700)]
Disable autograd fallback tests on Windows (#65147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65147
I think they trigger an MSVC bug per https://github.com/pytorch/pytorch/issues/48763
ghstack-source-id:
138247203
Test Plan: breakpointed https://www.internalfb.com/intern/sandcastle/job/
9007199738584981/ and sush'ed into the host and ran `buck build arvr/mode/win/opt //xplat/caffe2:autograd_libtorch_test_ovrsource` in `/cygdrive/d/ovrsource-null-hg`
Reviewed By: soulitzer
Differential Revision:
D30992685
fbshipit-source-id:
06c6fb2c18d55490f89fc91ee5b7a4c5a7faf1c6
Michael Dagitses [Fri, 17 Sep 2021 14:32:32 +0000 (07:32 -0700)]
implement "xy" indexing for torch.meshgrid (#62724)
Summary:
This is step 4/7 of https://github.com/pytorch/pytorch/issues/50276. This allows the use of `"xy"` indexing but doesn't change any defaults.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62724
Reviewed By: heitorschueroff
Differential Revision:
D30995290
Pulled By: dagitses
fbshipit-source-id:
08a6a6144b20bc019f68bc3c52e3bbf967976d8f
Alban Desmaison [Fri, 17 Sep 2021 13:28:41 +0000 (06:28 -0700)]
Allow parametrization to be nested (#65167)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65163
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65167
Reviewed By: jbschlosser
Differential Revision:
D31002318
Pulled By: albanD
fbshipit-source-id:
b1f1c6c9efa9e83af9789ed13efc133f777f418e
Nicolas Hug [Fri, 17 Sep 2021 10:27:23 +0000 (03:27 -0700)]
Pass GITHUB_TOKEN to linux CI jobs and avoid skipping torchhub tests (#64807)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/64760
This should hopefully put the torchhub tests back.
This also avoids skipping the torchhub tests: currently the tests are skipped if they fail, which pretty much defeats the purpose of having a test in the first place since we're never notified when they do fail.
cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra nairbv NicolasHug
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64807
Reviewed By: seemethere
Differential Revision:
D30994585
Pulled By: NicolasHug
fbshipit-source-id:
561782c22462b5cfec99cca153eb59623db5660a
Tao Xu [Fri, 17 Sep 2021 07:19:36 +0000 (00:19 -0700)]
[CoreML][fbcode] Add the `preprocess` python APIs (#64521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64521
Add the preprocess part for the coreml delegate. Check out the `example.py` for the usage.
ghstack-source-id:
138324214
Test Plan:
```
(base) [taox@devvm2780.vll0 ~/fbsource/fbcode/caffe2/fb] buck run coreml:example -- --model="/home/taox/mobilenetv2/mobilenetv2.pt" --out="/home/taox/mobilenetv2/mobilenetv2_coreml.pt"
Parsing buck files: finished in 0.5 sec
Downloaded 0/1 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules)
Building: finished in 10.6 sec (100%) 12611/57623 jobs, 1/57623 updated
Total time: 11.1 sec
Converting Frontend ==> MIL Ops: 100%|██████████████████████████████████████████▉| 382/383 [00:00<00:00, 692.58 ops/s]
Running MIL optimization passes: 100%|███████████████████████████████████████████| 18/18 [00:00<00:00, 45.55 passes/s]
Translating MIL ==> MLModel Ops: 100%|███████████████████████████████████████████| 704/704 [00:01<00:00, 468.56 ops/s]
input {
name: "input_0"
type {
multiArrayType {
shape: 1
shape: 3
shape: 224
shape: 224
dataType: FLOAT32
}
}
}
output {
name: "645"
type {
multiArrayType {
dataType: FLOAT32
}
}
}
metadata {
userDefined {
key: "com.github.apple.coremltools.source"
value: "torch==1.10.0a0+fb"
}
userDefined {
key: "com.github.apple.coremltools.version"
value: "4.1"
}
}
{'inputs': '[["input_0", "0", "[1, 3, 224, 224]"]]', 'outputs': '[["645", "0", "[1, 1000]"]]', 'config': '{"spec_ver": "4", "backend": "cpu", "allow_low_precision": "True"}', 'metadata': '{"coremltool_ver": "4.1", "torch_ver": "torch==1.10.0a0+fb"}'}
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0826 13:27:12.690302 2477051 backend_detail.cpp:376] Warning: Backend [coreml] is not available. Execution of this Module is still possible by saving and loading on a device where the backend is available. (function codegen_backend_module)
graph(%self.1 : torch.jit.LoweredModule.coreml.__torch__.torchvision.models.mobilenetv2.MobileNetV2,
%x.1 : Tensor):
%51 : str = prim::Constant[value="Exception: Backend is not available."]()
%50 : str = prim::Constant[value="AssertionError: "]()
%14 : str = prim::Constant[value="forward"]() # <string>:5:62
%48 : Tensor = prim::Uninitialized()
%44 : Tensor = prim::Uninitialized()
%typed_inputs.1 : Any[] = prim::ListConstruct(%x.1)
%__backend.3 : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1)
%8 : bool = prim::CallMethod[name="is_available"](%__backend.3) # <string>:4:19
%49 : Tensor = prim::If(%8) # <string>:4:16
block0():
%__backend : __torch__.torch.classes.__backends__.coreml = prim::GetAttr[name="__backend"](%self.1)
%__handles : Dict(str, Any) = prim::GetAttr[name="__handles"](%self.1)
%15 : Any = aten::__getitem__(%__handles, %14) # <string>:5:47
%17 : Any[] = prim::CallMethod[name="execute"](%__backend, %15, %typed_inputs.1) # <string>:5:24
%18 : Any = prim::ListUnpack(%17)
%20 : bool = prim::isinstance[types=[Tensor]](%18)
%39 : Tensor = prim::If(%20) # <string>:6:18
block0():
%22 : Tensor = prim::unchecked_cast(%18)
-> (%22)
block1():
= prim::RaiseException(%50) # <string>:6:18
-> (%44)
-> (%39)
block1():
= prim::RaiseException(%51) # <string>:9:18
-> (%48)
return (%49)
```
Reviewed By: raziel
Differential Revision:
D30585154
fbshipit-source-id:
66c7d2e931be6eaa3c43a0ee131ea8046452449d
Don Jang [Fri, 17 Sep 2021 05:32:23 +0000 (22:32 -0700)]
[Static Runtime] Introduce static_runtime::dict_unpack (#64771)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64771
Test Plan:
- Added `StaticRuntime.RemoveImmutableInputDictLookupsWithImmutableInputDict`
- Added `StaticRuntime.RemoveImmutableInputDictLookupsWithMutableInputDict`
- TBD: Perf impact measurement
Reviewed By: mikeiovine
Differential Revision:
D30685083
fbshipit-source-id:
050a92ef3b3ed0fdc0ab7a13a4b5dbfede9342a9
BowenBao [Fri, 17 Sep 2021 04:39:10 +0000 (21:39 -0700)]
[ONNX] Update submodule to 1.10.1 (#63716) (#64576)
Summary:
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **https://github.com/pytorch/pytorch/issues/64576 [ONNX] Update submodule to 1.10.1 (https://github.com/pytorch/pytorch/issues/63716)**
* [ONNX] Update IR version to 7
* [ONNX] update submodule to 1.10.1
* Disable some tests in caffe2 that fail b/c caffe2 doesn't support the
new ops.
* Update Bazel file.
* Update expect files for new ONNX IR version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64576
Reviewed By: jansel
Differential Revision:
D31006896
Pulled By: msaroufim
fbshipit-source-id:
f3bf97709f23a5a2cd49c708e7363231f2c1961a
James Reed [Fri, 17 Sep 2021 03:31:03 +0000 (20:31 -0700)]
[FX} Add torch.ops.profiler._record_function_{enter,exit} as stateful ops for DCE (#65180)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65180
Test Plan: Imported from OSS
Reviewed By: jansel
Differential Revision:
D31007115
Pulled By: jamesr66a
fbshipit-source-id:
823b15db712a382a4f2a4fd409983d47bc067150
Zafar Takhirov [Fri, 17 Sep 2021 03:29:05 +0000 (20:29 -0700)]
[quant] AO migration of the `torch/quantization/utils.py` (phase 1) (#64919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64919
AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly. This migrates the quantization utilities.
ghstack-source-id:
138303325
Test Plan: `buck test mode/dev //caffe2/test:quantization`
Reviewed By: jerryzh168
Differential Revision:
D30899082
fbshipit-source-id:
85eb38c419e417147e71758b682cd095308dd0c9
Jordan Fix [Fri, 17 Sep 2021 02:55:46 +0000 (19:55 -0700)]
[acc_utils] Add print_model_info (#65045)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65045
This is a useful tool for printing out all of the ops that are found in a model after acc_tracer. It assumes the provided model has no `call_module` or `call_method`, which is generally a reasonable assumption assuming a model has been successfully traced by the acc_tracer.
Test Plan:
Tested locally. Sample output:
```
Model Info:
> placeholder: 1184
> get_attr: 655
> output: 2
> torch.fx.experimental.fx_acc.acc_ops.add: 2
> torch.fx.experimental.fx_acc.acc_ops.cat: 23
> torch.fx.experimental.fx_acc.acc_ops.embedding_bag: 576
> torch.fx.experimental.fx_acc.acc_ops.layer_norm: 15
> torch.fx.experimental.fx_acc.acc_ops.linear: 27
> torch.fx.experimental.fx_acc.acc_ops.matmul: 3
> torch.fx.experimental.fx_acc.acc_ops.mul: 17
> torch.fx.experimental.fx_acc.acc_ops.permute: 2
> torch.fx.experimental.fx_acc.acc_ops.reshape: 419
> torch.fx.experimental.fx_acc.acc_ops.sigmoid: 16
> torch.fx.experimental.fx_acc.acc_ops.slice_tensor: 630
> torch.fx.experimental.fx_acc.acc_ops.sum: 4
> torch.fx.experimental.fx_acc.acc_ops.tanh: 315
```
Reviewed By:
842974287
Differential Revision:
D30954829
fbshipit-source-id:
5c4f0770667b72859b74099d9f4575284fc48bd2
Yinghai Lu [Fri, 17 Sep 2021 02:26:36 +0000 (19:26 -0700)]
Add back the owning_module fix (#65159)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65159
This was a legit fix originally introduced in
D30905949 (https://github.com/pytorch/pytorch/commit/
446d95a7f64cb464d28d27c4c87c48900a9fde79). But we hesitated and removed it for some reason. Putting it back.
Reviewed By:
842974287
Differential Revision:
D30996277
fbshipit-source-id:
3f5eede11dba2072e7cd5ae6ca7ac81d55fb75fa
Rui Zhu [Fri, 17 Sep 2021 01:08:00 +0000 (18:08 -0700)]
Add dropout shape inference as no-op in acc_tracer (#65113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65113
Register dropout as no-op in acc_tracer & Add shape inference for no-op
Test Plan:
buck test glow/fb/fx/acc_tracer:test_acc_shape_inference --- test_unary_15_dropout_no_op
buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_dropout
Reviewed By: jfix71
Differential Revision:
D30880679
fbshipit-source-id:
592fe50e17137c94c12727658191dedf08daf8cf
Nikita Shulga [Fri, 17 Sep 2021 00:36:14 +0000 (17:36 -0700)]
Pin SciPy to 1.6.2 on Windows (#65017)
Summary:
Re-enable previously disabled test_distributions
Note: conda does not have ScipPy-1.6.3, only 1.6.2
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65017
Reviewed By: seemethere
Differential Revision:
D31003199
Pulled By: malfet
fbshipit-source-id:
96b9d2a833f703008bb1f4df9361db8ec6f8ccc6
Avery Wang [Thu, 16 Sep 2021 23:37:52 +0000 (16:37 -0700)]
Added logging for the Reducer's non-member functions. (#65023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65023
Added an optional logging parameter for non-member functions `compute_bucket_assignment_by_size` and `verify_replica0_across_processes`. If a logger is provided then `TORCH_CHECK` assertions are replaced with a wrapper that logs the error to the DDP reducer's logger before calling `TORCH_CHECK`. If a logger is not provided `TORCH_CHECK` is still called.
Modified python-side calls to `_compute_bucket_assignment_by_size` and `_verify_model_across_ranks` to include a logger whenever possible. A notable exception is when these non-member functions are called in DDP's constructor - we cannot pass in a logger as they may have not been initialized yet.
We also added 4 new tests: `test_compute_bucket_assignment_by_size_sparse_error_{with, without}_logger` which tests the `_compute_bucket_assignment_by_size` function to ensure that sparse tensors are rejected and the errors are logged. `test_verify_model_across_rank_{with, without}_logger` calls `_verify_model_across_ranks` to ensure that ill-formed models (different ranks have different number of parameters compared to rank 0) are rejected and the errors are logged. The test `test_ddp_model_diff_across_ranks` remains unchanged - while it does construct a ill-formed DDP instance which triggers the error in `_verify_model_across_ranks`, we cannot check the logger because this error appears in the constructor.
Lastly, did some cleanup of the `test_ddp_model_diff_across_ranks` function to make the logic of choosing which context manager and error message to use more clean.
Test Plan:
**Build commands**
`buck build mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn --keep-going`
`buck build mode/dev-nosan //caffe2/test/distributed:distributed_gloo_spawn --keep-going`
**Test commands**
Test for `_compute_bucket_assignment_by_size` (Python)/ `compute_bucket_assignment_by_size` (C++)
`BACKEND={nccl, gloo} WORLD_SIZE=2 ../buck-out/dev/gen/caffe2/test/distributed/distributed_{nccl, gloo}_spawn#binary.par -r test_compute_bucket_assignment_by_size_sparse_error_{with, without}_logger`
Test for `_verify_model_across_ranks` (Python)/`verify_replicas0_across_process` (C++)
`BACKEND={nccl, gloo} WORLD_SIZE=2 ../buck-out/dev/gen/caffe2/test/distributed/distributed_{nccl, gloo}_spawn#binary.par -r test_verify_model_across_ranks_{with, without}_logger`
Test that constructs an ill-formed DDP instance. Only did cleanup of this function.
`BACKEND={nccl, gloo} WORLD_SIZE=2 ../buck-out/dev/gen/caffe2/test/distributed/distributed_{nccl, gloo}_spawn#binary.par -r test_ddp_model_diff_across_ranks`
Reviewed By: rohan-varma
Differential Revision:
D30924790
fbshipit-source-id:
dae6fa82485a204a6a4b022f2d073417d07ebb2f
kshitij12345 [Thu, 16 Sep 2021 21:20:43 +0000 (14:20 -0700)]
OpInfo: nn.functional.conv2d (#63517)
Summary:
Reference: https://github.com/pytorch/pytorch/issues/54261
Reference: https://github.com/facebookresearch/functorch/issues/78
Mostly inspired from https://github.com/pytorch/pytorch/issues/62882
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63517
Reviewed By: heitorschueroff
Differential Revision:
D30993855
Pulled By: zou3519
fbshipit-source-id:
7402f99addb4ef8f19c2ce1a09ed9006e737cc7e
Jane Xu [Thu, 16 Sep 2021 20:22:02 +0000 (13:22 -0700)]
Remove old references to 9.2 in documentation (#65059)
Summary:
Removes references in .rst and README.md and comments in the Dockerfile
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65059
Reviewed By: malfet
Differential Revision:
D30961110
Pulled By: janeyx99
fbshipit-source-id:
702a9a81bf08125ec4ac38bc656fc2c128c30018
Kefei Lu [Thu, 16 Sep 2021 20:14:12 +0000 (13:14 -0700)]
Provide function interface for `remove_duplicate_output_args` (#65134)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65134
So that its implementation can be abstracted and replaced
Test Plan: Run linter, CI
Reviewed By:
842974287
Differential Revision:
D30966916
fbshipit-source-id:
92ec78c7410d0be14faecb0ba1eafdc74bab5a5d
Kefei Lu [Thu, 16 Sep 2021 20:14:12 +0000 (13:14 -0700)]
Add type annotation for `TRTInterpreter.run` (#65135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65135
Opportunistically adding type annotation as I work through fx2trt code base.
Test Plan: run linter and CI
Reviewed By: houseroad,
842974287
Differential Revision:
D30903185
fbshipit-source-id:
3f700b57f4433f2d312c1ff2e6b99948e3c8845c
Charles David Hernandez [Thu, 16 Sep 2021 19:55:55 +0000 (12:55 -0700)]
[quant]ao migration for quantization mappings and fuser method mappings hg mv (#64985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64985
moving quantization_mappings.py and fuser_method_mappings.py to the ao folder while retaining backwards compatibility
also added dict test
ghstack-source-id:
138215312
Test Plan:
buck test mode/dev //caffe2/test:quantization
https://www.internalfb.com/intern/testinfra/testrun/
7036874471986444
buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization
https://www.internalfb.com/intern/testinfra/testrun/
5348024625792701
Reviewed By: z-a-f
Differential Revision:
D30982551
fbshipit-source-id:
00f53bd44009d6012a7de852000aad6885131edb
Jane Xu [Thu, 16 Sep 2021 19:53:12 +0000 (12:53 -0700)]
Remove CUDA 9.2 and older references from our cmake (#65065)
Summary:
Removes old CUDA references in our cuda.cmake
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65065
Reviewed By: malfet
Differential Revision:
D30992673
Pulled By: janeyx99
fbshipit-source-id:
85b524089ed57e5acbc71720267cf05e24a8c20a
Nikita Shulga [Thu, 16 Sep 2021 19:37:10 +0000 (12:37 -0700)]
Disable ParallelTBB (#65092)
Summary:
As ParallelTBB's `at::get_thread_num` is not compatible with general model used by OpenMP and ParallelNative (where it is an contiguous thread index within parallel loop), see https://github.com/pytorch/pytorch/issues/64571#issuecomment-
914691883
More examples of similar regressions: https://github.com/pytorch/pytorch/runs/
3612142217
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65092
Reviewed By: zhouzhuojie
Differential Revision:
D30995936
Pulled By: malfet
fbshipit-source-id:
db145b6a850d794f2c954f59f30249b291473e36
Zhengxu Chen [Thu, 16 Sep 2021 18:23:11 +0000 (11:23 -0700)]
Introduce tensorRT as builtin module for torch::deploy. (#63818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63818
ghstack-source-id:
138156957
Test Plan: next diff
Reviewed By: wconstab
Differential Revision:
D30499309
fbshipit-source-id:
4ab1bc9896243c0c1503afb18fbfb196fc37404e
David Berard [Thu, 16 Sep 2021 17:44:33 +0000 (10:44 -0700)]
[JIT] Improve BatchMM mutability handling (#65097)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65097
Previously, BatchMM would skip any block containing any mutable
operators. Now it will avoid batching any operation whose inputs or
outputs are ever mutated. Specifically: consider a tree of ADD, T,
and MM nodes rooted at an ADD node. If any input or output to any
node in the tree is ever mutated, then the entire tree will be ignored
by BatchMM.
Test Plan: python test/test_jit.py TestBatchMM
Reviewed By: eellison
Differential Revision:
D30973515
Pulled By: davidberard98
fbshipit-source-id:
9d836faa1ef0c9e3fefe0ffc0bd265f275471f48
Charles David Hernandez [Thu, 16 Sep 2021 17:31:21 +0000 (10:31 -0700)]
[quant] ao migration of observer and qconfig (#64982)
Summary:
(Had to recreate this diff so it wasn't dependent on the stack)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64982
migration of qconfig.py and observer.py to torch/ao/quantization using new test format
ghstack-source-id:
138215256
Test Plan:
buck test mode/opt //caffe2/test:quantization
https://www.internalfb.com/intern/testinfra/testconsole/testrun/
8444249354294701/
buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization
https://www.internalfb.com/intern/testinfra/testrun/
3940649742829796
Reviewed By: z-a-f
Differential Revision:
D30982534
fbshipit-source-id:
48d08969b1984311ceb036eac0877c811cd6add9
Kushashwa Ravi Shrimali [Thu, 16 Sep 2021 17:12:50 +0000 (10:12 -0700)]
[Fix] Raise error when empty index tensor is passed (gather) (#65006)
Summary:
See https://github.com/pytorch/pytorch/pull/63312#issuecomment-
919330081 for context.
cc: ezyang ysiraichi
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65006
Reviewed By: mruberry
Differential Revision:
D30937730
Pulled By: ezyang
fbshipit-source-id:
a8f77b1f40d07e7e3bef6caaafa119685f297638
James Reed [Thu, 16 Sep 2021 17:00:59 +0000 (10:00 -0700)]
[FX] Gate FXGraphDrawer on whether pydot is installed (#65088)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65088
Test Plan: Imported from OSS
Reviewed By: khabinov
Differential Revision:
D30967951
Pulled By: jamesr66a
fbshipit-source-id:
dba2f13a47889b3d4187de925b4fe74ee90b7f79
Michael Dagitses [Thu, 16 Sep 2021 16:58:09 +0000 (09:58 -0700)]
add support for indexing to meshgrid (#62722)
Summary:
This is step 3/7 of https://github.com/pytorch/pytorch/issues/50276. It only adds support for the argument but doesn't implement new indexing modes yet.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62722
Test Plan:
Verified this is not FC breaking by adding logging to both meshgrid
overloads and then called meshgrid twice:
`meshgrid(*tensors)`
and
`meshgrid(*tensors, indexing='ij')`
This confirmed that the former signature triggered the original native
function and the latter signature triggered the new native function.
Reviewed By: H-Huang
Differential Revision:
D30394313
Pulled By: dagitses
fbshipit-source-id:
e265cb114d8caae414ee2305dc463b34fdb57fa6
Richard Zou [Thu, 16 Sep 2021 16:00:34 +0000 (09:00 -0700)]
[Reland] Add python mode (#64360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64360
This PR adds a (private) enable_python_mode context manager.
(see torch/utils/_python_dispatch.py).
enable_python_mode accepts the type of a __torch_dispatch__ object
as its argument. Whenever an operator gets called inside of the
context manager, it dispatches to the __torch_dispatch__ of
the passed-in type.
Example usage:
```
with enable_python_mode(LoggingTensor):
z = torch.empty([])
assert isinstance(z, LoggingTensor)
```
There are quite a few changes that were made to support this.
First, we added TorchDispatchTypeObject, a C++ struct that represents the
type of a `__torch_dispatch__` object (e.g. LoggingTensor).
It holds both the PyObject* representing the class and a PyInterpreter*
so we know which Python interpreter it came from.
Next, we updated the concrete_dispatch_fn in python_variable.cpp to accept
a `const std::shared_ptr<TorchDispatchTypeObject>&` argument. When this
is null, dispatching happens as usual. When it is non-null, we prepend
the TorchDispatchTypeObject's PyObject* to the overloaded args list so that
it is considered first for dispatch.
To get that to work, we changed how `handle_torch_dispatch_no_python_arg_parser`
works. The "overloaded args list" previously only consisted of Tensor PyObjects,
but now it can have types in addition to Tensors!
- We renamed `append_overloaded_arg` to `append_overloaded_arg`
- We added a new `append_overloaded_type` that appends a type to
overloaded_args
- We added special handling in `handle_torch_dispatch_no_python_arg_parser`
and `append_overloaded_arg` to handle types in addition to Tensors.
Then, there is PythonMode and PythonModeTLS.
- We reuse the DispatchKey::Python dispatch key as a mode key
- We use PythonMode::enter and PythonMode::exit to enable/disable
DispatchKey::Python and set the PythonModeTLS.
- PythonModeTLS stores a TorchDispatchTypeObject as metadata.
- PythonMode is in libtorch_python, and PythonModeTLS is in ATen.
This split is due to the libtorch_python library boundary (because we need
to save TLS in ATen/ThreadLocalState)
- We modify the PythonFallbackKernel to look up
the relevant TorchDispatchTypeObject (if Python Mode is active) and
dispatch using it.
There are two more miscellaneous changes:
- internal_new_from_data (torch/csrc/utils/tensor_new.cpp) gets an
exclude guard. enable_python_mode currently does not handle
torch.tensor and the exclude guard is to prevent a bug.
Future:
- This PR does not allow for the nesting of Python modes. In the future we
should be able to enable this with a more sane no_dispatch API and by changing
the TLS to a stack. For now I did not need this for CompositeImplicitAutograd testing.
Test Plan: - new tests
Reviewed By: ezyang
Differential Revision:
D30698082
Pulled By: zou3519
fbshipit-source-id:
7094a90eee6aa51f8b71bc4d91cfb6f49e9691f8
Alban Desmaison [Thu, 16 Sep 2021 13:36:29 +0000 (06:36 -0700)]
Revert
D30888794: [Model Averaging] Simplify PostLocalSGD Optimizer API
Test Plan: revert-hammer
Differential Revision:
D30888794 (https://github.com/pytorch/pytorch/commit/
3d312b3b8ee90f8b289c7d5601a13d0521b46b7e)
Original commit changeset:
21261b480f6b
fbshipit-source-id:
87abb7e8cd9ecaac909ec6c3ee053fa7c4ae1975
Rodrigo Berriel [Thu, 16 Sep 2021 13:33:40 +0000 (06:33 -0700)]
Improve LSTM documentation for proj_size > 0 (#65102)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/65053. Although the documentation states that:
https://github.com/pytorch/pytorch/blob/
fe0f9d1dafb9791cb08635636a01128850d17538/torch/nn/modules/rnn.py#L500-L506
It seems that the definition of `weight_ih_l[k]` could be improved by specifying what happens when `k > 0` and `proj_size > 0`. As `proj_size` is only used in LSTM, no changes are needed for the other RNNs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65102
Reviewed By: supriyar
Differential Revision:
D30975781
Pulled By: jbschlosser
fbshipit-source-id:
12df06e5e6a8d5de0ad10fb15e33c3e6311c11d3
Scott Wolchok [Thu, 16 Sep 2021 04:43:09 +0000 (21:43 -0700)]
[Static Runtime] Use FastSet instead of std::set everywhere (#65114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65114
There doesn't seem to be any reason to use std::set for sets of pointers, right?
ghstack-source-id:
138198504
Reviewed By: hlu1
Differential Revision:
D30978450
fbshipit-source-id:
4599c6249fda3a89959f839d3bf6400c5891f82c
Amr Elshennawy [Thu, 16 Sep 2021 04:19:03 +0000 (21:19 -0700)]
Reduce PyToch Warnings - Cast fixes from
D26624430 (#65015)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65015
Split out the existing fixes into a diff we can land separately.
Test Plan:
pooled_embeddings_modules_test
Parsing buck files: finished in 8.3 sec
Creating action graph: finished in 38.3 sec
[RE] Metadata: Session ID=[https://fburl.com/b/reSessionID-
9bea421c-875e-4168-9e00-
7d67479b1a9f]
[RE] Waiting on 46 remote actions. Completed 905 actions remotely, action cache hit rate: 5.08%.
Downloaded 7002/8869 artifacts, 560.00 Mbytes, 11.6% cache miss (for updated rules)
Building: finished in 13:12.4 min (100%) 31964/31964 jobs, 17344/31964 updated
Total time: 13:59.1 min
More details at https://www.internalfb.com/intern/buck/build/
b9a58bba-e0aa-4c2b-8824-
a0c4074b0954
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id:
28cbe2b1-6fbc-450c-91c9-
c06a7ff1d53b
Trace available for this run at /tmp/tpx-
20210914-114921.005504/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/
1407375088325000
✓ ListingSuccess: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - main (23.849)
{emoji:2702} Omit: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda)
Test output:
> This test was disabled.
To run this test locally, add the command line flag --run-disabled to your test command (prefix with -- if using buck).
To view why this is disabled or re-enable this test in the test console, visit https://our.intern.facebook.com/intern/testinfra/testdetail/
562949981577936
↻ Skip: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) (13.201)
Test output:
> Repro command : $(cat "/tmp/tpx-
20210914-114921.005504/
dc174692-8d92-4459-8b8f-
201643c6ab7d/execution_command")
Skipped: CUDA is not available or no GPUs detected
stdout:
stderr:
↻ Skip: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda) (13.201)
Test output:
> Repro command : $(cat "/tmp/tpx-
20210914-114921.005504/
dc174692-8d92-4459-8b8f-
201643c6ab7d/execution_command")
Skipped: CUDA is not available or no GPUs detected
stdout:
stderr:
✓ Pass: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_compatibility (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda) (13.201)
↻ Skip: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) (13.201)
Test output:
> Repro command : $(cat "/tmp/tpx-
20210914-114921.005504/
dc174692-8d92-4459-8b8f-
201643c6ab7d/execution_command")
Skipped: CUDA is not available or no GPUs detected
stdout:
stderr:
✓ Pass: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_compatibility (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu) (13.201)
✓ Pass: caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - main (13.201)
Summary
Pass: 3
Skip: 3
↻ caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu)
↻ caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda)
↻ caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation_autograd (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_0_cpu)
Omit: 1
{emoji:2702} caffe2/torch/fb/sparsenn:pooled_embeddings_modules_test - test_permutation (caffe2.torch.fb.sparsenn.tests.pooled_embeddings_modules_test.PooledEmbeddingModulesTest_1_cuda)
ListingSuccess: 1
shape_inference_mode_test
[amrelshennawy@devvm855.ftw0 /data/users/amrelshennawy/fbsource/fbcode] buck test caffe2/torch/fb/sparsenn:shape_inference_mode_test
Downloaded 6/18 artifacts, 11.69 Kbytes, 53.8% cache miss (for updated rules)
Building: finished in 1.6 sec (100%) 110/110 jobs, 26/110 updated
Total time: 1.8 sec
More details at https://www.internalfb.com/intern/buck/build/
0e5f45b2-5777-49e9-a3b0-
09bd05687b2b
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id:
99509108-5ff3-4b1a-b7b3-
2f43c4036209
Trace available for this run at /tmp/tpx-
20210914-120119.723607/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/
6192449502564504
✓ ListingSuccess: caffe2/torch/fb/sparsenn:shape_inference_mode_test - main (0.374)
✓ Pass: caffe2/torch/fb/sparsenn:shape_inference_mode_test - test_set_upper_bound_mode (torch.python.fb.shape_inference_mode_test.TestShapeInferenceMode) (0.249)
✓ Pass: caffe2/torch/fb/sparsenn:shape_inference_mode_test - test_set_upper_bound_settings (torch.python.fb.shape_inference_mode_test.TestShapeInferenceMode) (0.253)
Summary
Pass: 2
ListingSuccess: 1
test
[amrelshennawy@devvm855.ftw0 /data/users/amrelshennawy/fbsource/fbcode] buck test caffe2/torch/fb/sparsenn:test
Parsing buck files: finished in 1.1 sec
Creating action graph: finished in 38.6 sec
Downloaded 6/30 artifacts, 11.29 Kbytes, 66.7% cache miss (for updated rules)
Building: finished in 41.6 sec (100%) 26783/26783 jobs, 43/26783 updated
Total time: 01:21.4 min
More details at https://www.internalfb.com/intern/buck/build/
8f794eb0-3d3c-4ee3-9aec-
5ec5cec1b0f4
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id:
a06164b5-d7d7-444c-a4ff-
e312cb9970d9
Trace available for this run at /tmp/tpx-
20210914-120428.464799/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/
3377699789132066
✓ ListingSuccess: caffe2/torch/fb/sparsenn:test - main (16.637)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_dense_mlp_quantize_ops (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.870)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges_shape_inference_mode (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.922)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges_to_dense_caffe2 (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.348)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_simple (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.370)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_recat_embedding_grad_output_mixed_D_batch (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.516)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bag_byte_rowwise_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.515)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.861)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bags (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.873)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges_out (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.969)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_segments_pad_minf (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.104)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_multiple_runs (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.342)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_sigrid_transform (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.664)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges_out_empty_batch (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.745)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_lengths (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.771)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_multiple_runs_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.944)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_offsets_to_ranges_empty_batch (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.944)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges_shape_inference_mode (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.245)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_prior_correction_calibration_prediction_nonbinary (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.328)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_8bitfakefused (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.501)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_ranges (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (20.608)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_lengths_inference_tests (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (22.403)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_cat_out (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (23.025)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_lengths_negatives_tests (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (23.956)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_cat (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (24.100)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_transform_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (17.384)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_values_scores_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.672)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_empty_values_scores_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.679)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_segments (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.726)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_ranges_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (17.567)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_box_cox_all_zeros (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.036)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_rowwise_prune_op_32bit_indices (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.430)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_transform_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.176)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_dense_feature_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.006)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges_gather (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.555)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_int_nbit_split_embedding_codegen_lookup_function (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.791)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_segments_smaller_max_len (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.737)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_pos (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.212)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bag_2bit_rowwise_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.612)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_prior_correction_calibration_prediction_binary (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.858)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_tracing_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.002)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_tracing (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (20.824)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_1d_counts (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.976)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_recat_embedding_grad_output_mixed_D (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.832)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_one_hot_lengths (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.844)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.558)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_box_cox_non_zeros (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.418)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_prior_correction_calibration_accumulate (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.222)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_unsqueeze_vector (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.327)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_xl_embedding_bag_4bit_rowwise_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.772)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.425)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_cat_backward (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.956)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_expand_offsets_tensor (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (19.320)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_gather_ranges (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.923)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_one_hot (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.549)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_deprecated_sigrid_transforms_create (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.932)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_clip_ranges_gather_lengths_to_offsets (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.807)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_length_to_row_idx (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (17.738)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_tracing_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (20.175)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_batch_box_cox_mixed (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.116)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_1d_bins (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.671)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_permute_out (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.002)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_create_sigrid_transforms_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (18.151)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_ranges_torch_bind (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (16.780)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_no_bins (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.185)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_cumsum (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.242)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_le_one (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.876)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_pack_and_unpack_segments (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (19.222)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_self_binning_histogram_quantile_dims (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (20.007)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_sigrid_hash_op (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.959)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_rowwise_prune_op_64bit_indices (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (18.601)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_ranges_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (17.977)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_broadcast_stack (caffe2.torch.fb.sparsenn.tests.sparsenn_operators_test.SparseNNOperatorsTest) (22.588)
✓ Pass: caffe2/torch/fb/sparsenn:test - test_multiple_runs_torch_bind_upper_bound (caffe2.torch.fb.sparsenn.tests.sigrid_transforms_test.SigridTransformsOpsTest) (15.342)
Summary
Pass: 73
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/
3377699789132066
Did not run (no GPU on my devserver):
gpu_test
cpp_gpu_test
Reviewed By: r-barnes
Differential Revision:
D30940399
fbshipit-source-id:
d867ca646723340775a49c1b983cdab64f2d67d8
Priya Ramani [Thu, 16 Sep 2021 03:03:38 +0000 (20:03 -0700)]
Bug fix (#65105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65105
Using buildErrorMessage in external_functions.cpp was breaking build target nnc_cpu_backend_lib as buildErrorMessage is defined in tensorexpr/kernel.cpp which is not included in mobile builds and we don't want to include it in mobile builds.
Also buildErrorMessage wraps error messages for fuser whereas nnc_aten_conv2d is now only used in AOT workflow and not called by the fuser. So wrapping assertion failures with fuser error message would be misleading for AOT workflow.
Test Plan:
Before fix:
```
+ buck build //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc
Downloading... 3/3 artifacts, 24.81 Kbytes, 0.0% cache miss (for updated rules)
Building... 1.7 sec (99%) 4639/4641 jobs, 3/4641 updated
- //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc#binary... 0.7 sec (running c++ link[0.6 sec])
Command failed with exit code 1.
command: [/data/users/priyaramani/fbsource/buck-out/cells/fbcode/gen/
aab7ed39/tools/build/buck/wrappers/__ld__/ld.sh, --ld=/data/users/priyaramani/fbsource/fbcode/third-party-buck/platform009/build/llvm-fb/9.0.0/bin/clang++, --cc=/data/users/priyaramani/fbsource/buck-out/cells/fbcode/gen/
aab7ed39/tools/build/buck/wrappers/__fbc...
<truncated>
...
stderr: clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]
ld.lld: error: undefined symbol: torch::jit::tensorexpr::buildErrorMessage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
>>> referenced by external_functions.cpp:69 (xplat/caffe2/torch/csrc/jit/tensorexpr/external_functions.cpp:69)
>>> ../nnc_cpu_backend_lib#compile-external_functions.cpp.o50e02bc2,platform009-clang/torch/csrc/jit/tensorexpr/external_functions.cpp.o:(nnc_aten_conv2d) in archive /data/users/priyaramani/fbsource/buck-out/gen/
aab7ed39/xplat/caffe2/nnc_cpu_backend_lib#platform009-clang,static/libnnc_cpu_backend_lib.a
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
When running <c++ link>.
When building rule //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc#binary (ovr_config//platform/linux:x86_64-fbcode).
clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]
ld.lld: error: undefined symbol: torch::jit::tensorexpr::buildErrorMessage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
>>> referenced by external_functions.cpp:69 (xplat/caffe2/torch/csrc/jit/tensorexpr/external_functions.cpp:69)
>>> ../nnc_cpu_backend_lib#compile-external_functions.cpp.o50e02bc2,platform009-clang/torch/csrc/jit/tensorexpr/external_functions.cpp.o:(nnc_aten_conv2d) in archive /data/users/priyaramani/fbsource/buck-out/gen/
aab7ed39/xplat/caffe2/nnc_cpu_backend_lib#platform009-clang,static/libnnc_cpu_backend_lib.a
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)
Command failed with exit code 1.
command: [/data/users/priyaramani/fbsource/buck-out/cells/fbcode/gen/
aab7ed39/tools/build/buck/wrappers/__ld__/ld.sh, --ld=/data/users/priyaramani/fbsource/fbcode/third-party-buck/platform009/build/llvm-fb/9.0.0[DEBUG kernel.cpp:2766] }
```
After fix:
```
+ buck build //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc
Action graph will be rebuilt because files have been added or removed.
clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument]
Downloaded 11/15 artifacts, 78.37 Kbytes, 15.4% cache miss (for updated rules)
Building: finished in 7.4 sec (100%) 4718/4718 jobs, 46/4718 updated
Total time: 7.5 sec
More details at https://www.internalfb.com/intern/buck/build/
b87be016-340c-49f8-b832-
0c1de70aae9e
```
Reviewed By: ZolotukhinM
Differential Revision:
D30975952
fbshipit-source-id:
85c028cc6af63c03b505b51302f5158c23e1a047
Jordan Fix [Thu, 16 Sep 2021 02:39:41 +0000 (19:39 -0700)]
[acc_ops] Add support for torch variants of squeeze and mul (#65037)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65037
att
Test Plan: updated unit tests
Reviewed By: yuhc
Differential Revision:
D30952224
fbshipit-source-id:
aaf75b27b4fc6c0436ba7bfcf324f761b900171b
Priya Ramani [Thu, 16 Sep 2021 02:12:47 +0000 (19:12 -0700)]
Add NNC AOT Compiler executable (#63994)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63994
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision:
D30582149
Pulled By: priyaramani
fbshipit-source-id:
3bbf085428824c3cb308e006c18bb0a57f50fef6
Zafar Takhirov [Thu, 16 Sep 2021 01:13:53 +0000 (18:13 -0700)]
[quant] AO migration of the `_correct_bias.py`, `_equalize.py`, and `_learnable_fake_quantize.py` (#64917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64917
AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates from torch.quantization to torch.ao.quantization the following files:
- `_correct_bias.py`
- `_equalize.py`
- `_learnable_fake_quantize.py`
**Note:** These file are migrated completely without any warning. The old location is thus silently deprecated.
Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestBiasCorrection`
Reviewed By: vkuzo
Differential Revision:
D30898565
fbshipit-source-id:
1d39be2539dd1adfcb42e16bdcc0daf5c8316bbd
Jane Xu [Thu, 16 Sep 2021 01:03:19 +0000 (18:03 -0700)]
.circleci/.jenkins: Remove 9.2 references in CI (#65024)
Summary:
Removes 9.2 references in CI scripts and configs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65024
Reviewed By: driazati
Differential Revision:
D30945948
Pulled By: janeyx99
fbshipit-source-id:
77890a00520c61500a934a90a74e3fcca84c09b5
Jane Xu [Thu, 16 Sep 2021 01:00:24 +0000 (18:00 -0700)]
.github: GHA add retry for docker run in chown workspace step (#65104)
Summary:
This should help prevent further errors in GHA workflows during the Chown Workspace step such as https://github.com/pytorch/pytorch/runs/
3614067053
I did not add retries to other steps with docker run
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65104
Reviewed By: seemethere
Differential Revision:
D30976330
Pulled By: janeyx99
fbshipit-source-id:
e403008548aa01c9a0a4ccebe56df0e889dd045c
Eli Uriegas [Thu, 16 Sep 2021 00:37:10 +0000 (17:37 -0700)]
Revert
D30752939: [pytorch][PR] nvfuser update
Test Plan: revert-hammer
Differential Revision:
D30752939 (https://github.com/pytorch/pytorch/commit/
cfaecaf40bd6cabd3f4e0ef0d8c7252655349b61)
Original commit changeset:
ce122e80f01b
fbshipit-source-id:
57685df8f9946032a06eff1de8a3d1498500d2d2
Zafar Takhirov [Thu, 16 Sep 2021 00:24:09 +0000 (17:24 -0700)]
[quant] AO migration of the `quant_types.py` (phase 1) (#64916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64916
AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the quant_type.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.
Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization`
Reviewed By: vkuzo
Differential Revision:
D30898422
fbshipit-source-id:
3e6126b49f0565a4136d6928cea9eb25368927ff
Zafar Takhirov [Thu, 16 Sep 2021 00:24:09 +0000 (17:24 -0700)]
[quant] AO migration of the `fuse_modules.py` (phase 1) (#64913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64913
AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the fuse_module.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.
Test Plan: `buck test mode/dev //caffe2/test:quantization`
Reviewed By: vkuzo
Differential Revision:
D30882819
fbshipit-source-id:
1926ad6aa49136aceb5b625dcef4bfde3a2860d4
Mikhail Zolotukhin [Thu, 16 Sep 2021 00:13:48 +0000 (17:13 -0700)]
[TensorExpr] Add a method for sanitizing Var and Buf names in Stmt. (#65010)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65010
This pass ensures all names are legal and not-duplicated.
Fixes #52727.
Test Plan: Imported from OSS
Reviewed By: bertmaher, navahgar
Differential Revision:
D30939717
Pulled By: ZolotukhinM
fbshipit-source-id:
7dbe7f937de41f22ad49137a5e067d698443ed63
Eli Uriegas [Wed, 15 Sep 2021 23:51:34 +0000 (16:51 -0700)]
.github: Enable only specific workflows for canary (#65099)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65099
Utilizes ciflow to enable only specific workflows for
pytorch/pytorch-canary to reduce noise on that specific repository
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS
Reviewed By: jbschlosser
Differential Revision:
D30973691
Pulled By: seemethere
fbshipit-source-id:
371765535b42a00bd72c2551c4faebf733d759f0
Eli Uriegas [Wed, 15 Sep 2021 23:09:57 +0000 (16:09 -0700)]
ci: Disable jit legacy on circleci, enable on gha (#65106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65106
Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra
Test Plan: Imported from OSS
Reviewed By: malfet, janeyx99
Differential Revision:
D30976186
Pulled By: seemethere
fbshipit-source-id:
8958f821eab9aa284496c57915894ed70f6b2fff
Jane Xu [Wed, 15 Sep 2021 22:59:21 +0000 (15:59 -0700)]
CI: Upgrade windows 10.1 jobs to 10.2 (#65080)
Summary:
This is first 2 steps in the following task:
1. Upgrade 10.1 to 10.2
2. Migrate force_on_cpu job to GHA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65080
Test Plan: https://github.com/pytorch/pytorch/pull/65086
Reviewed By: seemethere
Differential Revision:
D30973655
Pulled By: janeyx99
fbshipit-source-id:
67ab69ea99ff9e0336400a7173efef6d7daac07c
Jane Xu [Wed, 15 Sep 2021 22:59:06 +0000 (15:59 -0700)]
Replace windows 10.2 smoke tests on PRs to be 11.3 (#65090)
Summary:
As we default to linux CUDA 11.3 on PRs, we should do the same thing with Windows (instead of having 10.2 be the default). This means that 10.2 will now be master only, and 11.3 windows smoke tests will run on every PR.
This also copies over the "run smoke tests only" config--removing that will be in a separate PR once there's more certain decision making.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65090
Reviewed By: seemethere
Differential Revision:
D30968382
Pulled By: janeyx99
fbshipit-source-id:
c73f9a2cc800b678909365c4d80627d29fc09f94
Natalia Gimelshein [Wed, 15 Sep 2021 22:38:56 +0000 (15:38 -0700)]
Revert
D30883290: [Static Runtime] Move MemoryPlanner out into memory_planner.cpp
Test Plan: revert-hammer
Differential Revision:
D30883290 (https://github.com/pytorch/pytorch/commit/
0e11454d19e106ba6d5819c1147ca540cbce2943)
Original commit changeset:
a37570f8d943
fbshipit-source-id:
65c57a2b0d2e3c7006765195dd519e8cf2472f72
Charles David Hernandez [Wed, 15 Sep 2021 22:15:02 +0000 (15:15 -0700)]
[quant] Removing hardcoded "torch.quantization.observer" for migration (#64981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64981
this would have cause errors when observer.py was moved to ao.
see:
D30391189
ghstack-source-id:
138118430
Test Plan:
buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_dynamic_quant_multi_uses (quantization.jit.test_quantize_jit.TestQuantizeDynamicJitPasses)'
buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_save_load_state_dict_script (quantization.core.test_workflow_module.TestObserver)'
Reviewed By: supriyar
Differential Revision:
D30432008
fbshipit-source-id:
754727a89c78f6ceada6f8ff92c304f3953f38fc
Scott Wolchok [Wed, 15 Sep 2021 22:12:29 +0000 (15:12 -0700)]
[Caffe2][easy] Avoid spurious vector copy in TransposeOp (#64403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64403
No need to copy to the heap here.
ghstack-source-id:
138033019
Test Plan: CI
Reviewed By: smacke
Differential Revision:
D30712506
fbshipit-source-id:
5f4131b2569ebb1f5092262aaddb17215dea88f1
Scott Wolchok [Wed, 15 Sep 2021 22:12:29 +0000 (15:12 -0700)]
[Caffe2] Don't pass vector by value in SqueezeOp (#64400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64400
There appears to be no need to copy this vector.
ghstack-source-id:
138033020
Test Plan: CI
Reviewed By: smacke
Differential Revision:
D30711014
fbshipit-source-id:
b9fcf3d496a663b8478aa22d52b2c41f8f85e90f
David Riazati [Wed, 15 Sep 2021 21:46:11 +0000 (14:46 -0700)]
Use RDS for build size tracking (#64303)
Summary:
This adds 2 utilities: `register_rds_table` and `rds_write`. `register_rds_table` needs to be called once with the schema for the data that `rds_write` will write. These go to a lambda called `rds-proxy`, which will write to/read from the DB as necessary. This data can then be arbitrarily queried via `rds-proxy` (for use in CI) or on metrics.pytorch.org (for analysis).
It also hooks these up for build size tracking (which previously was not working on GHA)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64303
Reviewed By: mruberry
Differential Revision:
D30941182
Pulled By: driazati
fbshipit-source-id:
12c5575ddd29902477464fc989ad76a052306b9b
jiej [Wed, 15 Sep 2021 21:40:18 +0000 (14:40 -0700)]
nvfuser update (#63745)
Summary:
Syncing nvfuser code base from devel branch, Listing a few of our development since last sync:
- Extends support to normalization and reduction kernels.
- Multiple kernel launch for single `CudaFusionGroup`. Hierarchical caching system has been updated to cache graph segmentation.
- profile_ivalue is enabled to convert dynamic scalar into compile time constants, which are required by the codegen. (e.g. reduction axes).
To keep this PR simple and relatively review-free. We stripped most external changes and submitted them as separate PRs, so this gigantic PR is easier to handle.
internal updates are files located in:
1. updates in nvfuser codegen `torch/csrc/jit/coddgen/cuda`
2. added nvfuser specific benchmarks `benchmarks/cpp/nvfuser`
3. nvfuser jit cpp tests `test/cpp/jit/test_gpu.cpp` `test/cpp/jit/test_gpu_shift.cpp` `test/cpp/jit/test_gpu_validator.h`
updates affecting integration:
1. profile_ivalue enabled for nvfuser. related changes are in `torch/csrc/jit/runtime/*`,
2. exposed a few more symbols `aten/src/ATen/core/*` used by codegen
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63745
Reviewed By: saketh-are
Differential Revision:
D30752939
Pulled By: malfet
fbshipit-source-id:
ce122e80f01bcd3865f5bd3c4dfde660665fd84c
Elias Ellison [Wed, 15 Sep 2021 20:43:12 +0000 (13:43 -0700)]
Add embedding shape analysis (#64323)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64323
Test Plan: Imported from OSS
Reviewed By: driazati
Differential Revision:
D30738145
Pulled By: eellison
fbshipit-source-id:
be12408330d671bc65cf645aa2c20fafd954e6a9
Elias Ellison [Wed, 15 Sep 2021 20:43:12 +0000 (13:43 -0700)]
Max Pool with indices (#64121)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64121
Add support for aten operators which return multiple outputs
Test Plan: Imported from OSS
Reviewed By: driazati
Differential Revision:
D30738142
Pulled By: eellison
fbshipit-source-id:
0d7e51187bd5e3e9b43f0fdb5178366a97aec943
Elias Ellison [Wed, 15 Sep 2021 20:43:12 +0000 (13:43 -0700)]
Add Maxpool to shape analysis / Opinfo (#63530)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63530
how to review: pretty much just check that the inputs generated are a good representation of the op semantics, that should be sufficient for correctness, and then you can also double check the op size semantics by going to https://codebrowser.bddppq.com/pytorch/pytorch/ typing in native::{op_name} and looking at the op implementation as a bonus if you want
Test Plan: Imported from OSS
Reviewed By: driazati
Differential Revision:
D30738147
Pulled By: eellison
fbshipit-source-id:
cf52339e572ee04e0d6167fd95d8a82d58ea7706
Zafar Takhirov [Wed, 15 Sep 2021 20:11:58 +0000 (13:11 -0700)]
[quant][refactor] Change the structure of the ao migration tests (#64912)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64912
The test naming was confusing and ambiguous. The file was changed to reflect the framework that is being migrated ("quantization" instead of "quantize"). Also, the common testing class was extracted out
ghstack-source-id:
138157450
Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization`
Reviewed By: vkuzo
Differential Revision:
D30898214
fbshipit-source-id:
017f95995271d35bcdf6ff6a1b3974b837543e84
David Riazati [Wed, 15 Sep 2021 20:10:02 +0000 (13:10 -0700)]
Add retries to ECR login step (#65013)
Summary:
Switch retry mode from `legacy` to `standard` (https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-retries.html#cli-usage-retries-configure) and up the number of retries.
Fixes #{issue number}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65013
Reviewed By: zhouzhuojie, mruberry
Differential Revision:
D30943292
Pulled By: driazati
fbshipit-source-id:
0a21e9b4eacbb77e6aca22f9256d94cd591b23cd
Ilqar Ramazanli [Wed, 15 Sep 2021 20:07:59 +0000 (13:07 -0700)]
To add state dict and load_dict for Chained Scheduler (#65034)
Summary:
Adding state_dict() and load_state_dict() methods for Chained Scheduler
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65034
Reviewed By: prabhat00155, nateanl
Differential Revision:
D30958207
Pulled By: datumbox
fbshipit-source-id:
1a587a330d34e0548e891a39f8fb5a3d251b71fa
BowenBao [Wed, 15 Sep 2021 19:56:33 +0000 (12:56 -0700)]
[ONNX] Enhance shape (two changes merged) (#64585)
Summary:
Enhanced shape inference by introducing typeReliableMap.
[ONNX] exporter changes for torch hub models (https://github.com/pytorch/pytorch/issues/62856)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64585
Reviewed By: ezyang
Differential Revision:
D30870418
Pulled By: msaroufim
fbshipit-source-id:
87a294799cb87d649d1d13b6114a5cfbac9be15c
Co-authored-by: jiafatom <jiafa@microsoft.com>
Don Jang [Wed, 15 Sep 2021 19:50:22 +0000 (12:50 -0700)]
[Static Runtime] Move MemoryPlanner out into memory_planner.cpp (#65011)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65011
This change moves `MemoryPlanner` out of impl.cpp into memory_planner.cpp.
`MemoryPlanner` performs an independent sub-task of static analysis of a graph, and creating memory planning, and allocating/deallocating managed Tensors.
This change will reduce merge conflicts as I work on MemoryPlanner more actively for output Tensor support.
Test Plan: N/A
Reviewed By: mikeiovine
Differential Revision:
D30883290
fbshipit-source-id:
a37570f8d9430224a6987d2190bcf81cf875043d
Kiuk Chung [Wed, 15 Sep 2021 19:48:28 +0000 (12:48 -0700)]
(torch.distributed.elastic) properly format traceback on error (#65041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65041
Fixes a bug introduced in https://github.com/pytorch/pytorch/pull/64036 where the traceback of the error handler is printed out rather than the traceback of the actual exception.
Fixes https://github.com/pytorch/pytorch/issues/60910
Closes https://github.com/pytorch/pytorch/issues/60910
BEFORE (note that the `py_callstack` is NOT the traceback of the RuntimeError):
```
**************************************************************************************************************************************************************************************************************************************************
run_script_path FAILED
==================================================================================================================================================================================================================================================
Root Cause:
[0]:
time: 2021-09-14_22:01:06
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 1092727)
error_file: /tmp/torchelastic_aeyvjbpe/none_8zuih7tj/attempt_0/0/error.json
msg:
{
"message": "RuntimeError: rasing error since --throw was specified",
"extraInfo": {
"py_callstack": [
" File \"<string>\", line 1, in <module>\n",
" File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/spawn.py\", line 116, in spawn_main\n exitcode = _main(fd, parent_sentinel)\n",
" File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/spawn.py\", line 129, in _main\n return self._bootstrap(parent_sentinel)\n",
" File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/process.py\", line 315, in _bootstrap\n self.run()\n",
" File \"/usr/local/fbcode/platform009/lib/python3.8/multiprocessing/process.py\", line 108, in run\n self._target(*self._args, **self._kwargs)\n",
" File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/multiprocessing/spawn.py\", line 59, in _wrap\n fn(i, *args)\n",
" File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/api.py\", line 382, in _wrap\n ret = record(fn)(*args_)\n",
" File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py\", line 373, in wrapper\n error_handler.record_exception(e)\n",
" File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 86, in record_exception\n _write_error(e, self._get_error_file_path())\n",
" File \"/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/error_handler.py\", line 26, in _write_error\n \"py_callstack\": traceback.format_stack(),\n"
],
"timestamp": "
1631682066"
}
}
==================================================================================================================================================================================================================================================
Other Failures:
<NO_OTHER_FAILURES>
**************************************************************************************************************************************************************************************************************************************************
```
AFTER (note the traceback is the traceback of the RuntimeError):
```
********************************************************************************
run_script_path FAILED
================================================================================
Root Cause:
[0]:
time: 2021-09-14_21:49:25
rank: 0 (local_rank: 0)
exitcode: 1 (pid: 1014681)
error_file: /tmp/torchelastic_q0zods2c/none_qwmz5dgj/attempt_0/0/error.json
msg: Traceback (most recent call last):
File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper
return f(*args, **kwargs)
File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/run.py", line 671, in run_script_path
runpy.run_path(sys.argv[0], run_name="__main__")
File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 265, in run_path
return _run_module_code(code, init_globals, run_name,
File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/usr/local/fbcode/platform009/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/kiuk/tmp/test.py", line 55, in <module>
main()
File "/data/users/kiuk/fbsource/fbcode/buck-out/dev/gen/caffe2/run#link-tree/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper
return f(*args, **kwargs)
File "/home/kiuk/tmp/test.py", line 25, in main
raise RuntimeError("rasing error since --throw was specified")
RuntimeError: rasing error since --throw was specified
================================================================================
Other Failures:
<NO_OTHER_FAILURES>
********************************************************************************
```
Test Plan:
(see summary for before and after)
`test.py` contents:
```
import argparse
import os
import sys
import torch
import torch.distributed as dist
import torch.nn.functional as F
from torch.distributed.elastic.multiprocessing.errors import record
def parse_args(argv):
parser = argparse.ArgumentParser(description="test script")
parser.add_argument("--init_method", type=str, default="env://")
parser.add_argument("--backend", type=str, default="gloo")
parser.add_argument("--throw", action="store_true", default=False)
parser.add_argument("--exit", action="store_true", default=False)
return parser.parse_args()
record
def main():
args = parse_args(sys.argv[1:])
if args.throw:
raise RuntimeError("rasing error since --throw was specified")
if args.exit:
sys.exit(1)
init_method=args.init_method
backend=args.backend
world_size = int(os.environ["WORLD_SIZE"])
rank = int(os.environ["RANK"])
print(f"initializing `{backend}` process group with rank={rank}, world_size={world_size} at {init_method}")
dist.init_process_group(
backend=backend,
init_method=init_method,
world_size=world_size,
rank=rank)
print(f"successfully initialized process group with rank={dist.get_rank()}, world_size={dist.get_world_size()}")
t = F.one_hot(torch.tensor(rank), num_classes=world_size)
dist.all_reduce(t)
derived_world_size = torch.sum(t).item()
if derived_world_size != world_size:
raise RuntimeError(f"derived world size: {derived_world_size} != actual world size: {world_size}")
else:
print(f"sucessfully derived world size: {derived_world_size} (expected: {world_size}). Exiting")
if __name__ == "__main__":
main()
```
run it as:
```
$ python -m torch.distributed.run --nproc_per_node 2 test.py --throw
```
Reviewed By: cbalioglu
Differential Revision:
D30953731
fbshipit-source-id:
bbea04c59c2aec58969cf44d8e3723d5f8abe8a8
soulitzer [Wed, 15 Sep 2021 19:43:54 +0000 (12:43 -0700)]
Remove `run_functional_checks` from `test_autograd` and create necessary OpInfos (#64993)
Summary:
OpInfo tracker: https://github.com/pytorch/pytorch/issues/54261
- Eliminate duplicated testing logic in test_autograd
- Moved tests that rely on this testing logic to use OpInfos
- `cat` already has OpInfo (no action needed)
- Created OpInfo for `block_diag` and `broadcast_tensors`
Running into some FX errors. Added op to skip-list and created an issue here: https://github.com/pytorch/pytorch/issues/64997
Both `block_diag` and `broadcast_tensors` are variadic, so skipping `test_variant_consistency_jit` (from comments on other OpInfos, it looks like JIT does not support variadic tensors)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64993
Reviewed By: jbschlosser
Differential Revision:
D30961736
Pulled By: soulitzer
fbshipit-source-id:
e169305384a683acae1178c4e12e9e214a67226a
Peter Bell [Wed, 15 Sep 2021 19:15:01 +0000 (12:15 -0700)]
Dispatch.h: Avoid including ivalue (#64165)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64165
Test Plan: Imported from OSS
Reviewed By: gchanan
Differential Revision:
D30728587
Pulled By: ezyang
fbshipit-source-id:
d0d2e97491d9d5e2d2fc2d6e51420a4467c1bba4
Ilqar Ramazanli [Wed, 15 Sep 2021 18:55:53 +0000 (11:55 -0700)]
To add state_dict and load_state_dict to SequentialLR (#65035)
Summary:
To add state_dict() and load_state_dict() methods to SequentialLR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65035
Reviewed By: prabhat00155, nateanl
Differential Revision:
D30958204
Pulled By: datumbox
fbshipit-source-id:
65114e1b07146526ae2680233f5cd42b2534d67a
Nikita Shulga [Wed, 15 Sep 2021 18:52:06 +0000 (11:52 -0700)]
[CircleCI] Disable pytorch_linux_xenial_cuda10_2 test jobs (#65071)
Summary:
As all of them has been migrated to GHA:
- pytorch_linux_pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_distributed_test -> "linux-xenial-cuda11.3-py3.6-gcc7 / test (distributed, 1, 1, linux.8xlarge.nvidia.gpu)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (default, 1, 2,
linux.8xlarge.nvidia.gpu)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (default, 2, 2,
linux.8xlarge.nvidia.gpu)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (multigpu, 1, 1,
linux.16xlarge.nvidia.gpu)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_nogpu_NO_AVX2_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (nogpu_NO_AVX2, 1, 1, linux.2xlarge)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_nogpu_NO_AVX_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (nogpu_NO_AVX, 1, 1, linux.2xlarge)"
- pytorch_linux_xenial_cuda10_2_cudnn7_py3_slow_test -> "linux-xenial-cuda10.2-py3.6-gcc7 / test (slow, 1, 1, linux.8xlarge.nvidia.gpu)"
"pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build" is still a holdout due to slow gradchecks
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65071
Reviewed By: driazati, seemethere, janeyx99
Differential Revision:
D30963413
Pulled By: malfet
fbshipit-source-id:
d9a5188ce7eb2f60547b91b854a5db83af2b10e7
Samuel Salas [Wed, 15 Sep 2021 18:49:28 +0000 (11:49 -0700)]
Starter Task 1 (#64927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64927
Mypy error corrections
Test Plan: Corrected mypy errors to make code less prone to bugs by modifying types or adding lines that avoid special undesired cases e.g. asserting a variable to not None.
Reviewed By: wushirong
Differential Revision:
D30901654
fbshipit-source-id:
daae8692603b8b38203a98f673c455749c2fb855
Kyle Chen [Wed, 15 Sep 2021 18:48:33 +0000 (11:48 -0700)]
[ROCm] Update CI images for ROCm 4.3.1 (#64610)
Summary:
Signed-off-by: Kyle Chen <kylechen@amd.com>
reference:
https://github.com/pytorch/pytorch/issues/58017
jithunnair-amd
jeffdaily
arindamroy-eng
cc jeffdaily sunway513 jithunnair-amd ROCmSupport
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64610
Reviewed By: seemethere
Differential Revision:
D30964582
Pulled By: malfet
fbshipit-source-id:
a8335d3d32d7f1557d3cf6cb055ad0f9c49ef7aa
Yukio Siraichi [Wed, 15 Sep 2021 18:05:14 +0000 (11:05 -0700)]
Port `all` and `any` full reductions to structured kernels. (#64642)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64642
Tracking issue: #55070
This PR creates out overloads for both `all` and `any` kernels (full reduction overload),
and ports them to structured kernels.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision:
D30867354
Pulled By: ezyang
fbshipit-source-id:
46bccaf6c94a09ed77cc6c724d1183c82f801751
Scott Wolchok [Wed, 15 Sep 2021 16:55:02 +0000 (09:55 -0700)]
[PyTorch] remove string_view::operator[] bounds check (#64670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64670
Bounds checking is not required for `std::string_view`, and the checking hoses performance for the following performance prototype diff.
ghstack-source-id:
138037531
Test Plan: CI
Reviewed By: ezyang, bhosmer
Differential Revision:
D30747515
fbshipit-source-id:
1f4374415a82dfdccce76ea2c6885c13cb93d369
Scott Wolchok [Wed, 15 Sep 2021 16:55:02 +0000 (09:55 -0700)]
[PyTorch][easy] Add cbegin/cend to SmallVector (#64682)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64682
Looks like it was forked from llvm before cbegin and cend existed.
ghstack-source-id:
138036981
Test Plan: CI
Reviewed By: dhruvbird
Differential Revision:
D30814434
fbshipit-source-id:
9740fa8d3df1c90b77298a95ab9f1d0cf8c90320
Scott Wolchok [Wed, 15 Sep 2021 16:55:02 +0000 (09:55 -0700)]
[PyTorch] Avoid extra std::vector in parseSchemaOrName (#64678)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64678
We know we only want one declaration, so let's not create an excess std::vector (and thus a heap allocation) for that.
ghstack-source-id:
138036978
Test Plan: CI
Reviewed By: dhruvbird, tugsbayasgalan
Differential Revision:
D30813785
fbshipit-source-id:
c67e0100cdef5d894282939fb6d39a57309bc240
Zafar Takhirov [Wed, 15 Sep 2021 16:37:36 +0000 (09:37 -0700)]
[quant] Removing unnecessary import from torch/quantization/quantize.py (#64910)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64910
This bled through from the original location. Removing it is not just refactoring, but also prevents potential recursive imports.
ghstack-source-id:
138112663
Test Plan: `buck test mode/dev //caffe2/test:quantization`
Reviewed By: vkuzo
Differential Revision:
D30882924
fbshipit-source-id:
8652a334a5186c635761ea5e50f978d1f1078c12
Don Jang [Wed, 15 Sep 2021 15:35:57 +0000 (08:35 -0700)]
[Static Runtime] Check if outputs of a node do not overlap with each other (#63013)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63013
This change enhances the current memory overlapping check to include outputs: the enhancement enforces a constraint that all outputs of a node should NOT overlap with each other since they are supposed to be update by a node at the same time, holding the node's outputs.
This check will detect a problem like T97393697 immediately in debug mode.
Test Plan:
- Added a unittest `ProcessedNode.VerifyMemoryOverlapWithOverlappingOutputs`
- Ran `inline_cvr` on ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench with this diff and confirmed that the checking condition holds true during the run.
Reviewed By: hlu1
Differential Revision:
D30211705
fbshipit-source-id:
994d8dace2422e2498e504eb61452a55739238c0
Jane Xu [Wed, 15 Sep 2021 15:28:00 +0000 (08:28 -0700)]
Forward fix SkipInfo missing mypy (#65063)
Summary:
Fixes #{issue number}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65063
Reviewed By: malfet
Differential Revision:
D30961556
Pulled By: janeyx99
fbshipit-source-id:
9618e12ba873fb48fe5c846a48d4560ad521eb3e
Hong Xu [Wed, 15 Sep 2021 15:09:49 +0000 (08:09 -0700)]
When test set_affinity, don't hardcode the CPU ID (#65042)
Summary:
The setaffinity test always fails when the number of CPUs is smaller
than 3. Changed the test to be dynamically based on the number of CPUs
of the system.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65042
Reviewed By: jbschlosser
Differential Revision:
D30960554
Pulled By: ejguan
fbshipit-source-id:
55ac12714b4b0964b48c3617b79a7a345d40ebce
Kevin Tse [Wed, 15 Sep 2021 14:32:45 +0000 (07:32 -0700)]
[DataPipe] Make TarArchiveReader and ZipArchiveReader accepts FileSream with attempt to close and additional warning (#64788)
Summary:
ghstack is not working for the second commit so I'm manually creating this PR for now. Please only look at changes related to the second commit in this PR (there is a PR for the first commit).
This PR removes TarArchiveReader's dependency on FileLoader DataPipe, by allowing it to use a IterDataPipe of path names as input rather than a tuple of path name and a stream.
It also adds additional tests to ensure that the DataPipe is functioning properly when it is read multiple times or reset half way through reading.
The whole stack fixes https://github.com/pytorch/pytorch/issues/64281 - issues related to unclosed buffer stream.
Stack:
* __->__ https://github.com/pytorch/pytorch/issues/64788
* https://github.com/pytorch/pytorch/issues/64786
cc VitalyFedyunin ejguan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64788
Reviewed By: jbschlosser, ejguan
Differential Revision:
D30901176
Pulled By: NivekT
fbshipit-source-id:
59746a8d0144fc6d3ce0feb2d76445b82e6d414e
Philip Meier [Wed, 15 Sep 2021 14:16:29 +0000 (07:16 -0700)]
add `OpInfo` for `torch.nn.functional.dropout` (#62315)
Summary:
Addresses facebookresearch/functorch#78.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62315
Reviewed By: mruberry
Differential Revision:
D30932765
Pulled By: zou3519
fbshipit-source-id:
481c67b59a966b4d640973d252b3e392d8db728e
Jongsoo Park [Wed, 15 Sep 2021 04:30:45 +0000 (21:30 -0700)]
[dnnlowp] reduce num of test cases to avoid time out (#64935)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64935
As title
Test Plan: CI
Reviewed By: dskhudia
Differential Revision:
D30889157
fbshipit-source-id:
316c808806b084bd2e44c56e1cdb61adf2369a9d
Joel Schlosser [Wed, 15 Sep 2021 02:51:32 +0000 (19:51 -0700)]
Generic test parametrization functionality (#60753)
Summary:
This PR plays around with implementation & usage of a `parametrize` decorator for test parametrization similar to `pytest.mark.parametrize`, based on previous work introducing a `_TestParametrizer` class. It works with the internal `DeviceTest` hierarchy & composes with `dtype`, `skip*`, and other decorators. Basic usage is demonstrated in `test/test_blah.py`:
```python
import unittest
from itertools import product
from torch.testing._internal.common_device_type import (
instantiate_device_type_tests, deviceCountAtLeast, ops)
from torch.testing._internal.common_methods_invocations import op_db
from torch.testing._internal.common_utils import (
TestCase, run_tests, parametrize, instantiate_parametrized_tests, subtest)
class TestBlah(TestCase):
parametrize("x", range(5))
def test_default_names(self, x):
print('Passed in:', x)
# Use default names but add an expected failure.
parametrize("x", [subtest(0, decorators=[unittest.expectedFailure]),
*range(1, 5)])
def test_default_names_expected_failure(self, x):
if x == 0:
raise RuntimeError('Boom')
print('Passed in:', x)
parametrize("bias", [False, True], name_fn=lambda b: 'bias' if b else 'no_bias')
def test_custom_names(self, bias):
print('Passed in:', bias)
parametrize("bias", [subtest(True, name='bias'),
subtest(False, name='no_bias')])
def test_custom_names_alternate(self, bias):
print('Passed in:', bias)
parametrize("x,y", [(1, 2), (1, 3), (1, 4)])
def test_two_things_default_names(self, x, y):
print('Passed in:', x, y)
parametrize("x", [1, 2, 3])
parametrize("y", [4, 5, 6])
def test_two_things_composition(self, x, y):
print('Passed in:', x, y)
parametrize("x", [subtest(0, decorators=[unittest.expectedFailure]),
*range(1, 3)])
parametrize("y", [4, 5, subtest(6, decorators=[unittest.expectedFailure])])
def test_two_things_composition_expected_failure(self, x, y):
if x == 0 or y == 6:
raise RuntimeError('Boom')
print('Passed in:', x, y)
parametrize("x", [1, 2])
parametrize("y", [3, 4])
parametrize("z", [5, 6])
def test_three_things_composition(self, x, y, z):
print('Passed in:', x, y, z)
parametrize("x", [1, 2], name_fn=str)
parametrize("y", [3, 4], name_fn=str)
parametrize("z", [5, 6], name_fn=str)
def test_three_things_composition_custom_names(self, x, y, z):
print('Passed in:', x, y, z)
parametrize("x,y", product(range(2), range(3)))
def test_two_things_product(self, x, y):
print('Passed in:', x, y)
parametrize("x,y", [subtest((1, 2), name='double'),
subtest((1, 3), name='triple'),
subtest((1, 4), name='quadruple')])
def test_two_things_custom_names(self, x, y):
print('Passed in:', x, y)
parametrize("x,y", [(1, 2), (1, 3), (1, 4)], name_fn=lambda x, y: '{}_{}'.format(x, y))
def test_two_things_custom_names_alternate(self, x, y):
print('Passed in:', x, y)
class TestDeviceBlah(TestCase):
parametrize("x", range(10))
def test_default_names(self, device, x):
print('Passed in:', device, x)
parametrize("x,y", [(1, 2), (3, 4), (5, 6)])
def test_two_things(self, device, x, y):
print('Passed in:', device, x, y)
deviceCountAtLeast(1)
def test_multiple_devices(self, devices):
print('Passed in:', devices)
ops(op_db)
parametrize("flag", [False, True], lambda f: 'flag_enabled' if f else 'flag_disabled')
def test_op_parametrized(self, device, dtype, op, flag):
print('Passed in:', device, dtype, op, flag)
instantiate_parametrized_tests(TestBlah)
instantiate_device_type_tests(TestDeviceBlah, globals())
if __name__ == '__main__':
run_tests()
```
Generated tests:
```
TestBlah.test_custom_names_alternate_bias
TestBlah.test_custom_names_alternate_no_bias
TestBlah.test_custom_names_bias
TestBlah.test_custom_names_no_bias
TestBlah.test_default_names_expected_failure_x_0
TestBlah.test_default_names_expected_failure_x_1
TestBlah.test_default_names_expected_failure_x_2
TestBlah.test_default_names_expected_failure_x_3
TestBlah.test_default_names_expected_failure_x_4
TestBlah.test_default_names_x_0
TestBlah.test_default_names_x_1
TestBlah.test_default_names_x_2
TestBlah.test_default_names_x_3
TestBlah.test_default_names_x_4
TestBlah.test_three_things_composition_custom_names_1_3_5
TestBlah.test_three_things_composition_custom_names_1_3_6
TestBlah.test_three_things_composition_custom_names_1_4_5
TestBlah.test_three_things_composition_custom_names_1_4_6
TestBlah.test_three_things_composition_custom_names_2_3_5
TestBlah.test_three_things_composition_custom_names_2_3_6
TestBlah.test_three_things_composition_custom_names_2_4_5
TestBlah.test_three_things_composition_custom_names_2_4_6
TestBlah.test_three_things_composition_x_1_y_3_z_5
TestBlah.test_three_things_composition_x_1_y_3_z_6
TestBlah.test_three_things_composition_x_1_y_4_z_5
TestBlah.test_three_things_composition_x_1_y_4_z_6
TestBlah.test_three_things_composition_x_2_y_3_z_5
TestBlah.test_three_things_composition_x_2_y_3_z_6
TestBlah.test_three_things_composition_x_2_y_4_z_5
TestBlah.test_three_things_composition_x_2_y_4_z_6
TestBlah.test_two_things_composition_expected_failure_x_0_y_4
TestBlah.test_two_things_composition_expected_failure_x_0_y_5
TestBlah.test_two_things_composition_expected_failure_x_0_y_6
TestBlah.test_two_things_composition_expected_failure_x_1_y_4
TestBlah.test_two_things_composition_expected_failure_x_1_y_5
TestBlah.test_two_things_composition_expected_failure_x_1_y_6
TestBlah.test_two_things_composition_expected_failure_x_2_y_4
TestBlah.test_two_things_composition_expected_failure_x_2_y_5
TestBlah.test_two_things_composition_expected_failure_x_2_y_6
TestBlah.test_two_things_composition_x_1_y_4
TestBlah.test_two_things_composition_x_1_y_5
TestBlah.test_two_things_composition_x_1_y_6
TestBlah.test_two_things_composition_x_2_y_4
TestBlah.test_two_things_composition_x_2_y_5
TestBlah.test_two_things_composition_x_2_y_6
TestBlah.test_two_things_composition_x_3_y_4
TestBlah.test_two_things_composition_x_3_y_5
TestBlah.test_two_things_composition_x_3_y_6
TestBlah.test_two_things_custom_names_alternate_1_2
TestBlah.test_two_things_custom_names_alternate_1_3
TestBlah.test_two_things_custom_names_alternate_1_4
TestBlah.test_two_things_custom_names_double
TestBlah.test_two_things_custom_names_quadruple
TestBlah.test_two_things_custom_names_triple
TestBlah.test_two_things_default_names_x_1_y_2
TestBlah.test_two_things_default_names_x_1_y_3
TestBlah.test_two_things_default_names_x_1_y_4
TestBlah.test_two_things_product_x_0_y_0
TestBlah.test_two_things_product_x_0_y_1
TestBlah.test_two_things_product_x_0_y_2
TestBlah.test_two_things_product_x_1_y_0
TestBlah.test_two_things_product_x_1_y_1
TestBlah.test_two_things_product_x_1_y_2
TestDeviceBlahCPU.test_default_names_x_0_cpu
TestDeviceBlahCPU.test_default_names_x_1_cpu
TestDeviceBlahCPU.test_default_names_x_2_cpu
TestDeviceBlahCPU.test_default_names_x_3_cpu
TestDeviceBlahCPU.test_default_names_x_4_cpu
TestDeviceBlahCPU.test_default_names_x_5_cpu
TestDeviceBlahCPU.test_default_names_x_6_cpu
TestDeviceBlahCPU.test_default_names_x_7_cpu
TestDeviceBlahCPU.test_default_names_x_8_cpu
TestDeviceBlahCPU.test_default_names_x_9_cpu
TestDeviceBlahCPU.test_multiple_devices_cpu
TestDeviceBlahCPU.test_op_parametrized_<opname>_<variant>_cpu_uint8_flag_enabled_cpu
TestDeviceBlahCPU.test_two_things_x_1_y_2_cpu
TestDeviceBlahCPU.test_two_things_x_3_y_4_cpu
TestDeviceBlahCPU.test_two_things_x_5_y_6_cpu
TestDeviceBlahMETA.test_default_names_x_0_meta
TestDeviceBlahMETA.test_default_names_x_1_meta
TestDeviceBlahMETA.test_default_names_x_2_meta
TestDeviceBlahMETA.test_default_names_x_3_meta
TestDeviceBlahMETA.test_default_names_x_4_meta
TestDeviceBlahMETA.test_default_names_x_5_meta
TestDeviceBlahMETA.test_default_names_x_6_meta
TestDeviceBlahMETA.test_default_names_x_7_meta
TestDeviceBlahMETA.test_default_names_x_8_meta
TestDeviceBlahMETA.test_default_names_x_9_meta
TestDeviceBlahMETA.test_multiple_devices_meta
TestDeviceBlahMETA.test_op_parametrized_<opname>_<variant>_meta_uint8_flag_enabled_meta
TestDeviceBlahMETA.test_two_things_x_1_y_2_meta
TestDeviceBlahMETA.test_two_things_x_3_y_4_meta
TestDeviceBlahMETA.test_two_things_x_5_y_6_meta
```
Caveats:
* `parametrize` decorators cannot be "stacked" yet; each one overwrites the previous. This will change to either:
* Allow stacking of multiple decorators
* Error out with a nice error message if multiple decorators are specified
The PR introduces `instantiate_parametrized_tests()` in addition to `instantiate_device_type_tests()`. The former should be used for non-device-specific tests, and the latter should be used for device-specific tests, as usual. Both of these support the `parametrize` decorator. Only the latter supports the `ops` decorator (no change here- this was already the case).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60753
Reviewed By: saketh-are
Differential Revision:
D30606615
Pulled By: jbschlosser
fbshipit-source-id:
a34f36d643f68a6e221f419d9bb3e1ae1d84dd65
Sangbaek Park [Wed, 15 Sep 2021 02:33:27 +0000 (19:33 -0700)]
[vulkan] Use volk to load vulkan libraries and fix Windows build errors (#64988)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64988
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64968
The current wrapper (provided by [Vulkan-Tools](https://github.com/KhronosGroup/Vulkan-Tools/tree/master/common)) can't handle dynamically loading Vulkan on Windows/Mac. Therefore, we can bring in [volk](https://github.com/zeux/volk) to load the vulkan libraries for other platforms.
1. Use `volk` with `link_style="static"` only if Windows. Use `vulkan_wrapper` for all others (temporary solution)
2. Make DotSlash work on Windows when resolving glslc path
Test Plan:
For Android:
```
cd ~/fbsource
buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
cd -
```
For Mac:
```
buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac
./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64
```
On Local OSS repo with `pr/64988` branch:
The build and test are fine. Note that `VulkanAPITest.log_softmax()` has been broken for the past month. Ivan will take a look at when he is available.
Build: `BUILD_TEST=1 USE_VULKAN=1 USE_VULKAN_SHADERC_RUNTIME=1 USE_VULKAN_WRAPPER=0 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install`
Test: `$PYTORCH_ROOT/build/bin/vulkan_api_test /data/local/tmp`
```
Running main() from ../third_party/googletest/googletest/src/gtest_main.cc
[==========] Running 69 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 69 tests from VulkanAPITest
[ RUN ] VulkanAPITest.adaptive_avg_pool2d
[ OK ] VulkanAPITest.adaptive_avg_pool2d (228 ms)
[ RUN ] VulkanAPITest.add
[ OK ] VulkanAPITest.add (51 ms)
[ RUN ] VulkanAPITest.add_broadcast0
[ OK ] VulkanAPITest.add_broadcast0 (13 ms)
[ RUN ] VulkanAPITest.add_broadcast1
[ OK ] VulkanAPITest.add_broadcast1 (9 ms)
[ RUN ] VulkanAPITest.add_broadcast2
[ OK ] VulkanAPITest.add_broadcast2 (9 ms)
[ RUN ] VulkanAPITest.add_
[ OK ] VulkanAPITest.add_ (60 ms)
[ RUN ] VulkanAPITest.add_broadcast0_
[ OK ] VulkanAPITest.add_broadcast0_ (10 ms)
[ RUN ] VulkanAPITest.add_broadcast1_
[ OK ] VulkanAPITest.add_broadcast1_ (1 ms)
[ RUN ] VulkanAPITest.add_scalar
[ OK ] VulkanAPITest.add_scalar (24 ms)
[ RUN ] VulkanAPITest.add_scalar_
[ OK ] VulkanAPITest.add_scalar_ (8 ms)
[ RUN ] VulkanAPITest.addmm
[ OK ] VulkanAPITest.addmm (22 ms)
[ RUN ] VulkanAPITest.addmm_expand
[ OK ] VulkanAPITest.addmm_expand (12 ms)
[ RUN ] VulkanAPITest.avg_pool2d
[ OK ] VulkanAPITest.avg_pool2d (9 ms)
[ RUN ] VulkanAPITest.clamp
[ OK ] VulkanAPITest.clamp (92 ms)
[ RUN ] VulkanAPITest.clamp_
[ OK ] VulkanAPITest.clamp_ (60 ms)
[ RUN ] VulkanAPITest.conv2d
[ OK ] VulkanAPITest.conv2d (15 ms)
[ RUN ] VulkanAPITest.conv2d_dw
[ OK ] VulkanAPITest.conv2d_dw (15 ms)
[ RUN ] VulkanAPITest.conv2d_pw
[ OK ] VulkanAPITest.conv2d_pw (34 ms)
[ RUN ] VulkanAPITest.conv2d_winograd
[ OK ] VulkanAPITest.conv2d_winograd (10 ms)
[ RUN ] VulkanAPITest.copy
[ OK ] VulkanAPITest.copy (1 ms)
[ RUN ] VulkanAPITest.div
[ OK ] VulkanAPITest.div (32 ms)
[ RUN ] VulkanAPITest.div_broadcast0
[ OK ] VulkanAPITest.div_broadcast0 (11 ms)
[ RUN ] VulkanAPITest.div_broadcast1
[ OK ] VulkanAPITest.div_broadcast1 (9 ms)
[ RUN ] VulkanAPITest.div_broadcast2
[ OK ] VulkanAPITest.div_broadcast2 (7 ms)
[ RUN ] VulkanAPITest.div_
[ OK ] VulkanAPITest.div_ (46 ms)
[ RUN ] VulkanAPITest.div_broadcast0_
[ OK ] VulkanAPITest.div_broadcast0_ (9 ms)
[ RUN ] VulkanAPITest.div_broadcast1_
[ OK ] VulkanAPITest.div_broadcast1_ (2 ms)
[ RUN ] VulkanAPITest.div_scalar
[ OK ] VulkanAPITest.div_scalar (95 ms)
[ RUN ] VulkanAPITest.div_scalar_
[ OK ] VulkanAPITest.div_scalar_ (18 ms)
[ RUN ] VulkanAPITest.empty
[ OK ] VulkanAPITest.empty (0 ms)
[ RUN ] VulkanAPITest.hardsigmoid
[ OK ] VulkanAPITest.hardsigmoid (76 ms)
[ RUN ] VulkanAPITest.hardsigmoid_
[ OK ] VulkanAPITest.hardsigmoid_ (80 ms)
[ RUN ] VulkanAPITest.hardshrink
[ OK ] VulkanAPITest.hardshrink (630 ms)
[ RUN ] VulkanAPITest.hardshrink_
[ OK ] VulkanAPITest.hardshrink_ (573 ms)
[ RUN ] VulkanAPITest.leaky_relu
[ OK ] VulkanAPITest.leaky_relu (271 ms)
[ RUN ] VulkanAPITest.leaky_relu_
[ OK ] VulkanAPITest.leaky_relu_ (254 ms)
[ RUN ] VulkanAPITest.hardswish
[ OK ] VulkanAPITest.hardswish (83 ms)
[ RUN ] VulkanAPITest.hardswish_
[ OK ] VulkanAPITest.hardswish_ (72 ms)
[ RUN ] VulkanAPITest.max_pool2d
[ OK ] VulkanAPITest.max_pool2d (16 ms)
[ RUN ] VulkanAPITest.mean
[ OK ] VulkanAPITest.mean (17 ms)
[ RUN ] VulkanAPITest.mean2d
[ OK ] VulkanAPITest.mean2d (20 ms)
[ RUN ] VulkanAPITest.mm
[ OK ] VulkanAPITest.mm (12 ms)
[ RUN ] VulkanAPITest.mul
[ OK ] VulkanAPITest.mul (28 ms)
[ RUN ] VulkanAPITest.mul_broadcast0
[ OK ] VulkanAPITest.mul_broadcast0 (9 ms)
[ RUN ] VulkanAPITest.mul_broadcast1
[ OK ] VulkanAPITest.mul_broadcast1 (9 ms)
[ RUN ] VulkanAPITest.mul_broadcast2
[ OK ] VulkanAPITest.mul_broadcast2 (9 ms)
[ RUN ] VulkanAPITest.mul_
[ OK ] VulkanAPITest.mul_ (43 ms)
[ RUN ] VulkanAPITest.mul_broadcast0_
[ OK ] VulkanAPITest.mul_broadcast0_ (8 ms)
[ RUN ] VulkanAPITest.mul_broadcast1_
[ OK ] VulkanAPITest.mul_broadcast1_ (1 ms)
[ RUN ] VulkanAPITest.mul_scalar
[ OK ] VulkanAPITest.mul_scalar (64 ms)
[ RUN ] VulkanAPITest.mul_scalar_
[ OK ] VulkanAPITest.mul_scalar_ (17 ms)
[ RUN ] VulkanAPITest.reflection_pad2d
[ OK ] VulkanAPITest.reflection_pad2d (7 ms)
[ RUN ] VulkanAPITest.reshape
[ OK ] VulkanAPITest.reshape (73 ms)
[ RUN ] VulkanAPITest.reshape_
[ OK ] VulkanAPITest.reshape_ (41 ms)
[ RUN ] VulkanAPITest.sigmoid
[ OK ] VulkanAPITest.sigmoid (81 ms)
[ RUN ] VulkanAPITest.sigmoid_
[ OK ] VulkanAPITest.sigmoid_ (68 ms)
[ RUN ] VulkanAPITest.softmax
[ OK ] VulkanAPITest.softmax (28 ms)
[ RUN ] VulkanAPITest.log_softmax
Max Diff allowed: 5.87862e-05
../aten/src/ATen/test/vulkan_api_test.cpp:1470: Failure
Value of: check
Actual: false
Expected: true
[ FAILED ] VulkanAPITest.log_softmax (19 ms)
[ RUN ] VulkanAPITest.tanh
[ OK ] VulkanAPITest.tanh (63 ms)
[ RUN ] VulkanAPITest.tanh_
[ OK ] VulkanAPITest.tanh_ (68 ms)
[ RUN ] VulkanAPITest.sub
[ OK ] VulkanAPITest.sub (28 ms)
[ RUN ] VulkanAPITest.sub_broadcast0
[ OK ] VulkanAPITest.sub_broadcast0 (9 ms)
[ RUN ] VulkanAPITest.sub_broadcast1
[ OK ] VulkanAPITest.sub_broadcast1 (9 ms)
[ RUN ] VulkanAPITest.sub_broadcast2
[ OK ] VulkanAPITest.sub_broadcast2 (8 ms)
[ RUN ] VulkanAPITest.sub_
[ OK ] VulkanAPITest.sub_ (43 ms)
[ RUN ] VulkanAPITest.sub_broadcast0_
[ OK ] VulkanAPITest.sub_broadcast0_ (10 ms)
[ RUN ] VulkanAPITest.sub_broadcast1_
[ OK ] VulkanAPITest.sub_broadcast1_ (2 ms)
[ RUN ] VulkanAPITest.upsample_nearest2d
[ OK ] VulkanAPITest.upsample_nearest2d (5 ms)
[ RUN ] VulkanAPITest.mobilenetv2
[ OK ] VulkanAPITest.mobilenetv2 (82 ms)
[----------] 69 tests from VulkanAPITest (3885 ms total)
[----------] Global test environment tear-down
[==========] 69 tests from 1 test suite ran. (3885 ms total)
[ PASSED ] 68 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] VulkanAPITest.log_softmax
1 FAILED TEST
```
Differential Revision:
D30925995
fbshipit-source-id:
1b1b7f7f22090064424a5379d2f0559d0da7846a
Kshiteej K [Wed, 15 Sep 2021 01:17:53 +0000 (18:17 -0700)]
[fix] don't expose unique_dim in torch (#63080)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/62793
This is mostly a quick fix. I think the more correct fix could be updating `unique_dim` to `_unique_dim` which could be BC-breaking for C++ users (� maybe). Maybe something else I am missing.
~~Not sure how to add a test for it.~~ Have tested it locally.
We can add a test like following. Tested this locally, it fails currently but passes with the fix.
```python
def test_wildcard_import(self):
exec('from torch import *')
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63080
Reviewed By: gchanan
Differential Revision:
D30738711
Pulled By: zou3519
fbshipit-source-id:
b86d0190e45ba0b49fd2cffdcfd2e3a75cc2a35e
Michael Carilli [Wed, 15 Sep 2021 00:52:27 +0000 (17:52 -0700)]
[CUDA graphs] moves memory sharing intro paragraph (#64996)
Summary:
Puts memory sharing intro under Sharing memory... header, where it should have been all along.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64996
Reviewed By: mruberry
Differential Revision:
D30948619
Pulled By: ngimel
fbshipit-source-id:
5d9dd267b34e9d3fc499d4738377b58a22da1dc2
Supriya Rao [Wed, 15 Sep 2021 00:32:15 +0000 (17:32 -0700)]
Revert
D30558877: Ported std/var to ReductionOpInfo and minimum/maximum to BinaryUfuncInfo
Test Plan: revert-hammer
Differential Revision:
D30558877 (https://github.com/pytorch/pytorch/commit/
382e008fbf5cc91c283fc902bb0dd6cb7d4bbfda)
Original commit changeset:
3e62ff24a935
fbshipit-source-id:
3b9f03c1f43c6d5f2738ed139d0236f2ded78dbf
Yi Wang [Tue, 14 Sep 2021 23:35:32 +0000 (16:35 -0700)]
[Model Averaging] Simplify PostLocalSGD Optimizer API (#64885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64885
1) The constructor accepts a local optimizer instance instead of the inputs of local optimizer constructor and the class type.
2) The parameters are read from local optimizer's `param_groups` instead of a separate input.
Proposal: https://github.com/pytorch/pytorch/issues/59699
ghstack-source-id:
137865867
Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity
Reviewed By: rohan-varma
Differential Revision:
D30888794
fbshipit-source-id:
21261b480f6bbb9b2333426020e3f350da3f73c2
Heitor Schueroff [Tue, 14 Sep 2021 23:16:47 +0000 (16:16 -0700)]
Ported std/var to ReductionOpInfo and minimum/maximum to BinaryUfuncInfo (#63978)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63978
Test Plan: Imported from OSS
Reviewed By: saketh-are
Differential Revision:
D30558877
Pulled By: heitorschueroff
fbshipit-source-id:
3e62ff24a935784fc93a76a0f46a1deb060ba680
Erjia Guan [Tue, 14 Sep 2021 22:44:57 +0000 (15:44 -0700)]
[DataPipe] Improve Mapper to accept input/output index when apply fn (#64951)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64951
Test Plan: Imported from OSS
Reviewed By: VitalyFedyunin
Differential Revision:
D30910035
Pulled By: ejguan
fbshipit-source-id:
d687fe10939920a3617a60552fe743e8526438a0
Jerry Zhang [Tue, 14 Sep 2021 22:26:03 +0000 (15:26 -0700)]
[quant][tensorrt] Add tensorrt backend config (#64623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64623
The config api will change, but we'll add configs gradually for TensorRT to unblock experimentation
Test Plan:
python torch/fx/experimental/fx2trt/example/unittests.py
Imported from OSS
Reviewed By: vkuzo
Differential Revision:
D30800474
fbshipit-source-id:
3c4640de1205a0f19b62943ab84f386d80394ec2
Scott Wolchok [Tue, 14 Sep 2021 21:18:55 +0000 (14:18 -0700)]
[PyTorch] Add c10::hash<c10::ArrayRef<T>> (#64277)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64277
Just moved the vector implementation to ArrayRef and re-implemented the former using the latter.
ghstack-source-id:
137978947
Test Plan: existing CI
Reviewed By: dhruvbird
Differential Revision:
D30647666
fbshipit-source-id:
c0f4f06c348d36882ec0db802be44d8c7749562f
Scott Wolchok [Tue, 14 Sep 2021 21:18:55 +0000 (14:18 -0700)]
[PyTorch] Add OpCode cache in ByteCodeDeserializer (#64110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64110
As the code comment says, we can exploit pickler string interning to accelerate OpCode parsing. No more strcmp!
ghstack-source-id:
137978946
Test Plan:
Pixel 3 before: https://www.internalfb.com/intern/aibench/details/
591414145082422
Pixel 3 after: https://www.internalfb.com/intern/aibench/details/
484557404703261
new mean is 292 ms, down from 302 ms.
Reviewed By: dhruvbird
Differential Revision:
D30615052
fbshipit-source-id:
9707625e778388a7920ab72704d71ad57ddaac17
Scott Wolchok [Tue, 14 Sep 2021 21:18:55 +0000 (14:18 -0700)]
[PyTorch] Remove implicit conversion from Tuple to vector reference (#63993)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63993
This seems to be unused, and it's pretty scary.
ghstack-source-id:
137978949
Test Plan: CI
Reviewed By: lw
Differential Revision:
D30560441
fbshipit-source-id:
08b7ce971fd1e2dbeddbf37b02413fef513b4753
Scott Wolchok [Tue, 14 Sep 2021 21:18:55 +0000 (14:18 -0700)]
[PyTorch] Fix SourceRangeDeserializer vector copy (#64031)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64031
More copies of tuple elements.
ghstack-source-id:
137978948
Test Plan:
Pixel 3 before: https://our.intern.facebook.com/intern/aibench/details/
724509739115867
Pixel 3 after: https://our.intern.facebook.com/intern/aibench/details/
232361457767293
Top-line number doesn't seem to have moved, but we can see that the vector copy disappeared in the flame graph.
Reviewed By: raziel
Differential Revision:
D30559545
fbshipit-source-id:
e5343abae96b8e80e0ccec482ad316884ae231ea
Shiyan Deng [Tue, 14 Sep 2021 19:25:45 +0000 (12:25 -0700)]
[fx2trt] fix elementwise op converter with one operand being a literal and has different type (#65004)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65004
If we have some code like `torch.add(x, 1)` and x is a float tensor then in conversion things would falling apart because currently we will add a constant layer of int32 dtype for `1` but we actually need float dtype.
This diff adds an arg to `get_trt_tensor` which specify the dtype of the constant layer we would created.
Also, start to add doc string for functions.
Reviewed By: yinghai
Differential Revision:
D30852156
fbshipit-source-id:
650ce72d2794093a4616e640ea503dcc1c6b2bc4
Salil Desai [Tue, 14 Sep 2021 19:09:45 +0000 (12:09 -0700)]
[PyTorch Edge][Model Loading] Operator Call De-dup at TorchScript Serialization Level [2/2] (#64269)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64269
Revert changes in
D29826210 (https://github.com/pytorch/pytorch/commit/
693d8f2f0767413bb995b895fccad87dfd4f05a7) (we don't need operator lambda caching since there aren't duplicate operators anymore)
This diff stack results in an additional approx 12% speedup in model loading time (from 229ms to 200ms) when run against an 87MB speech model that jiatongzhou provided.
ghstack-source-id:
138014904
Test Plan:
**Speech Transducer v25 model (as in
D29826210 (https://github.com/pytorch/pytorch/commit/
693d8f2f0767413bb995b895fccad87dfd4f05a7))**
|| Before | After |
|Load Time|[229ms](https://www.internalfb.com/intern/aibench/details/
160889436133243)|[200ms](https://www.internalfb.com/intern/aibench/details/
837884532607514)|
|Save File Size|[86.23 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=
658544950)|[86.1 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=
658554403)|
The "after" flamegraph shows significantly less time is spent on ```append_operator``` than before.
Steps
- Check out desired commit in devserver (base branch or this diff)
- ```buck build bento/kernels:bento_kernel_pytorch```
- Use N1094068 with pytorch_local kernel to save model for lite interpreter
- Edit ```aibench/specifications/models/pytorch/speech_transducer/v25.json ``` to have new model location and md5
- ```buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote ```
**Test that saving a model with de-dup ops doesn't change its output**
https://www.internalfb.com/intern/anp/view/?id=1137434
Reviewed By: iseeyuan
Differential Revision:
D30615710
fbshipit-source-id:
bb4052f0f16eccab386585e94411056f94bce43c