review.tizen.org Git - platform/upstream/pytorch.git/log

projects / platform / upstream / pytorch.git / log

Akshit Khurana [Mon, 23 Aug 2021 23:33:07 +0000 (16:33 -0700)]

Fix typo in NNAPI tests (#63797)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63797

nnapi memory format test has a typo

Test Plan:
pytest test/test_nnapi.py::TestNNAPI

Imported from OSS

Reviewed By: Amyh11325

Differential Revision: D30495473

fbshipit-source-id: 8edad7c01a080847a64a2797e077ec4d6077552a

commit | commitdiff | tree

Don Jang [Mon, 23 Aug 2021 23:20:27 +0000 (16:20 -0700)]

[Static Runtime] Add an out variant op for aten::abs (#63675)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63675

This change adds an out variant implementation for `aten::abs`.

Test Plan:
- Observed `V0820 14:14:08.880342 101788 impl.cpp:1394] Switch to out variant for node: %3 : Tensor = aten::abs(%a.1)`

- Perf impact: TBD

Reviewed By: hlu1

Differential Revision: D30461317

fbshipit-source-id: 0c0230bd40afe463ae1ccb222c2a1207ebcf4191

commit | commitdiff | tree

Rong Rong (AI Infra) [Mon, 23 Aug 2021 22:36:59 +0000 (15:36 -0700)]

fix git diff issue (#63408)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/60111, ideally we should merge this before https://github.com/pytorch/pytorch/issues/63360 but we can also test this with https://github.com/pytorch/pytorch/issues/63360 easily.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63408

Test Plan:
- This is conform working with local test.sh run by setting PR_NUMBER
- should be validated by GHA CI as well

Concern:
- currently GHA CI is running into proxy 403 rate-limit exceeded issue consistently. However the worst case is not generating any git diff files, which is going to be exactly the same as current behavior.
- depends on https://github.com/pytorch/pytorch/issues/63770.

Reviewed By: driazati, janeyx99

Differential Revision: D30489355

Pulled By: walterddr

fbshipit-source-id: a638b7ae5820f29a7aca6cc40ff390ab253cb174

commit | commitdiff | tree

Eli Uriegas [Mon, 23 Aug 2021 22:02:10 +0000 (15:02 -0700)]

.github: Add ec2 information as a step (#63784)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63784

Also creates the common.yml.j2 file as a place to store common code
amongst the templates

Should look like:
![image](https://user-images.githubusercontent.com/1700823/130495226-f18b8c0f-1ea7-4097-8bbb-e998fabb71f2.png)

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS

Reviewed By: malfet, driazati

Differential Revision: D30490682

Pulled By: seemethere

fbshipit-source-id: 18028b4acff938ef54cd6e4877561b2d830a11cf

commit | commitdiff | tree

Erjia Guan [Mon, 23 Aug 2021 21:32:56 +0000 (14:32 -0700)]

Rename DataPipe to Op-er (#63325)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63325

Rename each DataPipe to an operation name ending with er. Functional API should remain `verb` such as `read_from_tar` , `shuffle`, ... (Discussed in [here](https://github.com/facebookexternal/torchdata/pull/97#discussion_r688553905))
- Batch -> Batcher
- Collate -> Collator
- Concat -> Concater
- GroupByKey - > ByKeyGrouper ?
- ListDirFiles -> FileLister
- LoadFilesFromDisk -> FileLoader
- Map -> Mapper
- ReadFilesFromTar -> TarArchiveReader
- ReadFilesFromZip -> ZipArchiveReader
- ReadLinesFromFile -> LineReader
- Shuffle -> Shuffler
- ToBytes -> StreamReader
- Transforms -> Transformer
- Zip -> Zipper

Let me know if you have better name for each DataPipe

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30466950

Pulled By: ejguan

fbshipit-source-id: 72909dca7b3964ab83b965891f96cc1ecf62d049

commit | commitdiff | tree

Zeina Migeed [Mon, 23 Aug 2021 21:09:10 +0000 (14:09 -0700)]

Add equality constraints for some acc opeartions for symbolic inference (#63689)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63689

Test Plan:
buck run mode/opt-clang caffe2/torch/fb/model_transform/experimental:fx_ir_lower_inline_cvr -- \
    --action=lower_and_run \
    --filename=inline_cvr_7x_dec_2020.model \
    --print_glow_glog=True

Reviewed By: jamesr66a

Differential Revision: D30462113

fbshipit-source-id: 0b2a1ce9770561248527d47c07b80112491dc949

commit | commitdiff | tree

Hao Lu [Mon, 23 Aug 2021 19:53:42 +0000 (12:53 -0700)]

[Static Runtime] Remove unused fusion patterns (#63636)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63636

Reviewed By: d1jang

Differential Revision: D30446573

fbshipit-source-id: 3abb7f697380f3b4e865b98c594de359b5e26b96

commit | commitdiff | tree

Bert Maher [Mon, 23 Aug 2021 19:41:32 +0000 (12:41 -0700)]

[nnc] Re-enable CPU fusion" (#63665)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63665

This reverts commit 125e2d02e575612eb427104e7c67f1c28f090db8.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30471646

Pulled By: bertmaher

fbshipit-source-id: 4189869566f03b5f9ada78d78830f6a34946eed6

commit | commitdiff | tree

Peter Bell [Mon, 23 Aug 2021 19:05:51 +0000 (12:05 -0700)]

Kill THCUNN (#63429)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63429

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30441308

Pulled By: ngimel

fbshipit-source-id: 3ae342a2f8d5c7f8827b637c4055c5d1b0a1be26

commit | commitdiff | tree

Rong Rong (AI Infra) [Mon, 23 Aug 2021 16:44:09 +0000 (09:44 -0700)]

fix mpi ssh runtime error (#63580)

Summary:
should fix https://github.com/pytorch/pytorch/issues/60756.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63580

Test Plan:
- this CI.
- validated by running on the bionic_cuda container: https://app.circleci.com/pipelines/github/pytorch/pytorch/366632/workflows/478602fb-698f-4210-ac09-d9c61af5c62b/jobs/15472104

Reviewed By: malfet

Differential Revision: D30486472

Pulled By: walterddr

fbshipit-source-id: d83ab88d163d4a468f03961a13d891b658668a7f

commit | commitdiff | tree

Rong Rong (AI Infra) [Mon, 23 Aug 2021 16:28:21 +0000 (09:28 -0700)]

hotfix clone issue (#63770)

Summary:
This was discovered during https://github.com/pytorch/pytorch/issues/63408. For some reason only this checkout action is not correctly set fetch-depth

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63770

Reviewed By: malfet, janeyx99

Differential Revision: D30486110

Pulled By: walterddr

fbshipit-source-id: a67395cca2487407ed0d49c8c89587935ca5f212

commit | commitdiff | tree

Gary Miguel [Mon, 23 Aug 2021 14:41:33 +0000 (07:41 -0700)]

[ONNX] add test images to repo (#63717)

Summary:
This is better than the status quo:
* Test doesn't download files from the internet -> faster and more
reliable.
* Test doesn't leave the git working directory dirty.

Rather than using the original images, I've copied some images from
the pytorch/vision repo. This will keep the tests in the two repos
in sync, while avoiding adding new assets to the vision repo.

See https://github.com/pytorch/vision/pull/4176.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63717

Reviewed By: janeyx99

Differential Revision: D30466016

Pulled By: malfet

fbshipit-source-id: 2c56d4c11b5c74db1764576bf1c95ce4ae714574

commit | commitdiff | tree

Alban Desmaison [Mon, 23 Aug 2021 14:05:51 +0000 (07:05 -0700)]

Allow implementing either backward or vjp for Function (#63434)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63434

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30431968

Pulled By: albanD

fbshipit-source-id: 0bb88664283486a9fd3364e6c3d79442a44625c2

commit | commitdiff | tree

Jithun Nair [Mon, 23 Aug 2021 05:29:04 +0000 (22:29 -0700)]

Update ROCm PyTorch persons of interest (#55206)

Summary:
cc jeffdaily sunway513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55206

Reviewed By: VitalyFedyunin

Differential Revision: D30296584

Pulled By: dzhulgakov

fbshipit-source-id: 6e5c610cc6b7c7fd58b80fa3f9de31f269341a88

commit | commitdiff | tree

Pritam Damania [Mon, 23 Aug 2021 01:55:45 +0000 (18:55 -0700)]

Remove `_fork_processes` from common_distributed.py (#63711)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63711

This removes `_fork_process` from common_distributed.py and fixes all
other callpoints to use `spawn_process` instead.
ghstack-source-id: 136395719

Test Plan: waitforbuildbot

Reviewed By: xush6528

Differential Revision: D30463834

fbshipit-source-id: 0c09e8a996d0e5b912c8cdd45488a39951bac4db

commit | commitdiff | tree

Horace He [Sun, 22 Aug 2021 00:13:27 +0000 (17:13 -0700)]

Made FuncTorchBatched decompose CompositeImplicitAutograd (#63616)

Summary:
See https://github.com/facebookresearch/functorch/issues/56

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63616

Reviewed By: zou3519

Differential Revision: D30438316

Pulled By: Chillee

fbshipit-source-id: e84446d9f68b87daa0cfff75b3b8a972f36ec85a

commit | commitdiff | tree

jiej [Sat, 21 Aug 2021 16:05:04 +0000 (09:05 -0700)]

BatchNorm autodiff re-enabled (#57321)

Summary:
Turns on BN in autodiff:

1. outputs an empty tensor for running stats to by pass autodiff issue on None;
2. fixing BN inference backward in cudnn & miopen, where backward falls back to native batchnorm kernel instead;

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57321

Reviewed By: albanD, ngimel

Differential Revision: D30250419

Pulled By: jansel

fbshipit-source-id: a62553789c20fb50a820003a056f40d9d642dfaa

commit | commitdiff | tree

Bert Maher [Sat, 21 Aug 2021 10:45:21 +0000 (03:45 -0700)]

Revert D30360382: [nnc] Support thread level parallelism in fused kernels

Test Plan: revert-hammer

Differential Revision:
D30360382 (https://github.com/pytorch/pytorch/commit/d6d86efb1c839ddafd1398d6dab9caa4f31a9f0b)

Original commit changeset: 29acf4e932c6

fbshipit-source-id: e0531113135d30eabb172dc1537d5dd6d65dc438

commit | commitdiff | tree

Bert Maher [Sat, 21 Aug 2021 10:36:09 +0000 (03:36 -0700)]

Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism

Test Plan: revert-hammer

Differential Revision:
D30417127 (https://github.com/pytorch/pytorch/commit/6600bc96517269c608ea47b76b6bda9476c7bcef)

Original commit changeset: b77d7c68364f

fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1

commit | commitdiff | tree

Wanchao Liang [Sat, 21 Aug 2021 05:15:55 +0000 (22:15 -0700)]

[sharded_tensor] add readonly tensor properties (#63679)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63679

This PR add read only tensor properties to sharded tensor, to match the torch.Tensor behaviors.

Test Plan: test_sharded_tensor_metadata

Reviewed By: pritamdamania87

Differential Revision: D30459343

fbshipit-source-id: 9aec8ecfe76479eed25f3b843495e5719ed2956d

commit | commitdiff | tree

Hao Lu [Sat, 21 Aug 2021 04:41:19 +0000 (21:41 -0700)]

[Static Runtime] Implement out variant for fb::quantized_linear (#63635)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63635

Reviewed By: ajyu

Differential Revision: D30446234

fbshipit-source-id: 1ef014186ff725930a97d0159626f9233ee74030

commit | commitdiff | tree

Akshit Khurana [Sat, 21 Aug 2021 04:08:59 +0000 (21:08 -0700)]

NNAPI: Support const values in binary ops

Summary:
NNAPI converter failed with 1 const value and one tensor earlier
Code suggestions from dreiss

Test Plan:
pytest test/test_nnapi.py::TestNNAPI::test_pointwise_binary

Imported from OSS

Reviewed By: anshuljain1

Differential Revision: D28893881

fbshipit-source-id: 59240373fb03c6fdafa4cb2fa4d8408dd20092f6

commit | commitdiff | tree

Peter Bell [Sat, 21 Aug 2021 01:27:33 +0000 (18:27 -0700)]

Migrate thnn_conv2d from THC to ATen (#63428)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63428

Closes gh-24644, closes gh-24645

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30441307

Pulled By: ngimel

fbshipit-source-id: 9c3dec469c0525831ae398df261cf41b7df7e373

commit | commitdiff | tree

Bo Wang [Sat, 21 Aug 2021 00:09:35 +0000 (17:09 -0700)]

Extend _sharded_tensor constructor to support other ops like torch.ones (#63378)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63378

a) Introduce InitCommonParams to wrap tensor creation params
b) Factor local tensor initiation into common_params so that tensor value is not hard specified in ShardedTensor constructor
c) Add _sharded_tensor.ones(...) to exemplify - Note memory_format arg is not provided to be consistent as torch.ones
d) Follow up: more ops like torch.full, torch.zero, torch.rand,

Test:
$ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestCreateTensorFromParams --v
$ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestShardedTensorChunked.test_create_sharded_tensor_with_ones --v
$ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestShardedTensorEnumerable.test_create_sharded_tensor_with_ones --v

Test Plan: Imported from OSS

Reviewed By: pritamdamania87, wanchaol

Differential Revision: D30359245

Pulled By: bowangbj

fbshipit-source-id: 85768fcb36e9d9d40213036884b1266930a91701

commit | commitdiff | tree

driazati [Fri, 20 Aug 2021 23:38:42 +0000 (16:38 -0700)]

[clang-tidy] Enable more folders (#63380)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63380

Crosses off some more of #62011, see the test in the stacked PR #63381

Test Plan: Imported from OSS

Reviewed By: malfet, seemethere

Differential Revision: D30455843

Pulled By: driazati

fbshipit-source-id: d473545d05ffa0b2476968f0b1c55f3a16a2c755

commit | commitdiff | tree

Yi Zhang [Fri, 20 Aug 2021 23:28:39 +0000 (16:28 -0700)]

enable increment build for build_libtorch (#63074)

Summary:
Since issue https://github.com/pytorch/pytorch/issues/59859 is resolved.

rerun_cmake in build_libtorch should not be hardcoded.
build_libtorch is necessary to generate debug version libtorch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63074

Reviewed By: VitalyFedyunin, seemethere

Differential Revision: D30306705

Pulled By: malfet

fbshipit-source-id: f2077d334191f4973da0681560937bc8bab730c1

commit | commitdiff | tree

北海若 [Fri, 20 Aug 2021 22:45:12 +0000 (15:45 -0700)]

[Doc] Deprecation notice for only_inputs argument (#63631)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/63544.

Changed docstring accordingly. I'm new here, not sure if the style is okay. Please check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63631

Reviewed By: ejguan

Differential Revision: D30459439

Pulled By: soulitzer

fbshipit-source-id: 8df3c509d1dd39764815b099ab47229550126cbe

commit | commitdiff | tree

driazati [Fri, 20 Aug 2021 22:45:10 +0000 (15:45 -0700)]

Remove breakpad from docker image (#63598)

Summary:
As of https://github.com/pytorch/pytorch/issues/63186 we're doing this properly via a third_party cmake build, so we don't need it here anymore.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63598

Reviewed By: walterddr, malfet

Differential Revision: D30432250

Pulled By: driazati

fbshipit-source-id: d0d5db14355cf574e42c0d0ed786bb26230180bd

commit | commitdiff | tree

jiayisun [Fri, 20 Aug 2021 21:54:51 +0000 (14:54 -0700)]

add BFloat16 operators on CPU: range, sinh, cosh, frexp, nan_to_num (#61826)

Summary:
Added BFloat16 support for range, sinh, cosh, frexp, and nan_to_num on CPU, and collected the benchmark data of these OPs(range, sinh, cosh, frexp, and nan_to_num) for BFloat16 and Float32 data type by using the operator_benchmark tool of PyTorch on the platform of Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz

Number of cores: 1 core, 28 cores(1 socket)
[cosh_sinh_benchmark.txt](https://github.com/pytorch/pytorch/files/6974313/cosh_sinh_benchmark.txt)
[frexp_benchmark.txt](https://github.com/pytorch/pytorch/files/6974315/frexp_benchmark.txt)
[nan_to_num_benchmark.txt](https://github.com/pytorch/pytorch/files/6974317/nan_to_num_benchmark.txt)
[range_benchmark.txt](https://github.com/pytorch/pytorch/files/6974318/range_benchmark.txt)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61826

Reviewed By: saketh-are

Differential Revision: D30257259

Pulled By: VitalyFedyunin

fbshipit-source-id: 394cd713e6394050a8c90b2160633beb675d71dd

commit | commitdiff | tree

Jeff Daily [Fri, 20 Aug 2021 21:00:20 +0000 (14:00 -0700)]

empty caching allocator before test_avg_pool2d large subtest (#63528)

Summary:
Otherwise, unrecoverable OOM occurs on MI25. Fixes broken ROCm CI test1.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63528

Reviewed By: malfet, zhouzhuojie

Differential Revision: D30459151

Pulled By: walterddr

fbshipit-source-id: 63e205c4f486fcbdd514cfb0ed8e38584f894585

commit | commitdiff | tree

Nikita Shulga [Fri, 20 Aug 2021 20:13:54 +0000 (13:13 -0700)]

Include iostream in ProcessGroupMPI.cpp (#63656)

Summary:
As it uses `std::cerr`, which in turn results in compilation regression introduced by https://github.com/pytorch/pytorch/pull/61500
Fixes https://github.com/pytorch/pytorch/issues/63653

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63656

Reviewed By: ejguan

Differential Revision: D30455824

Pulled By: malfet

fbshipit-source-id: 29f316e7f7fd8e7dcbee2666e7a985f25bf56515

commit | commitdiff | tree

Scott Wolchok [Fri, 20 Aug 2021 19:56:01 +0000 (12:56 -0700)]

[easy]Unbreak caffe2benchmarking build (#63655)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63655

ghstack-source-id: 136324310

Test Plan: buck build //fbobjc/Apps/Internal/Caffe2Benchmarking:Caffe2Benchmarking fbobjc/mode/iphonesimulator

Reviewed By: hl475, JacobSzwejbka

Differential Revision: D30455659

fbshipit-source-id: b6da6be4f89b6e84753ef0849ffedea04785034a

commit | commitdiff | tree

BowenBao [Fri, 20 Aug 2021 19:44:29 +0000 (12:44 -0700)]

[ONNX] Suppport torch.dot and torch.nn.utils.spectral_norm (#62596) (#62765)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62765

Fixes #27723

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375181

Pulled By: msaroufim

fbshipit-source-id: 715f4745899757ec405877980cd20c826028eb2c

Co-authored-by: BowenBao <bowbao@microsoft.com>

commit | commitdiff | tree

BowenBao [Fri, 20 Aug 2021 19:44:29 +0000 (12:44 -0700)]

[ONNX] Update repeat_interleave for dynamic repeats (#59979) (#62764)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62764

Fixes #58733

- Support dynamic interleave for cases with dynamic repeat values
- Moved repeat_interleave symbolic from opset 11 to opset 13, as sequence as output types for loop outputs is needed for this change

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375179

Pulled By: msaroufim

fbshipit-source-id: 787f96bf91d124fd0483761088c5f4ae930d96a9

Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>

commit | commitdiff | tree

BowenBao [Fri, 20 Aug 2021 19:44:29 +0000 (12:44 -0700)]

[ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280) (#62763)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62763

This PR is to fix the issue that the graph inputs might be updated when we export the model in inference mode.

When a model is export in inference mode, some optimizations will be made. One side effect of these optimizations is: the inputs of graph might be adjusted. Such optimizatiosn include:

1. Conv and BatchNorm op fusion.
2. Do constant folding.

If the user sets export_params=False, or set keep_initializers_as_inputs=True, it's highly possible that the user wants to provide the corresponding parameters or initiliazers as the inputs of the graph.
In such situation, no matter the model is export in inference mode or training mode, exporter needs to prevent above optimizations from adjusting the graph inputs. By this, the inputs of graph could match inputs that users provided.

The changes in this PR, add an additional common judgement to see if the above optimizations needs to be done or not. From the value of export_params and keep_initializers_as_inputs arguments, infer if the graph inputs are allowed to be adjusted.
If no, these optimizations will be ignored, even other requirements are matched.

Besides these code changes, the comments of some parameters below have been updated so that users have more thoughts when they consider how to leverage these parameters for different purposes:

1. export_params
2. training
3. do_constant_folding
4. keep_initializers_as_inputs

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375183

Pulled By: msaroufim

fbshipit-source-id: 4db8b9695649eb32a3a0fefa950ee2e5651bdba0

Co-authored-by: fatcat-z <jiz@microsoft.com>

commit | commitdiff | tree

BowenBao [Fri, 20 Aug 2021 19:44:29 +0000 (12:44 -0700)]

[ONNX] Fix controlflow shape inference with contrib op (#60707) (#62762)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62762

`ONNXShapeTypeInference` for node `n` is skipped if `n` is non ONNX namespace, or if `n` contains any non ONNX namespace nodes. This prevents controlflow nodes containing contrib ops from running `SpecialPostProcess`, which sets up correct node output shape/type information in rare cases.

This PR depends on opset 14 export https://github.com/pytorch/pytorch/pull/59486

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375180

Pulled By: msaroufim

fbshipit-source-id: 5deacec39f091deb4d75ddd9e660e12fca7f16c5

Co-authored-by: BowenBao <bowbao@microsoft.com>

commit | commitdiff | tree

Alban Desmaison [Fri, 20 Aug 2021 19:26:58 +0000 (12:26 -0700)]

Revert D30417370: [nnc] Enable CPU fusion

Test Plan: revert-hammer

Differential Revision:
D30417370 (https://github.com/pytorch/pytorch/commit/b9fc656cf26d60127bd695e4e5a7d27622f2563d)

Original commit changeset: 84ce7a578a36

fbshipit-source-id: cd23774cdc3273fd72f8a05f1900eaf36f373e6b

commit | commitdiff | tree

Pritam Damania [Fri, 20 Aug 2021 19:09:49 +0000 (12:09 -0700)]

[8/N] Remove c10d/ddp fork tests. (#63454)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63454

Continuation of https://github.com/pytorch/pytorch/pull/63443, this
PR removes all fork tests from torch.distributed.
ghstack-source-id: 136285511

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D30387872

fbshipit-source-id: f6d6313db126ae7b95b86f78a1e0726887c5c513

commit | commitdiff | tree

Alban Desmaison [Fri, 20 Aug 2021 19:05:32 +0000 (12:05 -0700)]

Revert D30426527: Adding DataLoader2 class as future replacement of DataLoader

Test Plan: revert-hammer

Differential Revision:
D30426527 (https://github.com/pytorch/pytorch/commit/5a7133b87fe2fd7d025d36855ed4cc06539a9299)

Original commit changeset: e5905d3364c4

fbshipit-source-id: 794d8a4e9256ccff8cf894aee10eff6adc30d502

commit | commitdiff | tree

Philip Meier [Fri, 20 Aug 2021 18:43:07 +0000 (11:43 -0700)]

Add `BinaryUfuncOpInfo` and broadcasting tests (#61964)

Summary:
As proof of concept, this PR uses the new `BinaryUfuncOpInfo` in broadcasting tests for `add`, `sub`, `mul`, `div`, `floor_div`, and `true_div`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61964

Reviewed By: ngimel

Differential Revision: D30407734

Pulled By: mruberry

fbshipit-source-id: ada28994f43b0635f279f45a02ecba18bc8ee033

commit | commitdiff | tree

Bert Maher [Fri, 20 Aug 2021 18:11:49 +0000 (11:11 -0700)]

[nnc] Enable CPU fusion (#63545)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63545

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417370

Pulled By: bertmaher

fbshipit-source-id: 84ce7a578a3678d5562bab99d1dc00330c4f72d1

commit | commitdiff | tree

Bert Maher [Fri, 20 Aug 2021 18:11:49 +0000 (11:11 -0700)]

Remove flag to toggle CPU fusion in the presence of parallelism (#63514)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417127

Pulled By: bertmaher

fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e

commit | commitdiff | tree

Bert Maher [Fri, 20 Aug 2021 18:11:49 +0000 (11:11 -0700)]

[nnc] Support thread level parallelism in fused kernels (#63386)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63386

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30360382

Pulled By: bertmaher

fbshipit-source-id: 29acf4e932c669ce0f35823faea9099bcd8119b6

commit | commitdiff | tree

Aaron Bockover [Fri, 20 Aug 2021 18:11:47 +0000 (11:11 -0700)]

Add support for the ONNX Runtime Eager Mode backend (#58248)

Summary:
This PR implements the necessary hooks/stubs/enums/etc for complete ONNX Runtime (ORT) Eager Mode integration. The actual extension will live out of tree at https://github.com/pytorch/ort.

We have been [working on this at Microsoft](https://github.com/microsoft/onnxruntime-pytorch/tree/eager-ort/torch_onnxruntime) for the last few months, and are finally ready to contribute the PyTorch core changes upstream (nothing major or exciting, just the usual boilerplate for adding new backends).

The ORT backend will allow us to ferry [almost] all torch ops into granular ONNX kernels that ORT will eagerly execute against any devices it supports (therefore, we only need a single ORT backend from a PyTorch perspective).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58248

Reviewed By: astaff

Differential Revision: D30344992

Pulled By: albanD

fbshipit-source-id: 69082b32121246340d686e16653626114b7714b2

commit | commitdiff | tree

Victor Quach [Fri, 20 Aug 2021 18:07:22 +0000 (11:07 -0700)]

Add docs describing saved tensor hooks (#62362)

Summary:
Add section to the Autograd mechanics docs to describe the recently
exposed saved tensors (https://github.com/pytorch/pytorch/issues/52451), how to register packing / unpacking
hooks (https://github.com/pytorch/pytorch/issues/60975) and how to use default hooks (https://github.com/pytorch/pytorch/issues/61834)

Sister PR: https://github.com/pytorch/pytorch/issues/62361 (will add a link from autograd.rst to notes/autograd in whatever PR does not land first)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62362

Reviewed By: soulitzer

Differential Revision: D30453177

Pulled By: Varal7

fbshipit-source-id: f5759977b069ff0ef36a47b08856d297691a6caa

commit | commitdiff | tree

Shiyan Deng [Fri, 20 Aug 2021 17:49:21 +0000 (10:49 -0700)]

[fx2trt] Add layernorm plugin for dynamic shape (#63620)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63620

Added layernorm dynamic plugin, so that it works when explicit batch dim is required. Needed for ig model.

Changed the way of how we creating a plugin layer from instantiating the plugin directly to use plugin creator with `PluginFieldCollection`.

Follow ups:
Another way to convert layernorm is by breaking it down to supported trt layers. T97398182

Test Plan: layernorm unittest

Reviewed By: yinghai

Differential Revision: D30138205

fbshipit-source-id: aebe021d8de818e20376634f30e84579b9807f9b

commit | commitdiff | tree

Pavithran Ramachandran [Fri, 20 Aug 2021 16:34:53 +0000 (09:34 -0700)]

[PyTorch][Edge] Improve InflatableArgs for Bundled Inputs (#62368)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62368

# Context
The bundled inputs accepts an expression in the form of string InflatableArg.fmt that can be applied on the inputs to inflate. The InflatableArg.fmt provides flexibility to have custom transformation to inflate. When the input arguments to a function are not Tensor type, TorchScript casts the inputs from type T to Optional[T] expects the function to handle Nullable (None) clause as well. This becomes tricky to handle in one line code or lambda functions.

We propose an alternative way which allows InflatableArg to include the text of a TorchScript function that would be defined on the module as a helper, then use that in its inflation expression. This can be provided by InflatableArg.fmt_fn. Please refer to pytorch/test/test_bundled_inputs.py for example on how to use the same.

Also refer JacobSzwejbka comment on the same [here](https://github.com/pytorch/pytorch/pull/62368#issuecomment-892012812)

# Mitigation
Allow InflatedArg to include the text of a TorchScript function that would be defined on the module as a helper, then use that in its inflation expression.
ghstack-source-id: 135158680

Test Plan:
To run `test_dict_args`

```
(base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource/fbcode] buck test //caffe2/test:test_bundled_inputs -- test_dict_args
Action graph will be rebuilt because files have been added or removed.
Building: finished in 5.4 sec (100%) 12180/12180 jobs, 0/12180 updated
  Total time: 5.8 sec
More details at https://www.internalfb.com/intern/buck/build/fafcf277-1095-4cba-978d-6022f0d391ad
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 5ef9de71-c1b1-406b-a6c0-3321c2368b8d
Trace available for this run at /tmp/tpx-20210727-163946.454212/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7036874465805934
    ✓ ListingSuccess: caffe2/test:test_bundled_inputs - main (11.365)
    ✓ Pass: caffe2/test:test_bundled_inputs - test_dict_args (test_bundled_inputs.TestBundledInputs) (12.307)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7036874465805934
```

To check the py code of TS module:
P433043973

Reviewed By: dreiss

Differential Revision: D29950421

fbshipit-source-id: c819ec5c94429b7fbf6c4beb0259457f169b08ec

commit | commitdiff | tree

Vitaly Fedyunin [Fri, 20 Aug 2021 16:00:23 +0000 (09:00 -0700)]

Adding DataLoader2 class as future replacement of DataLoader (#63523)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63523

Supports sharding and batching on loader level**
* #63522 Adding IterableAsDataPipe IterDataPipe
usefull for tests and simple cases

Supports sharding and batching on loader level

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30426527

Pulled By: VitalyFedyunin

fbshipit-source-id: e5905d3364c4880e720dd62fb066f08881c71a6e

commit | commitdiff | tree

albanD [Fri, 20 Aug 2021 15:42:31 +0000 (08:42 -0700)]

Small custom function refactor which doesn't change anything (#63433)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63433

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30431970

Pulled By: albanD

fbshipit-source-id: 905fa4d2ddeca18005b1bcb13dd6f8a080327e7c

commit | commitdiff | tree

Vitaly Fedyunin [Fri, 20 Aug 2021 15:36:14 +0000 (08:36 -0700)]

Adding IterableAsDataPipe IterDataPipe (#63522)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63522

Supports sharding and batching on loader level
* **#63522 Adding IterableAsDataPipe IterDataPipe
usefull for tests and simple cases**

usefull for tests and simple cases

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30426528

Pulled By: VitalyFedyunin

fbshipit-source-id: 535b5cc1505bb58731fcca8170541ac5ee7bd417

commit | commitdiff | tree

Mike Iovine [Fri, 20 Aug 2021 13:14:13 +0000 (06:14 -0700)]

[Static Runtime] Enable RemoveListMutation (#63536)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63536

Enable a pass that transforms sequences like this:
```
li = []
li.append(1)
li.append(2)
```
into this:
```
li = [1, 2]
```
Initially I implemented this pass myself (D30387213), but I discovered that there is an existing pass that does the same thing.

Reviewed By: hlu1

Differential Revision: D30412970

fbshipit-source-id: 0810ef03480878d5039bd800a40f5fd31c2652ec

commit | commitdiff | tree

Don Jang [Fri, 20 Aug 2021 07:43:40 +0000 (00:43 -0700)]

[Static Runtime] Add native op for aten::detach (#63625)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63625

This change adds a static runtime's native op implementation for `aten::detach` op.

See the standard `aten::detach`'s implementation (https://codebrowser.bddppq.com/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp.html#_ZN2at6native6detachERKNS_6TensorE ) for comparison.

Test Plan:
- Added `StaticRuntime.IndividualOps_Detach`.

- Observed

```
V0819 18:55:33.181188 3092034 impl.cpp:1398] Switch to native impl for node: %a.1 : Tensor = aten::detach(%input.1)
```

Reviewed By: hlu1

Differential Revision: D30443187

fbshipit-source-id: d6e0eadb1b817e0a126c4fc97526abc276ee8a17

commit | commitdiff | tree

Nikita Shulga [Fri, 20 Aug 2021 06:42:24 +0000 (23:42 -0700)]

Update protobuf to 3.13.1 (#62571)

Summary:
Update bazel to 4.10.0

Update ASAN_SYMBOLIZER_PATH to llvm-7
Suppress `vptr` ubsan violations in `test_jit`
Fix ProtoBuf patching for ONNX which caused Windows builds to crash while attempting to free `std::string` allocated on stack

Fixes https://github.com/pytorch/pytorch/issues/62569

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62571

Reviewed By: walterddr

Differential Revision: D30048685

Pulled By: malfet

fbshipit-source-id: 6462c1bef9c42318551d2cf906bbab41e1d4e1cd

commit | commitdiff | tree

Raghavan Raman [Fri, 20 Aug 2021 05:50:32 +0000 (22:50 -0700)]

[nnc] Updated sliceTail to do inplace mutation (#63532)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63532

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30412184

Pulled By: navahgar

fbshipit-source-id: e7669d3b9d24e14501f3feb6505c88d1d42030c6

commit | commitdiff | tree

Raghavan Raman [Fri, 20 Aug 2021 05:50:32 +0000 (22:50 -0700)]

[nnc] Updated sliceHead to do inplace mutation (#63531)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63531

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30412183

Pulled By: navahgar

fbshipit-source-id: 47ee9482a36e606788d28d22eee4edaca45ffa50

commit | commitdiff | tree

Scott Wolchok [Fri, 20 Aug 2021 01:52:33 +0000 (18:52 -0700)]

[PyTorch] Remove unnecessary iostream includes in headers (#61500)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61500

libstdc++ defines a static variable called `std::__ioinit` in iostream that adds global constructor size overhead to each translation that includes iostream. To reduce the size overhead from that, we can often include ostream instead.
ghstack-source-id: 136163529

Test Plan: buildsizebot some mobile apps

Reviewed By: dhruvbird

Differential Revision: D29648016

fbshipit-source-id: 9c3139712c71248513cc5032d21e77f3ecbae8fe

commit | commitdiff | tree

Scott Wolchok [Fri, 20 Aug 2021 01:52:33 +0000 (18:52 -0700)]

[PyTorch] Remove unused dump() methods in vec headers (#63533)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63533

These methods don't seem to be used, and they use std::cout, which incurs a small code size overhead on platforms using libstdc++ due to std::__ioinit (see #61500). Seems like we can just delete them?
ghstack-source-id: 136163409

Test Plan:
CI

Reviwers: #sentinel, dhruvbird

Reviewed By: dskhudia

Differential Revision: D30412269

fbshipit-source-id: 380b9aa2f9aabc4107188b6b209d2afc1769c0ee

commit | commitdiff | tree

Pavithran Ramachandran [Fri, 20 Aug 2021 01:39:50 +0000 (18:39 -0700)]

[PyTorch][Edge] Support backtrace symbolication for Android builds (#63339)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63339

# Context
https://fb.workplace.com/groups/pytorch.dev/permalink/900474523864362/?comment_id=901125403799274&reply_comment_id=905023386742809

##### WHAT IS A STACK TRACE?
A stack trace (also called stack backtrace or stack traceback) is a report of the active stack frames at a certain point in time during the execution of a program.

Typically when an exception is thrown, one would expect to see the code (file:line) that threw the exception, and every intermediate frame up to and including the main function.

We are enabling android stack trace to help debugging on android devices.

Test Plan:
## Steps to test
```
buck build fbsource//xplat/caffe2/mode/aibench_pytorch_android -c pt.enable_qpl=0 -c pt.has_backtraces=1 fbsource//xplat/caffe2/fb/lite_predictor:lite_predictorAndroid#android-x86_64

one_world android emulator android-28

adb push ~/fbsource/buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictorAndroid#android-x86_64 /data/local/tmp

cd /data/local/tmp
./lite_predictorAndroid#android-x86_64

./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true
```

## See how model file is not found stack traces is:

### before
```
./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true

Run with 2 threads
Run with 2 threads
Loading model...
terminating with uncaught exception of type c10::Error: open file failed, file path: ./detect.bc
Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first):
(no backtrace available)
Aborted
```

### after
```
134|generic_x86_64:/data/local/tmp $ ./lite_predictorAndroid#android-x86_64 --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true
Run with 2 threads
Run with 2 threads
Loading model...
terminating with uncaught exception of type c10::Error: open file failed, file path: ./detect.bc
Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first):
frame #0       c10::get_backtrace(unsigned long, unsigned long, bool)[0x59494274f10e]
frame #1       [0x5949427b1eee]
frame #2       [0x5949427b1eb2]
frame #3       [0x5949427b1cdc]
frame #4       std::__ndk1::function<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > ()>::operator()() const[0x5949427afc34]
frame #5       c10::Error::Error(c10::SourceLocation, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >)[0x5949427b05b1]
frame #6       c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949427aca5f]
frame #7       caffe2::serialize::FileAdapter::RAIIFile::RAIIFile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949426b37b2]
frame #8       caffe2::serialize::FileAdapter::FileAdapter(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x5949426b3903]
frame #9       torch::jit::_load_for_mobile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, c10::optional<c10::Device>, std::__ndk1::unordered_map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::hash<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::equal_to<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > > >&)[0x5949422737bd]
frame #10      torch::jit::_load_for_mobile(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, c10::optional<c10::Device>)[0x594942273769]
frame #11      benchmark(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, int, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, bool, int, int, int, bool, int, bool, int, double, bool, bool, bool, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)[0x59494189b21d]
frame #12      main[0x594941882aff]
frame #13      __libc_init[0x7b699d08578d]
```

### what we get for os:linux
```
(base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor --model ./detect.bc --input_dims "1,3,192,192" --input_type float --warmup 20 --iter 5 --report_pep true
Run with 24 threads
Run with 24 threads
Loading model...
terminate called after throwing an instance of 'c10::Error'
  what():  open file failed, file path: ./detect.bc
Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:13 (most recent call first):
frame #0: ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor() [0x20cb7fe]
frame #1: ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor() [0x20cb6c6]
frame #2: std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const + 0x54 (0x20ca4e4 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #3: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x57 (0x20ca9a7 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #4: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x7a (0x20c823a in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #5: caffe2::serialize::FileAdapter::RAIIFile::RAIIFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x96 (0x206f3d6 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #6: caffe2::serialize::FileAdapter::FileAdapter(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x42 (0x206f502 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #7: torch::jit::_load_for_mobile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x30 (0x1be826c in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #8: torch::jit::_load_for_mobile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>) + 0x35 (0x1be8214 in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #9: benchmark(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, int, int, int, bool, int, bool, int, double, bool, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x16d (0x12093ad in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #10: main + 0x25c (0x11f933c in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)
frame #11: __libc_start_main + 0x105 (0x7fc7b9f2ed95 in /usr/local/fbcode/platform009/lib/libc.so.6)
frame #12: _start + 0x2a (0x11f902a in ./buck-out/gen/xplat/caffe2/fb/lite_predictor/lite_predictor)

Aborted (core dumped)
````

Reviewed By: dhruvbird

Differential Revision: D30135947

fbshipit-source-id: f50c634ef4545843305cad4b4a14a8776b1aec76

commit | commitdiff | tree

Nikita Shulga [Thu, 19 Aug 2021 23:46:31 +0000 (16:46 -0700)]

Revert D30359218: [pytorch][PR] [doc] pre-commit fix instructions

Test Plan: revert-hammer

Differential Revision:
D30359218 (https://github.com/pytorch/pytorch/commit/4e1d84ae8fae49995c8966ccbe0f34360978492f)

Original commit changeset: 61771babeac4

fbshipit-source-id: c2ac0a4a7463fafa03ad0b20bfb0701a8c1476c4

commit | commitdiff | tree

zhouzhuojie [Thu, 19 Aug 2021 22:37:10 +0000 (15:37 -0700)]

Add concurrency group for more workflows (#63606)

Summary:
Fixes unnecessary duplicated workflows runs

![image](https://user-images.githubusercontent.com/658840/130146332-ecf54e49-3538-49c1-88de-b099f1c1e41f.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63606

Reviewed By: malfet, mruberry

Differential Revision: D30436889

Pulled By: zhouzhuojie

fbshipit-source-id: aafbad1edc45e3ab9bceb00e8f3b4204f18e43d0

commit | commitdiff | tree

Zeina Migeed [Thu, 19 Aug 2021 22:22:52 +0000 (15:22 -0700)]

acc type inference (#63119)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63119

Test Plan:
buck run mode/opt-clang caffe2/torch/fb/model_transform/experimental:fx_ir_lower_inline_cvr -- \
    --action=lower_and_run \
    --filename=inline_cvr_7x_dec_2020.model \
    --print_glow_glog=True

Reviewed By: jamesr66a, jfix71, ansley

Differential Revision: D30235895

fbshipit-source-id: dab7f96e1799b99eeae0ee519cf0ddd636fddf2e

commit | commitdiff | tree

Sergei Vorobev [Thu, 19 Aug 2021 21:57:00 +0000 (14:57 -0700)]

Replace hardcoded values in IndexKernel.cu (#63372)

Summary:
This is a small change that helps to maintain Cruise pytorch fork, since we use a different hardcoded value.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63372

Reviewed By: mruberry

Differential Revision: D30396171

Pulled By: ejguan

fbshipit-source-id: cc0023f58b5922d3d98c7283495e6dc8d35049b6

commit | commitdiff | tree

Adam J. Stewart [Thu, 19 Aug 2021 21:54:26 +0000 (14:54 -0700)]

DataLoader: allow non-integer Samplers (#63500)

Summary:
Not entirely sure how to use TypeVar but if someone could give me a hint it would be appreciated. Also let me know if you want me to add tests so we can make sure non-integer samplers actually work. It seems like `test/test_dataloader.py` is the correct location but that's a big file.

Fixes https://github.com/pytorch/pytorch/issues/63483

ejguan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63500

Reviewed By: mruberry

Differential Revision: D30403689

Pulled By: ejguan

fbshipit-source-id: 464e09e5aad3215b94a29cc5e21cb4b10ec136e3

commit | commitdiff | tree

Kimish Patel [Thu, 19 Aug 2021 20:32:26 +0000 (13:32 -0700)]

[Pytorch] Fix callstack pointer serialization bug (#63576)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63576

We serialize function name associated with InlinedCallStackPtr. This is derived
via querying Function* stored in InlinedCallStack. However this is a raw
pointer that is not gauranteed to be valid when we serialization happens. On
the other hand we also store function name separately when constructing
InlinedCallStack anyways. So this change just uniformly relies on function_name
instead of Function*

Test Plan: Internal build's asan failure + CI

Reviewed By: larryliu0820

Differential Revision: D30427029

fbshipit-source-id: de9617482404785920ed2e67b72f38461590fba3

commit | commitdiff | tree

Charles David Hernandez [Thu, 19 Aug 2021 20:04:48 +0000 (13:04 -0700)]

Updating the names of these functions (#63513)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63513

updating these names per Jerry's nits in the previous pr

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D30406710

fbshipit-source-id: a9f1577a2b8c4a93f5005e0f6278b7d7348d8b66

commit | commitdiff | tree

Natalia Gimelshein [Thu, 19 Aug 2021 20:00:08 +0000 (13:00 -0700)]

Revert embedding thrust->cub migration (#63451)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/63427

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63451

Reviewed By: mruberry

Differential Revision: D30398482

Pulled By: ngimel

fbshipit-source-id: e153786d204215555a6571688eabae712facad7e

commit | commitdiff | tree

Philip Meier [Thu, 19 Aug 2021 19:45:32 +0000 (12:45 -0700)]

Updates internal `assert_allclose` callsites in favor of `assert_close` (#61841)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61841

Redo of #60863.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30408145

Pulled By: mruberry

fbshipit-source-id: 0b34ebc7f23ba38ecd89640b61d8aca59b7eab58

commit | commitdiff | tree

Mike Ruberry [Thu, 19 Aug 2021 19:41:42 +0000 (12:41 -0700)]

Modernizes add and mul documentation (#63309)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/39329.

The documentation for torch.add and torch.mul was sorely out of date and even included deprecated references. This PR modernizes their descriptions consistent with torch.sub.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63309

Reviewed By: ngimel

Differential Revision: D30338004

Pulled By: mruberry

fbshipit-source-id: ee1c2a8106af8341253cafb0003b06e8f652624d

commit | commitdiff | tree

kshitij12345 [Thu, 19 Aug 2021 19:40:37 +0000 (12:40 -0700)]

[special] use __all__ to hide internal imports (#63135)

Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63135

Reviewed By: ngimel

Differential Revision: D30364287

Pulled By: mruberry

fbshipit-source-id: 20078668943fafa45ce09610634b1d2c424b1922

commit | commitdiff | tree

Yusuo Hu [Thu, 19 Aug 2021 19:37:58 +0000 (12:37 -0700)]

[BF16] Add a missing thread local specifier to autocast_gpu_dtype (#63416)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63416

Fix a missing thread local specifier introduced by recent PR

https://github.com/pytorch/pytorch/pull/61002

Test Plan: Unit Tests

Reviewed By: ngimel

Differential Revision: D30376154

fbshipit-source-id: c70d37ec85c3eba88eb87f766f1c4e7aeff8eaf9

commit | commitdiff | tree

Pritam Damania [Thu, 19 Aug 2021 18:21:26 +0000 (11:21 -0700)]

[7/N] Remove fork tests for RPC. (#63443)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63443

After https://github.com/pytorch/pytorch/pull/63442, all distributed
tests can run with opt-asan. As a result, we can now remove all of our fork
based tests.

This is the first PR in a stack, which first removes fork based tests from RPC.
ghstack-source-id: 136177744

Test Plan: waitforbuildbot

Reviewed By: lw

Differential Revision: D30384905

fbshipit-source-id: 86d438aebaa6cb02ae2a966fea244849849a1889

commit | commitdiff | tree

driazati [Thu, 19 Aug 2021 17:38:41 +0000 (10:38 -0700)]

Use CMake for breakpad (#63186)

Summary:
We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux.

```python
import torch

# On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes
# On MacOS/Linux this writes crashes to /tmp/pytorch_crashes
torch.utils._crash_handler.enable_minidumps()

# Easy way to cause a segfault and trigger the handler
torch.bincount(input=torch.tensor([9223372036854775807]))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186

Reviewed By: malfet, seemethere

Differential Revision: D30318404

Pulled By: driazati

fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc

commit | commitdiff | tree

Scott Wolchok [Thu, 19 Aug 2021 17:37:31 +0000 (10:37 -0700)]

[easy] Fix missing move in TupleType::createNamed (#61572)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61572

ghstack-source-id: 136161829

Test Plan: CI

Reviewed By: SplitInfinity

Differential Revision: D29672872

fbshipit-source-id: d8ba2d54f7914dbeb3fc52aa21dd77025951c4b5

commit | commitdiff | tree

Shiyan Deng [Thu, 19 Aug 2021 17:16:26 +0000 (10:16 -0700)]

[hpc] use fx2trt for exploration track (#63535)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63535

Reviewed By: yinghai, jianyuh

Differential Revision: D30272810

fbshipit-source-id: 61f3edf2a2282cd8c268a92acf92feb05a6ae3e1

commit | commitdiff | tree

Shiyan Deng [Thu, 19 Aug 2021 17:16:26 +0000 (10:16 -0700)]

Add permute021 fx2trt converter (#63238)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63238

Reviewed By: yinghai

Differential Revision: D30295373

fbshipit-source-id: 2a189fe485edaa978fd03e4b8d8582edb34ec648

commit | commitdiff | tree

Scott Wolchok [Thu, 19 Aug 2021 16:49:12 +0000 (09:49 -0700)]

[PyTorch] Test IValue move/copy/assign/swap more (#54717)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54717

Hit more tags in these tests
ghstack-source-id: 136140508

Test Plan: buck test //caffe2/aten:ivalue_test

Reviewed By: anjali411

Differential Revision: D27339736

fbshipit-source-id: 610c8e92846bb70ba725ab117440326ab50af5ce

commit | commitdiff | tree

David Esiobu [Thu, 19 Aug 2021 16:15:34 +0000 (09:15 -0700)]

Use linecache.lazycache to cache generated code. (#63453)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63453

Instead of patching linecache.getlines, use linecache.lazycache and
parts of the loader protocol described in PEP-302

Test Plan:
python3 test/test_fx.py

Imported from OSS

Reviewed By: suo

Differential Revision: D30388176

fbshipit-source-id: 92933711ecf3a21a07e1d6b0d1185ab0efd8341c

commit | commitdiff | tree

anjali411 [Thu, 19 Aug 2021 15:41:08 +0000 (08:41 -0700)]

Add fastpath for dot and vdot when the inputs have conj bit set to True (#62915)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62915

As much as 45% and 20% perf improvement on CUDA and CPU respectively.
consistent improvement in perf for all cases -- see perf numbers in comments below

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30404006

Pulled By: anjali411

fbshipit-source-id: 565940da28c7761d993cf43346932c24292e8a4d

commit | commitdiff | tree

Till Hoffmann [Thu, 19 Aug 2021 15:28:55 +0000 (08:28 -0700)]

Poisson zero rate (#61511)

Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/53485 by allowing zero rates for the Poisson distribution. This implementation is consistent with `scipy.stats.poisson` which admits zero rates. In addition to addressing the aforementioned issue, this PR makes two supporting changes:

1. add a `nonnegative` constraint to enforce non-negative rates for the Poisson distribution.
2. adjust the evaluation of the gradient of `xlogy` such that it is well defined for `x == 0 and y == 0`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61511

Reviewed By: ejguan

Differential Revision: D30352917

Pulled By: albanD

fbshipit-source-id: f3d33da58360e80d75eb83519f199b93232a2a2d

commit | commitdiff | tree

Jeff Daily [Thu, 19 Aug 2021 14:49:43 +0000 (07:49 -0700)]

add distributed/_sharded_tensor/test_sharded_tensor to ROCM_BLOCKLIST (#63508)

Summary:
Fixes current ROCm CI test2 brokenness until tensorpipe is fully supported by ROCm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63508

Reviewed By: ejguan

Differential Revision: D30406450

Pulled By: walterddr

fbshipit-source-id: c07509271d5d33901f3eaf7ffb916dc3626e1f9a

commit | commitdiff | tree

Ilqar Ramazanli [Thu, 19 Aug 2021 14:15:16 +0000 (07:15 -0700)]

To fix the chainability at epoch zero for some schedulers (#63457)

Summary:
It has been discussed in the https://github.com/pytorch/pytorch/pull/60836#issuecomment-899084092 that we have observed an obstacle to chain some type of learning rate schedulers. In particular we observed

* some of the learning rate schedulers returns initial learning rates at epoch 0 as
```
       return self.base_lrs`
```

* This can be a problem when two schedulers called as chained as

```
     scheduler1.step()
     scheduler2.step()
```

in particular, we completely ignore the effect of scheduler1 at epoch 0.  This could not be an issue if at epoch 0, scheduler1 was ineffective as in many schedulers, however for schedulers as WarmUp Schedulers, where at epoch 0 schedulers multiplicative value is smaller than 1 this could lead to undesired behaviors.

The following code snippet illustrates the problem better

## Reproducing the bug

```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR, ExponentialLR

model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 1.0)
scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
scheduler2 = ExponentialLR(optimizer, gamma=0.9)

for epoch in range(10):
     print(epoch, scheduler2.get_last_lr()[0])
     optimizer.step()
     scheduler1.step()
     scheduler2.step()
```

### Current Result

```
0 1.0
1 0.9
2 0.81
3 0.7290000000000001
4 0.6561000000000001
5 5.904900000000001
6 5.314410000000001
7 4.782969000000001
8 4.304672100000001
9 3.874204890000001
```

### Expected Result

```
0 1.0
1 0.9
2 0.81
3 0.7290000000000001
4 0.6561000000000001
5 0.5904900000000001
6 0.5314410000000001
7 0.4782969000000001
8 0.4304672100000001
9 0.3874204890000001
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63457

Reviewed By: datumbox

Differential Revision: D30424160

Pulled By: iramazanli

fbshipit-source-id: 3e15af8d278c872cd6f53406b55f4d3ce5002867

commit | commitdiff | tree

Alban Desmaison [Thu, 19 Aug 2021 13:47:31 +0000 (06:47 -0700)]

Update full backward hook doc with not-same-object note (#63245)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/61446

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63245

Reviewed By: ejguan

Differential Revision: D30352656

Pulled By: albanD

fbshipit-source-id: 7000ecb54a80f2da968ec7600b98574b608578ae

commit | commitdiff | tree

Mike Iovine [Thu, 19 Aug 2021 13:37:44 +0000 (06:37 -0700)]

[Static Runtime] Support __getitem__ for lists (#63398)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63398

This change provides a native `__getitem__` implementation for lists to avoid overhead associated with falling back to the JIT interpreter.

Test Plan: Unit tests: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D30368464

fbshipit-source-id: e0e0971508cd5d9bcf6025606993dc24ecbf6764

commit | commitdiff | tree

Alban Desmaison [Thu, 19 Aug 2021 13:19:20 +0000 (06:19 -0700)]

Revert D29399533: Hoisting common expressions out of If blocks

Test Plan: revert-hammer

Differential Revision:
D29399533 (https://github.com/pytorch/pytorch/commit/9477211e7d609ce382c0e22d7721c14c36d083de)

Original commit changeset: 9336b9dc48c0

fbshipit-source-id: f081c7280203f40328bcbb0c03a7c6a007acedb7

commit | commitdiff | tree

Chen Lai [Thu, 19 Aug 2021 09:12:44 +0000 (02:12 -0700)]

Fix interpreter debug logging message (#63499)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63499

https://github.com/pytorch/pytorch/pull/62418 combine the instruction and debug handle. This change fix the debugging message.
ghstack-source-id: 136184053

Test Plan: Uncomment and it works

Reviewed By: kimishpatel, raziel

Differential Revision: D30390699

fbshipit-source-id: e32b7b297ad3b7d8bffebd025d15519083a244c4

commit | commitdiff | tree

Nikolay Korovaiko [Thu, 19 Aug 2021 05:59:40 +0000 (22:59 -0700)]

layernom inplace (#63437)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63437

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30388824

Pulled By: Krovatkin

fbshipit-source-id: 852d19bf238544c5de177ed5854dcd01c7ae5572

commit | commitdiff | tree

Nikolay Korovaiko [Thu, 19 Aug 2021 05:59:40 +0000 (22:59 -0700)]

layernorm (#63436)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63436

use MKLDNN layernorm

use mkldnn version 2

address Elias feedback

fix build CI errors

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30388825

Pulled By: Krovatkin

fbshipit-source-id: fb909bfbf53cb8567a43aac40f51c491daeec908

commit | commitdiff | tree

Mikhail Zolotukhin [Thu, 19 Aug 2021 05:56:47 +0000 (22:56 -0700)]

[TensorExpr] Make CacheReplacer and IndexFlattener mutate stmts/exprs inplace. (#63527)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63527

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30411411

Pulled By: ZolotukhinM

fbshipit-source-id: efb14ee57b36537fa4fefa89bdd6bafe7151c012

commit | commitdiff | tree

Mikhail Zolotukhin [Thu, 19 Aug 2021 05:56:47 +0000 (22:56 -0700)]

[TensorExpr] Speedup ExternalCall.ComputeInterop test by reducing tensor sizes. (#63526)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63526

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30411410

Pulled By: ZolotukhinM

fbshipit-source-id: d9a99afac14d2238b5100c98ae9ed4467f9f05ea

commit | commitdiff | tree

Michael Dagitses [Thu, 19 Aug 2021 04:39:18 +0000 (21:39 -0700)]

support optional comparisons with different but comparable types (#62890)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/62565

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62890

Reviewed By: ejguan

Differential Revision: D30396008

Pulled By: dagitses

fbshipit-source-id: fca02207509f882973d54484f89c4d116505fc66

commit | commitdiff | tree

Edward Yang [Thu, 19 Aug 2021 03:56:25 +0000 (20:56 -0700)]

Beef up comment in AccumulateType (#63503)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63503

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30403160

Pulled By: ezyang

fbshipit-source-id: 6cb24418152d9fb146f86b6f973ec50f1a397a58

commit | commitdiff | tree

Yinbin Ma [Thu, 19 Aug 2021 03:52:17 +0000 (20:52 -0700)]

BF16 allreduce hook (#63260)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63260

Add BF16 all-reduce communication hook. Skip if CUDA version < 11 or NCCL version < 2.9.7.

Reviewed By: SciPioneer

Differential Revision: D30238317

fbshipit-source-id: bad35bf7d43f10f1c40997a282b831b61ef592bb

commit | commitdiff | tree

John Clow [Wed, 18 Aug 2021 23:28:02 +0000 (16:28 -0700)]

Hoisting common expressions out of If blocks (#59492)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59492

Adding code to find common expressions from the two subblocks of an if
operation and hoist them before the if block.
This also allows Dead Code Elimination to
then eliminate some if blocks.

Also eliminated some dead code in the codebase.

Test Plan:
python test_jit.py TestIfHoisting

Imported from OSS

Reviewed By: ngimel

Differential Revision: D29399533

fbshipit-source-id: 9336b9dc48c02c38862f98f98cd72fc1767a1802

commit | commitdiff | tree

Amy He [Wed, 18 Aug 2021 23:23:48 +0000 (16:23 -0700)]

Nnapi Delegation: Quick improvements (#63489)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63489

A few quick improvements to the Android NNAPI Delegate, some of which were discussed here https://github.com/pytorch/pytorch/pull/62272:
1) `throw std::exception` replaced with `TORCH_CHECK` to reduce runtime
size (nnapi_backend_lib.cpp)
2) weights processing moved from compile to preprocess step, since it can
be done AOT (nnapi_backend_lib.cpp & nnapi_backend_preprocess.cpp)
3) `ser_model_` and `shape_compute_module_` member variables removed, since they are never used after
`init()`, so they are not needed (nnapi_backend_lib.cpp)

Test Plan:
Unit tests: `python test/test_jit.py TestNnapiBackend`
Run SparkAR segmentation with delegated NNAPI as done here D30259033 (can use `jf download GAekdAwsyGKXhggFALN4LnSBTzcubsIXAAAz --file "v303-nnd-mod.ptl"` to get a preprocessed model from these changes)

Imported from OSS

Reviewed By: raziel, iseeyuan

Differential Revision: D30398880

fbshipit-source-id: b6872e1e9ccd583622b80659da00c83fdd82580e

commit | commitdiff | tree

kshitij12345 [Wed, 18 Aug 2021 23:08:48 +0000 (16:08 -0700)]

[fix] tensor_split : non-contiguous indices tensor (#63390)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/63281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63390

Reviewed By: ejguan

Differential Revision: D30362649

Pulled By: mruberry

fbshipit-source-id: 3ea3ad02199e4345beb0b580d056babd56112309

commit | commitdiff | tree

Sangbaek Park [Wed, 18 Aug 2021 22:50:33 +0000 (15:50 -0700)]

[Vulkan] Fix incorrect input range for Hardshrink tests (#63515)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63515

Fixed inappropriate input range for Hardshrink tests:
The range -10 ~ +10 for input tensors is more proper when we use the test set of lambda {-4.2, -1.0, -0.42, 0.0, 0.42, 1.0, 4.2, 42.42}.
ghstack-source-id: 136141416

Test Plan:
```build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
```
Note that the test can fail sporadically due to the precision loss by FP16(Vulkan)/FP32(CPU). This issue will be handled separately after some design discussions.

Reviewed By: SS-JIA

Differential Revision: D30389646

fbshipit-source-id: 7224bd8ba4e4972f5fc147df8a0cb84808f8c62e

commit | commitdiff | tree

Rong Rong (AI Infra) [Wed, 18 Aug 2021 22:02:05 +0000 (15:02 -0700)]

using PR number instead of IN_PULL_REQUEST (#63360)

Summary:
PR numbers should be available on GHA after this.

This fixes some target determinator not working issue discovered when manually running: https://github.com/pytorch/pytorch/issues/63412.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63360

Reviewed By: malfet, zhouzhuojie, seemethere

Differential Revision: D30374615

Pulled By: walterddr

fbshipit-source-id: eee8d8bb7aa4308a6a50cfdcd4423a96d846777f

commit | commitdiff | tree

Mike Iovine [Wed, 18 Aug 2021 21:56:51 +0000 (14:56 -0700)]

[Static Runtime] Benchmark reports native nodes (#63346)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63346

We have seen that we can get significant perf wins essentially for free by implementing native ops for ops that we cannot write out variants for (e.g. TupleUnpack D30306955 (https://github.com/pytorch/pytorch/commit/078b8004a62a51f75e1fbd8d08eea359af6bb1d7), append D30326461 (https://github.com/pytorch/pytorch/commit/9d9e7a8d7294834ddad957ddb1f4cd5a0e741e55)). Therefore, whether or not SR is using a native implementation is valuable information. By capturing this in the benchmarking suite, we can hopefully avoid wasting time profiling/manually inspecting `native_ops.cpp`

Reviewed By: hlu1

Differential Revision: D30346752

fbshipit-source-id: 205b090513b6a5a6ce4cb92f75ab0395b15d08f9

commit | commitdiff | tree

Mostafa Elhoushi [Wed, 18 Aug 2021 21:47:40 +0000 (14:47 -0700)]

[FX] make ASTReriter patch wrapped functions properly (#62987)

Summary:
reference the same global namespace (instead of copying it) in ASTRewriter to patch wrapped functions properly

Fixes #{62071}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62987

Test Plan:
To test it you may write this snippet and ensure the results are as shown in the comments:

```
import torch
import torch.fx

torch.fx.wrap
def to_be_wrapped(x):
    return torch.relu(x)

class Foo(torch.nn.Module):
    def forward(self, x):
        return to_be_wrapped(x)

traced = torch.fx.symbolic_trace(Foo())
print(traced.graph)
"""
graph():
    %x : [#users=1] = placeholder[target=x]
    %to_be_wrapped : [#users=1] = call_function[target=__main__.to_be_wrapped](args = (%x,), kwargs = {})
    return to_be_wrapped
"""

from torch.fx.experimental.rewriter import RewritingTracer

rt = RewritingTracer()
graph = rt.trace(Foo())
print(graph)
"""
### AFTER FIX (CORRECT):
graph():
    %x : [#users=1] = placeholder[target=x]
    %to_be_wrapped : [#users=1] = call_function[target=__main__.to_be_wrapped](args = (%x,), kwargs = {})
    return to_be_wrapped

### BEFORE FIX (WRONG):
graph():
    %x : [#users=1] = placeholder[target=x]
    %relu : [#users=1] = call_function[target=torch.relu](args = (%x,), kwargs = {})
    return relu
"""
```

Reviewed By: ansley

Differential Revision: D30396176

Pulled By: mostafaelhoushi

fbshipit-source-id: f61eddf32e9ef42b5f5c3ce21d559945214ee833

commit | commitdiff | tree

Dhruv Matani [Wed, 18 Aug 2021 21:47:19 +0000 (14:47 -0700)]

[PyTorch] Avoid using std::regex for device string parsing in Device.cpp (#63464)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63464

This was previously committed as D30281388 (https://github.com/pytorch/pytorch/commit/4d6f98ecada2d85b2474b023838debad4305316d), but was reverted due to t98478641. jnkwok1 confirmed that this change was not the root cause, so trying to land it again.

Currently, `std::regex` is used for parsing device strings. This is undesirable for a few reasons.

1. Increases binary size
2. Slows down model loading
3. Potentially uses more memory at runtime
4. Takes marginally longer time to build code that uses std::regex v/s not using std::regex

This change avoids the use of `std::regex` for parsing the device string since we don't need to.
ghstack-source-id: 136006963
ghstack-source-id: 136081898

Test Plan:
### AI Bench Runs

**Before this change:**
1. Model Load time: [252ms](https://www.internalfb.com/intern/aibench/details/332471502816548)
2. Model unload time: 3.5ms

**After this change:**
1. Model Load time: [240ms](https://www.internalfb.com/intern/aibench/details/652195589031318), which is an approx 5% reduction for the current model. I suspect percentage wise, it will be larger for smaller models since this is a fixed cost reduction.
2. Model unload time: 3.3ms (probably too small to be meaningfully impactful to an end user).

### BSB Results

```
D30281388 (https://github.com/pytorch/pytorch/commit/4d6f98ecada2d85b2474b023838debad4305316d)-V1 (https://www.internalfb.com/intern/diff/D30281388 (https://github.com/pytorch/pytorch/commit/4d6f98ecada2d85b2474b023838debad4305316d)/?dest_number=135713848)

messenger-pika-optimized-device: Succeeded
Change in Download Size for arm64 + 3x assets variation: -7.1 KiB
Change in Uncompressed Size for arm64 + 3x assets variation: -17.6 KiB

Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:551399955987465@base/bsb:551399955987465@diff/
```

Reviewed By: raziel, pavithranrao

Differential Revision: D30388269

fbshipit-source-id: 10942e7aa56f9ea47aa479a8f50187f2ce2899bf

Domain: Machine Learning / ML Framework;

RSS Atom