review.tizen.org Git - platform/upstream/pytorch.git/log

projects / platform / upstream / pytorch.git / log

Zeina Migeed [Thu, 19 Aug 2021 22:22:52 +0000 (15:22 -0700)]

acc type inference (#63119)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63119

Test Plan:
buck run mode/opt-clang caffe2/torch/fb/model_transform/experimental:fx_ir_lower_inline_cvr -- \
    --action=lower_and_run \
    --filename=inline_cvr_7x_dec_2020.model \
    --print_glow_glog=True

Reviewed By: jamesr66a, jfix71, ansley

Differential Revision: D30235895

fbshipit-source-id: dab7f96e1799b99eeae0ee519cf0ddd636fddf2e

commit | commitdiff | tree

Sergei Vorobev [Thu, 19 Aug 2021 21:57:00 +0000 (14:57 -0700)]

Replace hardcoded values in IndexKernel.cu (#63372)

Summary:
This is a small change that helps to maintain Cruise pytorch fork, since we use a different hardcoded value.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63372

Reviewed By: mruberry

Differential Revision: D30396171

Pulled By: ejguan

fbshipit-source-id: cc0023f58b5922d3d98c7283495e6dc8d35049b6

commit | commitdiff | tree

Adam J. Stewart [Thu, 19 Aug 2021 21:54:26 +0000 (14:54 -0700)]

DataLoader: allow non-integer Samplers (#63500)

Summary:
Not entirely sure how to use TypeVar but if someone could give me a hint it would be appreciated. Also let me know if you want me to add tests so we can make sure non-integer samplers actually work. It seems like `test/test_dataloader.py` is the correct location but that's a big file.

Fixes https://github.com/pytorch/pytorch/issues/63483

ejguan

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63500

Reviewed By: mruberry

Differential Revision: D30403689

Pulled By: ejguan

fbshipit-source-id: 464e09e5aad3215b94a29cc5e21cb4b10ec136e3

commit | commitdiff | tree

Kimish Patel [Thu, 19 Aug 2021 20:32:26 +0000 (13:32 -0700)]

[Pytorch] Fix callstack pointer serialization bug (#63576)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63576

We serialize function name associated with InlinedCallStackPtr. This is derived
via querying Function* stored in InlinedCallStack. However this is a raw
pointer that is not gauranteed to be valid when we serialization happens. On
the other hand we also store function name separately when constructing
InlinedCallStack anyways. So this change just uniformly relies on function_name
instead of Function*

Test Plan: Internal build's asan failure + CI

Reviewed By: larryliu0820

Differential Revision: D30427029

fbshipit-source-id: de9617482404785920ed2e67b72f38461590fba3

commit | commitdiff | tree

Charles David Hernandez [Thu, 19 Aug 2021 20:04:48 +0000 (13:04 -0700)]

Updating the names of these functions (#63513)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63513

updating these names per Jerry's nits in the previous pr

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D30406710

fbshipit-source-id: a9f1577a2b8c4a93f5005e0f6278b7d7348d8b66

commit | commitdiff | tree

Natalia Gimelshein [Thu, 19 Aug 2021 20:00:08 +0000 (13:00 -0700)]

Revert embedding thrust->cub migration (#63451)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/63427

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63451

Reviewed By: mruberry

Differential Revision: D30398482

Pulled By: ngimel

fbshipit-source-id: e153786d204215555a6571688eabae712facad7e

commit | commitdiff | tree

Philip Meier [Thu, 19 Aug 2021 19:45:32 +0000 (12:45 -0700)]

Updates internal `assert_allclose` callsites in favor of `assert_close` (#61841)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61841

Redo of #60863.

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30408145

Pulled By: mruberry

fbshipit-source-id: 0b34ebc7f23ba38ecd89640b61d8aca59b7eab58

commit | commitdiff | tree

Mike Ruberry [Thu, 19 Aug 2021 19:41:42 +0000 (12:41 -0700)]

Modernizes add and mul documentation (#63309)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/39329.

The documentation for torch.add and torch.mul was sorely out of date and even included deprecated references. This PR modernizes their descriptions consistent with torch.sub.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63309

Reviewed By: ngimel

Differential Revision: D30338004

Pulled By: mruberry

fbshipit-source-id: ee1c2a8106af8341253cafb0003b06e8f652624d

commit | commitdiff | tree

kshitij12345 [Thu, 19 Aug 2021 19:40:37 +0000 (12:40 -0700)]

[special] use __all__ to hide internal imports (#63135)

Summary:
Reference: https://github.com/pytorch/pytorch/issues/50345

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63135

Reviewed By: ngimel

Differential Revision: D30364287

Pulled By: mruberry

fbshipit-source-id: 20078668943fafa45ce09610634b1d2c424b1922

commit | commitdiff | tree

Yusuo Hu [Thu, 19 Aug 2021 19:37:58 +0000 (12:37 -0700)]

[BF16] Add a missing thread local specifier to autocast_gpu_dtype (#63416)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63416

Fix a missing thread local specifier introduced by recent PR

https://github.com/pytorch/pytorch/pull/61002

Test Plan: Unit Tests

Reviewed By: ngimel

Differential Revision: D30376154

fbshipit-source-id: c70d37ec85c3eba88eb87f766f1c4e7aeff8eaf9

commit | commitdiff | tree

Pritam Damania [Thu, 19 Aug 2021 18:21:26 +0000 (11:21 -0700)]

[7/N] Remove fork tests for RPC. (#63443)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63443

After https://github.com/pytorch/pytorch/pull/63442, all distributed
tests can run with opt-asan. As a result, we can now remove all of our fork
based tests.

This is the first PR in a stack, which first removes fork based tests from RPC.
ghstack-source-id: 136177744

Test Plan: waitforbuildbot

Reviewed By: lw

Differential Revision: D30384905

fbshipit-source-id: 86d438aebaa6cb02ae2a966fea244849849a1889

commit | commitdiff | tree

driazati [Thu, 19 Aug 2021 17:38:41 +0000 (10:38 -0700)]

Use CMake for breakpad (#63186)

Summary:
We currently build breakpad from [this fork](https://github.com/driazati/breakpad) to include extra logic to restore signal handlers that were previously present. With some [new additions](https://github.com/google/breakpad/compare/main...driazati:main) this fork now includes a CMake based build, so we can add breakpad as a proper dependency rather than rely on including it in Docker images as a system library which is error prone (we have a bunch of images) and hard to extend to MacOS / Windows. This also includes some changes to the crash handling code to support MacOS / Windows in a similar way to Linux.

```python
import torch

# On Windows this writes crashes to C:\Users\<user>\AppData\pytorch_crashes
# On MacOS/Linux this writes crashes to /tmp/pytorch_crashes
torch.utils._crash_handler.enable_minidumps()

# Easy way to cause a segfault and trigger the handler
torch.bincount(input=torch.tensor([9223372036854775807]))
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63186

Reviewed By: malfet, seemethere

Differential Revision: D30318404

Pulled By: driazati

fbshipit-source-id: 0d7daf3701cfaba5451cc529a0730272ab1eb1dc

commit | commitdiff | tree

Scott Wolchok [Thu, 19 Aug 2021 17:37:31 +0000 (10:37 -0700)]

[easy] Fix missing move in TupleType::createNamed (#61572)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61572

ghstack-source-id: 136161829

Test Plan: CI

Reviewed By: SplitInfinity

Differential Revision: D29672872

fbshipit-source-id: d8ba2d54f7914dbeb3fc52aa21dd77025951c4b5

commit | commitdiff | tree

Shiyan Deng [Thu, 19 Aug 2021 17:16:26 +0000 (10:16 -0700)]

[hpc] use fx2trt for exploration track (#63535)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63535

Reviewed By: yinghai, jianyuh

Differential Revision: D30272810

fbshipit-source-id: 61f3edf2a2282cd8c268a92acf92feb05a6ae3e1

commit | commitdiff | tree

Shiyan Deng [Thu, 19 Aug 2021 17:16:26 +0000 (10:16 -0700)]

Add permute021 fx2trt converter (#63238)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63238

Reviewed By: yinghai

Differential Revision: D30295373

fbshipit-source-id: 2a189fe485edaa978fd03e4b8d8582edb34ec648

commit | commitdiff | tree

Scott Wolchok [Thu, 19 Aug 2021 16:49:12 +0000 (09:49 -0700)]

[PyTorch] Test IValue move/copy/assign/swap more (#54717)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54717

Hit more tags in these tests
ghstack-source-id: 136140508

Test Plan: buck test //caffe2/aten:ivalue_test

Reviewed By: anjali411

Differential Revision: D27339736

fbshipit-source-id: 610c8e92846bb70ba725ab117440326ab50af5ce

commit | commitdiff | tree

David Esiobu [Thu, 19 Aug 2021 16:15:34 +0000 (09:15 -0700)]

Use linecache.lazycache to cache generated code. (#63453)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63453

Instead of patching linecache.getlines, use linecache.lazycache and
parts of the loader protocol described in PEP-302

Test Plan:
python3 test/test_fx.py

Imported from OSS

Reviewed By: suo

Differential Revision: D30388176

fbshipit-source-id: 92933711ecf3a21a07e1d6b0d1185ab0efd8341c

commit | commitdiff | tree

anjali411 [Thu, 19 Aug 2021 15:41:08 +0000 (08:41 -0700)]

Add fastpath for dot and vdot when the inputs have conj bit set to True (#62915)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62915

As much as 45% and 20% perf improvement on CUDA and CPU respectively.
consistent improvement in perf for all cases -- see perf numbers in comments below

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30404006

Pulled By: anjali411

fbshipit-source-id: 565940da28c7761d993cf43346932c24292e8a4d

commit | commitdiff | tree

Till Hoffmann [Thu, 19 Aug 2021 15:28:55 +0000 (08:28 -0700)]

Poisson zero rate (#61511)

Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/53485 by allowing zero rates for the Poisson distribution. This implementation is consistent with `scipy.stats.poisson` which admits zero rates. In addition to addressing the aforementioned issue, this PR makes two supporting changes:

1. add a `nonnegative` constraint to enforce non-negative rates for the Poisson distribution.
2. adjust the evaluation of the gradient of `xlogy` such that it is well defined for `x == 0 and y == 0`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61511

Reviewed By: ejguan

Differential Revision: D30352917

Pulled By: albanD

fbshipit-source-id: f3d33da58360e80d75eb83519f199b93232a2a2d

commit | commitdiff | tree

Jeff Daily [Thu, 19 Aug 2021 14:49:43 +0000 (07:49 -0700)]

add distributed/_sharded_tensor/test_sharded_tensor to ROCM_BLOCKLIST (#63508)

Summary:
Fixes current ROCm CI test2 brokenness until tensorpipe is fully supported by ROCm.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63508

Reviewed By: ejguan

Differential Revision: D30406450

Pulled By: walterddr

fbshipit-source-id: c07509271d5d33901f3eaf7ffb916dc3626e1f9a

commit | commitdiff | tree

Ilqar Ramazanli [Thu, 19 Aug 2021 14:15:16 +0000 (07:15 -0700)]

To fix the chainability at epoch zero for some schedulers (#63457)

Summary:
It has been discussed in the https://github.com/pytorch/pytorch/pull/60836#issuecomment-899084092 that we have observed an obstacle to chain some type of learning rate schedulers. In particular we observed

* some of the learning rate schedulers returns initial learning rates at epoch 0 as
```
       return self.base_lrs`
```

* This can be a problem when two schedulers called as chained as

```
     scheduler1.step()
     scheduler2.step()
```

in particular, we completely ignore the effect of scheduler1 at epoch 0.  This could not be an issue if at epoch 0, scheduler1 was ineffective as in many schedulers, however for schedulers as WarmUp Schedulers, where at epoch 0 schedulers multiplicative value is smaller than 1 this could lead to undesired behaviors.

The following code snippet illustrates the problem better

## Reproducing the bug

```python
import torch
from torch.nn import Parameter
from torch.optim import SGD
from torch.optim.lr_scheduler import WarmUpLR, ExponentialLR

model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 1.0)
scheduler1 = WarmUpLR(optimizer, warmup_factor=0.1, warmup_iters=5, warmup_method="constant")
scheduler2 = ExponentialLR(optimizer, gamma=0.9)

for epoch in range(10):
     print(epoch, scheduler2.get_last_lr()[0])
     optimizer.step()
     scheduler1.step()
     scheduler2.step()
```

### Current Result

```
0 1.0
1 0.9
2 0.81
3 0.7290000000000001
4 0.6561000000000001
5 5.904900000000001
6 5.314410000000001
7 4.782969000000001
8 4.304672100000001
9 3.874204890000001
```

### Expected Result

```
0 1.0
1 0.9
2 0.81
3 0.7290000000000001
4 0.6561000000000001
5 0.5904900000000001
6 0.5314410000000001
7 0.4782969000000001
8 0.4304672100000001
9 0.3874204890000001
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63457

Reviewed By: datumbox

Differential Revision: D30424160

Pulled By: iramazanli

fbshipit-source-id: 3e15af8d278c872cd6f53406b55f4d3ce5002867

commit | commitdiff | tree

Alban Desmaison [Thu, 19 Aug 2021 13:47:31 +0000 (06:47 -0700)]

Update full backward hook doc with not-same-object note (#63245)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/61446

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63245

Reviewed By: ejguan

Differential Revision: D30352656

Pulled By: albanD

fbshipit-source-id: 7000ecb54a80f2da968ec7600b98574b608578ae

commit | commitdiff | tree

Mike Iovine [Thu, 19 Aug 2021 13:37:44 +0000 (06:37 -0700)]

[Static Runtime] Support __getitem__ for lists (#63398)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63398

This change provides a native `__getitem__` implementation for lists to avoid overhead associated with falling back to the JIT interpreter.

Test Plan: Unit tests: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest`

Reviewed By: hlu1

Differential Revision: D30368464

fbshipit-source-id: e0e0971508cd5d9bcf6025606993dc24ecbf6764

commit | commitdiff | tree

Alban Desmaison [Thu, 19 Aug 2021 13:19:20 +0000 (06:19 -0700)]

Revert D29399533: Hoisting common expressions out of If blocks

Test Plan: revert-hammer

Differential Revision:
D29399533 (https://github.com/pytorch/pytorch/commit/9477211e7d609ce382c0e22d7721c14c36d083de)

Original commit changeset: 9336b9dc48c0

fbshipit-source-id: f081c7280203f40328bcbb0c03a7c6a007acedb7

commit | commitdiff | tree

Chen Lai [Thu, 19 Aug 2021 09:12:44 +0000 (02:12 -0700)]

Fix interpreter debug logging message (#63499)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63499

https://github.com/pytorch/pytorch/pull/62418 combine the instruction and debug handle. This change fix the debugging message.
ghstack-source-id: 136184053

Test Plan: Uncomment and it works

Reviewed By: kimishpatel, raziel

Differential Revision: D30390699

fbshipit-source-id: e32b7b297ad3b7d8bffebd025d15519083a244c4

commit | commitdiff | tree

Nikolay Korovaiko [Thu, 19 Aug 2021 05:59:40 +0000 (22:59 -0700)]

layernom inplace (#63437)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63437

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30388824

Pulled By: Krovatkin

fbshipit-source-id: 852d19bf238544c5de177ed5854dcd01c7ae5572

commit | commitdiff | tree

Nikolay Korovaiko [Thu, 19 Aug 2021 05:59:40 +0000 (22:59 -0700)]

layernorm (#63436)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63436

use MKLDNN layernorm

use mkldnn version 2

address Elias feedback

fix build CI errors

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30388825

Pulled By: Krovatkin

fbshipit-source-id: fb909bfbf53cb8567a43aac40f51c491daeec908

commit | commitdiff | tree

Mikhail Zolotukhin [Thu, 19 Aug 2021 05:56:47 +0000 (22:56 -0700)]

[TensorExpr] Make CacheReplacer and IndexFlattener mutate stmts/exprs inplace. (#63527)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63527

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30411411

Pulled By: ZolotukhinM

fbshipit-source-id: efb14ee57b36537fa4fefa89bdd6bafe7151c012

commit | commitdiff | tree

Mikhail Zolotukhin [Thu, 19 Aug 2021 05:56:47 +0000 (22:56 -0700)]

[TensorExpr] Speedup ExternalCall.ComputeInterop test by reducing tensor sizes. (#63526)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63526

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30411410

Pulled By: ZolotukhinM

fbshipit-source-id: d9a99afac14d2238b5100c98ae9ed4467f9f05ea

commit | commitdiff | tree

Michael Dagitses [Thu, 19 Aug 2021 04:39:18 +0000 (21:39 -0700)]

support optional comparisons with different but comparable types (#62890)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/62565

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62890

Reviewed By: ejguan

Differential Revision: D30396008

Pulled By: dagitses

fbshipit-source-id: fca02207509f882973d54484f89c4d116505fc66

commit | commitdiff | tree

Edward Yang [Thu, 19 Aug 2021 03:56:25 +0000 (20:56 -0700)]

Beef up comment in AccumulateType (#63503)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63503

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30403160

Pulled By: ezyang

fbshipit-source-id: 6cb24418152d9fb146f86b6f973ec50f1a397a58

commit | commitdiff | tree

Yinbin Ma [Thu, 19 Aug 2021 03:52:17 +0000 (20:52 -0700)]

BF16 allreduce hook (#63260)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63260

Add BF16 all-reduce communication hook. Skip if CUDA version < 11 or NCCL version < 2.9.7.

Reviewed By: SciPioneer

Differential Revision: D30238317

fbshipit-source-id: bad35bf7d43f10f1c40997a282b831b61ef592bb

commit | commitdiff | tree

John Clow [Wed, 18 Aug 2021 23:28:02 +0000 (16:28 -0700)]

Hoisting common expressions out of If blocks (#59492)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59492

Adding code to find common expressions from the two subblocks of an if
operation and hoist them before the if block.
This also allows Dead Code Elimination to
then eliminate some if blocks.

Also eliminated some dead code in the codebase.

Test Plan:
python test_jit.py TestIfHoisting

Imported from OSS

Reviewed By: ngimel

Differential Revision: D29399533

fbshipit-source-id: 9336b9dc48c02c38862f98f98cd72fc1767a1802

commit | commitdiff | tree

Amy He [Wed, 18 Aug 2021 23:23:48 +0000 (16:23 -0700)]

Nnapi Delegation: Quick improvements (#63489)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63489

A few quick improvements to the Android NNAPI Delegate, some of which were discussed here https://github.com/pytorch/pytorch/pull/62272:
1) `throw std::exception` replaced with `TORCH_CHECK` to reduce runtime
size (nnapi_backend_lib.cpp)
2) weights processing moved from compile to preprocess step, since it can
be done AOT (nnapi_backend_lib.cpp & nnapi_backend_preprocess.cpp)
3) `ser_model_` and `shape_compute_module_` member variables removed, since they are never used after
`init()`, so they are not needed (nnapi_backend_lib.cpp)

Test Plan:
Unit tests: `python test/test_jit.py TestNnapiBackend`
Run SparkAR segmentation with delegated NNAPI as done here D30259033 (can use `jf download GAekdAwsyGKXhggFALN4LnSBTzcubsIXAAAz --file "v303-nnd-mod.ptl"` to get a preprocessed model from these changes)

Imported from OSS

Reviewed By: raziel, iseeyuan

Differential Revision: D30398880

fbshipit-source-id: b6872e1e9ccd583622b80659da00c83fdd82580e

commit | commitdiff | tree

kshitij12345 [Wed, 18 Aug 2021 23:08:48 +0000 (16:08 -0700)]

[fix] tensor_split : non-contiguous indices tensor (#63390)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/63281

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63390

Reviewed By: ejguan

Differential Revision: D30362649

Pulled By: mruberry

fbshipit-source-id: 3ea3ad02199e4345beb0b580d056babd56112309

commit | commitdiff | tree

Sangbaek Park [Wed, 18 Aug 2021 22:50:33 +0000 (15:50 -0700)]

[Vulkan] Fix incorrect input range for Hardshrink tests (#63515)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63515

Fixed inappropriate input range for Hardshrink tests:
The range -10 ~ +10 for input tensors is more proper when we use the test set of lambda {-4.2, -1.0, -0.42, 0.0, 0.42, 1.0, 4.2, 42.42}.
ghstack-source-id: 136141416

Test Plan:
```build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output
adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test
adb shell "/data/local/tmp/vulkan_api_test"
```
Note that the test can fail sporadically due to the precision loss by FP16(Vulkan)/FP32(CPU). This issue will be handled separately after some design discussions.

Reviewed By: SS-JIA

Differential Revision: D30389646

fbshipit-source-id: 7224bd8ba4e4972f5fc147df8a0cb84808f8c62e

commit | commitdiff | tree

Rong Rong (AI Infra) [Wed, 18 Aug 2021 22:02:05 +0000 (15:02 -0700)]

using PR number instead of IN_PULL_REQUEST (#63360)

Summary:
PR numbers should be available on GHA after this.

This fixes some target determinator not working issue discovered when manually running: https://github.com/pytorch/pytorch/issues/63412.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63360

Reviewed By: malfet, zhouzhuojie, seemethere

Differential Revision: D30374615

Pulled By: walterddr

fbshipit-source-id: eee8d8bb7aa4308a6a50cfdcd4423a96d846777f

commit | commitdiff | tree

Mike Iovine [Wed, 18 Aug 2021 21:56:51 +0000 (14:56 -0700)]

[Static Runtime] Benchmark reports native nodes (#63346)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63346

We have seen that we can get significant perf wins essentially for free by implementing native ops for ops that we cannot write out variants for (e.g. TupleUnpack D30306955 (https://github.com/pytorch/pytorch/commit/078b8004a62a51f75e1fbd8d08eea359af6bb1d7), append D30326461 (https://github.com/pytorch/pytorch/commit/9d9e7a8d7294834ddad957ddb1f4cd5a0e741e55)). Therefore, whether or not SR is using a native implementation is valuable information. By capturing this in the benchmarking suite, we can hopefully avoid wasting time profiling/manually inspecting `native_ops.cpp`

Reviewed By: hlu1

Differential Revision: D30346752

fbshipit-source-id: 205b090513b6a5a6ce4cb92f75ab0395b15d08f9

commit | commitdiff | tree

Mostafa Elhoushi [Wed, 18 Aug 2021 21:47:40 +0000 (14:47 -0700)]

[FX] make ASTReriter patch wrapped functions properly (#62987)

Summary:
reference the same global namespace (instead of copying it) in ASTRewriter to patch wrapped functions properly

Fixes #{62071}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62987

Test Plan:
To test it you may write this snippet and ensure the results are as shown in the comments:

```
import torch
import torch.fx

torch.fx.wrap
def to_be_wrapped(x):
    return torch.relu(x)

class Foo(torch.nn.Module):
    def forward(self, x):
        return to_be_wrapped(x)

traced = torch.fx.symbolic_trace(Foo())
print(traced.graph)
"""
graph():
    %x : [#users=1] = placeholder[target=x]
    %to_be_wrapped : [#users=1] = call_function[target=__main__.to_be_wrapped](args = (%x,), kwargs = {})
    return to_be_wrapped
"""

from torch.fx.experimental.rewriter import RewritingTracer

rt = RewritingTracer()
graph = rt.trace(Foo())
print(graph)
"""
### AFTER FIX (CORRECT):
graph():
    %x : [#users=1] = placeholder[target=x]
    %to_be_wrapped : [#users=1] = call_function[target=__main__.to_be_wrapped](args = (%x,), kwargs = {})
    return to_be_wrapped

### BEFORE FIX (WRONG):
graph():
    %x : [#users=1] = placeholder[target=x]
    %relu : [#users=1] = call_function[target=torch.relu](args = (%x,), kwargs = {})
    return relu
"""
```

Reviewed By: ansley

Differential Revision: D30396176

Pulled By: mostafaelhoushi

fbshipit-source-id: f61eddf32e9ef42b5f5c3ce21d559945214ee833

commit | commitdiff | tree

Dhruv Matani [Wed, 18 Aug 2021 21:47:19 +0000 (14:47 -0700)]

[PyTorch] Avoid using std::regex for device string parsing in Device.cpp (#63464)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63464

This was previously committed as D30281388 (https://github.com/pytorch/pytorch/commit/4d6f98ecada2d85b2474b023838debad4305316d), but was reverted due to t98478641. jnkwok1 confirmed that this change was not the root cause, so trying to land it again.

Currently, `std::regex` is used for parsing device strings. This is undesirable for a few reasons.

1. Increases binary size
2. Slows down model loading
3. Potentially uses more memory at runtime
4. Takes marginally longer time to build code that uses std::regex v/s not using std::regex

This change avoids the use of `std::regex` for parsing the device string since we don't need to.
ghstack-source-id: 136006963
ghstack-source-id: 136081898

Test Plan:
### AI Bench Runs

**Before this change:**
1. Model Load time: [252ms](https://www.internalfb.com/intern/aibench/details/332471502816548)
2. Model unload time: 3.5ms

**After this change:**
1. Model Load time: [240ms](https://www.internalfb.com/intern/aibench/details/652195589031318), which is an approx 5% reduction for the current model. I suspect percentage wise, it will be larger for smaller models since this is a fixed cost reduction.
2. Model unload time: 3.3ms (probably too small to be meaningfully impactful to an end user).

### BSB Results

```
D30281388 (https://github.com/pytorch/pytorch/commit/4d6f98ecada2d85b2474b023838debad4305316d)-V1 (https://www.internalfb.com/intern/diff/D30281388 (https://github.com/pytorch/pytorch/commit/4d6f98ecada2d85b2474b023838debad4305316d)/?dest_number=135713848)

messenger-pika-optimized-device: Succeeded
Change in Download Size for arm64 + 3x assets variation: -7.1 KiB
Change in Uncompressed Size for arm64 + 3x assets variation: -17.6 KiB

Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:551399955987465@base/bsb:551399955987465@diff/
```

Reviewed By: raziel, pavithranrao

Differential Revision: D30388269

fbshipit-source-id: 10942e7aa56f9ea47aa479a8f50187f2ce2899bf

commit | commitdiff | tree

Mikhail Zolotukhin [Wed, 18 Aug 2021 21:46:25 +0000 (14:46 -0700)]

[TensorExpr] IRSimplifier: sort terms in polynomials, terms, minterms, maxterms. (#63197)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63197

This solves non-determinism from using hash values in sort methods.
Changes in tests are mostly mechanical.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30292776

Pulled By: ZolotukhinM

fbshipit-source-id: 74f57b53c3afc9d4be45715fd74781271373e055

commit | commitdiff | tree

Mikhail Zolotukhin [Wed, 18 Aug 2021 21:46:25 +0000 (14:46 -0700)]

[TensorExpr] Add debug logging to LoopNest::computeInline. (#63196)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63196

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30292778

Pulled By: ZolotukhinM

fbshipit-source-id: d8a111b75466a9354f6d048119cc6f814c9d5abb

commit | commitdiff | tree

Michael Dagitses [Wed, 18 Aug 2021 20:43:54 +0000 (13:43 -0700)]

clarify that `torch.finfo.tiny` is the smallest normal number (#63241)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63241

This is a common source of confusion, but it matches the NumPy
behavior.

Fixes #44010
Fixes #59526

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30307646

Pulled By: dagitses

fbshipit-source-id: d848140ba267560387d83f3e7acba8c3cdc53d82

commit | commitdiff | tree

Alexander Grund [Wed, 18 Aug 2021 20:33:36 +0000 (13:33 -0700)]

Fix segmentation fault due to access to destroyed CudaIPCGlobalEntities instance (#56141)

Summary:
There is an instance of the static destruction order fiasco where cuda_ipc_global_entities may be accessed after it is destroyed. See https://github.com/pytorch/pytorch/issues/51961

This change uses a flag and avoids accesses to the destroyed class when it is set to false.

Fixes https://github.com/pytorch/pytorch/issues/51961

This removes the function to clear shared_blocks introduced by https://github.com/pytorch/pytorch/issues/53080 which had multiple issues: Unprotected access to a shared structure and modification of the vector which is being cleared by the destructors of the objects contained.
I.e. what happened was:

- `CudaIPCSentDataLimbo_.clear_shared_blocks();` is called from the destructor of CudaIPCGlobalEntities as of your PR
- This deletes instances of `CudaIPCSentData` which hold `at::DataPtr` created by `GetNewRefCountedSentData`
- This means `CudaIPCSentDataDelete` is called with still active pointers
- Hence `CudaIPCSentDataLimbo_.add` is called adding a new value to `shared_blocks_`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56141

Reviewed By: ejguan

Differential Revision: D30397279

Pulled By: VitalyFedyunin

fbshipit-source-id: ce4b8b90fa1c90d275e5eca93ba84321cbc6140a

commit | commitdiff | tree

Charles David Hernandez [Wed, 18 Aug 2021 20:30:35 +0000 (13:30 -0700)]

Bugfix for fuse qconfig comparison (#63384)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63384

In some cases the changes to qconfig on module would cause the
fusions to fail. This bugfix solves that problem by adding a
qconfig_function_comparison that compares the functions within the
qconfig rather than the modules the qconfigs are on. The comparison
looks at the partial object within QConfig.activation/weight.p and
compares args, keywords and func. This is necessary to do mannually
because partial doesn't have __eq__ implemented and so == reverts to is.

Test Plan:
python test/test_quantization.py
TestFuseFx.test_problematic_fuse_example

Imported from OSS

Reviewed By: supriyar, ejguan

Differential Revision: D30386264

fbshipit-source-id: 51e358c021c39d6f48dc12ad2a82b2838677b9de

commit | commitdiff | tree

BowenBao [Wed, 18 Aug 2021 20:25:19 +0000 (13:25 -0700)]

[ONNX] Fix for batchnorm training op mode (#52758) (#62760)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62760

* Rebase

# Conflicts:
# torch/csrc/jit/passes/onnx/eval_peephole.cpp

# Conflicts:
# test/onnx/test_utility_funs.py
# torch/onnx/symbolic_opset9.py

* Update symbolic_opset12.py

* Update test.sh
# Conflicts:
# .jenkins/caffe2/test.sh

* Merge

* Fix utility tests

# Conflicts:
# test/onnx/test_pytorch_onnx_onnxruntime.py
# test/onnx/test_utility_funs.py

* Fix for comment

* Enable BN tests

* Fix for test

* Update test_pytorch_onnx_onnxruntime.py

* Update test_pytorch_onnx_onnxruntime.py

* Update test_utility_funs.py

* Update test_pytorch_onnx_onnxruntime.py

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30349060

Pulled By: msaroufim

fbshipit-source-id: 93312c17607974731c17099ae181acb6e4c1c409

commit | commitdiff | tree

BowenBao [Wed, 18 Aug 2021 20:25:19 +0000 (13:25 -0700)]

[ONNX] Remove aten parameter (#61652) (#62759)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62759

* remove aten argument in export()

* add export_to_pretty_string default value OperatorExportTypes.ONNX

* add DPYTORCH_ONNX_CAFFE2_BUNDLE description

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30349062

Pulled By: msaroufim

fbshipit-source-id: d9738f3aa8b80eac54548d0b9494f9f1e544f20f

Co-authored-by: Gary Miguel <garymiguel@microsoft.com>

commit | commitdiff | tree

BowenBao [Wed, 18 Aug 2021 20:25:19 +0000 (13:25 -0700)]

[ONNX] Add support for opset14 in PT-ONNX exporter (#59486) (#62758)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62758

* Add initial changes for opset14

* Fixed flake

* Add onnx submodule changes and removed utility func tests

* Add updated batchNorm symbolic

* Add triu/tril symbolics

* Fix lint

* Fixed test failures

* Add reshape with allowzero

* Added tests/refactored opset versioning

* Bump onnxruntime version

* Fix clang/lint failures

* Add reshape shape inference for opset 14

* Changes for allowzero

* Fix lint/clang and test failures

* Updated PR

* Flake fixes

* Fix flake

* Remove new_jit_api tests

* Add opset14 models

* Update allowzero

* Fix test failures

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30349063

Pulled By: msaroufim

fbshipit-source-id: 54724246149b01a2f627c43d7396253a7e9c9eb9

Co-authored-by: Shubham Bhokare <sbhokare@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

commit | commitdiff | tree

BowenBao [Wed, 18 Aug 2021 20:25:19 +0000 (13:25 -0700)]

[ONNX] Support lstm_cell symbolic (#61476) (#62757)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62757

Support lstm_cell symbolic

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30349061

Pulled By: msaroufim

fbshipit-source-id: f236177e3e5c62a30b7e4d91a623bcaef21b5eb1

Co-authored-by: jiafatom <jiafa@microsoft.com>

commit | commitdiff | tree

James Reed [Wed, 18 Aug 2021 20:16:01 +0000 (13:16 -0700)]

[FX] Fix GraphModule deepcopy to use deepcopied graph (#63090)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63090

Test Plan: Imported from OSS

Reviewed By: ansley

Differential Revision: D30252471

Pulled By: jamesr66a

fbshipit-source-id: cafd7d7917935a5ea6ffa2a7fe9e9b2a9578b3e3

commit | commitdiff | tree

Basil Hosmer [Wed, 18 Aug 2021 19:06:53 +0000 (12:06 -0700)]

MaybeOwned page for dev wiki (#63450)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63450

Brief guide to understanding `MaybeOwned<Tensor>`, aimed at C++ PT devs who are obliged to interact with existing uses of it, rather than encouraging new usage.

For reviewers: I haven't yet added a link to this page from anywhere. I'm thinking the right place is the [dev wiki main page C++ section](https://github.com/pytorch/pytorch/wiki#c) but happy to put it wherever makes sense, suggestions welcome.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30402313

Pulled By: bhosmer

fbshipit-source-id: 69b15909ecafcd8d88e44f664f88c3ad4eb26d84

commit | commitdiff | tree

peterjc123 [Wed, 18 Aug 2021 18:41:42 +0000 (11:41 -0700)]

Disable RDYNAMIC check with MSVC (#62949)

Summary:
When testing with clang-cl, the flag is added though it is unsupported and that generates a few warnings. Tried a few alternatives like https://cmake.org/cmake/help/latest/module/CheckLinkerFlag.html, but they just don't work.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62949

Reviewed By: zhouzhuojie, driazati

Differential Revision: D30359206

Pulled By: malfet

fbshipit-source-id: 1bd27ad5772fe6757fa8c3a4bddf904f88d70b7b

commit | commitdiff | tree

Michael Dagitses [Wed, 18 Aug 2021 18:39:12 +0000 (11:39 -0700)]

document why wrappers exist in `torch.functional` (#62847)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/62844.

These wrappers are not super obvious, but ultimately stem from the lack of support for functions with variadic args in native_functions.yaml. https://github.com/pytorch/pytorch/issues/62845 tracks that issue.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62847

Reviewed By: VitalyFedyunin

Differential Revision: D30305016

Pulled By: dagitses

fbshipit-source-id: 716fcecb0417b770bc92cfd8c54f7ead89070896

commit | commitdiff | tree

Rohan Varma [Wed, 18 Aug 2021 18:38:11 +0000 (11:38 -0700)]

[DDP] Add a debug check in cpp fp16 compress (#63379)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63379

this codepath has been prone to bugs as seen in the below diff, this
will help ensure against changes/refactors that touch this, as a basic sanity
check. Enabled it in debug-only builds to not affect the perf.
ghstack-source-id: 136056093

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D30358440

fbshipit-source-id: e1b3893a223722c2593ceed8696a09c7d07d47c1

commit | commitdiff | tree

Rohan Varma [Wed, 18 Aug 2021 18:38:11 +0000 (11:38 -0700)]

[DDP][Grad compression] Fix fp16 cpp hook (#63375)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63375

I think tensor.copy_(tensor.to(torch::kFloat16)); will keep it as
float32.

Tested by add the following line:

```
LOG(INFO) << "Type is: " << compressed_tensor.scalar_type();
```

before:

```
I0816 17:03:09.823688 364141 default_comm_hooks.cpp:21] Type is: Float
```
after:

```
I0816 17:01:16.779052 353924 default_comm_hooks.cpp:21] Type is: Half
```
ghstack-source-id: 136056092

Test Plan: ci

Reviewed By: SciPioneer

Differential Revision: D30356256

fbshipit-source-id: 8208a705acd7628541cd43c8bf61d007dfdd2435

commit | commitdiff | tree

Stas Bekman [Wed, 18 Aug 2021 18:37:07 +0000 (11:37 -0700)]

[doc] pre-commit fix instructions (#61717)

Summary:
fix invalid instruction

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61717

Reviewed By: zhouzhuojie, driazati

Differential Revision: D30359218

Pulled By: malfet

fbshipit-source-id: 61771babeac4d34425a61ce49f38a7099b521eec

commit | commitdiff | tree

Heitor Schueroff [Wed, 18 Aug 2021 18:30:44 +0000 (11:30 -0700)]

Make SkipInfo with expected_failure an XFAIL (#63481)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63481

This PR changes the SkipInfo decorators to use unittest.expectedFailure so that the test reports as XFAIL as opposed to PASSED.

Note that changing the expectedFailure here https://github.com/pytorch/pytorch/blob/30e1c74dc19ae2b622b46ebcdb7972c42775ac80/torch/testing/_internal/common_device_type.py#L879 to an XFAIL is not possible because the decision of whether to decorate is delayed until the wrapper function is called.

fixes https://github.com/pytorch/pytorch/issues/63363

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30397154

Pulled By: heitorschueroff

fbshipit-source-id: c5e4911969ad8667763eec4203dbbc6a51178592

commit | commitdiff | tree

soulitzer [Wed, 18 Aug 2021 18:29:51 +0000 (11:29 -0700)]

Improve custom function docs (#60312)

Summary:
- Adds some code examples for `ctx` methods and make requirements of arguments more clear
- Type annotations for `save_for_backward`, `mark_dirty`, `mark_non_differentiable`, and `set_materialize_grads` (BC-breaking?)
- Refactor `torch.autograd.Function` doc

Pull Request resolved: https://github.com/pytorch/pytorch/pull/60312

Reviewed By: VitalyFedyunin

Differential Revision: D30314961

Pulled By: soulitzer

fbshipit-source-id: a284314b65662e26390417bd2b6b12cd85e68dc8

commit | commitdiff | tree

Pritam Damania [Wed, 18 Aug 2021 17:46:09 +0000 (10:46 -0700)]

[6/N] Enable opt-asan for elastic and launcher tests. (#63442)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63442

Continuation of https://github.com/pytorch/pytorch/pull/62051, I've
enabled elastic and launcher tests to run in opt-asan mode which is supported
with spawn multiprocessing.

This allows us to completely get rid of fork based tests from torch.distributed
and have all tests run in spawn mode.
ghstack-source-id: 136057123

Test Plan: waitforbuildbot

Reviewed By: cbalioglu

Differential Revision: D30384267

fbshipit-source-id: ad3447cfb9d6e31e7ec8332d64c8ff1054858dcb

commit | commitdiff | tree

Shirong Wu [Wed, 18 Aug 2021 17:39:53 +0000 (10:39 -0700)]

Add validation check in fx2trt interpreter (#63424)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63424

Add validation check in fx2trt for missing converter operators. If any op missing, interpreter init will report missing operators.

Test Plan:
for call_function and call_method:
manual test with feeds benchmark and verify init failed with expected message.
{F642390780}

for call_module:
specify a module as leaf node and make acc_tracer trace it as a node; then in fx2trt.py, in CONVERTER initialize stage make it skip recording all modules; initialize interpreter and call validator function, verify the output includes the missing module name, return value print as screenshot below.

{F643458718}

Reviewed By: 842974287

Differential Revision: D30294832

fbshipit-source-id: 243dca3fdfc6a174ded65248938e2a234aec19c6

commit | commitdiff | tree

John Shen [Wed, 18 Aug 2021 17:35:55 +0000 (10:35 -0700)]

[pytorch] Make qconv forward() thread safe (#63432)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63432

There's a race condition in quantized models when multiple threads call forward() due to qnnpack packing the weights the first time the operator is called. This locks the entire apply_impl function.

Test Plan:
https://github.com/pytorch/pytorch/issues/58055

Ran the script before and after, original crashes went away

Reviewed By: kimishpatel

Differential Revision: D30229520

fbshipit-source-id: d06cabe24199a80325cd57f24a7fd60624be2cf7

commit | commitdiff | tree

Masaki Kozuki [Wed, 18 Aug 2021 16:42:14 +0000 (09:42 -0700)]

Use `fastAtomicAdd` in EmbeddingBag (mode "max") backward (#63298)

Summary:
Rel: https://github.com/pytorch/pytorch/issues/62695

### This PR
|   n_tokens |   num_embeddings |   embedding_dim | mode   |    bwd_fp32 |    bwd_fp16 |
|-----------:|-----------------:|----------------:|:-------|------------:|------------:|
|       4096 |             4096 |            4096 | max    | 0.000326228 | 0.000181448 |
|       4096 |             4096 |           16384 | max    | 0.00102805  | 0.000618136 |
|       4096 |            16384 |            4096 | max    | 0.000907326 | 0.000530422 |
|       4096 |            16384 |           16384 | max    | 0.00334988  | 0.00264645  |
|      16384 |             4096 |            4096 | max    | 0.000366449 | 0.000320232 |
|      16384 |             4096 |           16384 | max    | 0.00126421  | 0.00104183  |
|      16384 |            16384 |            4096 | max    | 0.00087738  | 0.00065068  |
|      16384 |            16384 |           16384 | max    | 0.00379229  | 0.00298201  |

### Original
|   n_tokens |   num_embeddings |   embedding_dim | mode   |    bwd_fp32 |    bwd_fp16 |
|-----------:|-----------------:|----------------:|:-------|------------:|------------:|
|       4096 |             4096 |            4096 | max    | 0.00032407  | 0.000188231 |
|       4096 |             4096 |           16384 | max    | 0.00104356  | 0.000624001 |
|       4096 |            16384 |            4096 | max    | 0.000902069 | 0.000527382 |
|       4096 |            16384 |           16384 | max    | 0.00302202  | 0.00255153  |
|      16384 |             4096 |            4096 | max    | 0.000384343 | 0.000403249 |
|      16384 |             4096 |           16384 | max    | 0.00126445  | 0.00135069  |
|      16384 |            16384 |            4096 | max    | 0.000880814 | 0.000825679 |
|      16384 |            16384 |           16384 | max    | 0.00337611  | 0.00319515  |

cc xwang233 ptrblck ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63298

Reviewed By: mruberry

Differential Revision: D30383583

Pulled By: ngimel

fbshipit-source-id: 14dd9d67002c53a153721812709033c198f68c1e

commit | commitdiff | tree

Rishi Puri [Wed, 18 Aug 2021 16:41:37 +0000 (09:41 -0700)]

Reverting launch bounds change in topK that induced a regression in perf (#63431)

Summary:
[topkwsyncs.zip](https://github.com/pytorch/pytorch/files/7003077/topkwsyncs.zip)

Running this script on nvidia containers 21.08 vs 21.07 we see the following perf drops:
topk(input=(dtype=torch.float16,shape=[60, 201600]), k=2000, dim=1, sorted=True) - 0.63

topk(input=(dtype=torch.float32,shape=[120000]), k=12000, dim=0, sorted=False) - 0.55

topk(input=(dtype=torch.float16,shape=[5, 201600]), k=2000, dim=1, sorted=True) - 0.55

topk(input=(dtype=torch.float32,shape=[1, 10000]), k=1000, dim=1, sorted=False) - 0.33

The relative perf drop is reported as (21.08_time - 21.07_time) / 21.07_time

I narrowed down the source of the regression to this commit: https://github.com/pytorch/pytorch/pull/60314
which reduced launch bounds from 1024 to 512.

The perf did not seem to regress in the original evidence provided to change 1024 to 512 due to the input shapes in the benchmark being a lot smaller than the input shapes of the tensors which I am witnessing perf regression in. I suggest reverting back to 1024 as with 512 there was no considerable improvement in perf for small inputs and a major regression in perf for large tensors.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63431

Reviewed By: mruberry

Differential Revision: D30384087

Pulled By: ngimel

fbshipit-source-id: 11eecbba82a069b1d4579d674c3f644ab8060ad2

commit | commitdiff | tree

Erjia Guan [Wed, 18 Aug 2021 15:47:27 +0000 (08:47 -0700)]

Make DataChunk support list in-place ops (#63422)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63422

Fixes #63095

Make `DataChunk` delegate to list method. Then it will support in-place operations:
- `sort`
- `reverse`
- `append`
- `extend`
- `random.shuffle`

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D30379027

Pulled By: ejguan

fbshipit-source-id: d176bd0cc8b89b915c7bb184ff243ab1f605616d

commit | commitdiff | tree

cyy [Wed, 18 Aug 2021 15:04:08 +0000 (08:04 -0700)]

A tiny fix in MT19937RNGEngine (#63219)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63219

Reviewed By: VitalyFedyunin

Differential Revision: D30341484

Pulled By: ezyang

fbshipit-source-id: 0ff4499d0f4a3dfeb991c0f10fe3248c6ca1c992

commit | commitdiff | tree

Edward Yang [Wed, 18 Aug 2021 14:45:45 +0000 (07:45 -0700)]

Implement subclass priority for __torch_dispatch__ (#63411)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63411

In order to get this behavior, you have to use append_overloaded,
which I forgot to use in the previous implementation. I exposed
an internal helper function which is more appropriate for dispatch
to Python where we know that an argument is definitely a Tensor (and
this test no longer needs to be done).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D30374489

Pulled By: ezyang

fbshipit-source-id: 43b08c00d1958c9b26d82a025d19f0b67bb85590

commit | commitdiff | tree

Jerry Zhang [Wed, 18 Aug 2021 14:36:47 +0000 (07:36 -0700)]

[fx2trt] Add dequantize support (#63448)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63448

Only available after TensorRT 8.0

Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_dequantize

Reviewed By: 842974287

Differential Revision: D30296863

fbshipit-source-id: 44b9630ef0d210e7f20e650dc81c519f7e41f5f3

commit | commitdiff | tree

Philip Meier [Wed, 18 Aug 2021 14:36:22 +0000 (07:36 -0700)]

add `OpInfo` for `torch.linalg.tensorinv` (#62326)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/53739.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62326

Reviewed By: H-Huang

Differential Revision: D30136376

Pulled By: zou3519

fbshipit-source-id: 04ec9450e8866667649af401c7559b96ddc91491

commit | commitdiff | tree

JackCaoG [Wed, 18 Aug 2021 13:42:51 +0000 (06:42 -0700)]

Update cuda amp to also check xla device (#63413)

Summary:
Fixes https://github.com/pytorch/xla/issues/3086. Pytorch/XLA:GPU also use cuda amp. I verified the pt/xla `test_autocast` with this fix and all test passed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63413

Reviewed By: ngimel

Differential Revision: D30380785

Pulled By: bdhirsh

fbshipit-source-id: fd1a1de7d224c616fc3fa90b80a688a21f6b1ecc

commit | commitdiff | tree

CodemodService FBSourceClangFormatLinterBot [Wed, 18 Aug 2021 11:18:47 +0000 (04:18 -0700)]

[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT`

Reviewed By: zertosh

Differential Revision: D30391472

fbshipit-source-id: d4eb1e7debea8905e7fee5f026c082bee65e78f3

commit | commitdiff | tree

Michael Dagitses [Wed, 18 Aug 2021 11:04:43 +0000 (04:04 -0700)]

enhance comparison tests for c10::optional (#62887)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62887

Reviewed By: VitalyFedyunin

Differential Revision: D30305044

Pulled By: dagitses

fbshipit-source-id: d0a3a9e4ea186915ef087543aaf81a606f943380

commit | commitdiff | tree

Michael Dagitses [Wed, 18 Aug 2021 10:59:51 +0000 (03:59 -0700)]

clarify the documentation of `torch.meshgrid` (#62977)

Summary:
Also warn about the behavior differences from `numpy.meshgrid`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62977

Reviewed By: mruberry, ngimel

Differential Revision: D30220930

Pulled By: dagitses

fbshipit-source-id: ae6587b41792721cae2135376c58121b4634e296

commit | commitdiff | tree

Pritam Damania [Wed, 18 Aug 2021 08:58:05 +0000 (01:58 -0700)]

[5/N] Run opt-asan with detect_leaks=0 (#63361)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63361

Python multiprocessing doesn't support LSAN and causes false positives
instead. As a result, disabling LSAN for these tests so that we can still run
with opt-asan
ghstack-source-id: 135962489

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D30352269

fbshipit-source-id: f6ab5abce7bdef00cd5e1f5977424d2b151174af

commit | commitdiff | tree

Wanchao Liang [Wed, 18 Aug 2021 06:10:48 +0000 (23:10 -0700)]

[sharded_tensor] fix typing issue for placement (#63426)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63426

placement should either be a string or a _remote_device, this fixes the type to match the behaviors
ghstack-source-id: 136041125

Reviewed By: pritamdamania87

Differential Revision: D30379702

fbshipit-source-id: 34e226494240923b433e3a39cc08c84d42cdad6b

commit | commitdiff | tree

Pavithran Ramachandran [Wed, 18 Aug 2021 05:26:22 +0000 (22:26 -0700)]

[easy][PyTorchEdge] print error message when failing to load model file (#63404)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63404

# Context
Loading a model file using `fopen` might error out for multiple reasons. Repro'ing the error on devices takes some time and efforts. Logging the error no# will help in debugging and fixing the error quickly.

# Mitigation
Printout the error message of the `fopen` to help users debug the issue.

Test Plan:
```
(base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] buck run xplat/caffe2/fb/lite_predictor:lite_predictor -- --model=/home/pavithran/models/prod/GAaNhAoTIV6cIvgJAHn30m8NR1QgbmQwAAAA.ptl --use_bundled_input=0
Building: finished in 0.5 sec (100%) 354/354 jobs, 0/354 updated
Total time: 0.6 sec
Run with 24 threads
Run with 24 threads
Loading model...
terminate called after throwing an instance of 'c10::Error'
what(): open file failed because of errno 2 on fopen: No such file or directory, file path: /home/pavithran/models/prod/GAaNhAoTIV6cIvgJAHn30m8NR1QgbmQwAAAA.ptl
Exception raised from RAIIFile at xplat/caffe2/caffe2/serialize/file_adapter.cc:15 (most recent call first):
(no backtrace available)
```

Reviewed By: dhruvbird

Differential Revision: D30372308

fbshipit-source-id: 5346e828f53f6bc5d871b403586566a3332a389a

commit | commitdiff | tree

Jerry Zhang [Wed, 18 Aug 2021 04:35:55 +0000 (21:35 -0700)]

[fx2trt] Add quantize_per_tensor support (#63447)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63447

Only available in TRT 8.0 and above

Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_quantize_per_tensor

Reviewed By: 842974287

Differential Revision: D30322844

fbshipit-source-id: dfd925e3432de128f2925b1aa55d6125e63359af

commit | commitdiff | tree

Shen Li [Wed, 18 Aug 2021 03:12:51 +0000 (20:12 -0700)]

Fix RPC Python User Function Error Handling (#63406)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63406

The `RemoteException` will be thrown on the caller side when converting
the response message to IValue. Since it is a Python error, the error
message needs to be extracted explicitly and clear the `PyErr`.

Test Plan: Imported from OSS

Reviewed By: rohan-varma, ngimel

Differential Revision: D30372741

Pulled By: mrshenli

fbshipit-source-id: 1f72a7ee0c39cc2ef070f99884c142f7b3e0543d

commit | commitdiff | tree

Aliaksandr Ivanou [Wed, 18 Aug 2021 02:54:30 +0000 (19:54 -0700)]

[torch] Set default log level for torch elastic (#63214)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63214

The default log level in fb and oss is different: in oss we use WARNING and in fb we use INFO.

Test Plan: unittests, f291441502

Reviewed By: cbalioglu

Differential Revision: D30296298

fbshipit-source-id: 89067352be767255fbc66e790ec333582de64c6c

commit | commitdiff | tree

Rohan Varma [Wed, 18 Aug 2021 00:12:32 +0000 (17:12 -0700)]

[BE] remove _SUPPORTED_OPTIM_MAP from tests (#63383)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63383

Per title
ghstack-source-id: 135966157

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D30358921

fbshipit-source-id: 965e054e525194b1ee55980340df275bab355c9b

commit | commitdiff | tree

Rohan Varma [Wed, 18 Aug 2021 00:12:32 +0000 (17:12 -0700)]

[DDP] Support step_param for AdamW (#63382)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63382

Per title
ghstack-source-id: 135966156

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D30255446

fbshipit-source-id: e6ffbf339db0bc5b4702d02b74a462309df07c75

commit | commitdiff | tree

Jerry Zhang [Tue, 17 Aug 2021 23:54:09 +0000 (16:54 -0700)]

[quant][graphmode][fx][fix] Fix quantization for tuple arguments (#63376)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63376

Previously when tuple is an argument for a quantizable op it would be transformed to a list by mistake,
this PR fixes that.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_preserve_tuple

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D30357642

fbshipit-source-id: 82d10805d9c00c003cc99983dca68b6455ff7b2e

commit | commitdiff | tree

zhouzhuojie [Tue, 17 Aug 2021 23:53:08 +0000 (16:53 -0700)]

Add more ciflow labels for more workflows (#63410)

Summary:
- Add more ciflow labels and enable it for more workflows.
- Only the 'ciflow/default' workflows are run by default on pull_request time
- Other labels can be manually triggered by (adding the labels + unassign pytorchbot), OR wait for pytorchbot's comment opt-in rollout
- The label design is a logical operator `OR`, i.e. adding ('ciflow/cuda' + 'ciflow/win') will trigger the union of them. (design feedback is needed here)

Typical default workflows for normal PRs.

<details>
<summary>Generated label rules</summary>

![image](https://user-images.githubusercontent.com/658840/129779905-eb5e56dd-a696-4040-9eb6-71ecb6487dc1.png)

```
{
  "label_rules": {
    "ciflow/all": [
      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
      "libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
      "linux-bionic-cuda10.2-py3.9-gcc7",
      "linux-bionic-py3.8-gcc9-coverage",
      "linux-xenial-cuda10.2-py3.6-gcc7",
      "linux-xenial-cuda11.1-py3.6-gcc7",
      "linux-xenial-py3.6-gcc5.4",
      "linux-xenial-py3.6-gcc7-bazel-test",
      "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
      "periodic-linux-xenial-cuda11.3-py3.6-gcc7",
      "periodic-win-vs2019-cuda11.3-py3",
      "win-vs2019-cpu-py3",
      "win-vs2019-cuda10.1-py3",
      "win-vs2019-cuda11.1-py3"
    ],
    "ciflow/bazel": [
      "linux-xenial-py3.6-gcc7-bazel-test"
    ],
    "ciflow/coverage": [
      "linux-bionic-py3.8-gcc9-coverage"
    ],
    "ciflow/cpu": [
      "linux-bionic-py3.8-gcc9-coverage",
      "linux-xenial-py3.6-gcc5.4",
      "linux-xenial-py3.6-gcc7-bazel-test",
      "win-vs2019-cpu-py3"
    ],
    "ciflow/cuda": [
      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
      "libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
      "linux-bionic-cuda10.2-py3.9-gcc7",
      "linux-xenial-cuda10.2-py3.6-gcc7",
      "linux-xenial-cuda11.1-py3.6-gcc7",
      "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
      "periodic-linux-xenial-cuda11.3-py3.6-gcc7",
      "periodic-win-vs2019-cuda11.3-py3",
      "win-vs2019-cuda10.1-py3",
      "win-vs2019-cuda11.1-py3"
    ],
    "ciflow/default": [
      "linux-bionic-py3.8-gcc9-coverage",
      "linux-xenial-cuda11.1-py3.6-gcc7",
      "linux-xenial-py3.6-gcc5.4",
      "linux-xenial-py3.6-gcc7-bazel-test",
      "win-vs2019-cpu-py3",
      "win-vs2019-cuda10.1-py3"
    ],
    "ciflow/libtorch": [
      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
      "libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
      "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7"
    ],
    "ciflow/linux": [
      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",
      "libtorch-linux-xenial-cuda11.1-py3.6-gcc7",
      "linux-bionic-cuda10.2-py3.9-gcc7",
      "linux-bionic-py3.8-gcc9-coverage",
      "linux-xenial-cuda10.2-py3.6-gcc7",
      "linux-xenial-cuda11.1-py3.6-gcc7",
      "linux-xenial-py3.6-gcc5.4",
      "linux-xenial-py3.6-gcc7-bazel-test",
      "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
      "periodic-linux-xenial-cuda11.3-py3.6-gcc7"
    ],
    "ciflow/scheduled": [
      "periodic-libtorch-linux-xenial-cuda11.3-py3.6-gcc7",
      "periodic-linux-xenial-cuda11.3-py3.6-gcc7",
      "periodic-win-vs2019-cuda11.3-py3"
    ],
    "ciflow/slow": [
      "linux-bionic-cuda10.2-py3.9-gcc7",
      "linux-xenial-cuda10.2-py3.6-gcc7"
    ],
    "ciflow/win": [
      "periodic-win-vs2019-cuda11.3-py3",
      "win-vs2019-cpu-py3",
      "win-vs2019-cuda10.1-py3",
      "win-vs2019-cuda11.1-py3"
    ]
  },
  "version": "v1"
}
```
</details>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63410

Reviewed By: ngimel

Differential Revision: D30378553

Pulled By: zhouzhuojie

fbshipit-source-id: 4e0953740793e5e72b95018f8ab2ce4a6a364c38

commit | commitdiff | tree

Masaki Kozuki [Tue, 17 Aug 2021 23:51:34 +0000 (16:51 -0700)]

`F.avg_pool3` CUDA backward: gpuAtomicAddNoReturn -> fastAtomicAdd (#63387)

Summary:
Rel: https://github.com/pytorch/pytorch/issues/62695

In the following two tables, I set `kernel_size` to 3 and `stride` to 2.
In benchmark, input tensors have the shape of (N, C, n_features, n_features, n_features).
Tested on RTX3080 w/ CUDA11.4 Update 1.

## This PR

|   N |   C |   n_features | dtype         |        time |
|----:|----:|-------------:|:--------------|------------:|
|  32 |   3 |            8 | torch.float16 | 7.46846e-05 |
|  32 |   3 |            8 | torch.float32 | 8.18968e-05 |
|  32 |   3 |           32 | torch.float16 | 0.000156748 |
|  32 |   3 |           32 | torch.float32 | 0.000165236 |
|  32 |   3 |          128 | torch.float16 | 0.00549854  |
|  32 |   3 |          128 | torch.float32 | 0.008926    |

## master (6acd87f)

|   N |   C |   n_features | dtype         |        time |
|----:|----:|-------------:|:--------------|------------:|
|  32 |   3 |            8 | torch.float16 | 7.60436e-05 |
|  32 |   3 |            8 | torch.float32 | 7.55072e-05 |
|  32 |   3 |           32 | torch.float16 | 0.000189292 |
|  32 |   3 |           32 | torch.float32 | 0.000168645 |
|  32 |   3 |          128 | torch.float16 | 0.00699538  |
|  32 |   3 |          128 | torch.float32 | 0.00890226  |

master's time divided by PR's time is as follows:

| N | C | n_features | master / PR |
|---:|---:|---------------:|----------------:|
| 32 | 3 | 8 | 1.018 |
| 32 | 3 | 32 | 1.208 |
| 32 | 3 | 128 | 1.272|

cc: xwang233 ptrblck ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63387

Reviewed By: mruberry

Differential Revision: D30381434

Pulled By: ngimel

fbshipit-source-id: 3b97aee4b0d457a0277a0d31ac56d4151134c099

commit | commitdiff | tree

Nikita Shulga [Tue, 17 Aug 2021 22:28:45 +0000 (15:28 -0700)]

Add pocketfft as submodule (#62841)

Summary:
Using https://github.com/mreineck/pocketfft

Also delete explicit installation of pocketfft during the build as it will be available via submodule

Limit PocketFFT support to cmake-3.10 or newer, as `set_source_files_properties` does not seem to work as expected with cmake-3.5

Partially addresses https://github.com/pytorch/pytorch/issues/62821

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62841

Reviewed By: seemethere

Differential Revision: D30140441

Pulled By: malfet

fbshipit-source-id: d1a1cf1b43375321f5ec5b3d0b538f58082f7825

commit | commitdiff | tree

Rohan Varma [Tue, 17 Aug 2021 22:01:21 +0000 (15:01 -0700)]

[wip] Move smallest bucket to end after rebuild buckets (#62279)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62279

Before rebuild buckets, `kDefaultFirstBucketBytes` is actually misleading because we reverse the parameter indices when initialize reducer so it is actually the size of the last bucket.

Currently rebuild buckets sets this to be the first bucket size, but seeing if keeping it as last can help perf.

This is currently experimental only and don't plan to land it unless experiments show a clear win.
ghstack-source-id: 135966897

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D29927931

fbshipit-source-id: 55b949986fa2c3bade6fcb4bf5b513461bf0f490

commit | commitdiff | tree

Kevin Tse [Tue, 17 Aug 2021 21:46:22 +0000 (14:46 -0700)]

adding a note to the documentation of polar (#63259)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63259

Fix #52919

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D30342536

Pulled By: NivekT

fbshipit-source-id: 4c61a86f96a6370cc64652bf652c4ae25c9f4601

commit | commitdiff | tree

Jerry Zhang [Tue, 17 Aug 2021 21:40:19 +0000 (14:40 -0700)]

[quant][graphmode][fx][bc-breaking] Support for reference pattern for fixqparam ops in eval mode (#62608)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62608

Insert extra fixeqparam fake quant in the output of fixed qparam ops in fbgemm e.g. sigmoid
so that we can produce reference patterns for these ops

Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps

Imported from OSS

Reviewed By: iramazanli

Differential Revision: D30053978

fbshipit-source-id: c527944b6e791bb4d45ebe96265af52794203695

commit | commitdiff | tree

Dhruv Matani [Tue, 17 Aug 2021 21:39:04 +0000 (14:39 -0700)]

Revert D30281388: [PyTorch] Avoid using std::regex for device string parsing in Device.cpp

Test Plan: revert-hammer

Differential Revision:
D30281388 (https://github.com/pytorch/pytorch/commit/4d6f98ecada2d85b2474b023838debad4305316d)

Original commit changeset: 4d998e9f313e

fbshipit-source-id: 11134b3400cc3e851155c9c1b6fb59308ff1567b

commit | commitdiff | tree

Richard Zou [Tue, 17 Aug 2021 20:39:52 +0000 (13:39 -0700)]

Fix zero-dim handling in torch.matmul (#63359)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63359

Fixes #63352. The problem was that in e.g. `torch.matmul(A, B)` with A,
B having shapes [3, 2, 0] and [0, 2], the code attempts to call
`A.view(-1, 0)` which fails due to "-1 being ambiguous". The solution is
to manually compute what we want the shape of the view to be.

Test Plan: - new tests

Reviewed By: ngimel

Differential Revision: D30351583

Pulled By: zou3519

fbshipit-source-id: 7625691fe8b85d96a4073409596a932c303e3e8c

commit | commitdiff | tree

Mikhail Zolotukhin [Tue, 17 Aug 2021 20:39:36 +0000 (13:39 -0700)]

[TensorExpr] Add a wrapper for all expr and stmt pointers. (#63195)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63195

This helps us to later switch from using KernelArena with raw pointers
to shared pointers without having to change all our source files at
once.

The changes are mechanical and should not affect any functionality.

With this PR, we're changing the following:
* `Add*` --> `AddPtr`
* `new Add(...)` --> `alloc<Add>(...)`
* `dynamic_cast<Add*>` --> `to<Add>`
* `static_cast<Add*>` --> `static_to<Add>`

Due to some complications with args forwarding, some places became more
verbose, e.g.:
* `new Block({})` --> `new Block(std::vector<ExprPtr>())`

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30292779

Pulled By: ZolotukhinM

fbshipit-source-id: 150301c7d2df56b608b035827b6a9a87f5e2d9e9

commit | commitdiff | tree

Kushashwa Ravi Shrimali [Tue, 17 Aug 2021 20:35:32 +0000 (13:35 -0700)]

OpInfo fix: `conv_transpose2d` (#63389)

Summary:
Addresses comment: https://github.com/pytorch/pytorch/pull/62882#issuecomment-899679606.

cc: mruberry ngimel

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63389

Reviewed By: mruberry

Differential Revision: D30377481

Pulled By: ngimel

fbshipit-source-id: 0fa21acc3503c259c9b27463e8555247c43d9e2e

commit | commitdiff | tree

Mike Iovine [Tue, 17 Aug 2021 20:34:44 +0000 (13:34 -0700)]

[Static Runtime] Implement aten::append (#63350)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63350

Add a native implementation for `aten::append`, the list append op.

Test Plan: New unit test: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Append`

Reviewed By: hlu1

Differential Revision: D30326461

fbshipit-source-id: 0dbdf6cc82e78c7c36db39583256f6b87385e3d3

commit | commitdiff | tree

Ivan Kobzarev [Tue, 17 Aug 2021 20:34:20 +0000 (13:34 -0700)]

[vulkan] Add log_softmax (#63193)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63193

Test Plan: Imported from OSS

Reviewed By: SS-JIA

Differential Revision: D30291987

fbshipit-source-id: 89c6560274e5a841e5af249f6963b67ef6826f4c

commit | commitdiff | tree

Supriya Rao [Tue, 17 Aug 2021 18:39:16 +0000 (11:39 -0700)]

[quant][fx] Ensure qconfig works for QAT with multiple modules (#63343)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63343

The previous implementation had a bug where we were trying to modify an ordered dict value while iterating through it.
This fixes it by creating a copy before modifying it.

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qconfig_qat_module_type

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D30346116

fbshipit-source-id: 0e33dad1163e8bff3fd363bfd04de8f7114d7a3a

commit | commitdiff | tree

Yi Wang [Tue, 17 Aug 2021 18:28:43 +0000 (11:28 -0700)]

Add return type hint and improve the docstring of consume_prefix_in_state_dict_if_present method (#63388)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63388

Context: https://discuss.pytorch.org/t/how-to-use-the-helper-function-consume-prefix-in-state-dict-if-present/129505/3

Make it clear that this method strips the prefix in place rather than returns a new value.

Additional reformatting is also applied.
ghstack-source-id: 135973393

Test Plan: waitforbuildbot

Reviewed By: rohan-varma

Differential Revision: D30360931

fbshipit-source-id: 1a0c7967a4c86f729e3c810686c21dec43d1dd7a

commit | commitdiff | tree

Elias Ellison [Tue, 17 Aug 2021 18:21:50 +0000 (11:21 -0700)]

Add handling of ifs to shape propagation (#62914)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62914

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30196945

Pulled By: eellison

fbshipit-source-id: 1c0c7f938c4547330fd1dba8ab7dd0b99a79b6a9

commit | commitdiff | tree

Elias Ellison [Tue, 17 Aug 2021 18:21:50 +0000 (11:21 -0700)]

Small shape analysis changes (#62911)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62911

Test Plan: Imported from OSS

Reviewed By: anjali411

Differential Revision: D30196946

Pulled By: eellison

fbshipit-source-id: 2562bab323088d9c1440ae0431e533f9bcc513d3

commit | commitdiff | tree

Elias Ellison [Tue, 17 Aug 2021 18:21:50 +0000 (11:21 -0700)]

Add a few peepholes (#62910)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62910

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30196947

Pulled By: eellison

fbshipit-source-id: d88c92616d4de4f47ff4fcf5c1994e629ca20395

commit | commitdiff | tree

Elias Ellison [Tue, 17 Aug 2021 18:21:50 +0000 (11:21 -0700)]

Propagate symbolic dimensions through idioms like x.view(y.size()) (#61975)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61975

Propagate symbolic dimensions through size calls. We did this by associating SymbolicSizes with integer inputs by looking through their constructors for `x.size(1)` or `x.size()` nodes.

Test Plan: Imported from OSS

Reviewed By: gchanan

Differential Revision: D30196948

Pulled By: eellison

fbshipit-source-id: 377fc1d2f6d396c52dc0e87fa814b15720f1414e

commit | commitdiff | tree

Jerry Zhang [Tue, 17 Aug 2021 17:41:38 +0000 (10:41 -0700)]

[fx2trt] Refactor linear op to use mm + add

Summary:
Previously linear is translated to fully_connected which only works when weight is a constant,
this diff changes that to mm + add so that the weight can be an ITensor so that we can have the weight - quantize - dequantize
pattern in the produced TensorRT network

Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_linear

Reviewed By: 842974287

Differential Revision: D30294751

fbshipit-source-id: 596fbd4c81caef8df41a002a2e14fbf22d9d2a80

Domain: Machine Learning / ML Framework;

RSS Atom