platform/upstream/pytorch.git
3 years agoTurn off layer norm in jit symbolic differentiation (#63816)
Xiaodong Wang [Tue, 24 Aug 2021 22:45:59 +0000 (15:45 -0700)]
Turn off layer norm in jit symbolic differentiation (#63816)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63816

Test Plan:
Confirmed this can rescue the NE:

https://www.internalfb.com/mast/job/torchx_xdwang-SparseNNApplication_72cf593d

Reviewed By: ngimel

Differential Revision: D30498746

fbshipit-source-id: 4a387f32ee2f70685de6104459c7f21bfbddc187

3 years agoAdd a common autograd TLS state (#63860)
Alban Desmaison [Tue, 24 Aug 2021 22:32:42 +0000 (15:32 -0700)]
Add a common autograd TLS state (#63860)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63860

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Differential Revision: D30513253

Pulled By: albanD

fbshipit-source-id: 97d76ed54dfbdf4ba3fc7051ce3b9bb636cefb4b

3 years ago.github: Enable with-ssh for Windows (#63440)
Eli Uriegas [Tue, 24 Aug 2021 21:13:04 +0000 (14:13 -0700)]
.github: Enable with-ssh for Windows (#63440)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63440

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS

Reviewed By: janeyx99

Differential Revision: D30521460

Pulled By: seemethere

fbshipit-source-id: e987e170e73fb4f9d9f024bed0e58404ed206848

3 years ago[FX] Fix _replicate_for_data_parallel (#63821)
James Reed [Tue, 24 Aug 2021 20:44:52 +0000 (13:44 -0700)]
[FX] Fix _replicate_for_data_parallel (#63821)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63821

Test Plan: Imported from OSS

Reviewed By: suo

Differential Revision: D30502115

Pulled By: jamesr66a

fbshipit-source-id: 0f004f95def6e1ba21ccbeab40cb0a739a0ad20c

3 years agoDo not modify saved variables in-place for spectral norm during power iteration ...
soulitzer [Tue, 24 Aug 2021 20:02:27 +0000 (13:02 -0700)]
Do not modify saved variables in-place for spectral norm during power iteration (#62293)

Summary:
Interestingly enough, the original code did have a mechanism that aims to prevent this very issue:
but it performs a clone AFTER modifying u and v in-place.
This wouldn't work though because we can later use the cloned u and v in operations that save for backward, and the next time we execute forward, we modify the same cloned u and v in-place.
So if the idea is that we want to avoid modifying saved variable in-place we should clone it BEFORE the in-place operation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62293

Reviewed By: bdhirsh

Differential Revision: D30489750

Pulled By: soulitzer

fbshipit-source-id: cbe8dea885aef97adda8481f7a822e5bd91f7889

3 years agoMigrate legacy lstsq from THC to ATen (CUDA) (#63504)
Peter Bell [Tue, 24 Aug 2021 19:43:27 +0000 (12:43 -0700)]
Migrate legacy lstsq from THC to ATen (CUDA) (#63504)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63504

Closes gh-24592

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30441304

Pulled By: ngimel

fbshipit-source-id: ec176596f54bc084af48a73d1dbb0dcb82fec593

3 years agoRevert D30513613: Removing tensor.data usage in utils with tensor set_ method
Edward Yang [Tue, 24 Aug 2021 19:19:16 +0000 (12:19 -0700)]
Revert D30513613: Removing tensor.data usage in utils with tensor set_ method

Test Plan: revert-hammer

Differential Revision:
D30513613 (https://github.com/pytorch/pytorch/commit/d08a36f831cbcb4516fc1b68e3e3deff8ab45aba)

Original commit changeset: 402efb9c30fa

fbshipit-source-id: 911c66a9852de77dc5274b5fb373258c0c97739a

3 years agoMerge common fields from TensorInitParams and ShardedTensorMetadata into TensorProper...
Bo Wang [Tue, 24 Aug 2021 18:45:54 +0000 (11:45 -0700)]
Merge common fields from TensorInitParams and ShardedTensorMetadata into TensorProperties (#63731)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63731
1) Follow up [PR/63378 last comment](https://github.com/pytorch/pytorch/pull/63378#discussion_r693143053)
2) Also updated the caller side (usage of ShardedTensorMetadta) in fbcode

Ref: [landing workflow 3](https://www.internalfb.com/intern/wiki/PyTorch/PyTorchDev/Workflow/Landing/#landing-your-prs-from-gi-1)

Test Plan:
Imported from OSS

OSS: (pytorch).. $ python test/distributed/_sharded_tensor/test_sharded_tensor.py --v
FB:  fbcode $ buck test mode/dev //aiplatform/modelstore/checkpointing/pyper/tests:checkpoint_utils_test

Reviewed By: wanchaol, heitorschueroff

Differential Revision: D30472281

fbshipit-source-id: 727fb0e7f10eab4eb7a10476194e9008f2ac1fb5

3 years agoRemoving tensor.data usage in utils with tensor set_ method (#63867)
Aayush Prakash [Tue, 24 Aug 2021 18:19:34 +0000 (11:19 -0700)]
Removing tensor.data usage in utils with tensor set_ method (#63867)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63867

When updating the model parameter, updating `parameter.data` is no longer recommended, because this `data` field will be deprecated in the future.

The replacement is `tensor.set_`.

ghstack-source-id: 136531233

Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager

Reviewed By: SciPioneer

Differential Revision: D30513613

fbshipit-source-id: 402efb9c30fafc3f285bebc631639f656ceae585

3 years agoupdate readme and contributing.md (#63843)
Yi Zhang [Tue, 24 Aug 2021 17:50:57 +0000 (10:50 -0700)]
update readme and contributing.md (#63843)

Summary:
1. In fact, Visual Studio isn't supported as CMAKE generator
2. I was asked many times why there's error as 'Could NOT find OpenMP'
3. Add Newly added Best Practices link in contributing.md

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63843

Reviewed By: seemethere, heitorschueroff

Differential Revision: D30514095

Pulled By: janeyx99

fbshipit-source-id: 76715a1d8c049122546e5a7778cafe54e4dfd5d6

3 years agoSubprocess encoding fixes for cpp extension (#63756)
peterjc123 [Tue, 24 Aug 2021 17:44:45 +0000 (10:44 -0700)]
Subprocess encoding fixes for cpp extension (#63756)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/63584

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63756

Reviewed By: bdhirsh

Differential Revision: D30485046

Pulled By: ezyang

fbshipit-source-id: 4f0ac383da4e8843e2a602dceae85f389d7434ee

3 years agoadd bf16 support for bucketize (#55588)
mingfeima [Tue, 24 Aug 2021 17:30:18 +0000 (10:30 -0700)]
add bf16 support for bucketize (#55588)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55588

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28836796

Pulled By: VitalyFedyunin

fbshipit-source-id: c9ae5b969c30a45473533be5f29bb497f8da5143

3 years ago[pruner] modify base pruner to prune bias by default (#63202)
Karen Zhou [Tue, 24 Aug 2021 17:17:28 +0000 (10:17 -0700)]
[pruner] modify base pruner to prune bias by default (#63202)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63202

By default, the prune will also prune biases, such that the whole output channel is removed. The user can manually set `also_prune_bias` to False when calling `prepare` if they don't want the bias to be pruned.
ghstack-source-id: 136466671

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1MV32

modify `fusion_tests` according to API change
`buck test mode/opt //scripts/kazhou:fusion_tests`

https://pxl.cl/1NbKz

Reviewed By: z-a-f

Differential Revision: D30294494

fbshipit-source-id: c84655648bee0035559195ca855b98fb7edaa134

3 years ago[pruner] amend base pruner API to match base sparsifier (#63178)
Karen Zhou [Tue, 24 Aug 2021 17:17:28 +0000 (10:17 -0700)]
[pruner] amend base pruner API to match base sparsifier (#63178)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63178

Update base pruner API to match base sparsifier API as defined in D28970960 / PR58955

Changes include:
- `enable_mask_update = True` in `__init__`
- `prepare` takes model and config instead of constructor
- convert functionality renamed to `squash_mask`, `convert` method call now raises Error
- `activation_handles` ad `bias_handles` initialized in `_prepare` instead of constructor
ghstack-source-id: 136467595

Test Plan:
Function names updates according to changes

`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1MTgH

TODO will need to modify `fbcode/scripts/kazhou/fusion_tests.py` to use new API

Reviewed By: z-a-f

Differential Revision: D30287179

fbshipit-source-id: d4727bea1873b500f2d4bb784db26d532bf26cce

3 years ago[pruner] refactor `ActivationReconstruction` forward hooks (#63158)
Karen Zhou [Tue, 24 Aug 2021 17:17:28 +0000 (10:17 -0700)]
[pruner] refactor `ActivationReconstruction` forward hooks (#63158)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63158

Combined functionality for `ActivationReconstruction` for both Linear and Conv2d in one class. The only difference between the old classes was the size and indexing of the reconstructed tensor -- that logic can be generalized by iterating over the size of `output`.
ghstack-source-id: 136467465

Test Plan:
`buck test mode/dev-nosan //caffe2/test:ao -- TestBasePruner`

https://pxl.cl/1MSSv

Reviewed By: raghuramank100

Differential Revision: D30282765

fbshipit-source-id: 08a1e4e0650511019fff85cf52b41dd818b0c7f8

3 years ago[Static Runtime] Implement prim::VarStack out variant (#63579)
Mike Iovine [Tue, 24 Aug 2021 16:38:25 +0000 (09:38 -0700)]
[Static Runtime] Implement prim::VarStack out variant (#63579)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63579

Provide a static runtime out variant implementation for the new op introduced in D30426232 (https://github.com/pytorch/pytorch/commit/1385f9fb12e6607c98d2d9d5edaaaab2bc07386f).

Test Plan: `buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_VarStack`

Reviewed By: navahgar

Differential Revision: D30410525

fbshipit-source-id: bc59a3d8ad23e3d94561ec2dca9cc20687dbadf8

3 years ago[Reland] Embedding thrust->cub migration (#63806)
Xiang Gao [Tue, 24 Aug 2021 16:24:50 +0000 (09:24 -0700)]
[Reland] Embedding thrust->cub migration (#63806)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/63427

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63806

Reviewed By: bdhirsh

Differential Revision: D30498255

Pulled By: ngimel

fbshipit-source-id: 78b7085a92a168cf0163f53dcb712bac922f5235

3 years agooptimize BFloat16 elemwise operators CPU: sigmoid, sigmoid_backward, tanh_backward...
mingfeima [Tue, 24 Aug 2021 15:54:36 +0000 (08:54 -0700)]
optimize BFloat16 elemwise operators CPU: sigmoid, sigmoid_backward, tanh_backward, addcmul, addcdiv (#55221)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55221

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28836797

Pulled By: VitalyFedyunin

fbshipit-source-id: 6b79098c902ffe65d228668118ef36fb49bab800

3 years agoEnable BFloat16 LeakyReLU and RReLU in CPU path (#61514)
yanbing-j [Tue, 24 Aug 2021 15:32:33 +0000 (08:32 -0700)]
Enable BFloat16 LeakyReLU and RReLU in CPU path (#61514)

Summary:
Enable and optimize BFloat16 LeakyReLU and RReLU in CPU path.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61514

Reviewed By: ejguan

Differential Revision: D30257612

Pulled By: VitalyFedyunin

fbshipit-source-id: 8cc0d1faacd02dcc9827af724a86d95b6952748f

3 years agoENH Adds no_batch_dim for NLLLoss (#62651)
Thomas J. Fan [Tue, 24 Aug 2021 15:26:21 +0000 (08:26 -0700)]
ENH Adds no_batch_dim for NLLLoss (#62651)

Summary:
Towards https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62651

Reviewed By: VitalyFedyunin

Differential Revision: D30303340

Pulled By: jbschlosser

fbshipit-source-id: 7ab478cf63bf6cd1f850cad5fd101e74a2cfe3f5

3 years agofix batchnorm2d issue when input is non contiguous (#63392)
mingfeima [Tue, 24 Aug 2021 15:22:47 +0000 (08:22 -0700)]
fix batchnorm2d issue when input is non contiguous (#63392)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63392

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30476317

Pulled By: VitalyFedyunin

fbshipit-source-id: 03055a0aec21cf2c029b6f32315da2b09cb722d0

3 years ago[JIT] Add variadic stack op (#63578)
Mike Iovine [Tue, 24 Aug 2021 15:19:38 +0000 (08:19 -0700)]
[JIT] Add variadic stack op (#63578)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63578

Added a new op `prim::VarStack` and a pass that transforms instances of `aten::stack(list, dim)` into `prim::VarStack(list[0], ..., list[n], dim)`. Also provided a JIT interpreter implementation.

Most of the implementation/tests are the same as `prim::VarConcat`.

Test Plan: `buck test caffe2/test/cpp/jit:jit -- TestStackOpt`

Reviewed By: navahgar

Differential Revision: D30426232

fbshipit-source-id: 9829a7db6e0a5038c9b7528c43c25b0c221aa2ce

3 years ago[BE] add distributed run_test options (#63147)
Rong Rong (AI Infra) [Tue, 24 Aug 2021 15:01:36 +0000 (08:01 -0700)]
[BE] add distributed run_test options (#63147)

Summary:
Currently distributed tests are mixed within test_python.
We would like to split the distributed tests into its own batch thus we need to split them out.

Adding an option to include/exclude distributed tests with CUSTOM_HANDLERS.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63147

Test Plan:
- locally run with the addition run_test.py options.
- CI

Dependency: found a bug in mpiexec test and need https://github.com/pytorch/pytorch/issues/63580 to fix it first.

Reviewed By: bdhirsh

Differential Revision: D30496178

Pulled By: walterddr

fbshipit-source-id: 7903a57b619f2425028028f944211938823918a6

3 years agoRevert D30388099: Add a common autograd TLS state
Alban Desmaison [Tue, 24 Aug 2021 14:20:56 +0000 (07:20 -0700)]
Revert D30388099: Add a common autograd TLS state

Test Plan: revert-hammer

Differential Revision:
D30388099 (https://github.com/pytorch/pytorch/commit/83d9bad44a1e1e6202103cd22e4dbd2bd3d7dae0)

Original commit changeset: 8e03f940150f

fbshipit-source-id: f6d60fec66e8292f5268335bb8a3e7e1a662f23b

3 years agoENH Adds no_batch_dim tests/docs for LPPool1d and Identity (#62190)
Thomas J. Fan [Tue, 24 Aug 2021 13:58:05 +0000 (06:58 -0700)]
ENH Adds no_batch_dim tests/docs for LPPool1d and Identity (#62190)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/60585

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62190

Reviewed By: ejguan

Differential Revision: D29942385

Pulled By: jbschlosser

fbshipit-source-id: 00df6f6f01ad039631bb8679f8de94863aac7650

3 years agoAdd a common autograd TLS state (#63114)
albanD [Tue, 24 Aug 2021 13:52:38 +0000 (06:52 -0700)]
Add a common autograd TLS state (#63114)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63114

This PR collapses the GradMode and InferenceMode thread local booleans into a single thread local uint8.
This helps reducing the number of thread local variable accesses done when we propagate ThreadLocalStates.

Note that this is even more beneficial as we will add a forward mode AD TLS (similar to GradMode) higher in this stack and this new structure should reduce the perf impact of adding this new TLS.

Here is the full benchmark result between master and the top of this stack: https://gist.github.com/albanD/e421101e9ed344e94999bef3a54bf0f3
tl;dr: give a benefit in most cases. It is never detrimental.

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30388099

Pulled By: albanD

fbshipit-source-id: 8e03f940150ff063c2edd792733663413ae2f486

3 years agoSeparating quantization test from distributed_test (#63058)
Marjan Fariborz [Tue, 24 Aug 2021 08:43:33 +0000 (01:43 -0700)]
Separating quantization test from distributed_test (#63058)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63058

Dedicating separate tests for different quantization methods. Currently supporting FP16 method.
ghstack-source-id: 136499767

Test Plan: uck test mode/dev //caffe2/test/distributed/algorithms/quantization:quantization_gloo_fork -- name_of_the_test

Reviewed By: wanchaol

Differential Revision: D30142580

fbshipit-source-id: 3aacec1a231a662067d2b48c001f0c69fefcdd60

3 years ago[TensorExpr] Nuke KernelArena and KernelScope. (#63587)
Mikhail Zolotukhin [Tue, 24 Aug 2021 07:29:22 +0000 (00:29 -0700)]
[TensorExpr] Nuke KernelArena and KernelScope. (#63587)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63587

Now that there is no classes using KernelArena for memory management we
can remove it.

Differential Revision:
D30429115
D30429115

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 375f6f9294d27790645eeb7cb5a8e87047a57544

3 years ago[TensorExpr] Make 'Tensor' a value type. (#63586)
Mikhail Zolotukhin [Tue, 24 Aug 2021 07:29:22 +0000 (00:29 -0700)]
[TensorExpr] Make 'Tensor' a value type. (#63586)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63586

This is another commit in transition from KernelArena memory management.
Tensor is essentially just a pair of <BufPtr, StmtPtr> and we don't need
to dynamically allocate it at all - it's cheap to pass it by value, and
that's what we're switching to in this commit.

After this change nothing uses KernelScope/KernelArena and they can be
safely removed.

Differential Revision:
D30429114
D30429114

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: f90b859cfe863692b7beffbe9bd0e4143df1e819

3 years ago[TensorExpr] Switch Exprs and Stmt from kernel-arena to shared_ptr. (#63216)
Mikhail Zolotukhin [Tue, 24 Aug 2021 07:29:22 +0000 (00:29 -0700)]
[TensorExpr] Switch Exprs and Stmt from kernel-arena to shared_ptr. (#63216)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63216

Currently there are three classes managed by KernelArena: Expr, Stmt,
and Tensor (and derived classes). KernelArena has been a long standing
painpoint for NNC devs and we're moving away from that memory management
model to ref-count based memory model (using shared_ptr). This commit
switches Expr and Stmt to shared_ptr and is the biggest change in this
transition. Later commits will detach Tensor from KernelArena and kill
the arena + scope altogether.

Differential Revision:
D30353195
D30353195

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 9575225ada3d0fb65087ae40435f3dfea4792cae

3 years ago[TensorExpr] More NFC changes like Expr* -> ExprPtr. (#63778)
Mikhail Zolotukhin [Tue, 24 Aug 2021 07:29:22 +0000 (00:29 -0700)]
[TensorExpr] More NFC changes like Expr* -> ExprPtr. (#63778)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63778

This is a preparation for a switch from raw pointers to shared pointers
as a memory model for TE expressions and statements.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30487425

Pulled By: ZolotukhinM

fbshipit-source-id: 9cbe817b7d4e5fc2f150b29bb9b3bf578868f20c

3 years agoadd channels last for GroupNorm (#49821)
mingfeima [Tue, 24 Aug 2021 05:53:35 +0000 (22:53 -0700)]
add channels last for GroupNorm (#49821)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49821

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D26007053

Pulled By: VitalyFedyunin

fbshipit-source-id: 34a48d5d3b66a159febf3c3d96748fbaba1b9e31

3 years agoAdd ROCm as a platform for which tests can be disabled (#63813)
Jane Xu [Tue, 24 Aug 2021 01:44:46 +0000 (18:44 -0700)]
Add ROCm as a platform for which tests can be disabled (#63813)

Summary:
Realized we were missing ROCm as a platform on which one could disable a flaky test. (like how this issue specifies windows https://github.com/pytorch/pytorch/issues/61655)

cc jeffdaily sunway513 jithunnair-amd ROCmSupport

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63813

Reviewed By: seemethere

Differential Revision: D30498478

Pulled By: janeyx99

fbshipit-source-id: f1abe8677e1ddd01de3291e1618272ad8e287dc4

3 years ago[Static Runtime] SR clones graph input (#63704)
Mike Iovine [Tue, 24 Aug 2021 01:43:17 +0000 (18:43 -0700)]
[Static Runtime] SR clones graph input (#63704)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63704

Previously SR did not clone the graph. This was leading to subtle bugs in `testStaticRuntime`; static runtime would modify its graph, and the graph used by the JIT interpreter would change as well. The JIT interpreter would then crash if SR-only ops were added!

Cloning the graph is more consistent with the behavior of the `Module` ctor.

Test Plan: `buck test caffe2/benchmarks/static_runtime/...`

Reviewed By: hlu1

Differential Revision: D30463294

fbshipit-source-id: b771551a1f55f95fde79373b23babcf3e5ddf726

3 years ago[fx2trt] Add acc op and converter for torch.pow (#63795)
Shiyan Deng [Tue, 24 Aug 2021 01:17:20 +0000 (18:17 -0700)]
[fx2trt] Add acc op and converter for torch.pow (#63795)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63795

att

Test Plan: buck run mode/opt caffe2/torch/fb/fx2trt:test_binary_ops

Reviewed By: jackm321, wushirong

Differential Revision: D30492488

fbshipit-source-id: 6d615770567b13720316f06fd2f866ea2fdc2995

3 years agoAdding DataLoader2 class as future replacement of DataLoader (#63742)
Vitaly Fedyunin [Tue, 24 Aug 2021 01:07:37 +0000 (18:07 -0700)]
Adding DataLoader2 class as future replacement of DataLoader (#63742)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63742

Supports sharding and batching on loader level**

Supports sharding and batching on loader level

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30494506

Pulled By: VitalyFedyunin

fbshipit-source-id: 6648e09d955055ac38e3a4e3973f701acefca762

3 years ago[BE] Enable PostLocalSGD tests on windows (#63463)
Rohan Varma [Tue, 24 Aug 2021 00:45:39 +0000 (17:45 -0700)]
[BE] Enable PostLocalSGD tests on windows (#63463)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63463

Now that `torch.distributed.optim` gates DistributedOptimizer on RPC availability, local sgd optimizer can be used on windows.
ghstack-source-id: 136437632

Test Plan: Ci

Reviewed By: SciPioneer

Differential Revision: D30358922

fbshipit-source-id: 9b56aebf1075f026637296d338805ad8851c9d40

3 years ago[BE] Enable functional optim tests for windows (#63462)
Rohan Varma [Tue, 24 Aug 2021 00:45:39 +0000 (17:45 -0700)]
[BE] Enable functional optim tests for windows (#63462)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63462

Now that `torch.distributed.optim` gates DistributedOptimizer on RPC availability, these tests can be run on windows.
ghstack-source-id: 136437635

Test Plan: CI

Reviewed By: SciPioneer

Differential Revision: D30358923

fbshipit-source-id: 36739bdfe7214789f17de652d30c62c2bc124c73

3 years ago[fx_acc] Add mapper for torch.log1p (#63792)
Shiyan Deng [Tue, 24 Aug 2021 00:41:38 +0000 (17:41 -0700)]
[fx_acc] Add mapper for torch.log1p (#63792)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63792

Map `torch.log1p` to `acc_ops.add` + `acc_ops.log`.

Test Plan: buck test mode/opt glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_log1p

Reviewed By: wushirong

Differential Revision: D30491706

fbshipit-source-id: bcbeddf06131113185d2019cfd7cf5e9193a8a78

3 years agoFix pocketfft include path in mobile build (#63714)
Peter Bell [Tue, 24 Aug 2021 00:39:50 +0000 (17:39 -0700)]
Fix pocketfft include path in mobile build (#63714)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63714

PocketFFT was disabled for CMake < 3.9 but CMake 3.11 is the first version to support `INCLUDE_DIRECTORIES` as a target property. So updating to CMake 3.10 causes the mobile builds to fail. Instead of limiting the CMake support, this just adds the include directory to the entire target,

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D30498369

Pulled By: malfet

fbshipit-source-id: 83372e29c477c97e7015763b7c29d6d7e456bcef

3 years agoSimplify ccache instructions in CONTRIBUTING.md (#62549)
Peter Bell [Tue, 24 Aug 2021 00:39:45 +0000 (17:39 -0700)]
Simplify ccache instructions in CONTRIBUTING.md (#62549)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62549

When building CUDA files with native CMake support, it will respect the
`CMAKE_CUDA_COMPILER_LAUNCHER` setting. So, there's no need for symlinks.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D30498488

Pulled By: malfet

fbshipit-source-id: 71c2ae9d4570cfac2a64d777bc95cda3764332a0

3 years agoSkip archiving useless build artifacts (#63785)
driazati [Tue, 24 Aug 2021 00:30:51 +0000 (17:30 -0700)]
Skip archiving useless build artifacts (#63785)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63785

We currently zip up everything in `build/` which includes a lot of cruft (`.o` files, random things copied in from dependencies, etc). This makes the artifact bigger (slower upload/download times, and takes about 1.5 minutes to archive). This change makes archiving instead take ~15 seconds and removes the 50 second upload to GitHub step that isn't as useful now that we have the HUD PR page that lists out all artifacts.

Test Plan: Imported from OSS

Reviewed By: seemethere, janeyx99

Differential Revision: D30494444

Pulled By: driazati

fbshipit-source-id: 93202dba7387daeb4859a938110b02ff2dc2ccc4

3 years agoFix some memory bugs in onnx passes (#63754)
Bert Maher [Tue, 24 Aug 2021 00:28:33 +0000 (17:28 -0700)]
Fix some memory bugs in onnx passes (#63754)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63754

Running onnx tests with ASAN uncovers several memory errors.  These two are caused by: (1) iterating the uses list of a node after mutation, and (2) accessing the `blocks` attribute of a possibly deleted node.

To reproduce (this is on a CentOS 7 box):
```
DEBUG=1 CFLAGS="-fsanitize=address" CXXFLAGS="-fsanitize=address" USE_LLVM=$(realpath ../llvm-project/install) CMAKE_PREFIX_PATH=$CONDA_PREFIX python setup.py install
LD_PRELOAD=$(realpath /lib64/libasan.so.5) numactl -C3 pytest -v --cov --cov-report xml:test/coverage.xml --cov-append onnx/test_pytorch_onnx_onnxruntime.py::TestONNXRuntime_opset11 -s
```

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30493939

Pulled By: bertmaher

fbshipit-source-id: e16e19dc9b4c9896e102ca8bf04c8bedfdde87af

3 years ago[JIT] Move UseVariadicCat internals (#63577)
Mike Iovine [Tue, 24 Aug 2021 00:26:27 +0000 (17:26 -0700)]
[JIT] Move UseVariadicCat internals (#63577)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63577

Since other variadic ops will have an almost identical implementation, we can generalize the `UseVariadicCat` implementation and put it in a common folder.

Also moved some test utilities that other variadic op tests will likely need.

Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOptTest`

Reviewed By: navahgar

Differential Revision: D30409937

fbshipit-source-id: 925c11c27b58ce98cb8368d2a205e26ba66d3db9

3 years agoFix typo in NNAPI tests (#63797)
Akshit Khurana [Mon, 23 Aug 2021 23:33:07 +0000 (16:33 -0700)]
Fix typo in NNAPI tests (#63797)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63797

nnapi memory format test has a typo

Test Plan:
pytest test/test_nnapi.py::TestNNAPI

Imported from OSS

Reviewed By: Amyh11325

Differential Revision: D30495473

fbshipit-source-id: 8edad7c01a080847a64a2797e077ec4d6077552a

3 years ago[Static Runtime] Add an out variant op for aten::abs (#63675)
Don Jang [Mon, 23 Aug 2021 23:20:27 +0000 (16:20 -0700)]
[Static Runtime] Add an out variant op for aten::abs (#63675)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63675

This change adds an out variant implementation for `aten::abs`.

Test Plan:
- Observed `V0820 14:14:08.880342 101788 impl.cpp:1394] Switch to out variant for node: %3 : Tensor = aten::abs(%a.1)`

- Perf impact: TBD

Reviewed By: hlu1

Differential Revision: D30461317

fbshipit-source-id: 0c0230bd40afe463ae1ccb222c2a1207ebcf4191

3 years agofix git diff issue (#63408)
Rong Rong (AI Infra) [Mon, 23 Aug 2021 22:36:59 +0000 (15:36 -0700)]
fix git diff issue (#63408)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/60111, ideally we should merge this before https://github.com/pytorch/pytorch/issues/63360 but we can also test this with https://github.com/pytorch/pytorch/issues/63360 easily.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63408

Test Plan:
- This is conform working with local test.sh run by setting PR_NUMBER
- should be validated by GHA CI as well

Concern:
- currently GHA CI is running into proxy 403 rate-limit exceeded issue consistently. However the worst case is not generating any git diff files, which is going to be exactly the same as current behavior.
- depends on https://github.com/pytorch/pytorch/issues/63770.

Reviewed By: driazati, janeyx99

Differential Revision: D30489355

Pulled By: walterddr

fbshipit-source-id: a638b7ae5820f29a7aca6cc40ff390ab253cb174

3 years ago.github: Add ec2 information as a step (#63784)
Eli Uriegas [Mon, 23 Aug 2021 22:02:10 +0000 (15:02 -0700)]
.github: Add ec2 information as a step (#63784)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63784

Also creates the common.yml.j2 file as a place to store common code
amongst the templates

Should look like:
![image](https://user-images.githubusercontent.com/1700823/130495226-f18b8c0f-1ea7-4097-8bbb-e998fabb71f2.png)

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
Test Plan: Imported from OSS

Reviewed By: malfet, driazati

Differential Revision: D30490682

Pulled By: seemethere

fbshipit-source-id: 18028b4acff938ef54cd6e4877561b2d830a11cf

3 years agoRename DataPipe to Op-er (#63325)
Erjia Guan [Mon, 23 Aug 2021 21:32:56 +0000 (14:32 -0700)]
Rename DataPipe to Op-er (#63325)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63325

Rename each DataPipe to an operation name ending with er. Functional API should remain `verb` such as `read_from_tar` , `shuffle`, ... (Discussed in [here](https://github.com/facebookexternal/torchdata/pull/97#discussion_r688553905))
- Batch -> Batcher
- Collate -> Collator
- Concat -> Concater
- GroupByKey - > ByKeyGrouper ?
- ListDirFiles -> FileLister
- LoadFilesFromDisk -> FileLoader
- Map -> Mapper
- ReadFilesFromTar -> TarArchiveReader
- ReadFilesFromZip -> ZipArchiveReader
- ReadLinesFromFile -> LineReader
- Shuffle -> Shuffler
- ToBytes -> StreamReader
- Transforms -> Transformer
- Zip -> Zipper

Let me know if you have better name for each DataPipe

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30466950

Pulled By: ejguan

fbshipit-source-id: 72909dca7b3964ab83b965891f96cc1ecf62d049

3 years agoAdd equality constraints for some acc opeartions for symbolic inference (#63689)
Zeina Migeed [Mon, 23 Aug 2021 21:09:10 +0000 (14:09 -0700)]
Add equality constraints for some acc opeartions for symbolic inference (#63689)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63689

Test Plan:
buck run mode/opt-clang caffe2/torch/fb/model_transform/experimental:fx_ir_lower_inline_cvr -- \
    --action=lower_and_run \
    --filename=inline_cvr_7x_dec_2020.model \
    --print_glow_glog=True

Reviewed By: jamesr66a

Differential Revision: D30462113

fbshipit-source-id: 0b2a1ce9770561248527d47c07b80112491dc949

3 years ago[Static Runtime] Remove unused fusion patterns (#63636)
Hao Lu [Mon, 23 Aug 2021 19:53:42 +0000 (12:53 -0700)]
[Static Runtime] Remove unused fusion patterns (#63636)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63636

Reviewed By: d1jang

Differential Revision: D30446573

fbshipit-source-id: 3abb7f697380f3b4e865b98c594de359b5e26b96

3 years ago[nnc] Re-enable CPU fusion" (#63665)
Bert Maher [Mon, 23 Aug 2021 19:41:32 +0000 (12:41 -0700)]
[nnc] Re-enable CPU fusion" (#63665)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63665

This reverts commit 125e2d02e575612eb427104e7c67f1c28f090db8.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30471646

Pulled By: bertmaher

fbshipit-source-id: 4189869566f03b5f9ada78d78830f6a34946eed6

3 years agoKill THCUNN (#63429)
Peter Bell [Mon, 23 Aug 2021 19:05:51 +0000 (12:05 -0700)]
Kill THCUNN (#63429)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63429

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30441308

Pulled By: ngimel

fbshipit-source-id: 3ae342a2f8d5c7f8827b637c4055c5d1b0a1be26

3 years agofix mpi ssh runtime error (#63580)
Rong Rong (AI Infra) [Mon, 23 Aug 2021 16:44:09 +0000 (09:44 -0700)]
fix mpi ssh runtime error (#63580)

Summary:
should fix https://github.com/pytorch/pytorch/issues/60756.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63580

Test Plan:
- this CI.
- validated by running on the bionic_cuda container: https://app.circleci.com/pipelines/github/pytorch/pytorch/366632/workflows/478602fb-698f-4210-ac09-d9c61af5c62b/jobs/15472104

Reviewed By: malfet

Differential Revision: D30486472

Pulled By: walterddr

fbshipit-source-id: d83ab88d163d4a468f03961a13d891b658668a7f

3 years agohotfix clone issue (#63770)
Rong Rong (AI Infra) [Mon, 23 Aug 2021 16:28:21 +0000 (09:28 -0700)]
hotfix clone issue (#63770)

Summary:
This was discovered during https://github.com/pytorch/pytorch/issues/63408. For some reason only this checkout action is not correctly set fetch-depth

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63770

Reviewed By: malfet, janeyx99

Differential Revision: D30486110

Pulled By: walterddr

fbshipit-source-id: a67395cca2487407ed0d49c8c89587935ca5f212

3 years ago[ONNX] add test images to repo (#63717)
Gary Miguel [Mon, 23 Aug 2021 14:41:33 +0000 (07:41 -0700)]
[ONNX] add test images to repo (#63717)

Summary:
This is better than the status quo:
* Test doesn't download files from the internet -> faster and more
  reliable.
* Test doesn't leave the git working directory dirty.

Rather than using the original images, I've copied some images from
the pytorch/vision repo. This will keep the tests in the two repos
in sync, while avoiding adding new assets to the vision repo.

See https://github.com/pytorch/vision/pull/4176.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63717

Reviewed By: janeyx99

Differential Revision: D30466016

Pulled By: malfet

fbshipit-source-id: 2c56d4c11b5c74db1764576bf1c95ce4ae714574

3 years agoAllow implementing either backward or vjp for Function (#63434)
Alban Desmaison [Mon, 23 Aug 2021 14:05:51 +0000 (07:05 -0700)]
Allow implementing either backward or vjp for Function (#63434)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63434

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30431968

Pulled By: albanD

fbshipit-source-id: 0bb88664283486a9fd3364e6c3d79442a44625c2

3 years agoUpdate ROCm PyTorch persons of interest (#55206)
Jithun Nair [Mon, 23 Aug 2021 05:29:04 +0000 (22:29 -0700)]
Update ROCm PyTorch persons of interest (#55206)

Summary:
cc jeffdaily sunway513

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55206

Reviewed By: VitalyFedyunin

Differential Revision: D30296584

Pulled By: dzhulgakov

fbshipit-source-id: 6e5c610cc6b7c7fd58b80fa3f9de31f269341a88

3 years agoRemove `_fork_processes` from common_distributed.py (#63711)
Pritam Damania [Mon, 23 Aug 2021 01:55:45 +0000 (18:55 -0700)]
Remove `_fork_processes` from common_distributed.py (#63711)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63711

This removes `_fork_process` from common_distributed.py and fixes all
other callpoints to use `spawn_process` instead.
ghstack-source-id: 136395719

Test Plan: waitforbuildbot

Reviewed By: xush6528

Differential Revision: D30463834

fbshipit-source-id: 0c09e8a996d0e5b912c8cdd45488a39951bac4db

3 years agoMade FuncTorchBatched decompose CompositeImplicitAutograd (#63616)
Horace He [Sun, 22 Aug 2021 00:13:27 +0000 (17:13 -0700)]
Made FuncTorchBatched decompose CompositeImplicitAutograd (#63616)

Summary:
See https://github.com/facebookresearch/functorch/issues/56

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63616

Reviewed By: zou3519

Differential Revision: D30438316

Pulled By: Chillee

fbshipit-source-id: e84446d9f68b87daa0cfff75b3b8a972f36ec85a

3 years agoBatchNorm autodiff re-enabled (#57321)
jiej [Sat, 21 Aug 2021 16:05:04 +0000 (09:05 -0700)]
BatchNorm autodiff re-enabled (#57321)

Summary:
Turns on BN in autodiff:

1. outputs an empty tensor for running stats to by pass autodiff issue on None;
2. fixing BN inference backward in cudnn & miopen, where backward falls back to native batchnorm kernel instead;

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57321

Reviewed By: albanD, ngimel

Differential Revision: D30250419

Pulled By: jansel

fbshipit-source-id: a62553789c20fb50a820003a056f40d9d642dfaa

3 years agoRevert D30360382: [nnc] Support thread level parallelism in fused kernels
Bert Maher [Sat, 21 Aug 2021 10:45:21 +0000 (03:45 -0700)]
Revert D30360382: [nnc] Support thread level parallelism in fused kernels

Test Plan: revert-hammer

Differential Revision:
D30360382 (https://github.com/pytorch/pytorch/commit/d6d86efb1c839ddafd1398d6dab9caa4f31a9f0b)

Original commit changeset: 29acf4e932c6

fbshipit-source-id: e0531113135d30eabb172dc1537d5dd6d65dc438

3 years agoRevert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism
Bert Maher [Sat, 21 Aug 2021 10:36:09 +0000 (03:36 -0700)]
Revert D30417127: Remove flag to toggle CPU fusion in the presence of parallelism

Test Plan: revert-hammer

Differential Revision:
D30417127 (https://github.com/pytorch/pytorch/commit/6600bc96517269c608ea47b76b6bda9476c7bcef)

Original commit changeset: b77d7c68364f

fbshipit-source-id: 6b52fb83a84fe241945e3cb3eeb71050d1d9c8f1

3 years ago[sharded_tensor] add readonly tensor properties (#63679)
Wanchao Liang [Sat, 21 Aug 2021 05:15:55 +0000 (22:15 -0700)]
[sharded_tensor] add readonly tensor properties (#63679)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63679

This PR add read only tensor properties to sharded tensor, to match the torch.Tensor behaviors.

Test Plan: test_sharded_tensor_metadata

Reviewed By: pritamdamania87

Differential Revision: D30459343

fbshipit-source-id: 9aec8ecfe76479eed25f3b843495e5719ed2956d

3 years ago[Static Runtime] Implement out variant for fb::quantized_linear (#63635)
Hao Lu [Sat, 21 Aug 2021 04:41:19 +0000 (21:41 -0700)]
[Static Runtime] Implement out variant for fb::quantized_linear (#63635)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63635

Reviewed By: ajyu

Differential Revision: D30446234

fbshipit-source-id: 1ef014186ff725930a97d0159626f9233ee74030

3 years agoNNAPI: Support const values in binary ops
Akshit Khurana [Sat, 21 Aug 2021 04:08:59 +0000 (21:08 -0700)]
NNAPI: Support const values in binary ops

Summary:
NNAPI converter failed with 1 const value and one tensor earlier
Code suggestions from dreiss

Test Plan:
pytest test/test_nnapi.py::TestNNAPI::test_pointwise_binary

Imported from OSS

Reviewed By: anshuljain1

Differential Revision: D28893881

fbshipit-source-id: 59240373fb03c6fdafa4cb2fa4d8408dd20092f6

3 years agoMigrate thnn_conv2d from THC to ATen (#63428)
Peter Bell [Sat, 21 Aug 2021 01:27:33 +0000 (18:27 -0700)]
Migrate thnn_conv2d from THC to ATen (#63428)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63428

Closes gh-24644, closes gh-24645

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30441307

Pulled By: ngimel

fbshipit-source-id: 9c3dec469c0525831ae398df261cf41b7df7e373

3 years agoExtend _sharded_tensor constructor to support other ops like torch.ones (#63378)
Bo Wang [Sat, 21 Aug 2021 00:09:35 +0000 (17:09 -0700)]
Extend _sharded_tensor constructor to support other ops like torch.ones (#63378)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63378

a) Introduce InitCommonParams to wrap tensor creation params
b) Factor local tensor initiation into common_params so that tensor value is not hard specified in ShardedTensor constructor
c) Add _sharded_tensor.ones(...) to exemplify - Note memory_format arg is not provided to be consistent as torch.ones
d) Follow up: more ops like torch.full, torch.zero, torch.rand,

Test:
$ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestCreateTensorFromParams --v
$ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestShardedTensorChunked.test_create_sharded_tensor_with_ones --v
$ python test/distributed/_sharded_tensor/test_sharded_tensor.py TestShardedTensorEnumerable.test_create_sharded_tensor_with_ones --v

Test Plan: Imported from OSS

Reviewed By: pritamdamania87, wanchaol

Differential Revision: D30359245

Pulled By: bowangbj

fbshipit-source-id: 85768fcb36e9d9d40213036884b1266930a91701

3 years ago[clang-tidy] Enable more folders (#63380)
driazati [Fri, 20 Aug 2021 23:38:42 +0000 (16:38 -0700)]
[clang-tidy] Enable more folders (#63380)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63380

Crosses off some more of #62011, see the test in the stacked PR #63381

Test Plan: Imported from OSS

Reviewed By: malfet, seemethere

Differential Revision: D30455843

Pulled By: driazati

fbshipit-source-id: d473545d05ffa0b2476968f0b1c55f3a16a2c755

3 years agoenable increment build for build_libtorch (#63074)
Yi Zhang [Fri, 20 Aug 2021 23:28:39 +0000 (16:28 -0700)]
enable increment build for build_libtorch (#63074)

Summary:
Since issue https://github.com/pytorch/pytorch/issues/59859 is resolved.

rerun_cmake in build_libtorch should not be hardcoded.
build_libtorch is necessary to generate debug version libtorch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63074

Reviewed By: VitalyFedyunin, seemethere

Differential Revision: D30306705

Pulled By: malfet

fbshipit-source-id: f2077d334191f4973da0681560937bc8bab730c1

3 years ago[Doc] Deprecation notice for only_inputs argument (#63631)
北海若 [Fri, 20 Aug 2021 22:45:12 +0000 (15:45 -0700)]
[Doc] Deprecation notice for only_inputs argument (#63631)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/63544.

Changed docstring accordingly. I'm new here, not sure if the style is okay. Please check.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63631

Reviewed By: ejguan

Differential Revision: D30459439

Pulled By: soulitzer

fbshipit-source-id: 8df3c509d1dd39764815b099ab47229550126cbe

3 years agoRemove breakpad from docker image (#63598)
driazati [Fri, 20 Aug 2021 22:45:10 +0000 (15:45 -0700)]
Remove breakpad from docker image (#63598)

Summary:
As of https://github.com/pytorch/pytorch/issues/63186 we're doing this properly via a third_party cmake build, so we don't need it here anymore.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63598

Reviewed By: walterddr, malfet

Differential Revision: D30432250

Pulled By: driazati

fbshipit-source-id: d0d5db14355cf574e42c0d0ed786bb26230180bd

3 years agoadd BFloat16 operators on CPU: range, sinh, cosh, frexp, nan_to_num (#61826)
jiayisun [Fri, 20 Aug 2021 21:54:51 +0000 (14:54 -0700)]
add BFloat16 operators on CPU: range, sinh, cosh, frexp, nan_to_num (#61826)

Summary:
Added BFloat16 support for range, sinh, cosh, frexp, and nan_to_num on CPU, and collected the benchmark data of these OPs(range, sinh, cosh, frexp, and nan_to_num) for BFloat16 and Float32 data type by using the operator_benchmark tool of PyTorch on the platform of Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz

Number of cores: 1 core, 28 cores(1 socket)
[cosh_sinh_benchmark.txt](https://github.com/pytorch/pytorch/files/6974313/cosh_sinh_benchmark.txt)
[frexp_benchmark.txt](https://github.com/pytorch/pytorch/files/6974315/frexp_benchmark.txt)
[nan_to_num_benchmark.txt](https://github.com/pytorch/pytorch/files/6974317/nan_to_num_benchmark.txt)
[range_benchmark.txt](https://github.com/pytorch/pytorch/files/6974318/range_benchmark.txt)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61826

Reviewed By: saketh-are

Differential Revision: D30257259

Pulled By: VitalyFedyunin

fbshipit-source-id: 394cd713e6394050a8c90b2160633beb675d71dd

3 years agoempty caching allocator before test_avg_pool2d large subtest (#63528)
Jeff Daily [Fri, 20 Aug 2021 21:00:20 +0000 (14:00 -0700)]
empty caching allocator before test_avg_pool2d large subtest (#63528)

Summary:
Otherwise, unrecoverable OOM occurs on MI25.  Fixes broken ROCm CI test1.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63528

Reviewed By: malfet, zhouzhuojie

Differential Revision: D30459151

Pulled By: walterddr

fbshipit-source-id: 63e205c4f486fcbdd514cfb0ed8e38584f894585

3 years agoInclude iostream in ProcessGroupMPI.cpp (#63656)
Nikita Shulga [Fri, 20 Aug 2021 20:13:54 +0000 (13:13 -0700)]
Include iostream in ProcessGroupMPI.cpp (#63656)

Summary:
As it uses `std::cerr`, which in turn results in compilation regression introduced by https://github.com/pytorch/pytorch/pull/61500
Fixes https://github.com/pytorch/pytorch/issues/63653

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63656

Reviewed By: ejguan

Differential Revision: D30455824

Pulled By: malfet

fbshipit-source-id: 29f316e7f7fd8e7dcbee2666e7a985f25bf56515

3 years ago[easy]Unbreak caffe2benchmarking build (#63655)
Scott Wolchok [Fri, 20 Aug 2021 19:56:01 +0000 (12:56 -0700)]
[easy]Unbreak caffe2benchmarking build (#63655)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63655

ghstack-source-id: 136324310

Test Plan: buck build //fbobjc/Apps/Internal/Caffe2Benchmarking:Caffe2Benchmarking fbobjc/mode/iphonesimulator

Reviewed By: hl475, JacobSzwejbka

Differential Revision: D30455659

fbshipit-source-id: b6da6be4f89b6e84753ef0849ffedea04785034a

3 years ago[ONNX] Suppport torch.dot and torch.nn.utils.spectral_norm (#62596) (#62765)
BowenBao [Fri, 20 Aug 2021 19:44:29 +0000 (12:44 -0700)]
[ONNX] Suppport torch.dot and torch.nn.utils.spectral_norm (#62596) (#62765)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62765

Fixes #27723

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375181

Pulled By: msaroufim

fbshipit-source-id: 715f4745899757ec405877980cd20c826028eb2c

Co-authored-by: BowenBao <bowbao@microsoft.com>
3 years ago[ONNX] Update repeat_interleave for dynamic repeats (#59979) (#62764)
BowenBao [Fri, 20 Aug 2021 19:44:29 +0000 (12:44 -0700)]
[ONNX] Update repeat_interleave for dynamic repeats (#59979) (#62764)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62764

Fixes #58733

- Support dynamic interleave for cases with dynamic repeat values
- Moved repeat_interleave symbolic from opset 11 to opset 13, as sequence as output types for loop outputs is needed for this change

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375179

Pulled By: msaroufim

fbshipit-source-id: 787f96bf91d124fd0483761088c5f4ae930d96a9

Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>
3 years ago[ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (...
BowenBao [Fri, 20 Aug 2021 19:44:29 +0000 (12:44 -0700)]
[ONNX] Fix an issue that optimizations might adjust graph inputs unexpectedly. (#61280) (#62763)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62763

This PR is to fix the issue that the graph inputs might be updated when we export the model in inference mode.

When a model is export in inference mode, some optimizations will be made. One side effect of these optimizations is: the inputs of graph might be adjusted. Such optimizatiosn include:

1. Conv and BatchNorm op fusion.
2. Do constant folding.

If the user sets export_params=False, or set keep_initializers_as_inputs=True, it's highly possible that the user wants to provide the corresponding parameters or initiliazers as the inputs of the graph.
In such situation, no matter the model is export in inference mode or training mode, exporter needs to prevent above optimizations from adjusting the graph inputs. By this, the inputs of graph could match inputs that users provided.

The changes in this PR, add an additional common judgement to see if the above optimizations needs to be done or not. From the value of export_params and keep_initializers_as_inputs arguments, infer if the graph inputs are allowed to be adjusted.
If no, these optimizations will be ignored, even other requirements are matched.

Besides these code changes, the comments of some parameters below have been updated so that users have more thoughts when they consider how to leverage these parameters for different purposes:

1. export_params
2. training
3. do_constant_folding
4. keep_initializers_as_inputs

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375183

Pulled By: msaroufim

fbshipit-source-id: 4db8b9695649eb32a3a0fefa950ee2e5651bdba0

Co-authored-by: fatcat-z <jiz@microsoft.com>
3 years ago[ONNX] Fix controlflow shape inference with contrib op (#60707) (#62762)
BowenBao [Fri, 20 Aug 2021 19:44:29 +0000 (12:44 -0700)]
[ONNX] Fix controlflow shape inference with contrib op (#60707) (#62762)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62762

`ONNXShapeTypeInference` for node `n` is skipped if `n` is non ONNX namespace, or if `n` contains any non ONNX namespace nodes. This prevents controlflow nodes containing contrib ops from running `SpecialPostProcess`, which sets up correct node output shape/type information in rare cases.

This PR depends on opset 14 export https://github.com/pytorch/pytorch/pull/59486

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D30375180

Pulled By: msaroufim

fbshipit-source-id: 5deacec39f091deb4d75ddd9e660e12fca7f16c5

Co-authored-by: BowenBao <bowbao@microsoft.com>
3 years agoRevert D30417370: [nnc] Enable CPU fusion
Alban Desmaison [Fri, 20 Aug 2021 19:26:58 +0000 (12:26 -0700)]
Revert D30417370: [nnc] Enable CPU fusion

Test Plan: revert-hammer

Differential Revision:
D30417370 (https://github.com/pytorch/pytorch/commit/b9fc656cf26d60127bd695e4e5a7d27622f2563d)

Original commit changeset: 84ce7a578a36

fbshipit-source-id: cd23774cdc3273fd72f8a05f1900eaf36f373e6b

3 years ago[8/N] Remove c10d/ddp fork tests. (#63454)
Pritam Damania [Fri, 20 Aug 2021 19:09:49 +0000 (12:09 -0700)]
[8/N] Remove c10d/ddp fork tests. (#63454)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63454

Continuation of https://github.com/pytorch/pytorch/pull/63443, this
PR removes all fork tests from torch.distributed.
ghstack-source-id: 136285511

Test Plan: waitforbuildbot

Reviewed By: SciPioneer

Differential Revision: D30387872

fbshipit-source-id: f6d6313db126ae7b95b86f78a1e0726887c5c513

3 years agoRevert D30426527: Adding DataLoader2 class as future replacement of DataLoader
Alban Desmaison [Fri, 20 Aug 2021 19:05:32 +0000 (12:05 -0700)]
Revert D30426527: Adding DataLoader2 class as future replacement of DataLoader

Test Plan: revert-hammer

Differential Revision:
D30426527 (https://github.com/pytorch/pytorch/commit/5a7133b87fe2fd7d025d36855ed4cc06539a9299)

Original commit changeset: e5905d3364c4

fbshipit-source-id: 794d8a4e9256ccff8cf894aee10eff6adc30d502

3 years agoAdd `BinaryUfuncOpInfo` and broadcasting tests (#61964)
Philip Meier [Fri, 20 Aug 2021 18:43:07 +0000 (11:43 -0700)]
Add `BinaryUfuncOpInfo` and broadcasting tests (#61964)

Summary:
As proof of concept, this PR uses the new `BinaryUfuncOpInfo` in broadcasting tests for `add`, `sub`, `mul`, `div`, `floor_div`, and `true_div`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/61964

Reviewed By: ngimel

Differential Revision: D30407734

Pulled By: mruberry

fbshipit-source-id: ada28994f43b0635f279f45a02ecba18bc8ee033

3 years ago[nnc] Enable CPU fusion (#63545)
Bert Maher [Fri, 20 Aug 2021 18:11:49 +0000 (11:11 -0700)]
[nnc] Enable CPU fusion (#63545)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63545

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417370

Pulled By: bertmaher

fbshipit-source-id: 84ce7a578a3678d5562bab99d1dc00330c4f72d1

3 years agoRemove flag to toggle CPU fusion in the presence of parallelism (#63514)
Bert Maher [Fri, 20 Aug 2021 18:11:49 +0000 (11:11 -0700)]
Remove flag to toggle CPU fusion in the presence of parallelism (#63514)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63514

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30417127

Pulled By: bertmaher

fbshipit-source-id: b77d7c68364f2af73570740540f3b1152313016e

3 years ago[nnc] Support thread level parallelism in fused kernels (#63386)
Bert Maher [Fri, 20 Aug 2021 18:11:49 +0000 (11:11 -0700)]
[nnc] Support thread level parallelism in fused kernels (#63386)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63386

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D30360382

Pulled By: bertmaher

fbshipit-source-id: 29acf4e932c669ce0f35823faea9099bcd8119b6

3 years agoAdd support for the ONNX Runtime Eager Mode backend (#58248)
Aaron Bockover [Fri, 20 Aug 2021 18:11:47 +0000 (11:11 -0700)]
Add support for the ONNX Runtime Eager Mode backend (#58248)

Summary:
This PR implements the necessary hooks/stubs/enums/etc for complete ONNX Runtime (ORT) Eager Mode integration. The actual extension will live out of tree at https://github.com/pytorch/ort.

We have been [working on this at Microsoft](https://github.com/microsoft/onnxruntime-pytorch/tree/eager-ort/torch_onnxruntime) for the last few months, and are finally ready to contribute the PyTorch core changes upstream (nothing major or exciting, just the usual boilerplate for adding new backends).

The ORT backend will allow us to ferry [almost] all torch ops into granular ONNX kernels that ORT will eagerly execute against any devices it supports (therefore, we only need a single ORT backend from a PyTorch perspective).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58248

Reviewed By: astaff

Differential Revision: D30344992

Pulled By: albanD

fbshipit-source-id: 69082b32121246340d686e16653626114b7714b2

3 years agoAdd docs describing saved tensor hooks (#62362)
Victor Quach [Fri, 20 Aug 2021 18:07:22 +0000 (11:07 -0700)]
Add docs describing saved tensor hooks (#62362)

Summary:
Add section to the Autograd mechanics docs to describe the recently
exposed saved tensors (https://github.com/pytorch/pytorch/issues/52451), how to register packing / unpacking
hooks (https://github.com/pytorch/pytorch/issues/60975) and how to use default hooks (https://github.com/pytorch/pytorch/issues/61834)

Sister PR: https://github.com/pytorch/pytorch/issues/62361 (will add a link from autograd.rst to notes/autograd in whatever PR does not land first)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62362

Reviewed By: soulitzer

Differential Revision: D30453177

Pulled By: Varal7

fbshipit-source-id: f5759977b069ff0ef36a47b08856d297691a6caa

3 years ago[fx2trt] Add layernorm plugin for dynamic shape (#63620)
Shiyan Deng [Fri, 20 Aug 2021 17:49:21 +0000 (10:49 -0700)]
[fx2trt] Add layernorm plugin for dynamic shape (#63620)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63620

Added layernorm dynamic plugin, so that it works when explicit batch dim is required. Needed for ig model.

Changed the way of how we creating a plugin layer from instantiating the plugin directly to use plugin creator with `PluginFieldCollection`.

Follow ups:
Another way to convert layernorm is by breaking it down to supported trt layers. T97398182

Test Plan: layernorm unittest

Reviewed By: yinghai

Differential Revision: D30138205

fbshipit-source-id: aebe021d8de818e20376634f30e84579b9807f9b

3 years ago[PyTorch][Edge] Improve InflatableArgs for Bundled Inputs (#62368)
Pavithran Ramachandran [Fri, 20 Aug 2021 16:34:53 +0000 (09:34 -0700)]
[PyTorch][Edge] Improve InflatableArgs for Bundled Inputs (#62368)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62368

# Context
The bundled inputs accepts an expression in the form of string InflatableArg.fmt that can be applied on the inputs to inflate. The InflatableArg.fmt provides flexibility to have custom transformation to inflate. When the input arguments to a function are not Tensor type, TorchScript casts the inputs from type T to Optional[T] expects the function to handle Nullable (None) clause as well. This becomes tricky to handle in one line code or lambda functions.

We propose an alternative way which allows InflatableArg to include the text of a TorchScript function that would be defined on the module as a helper, then use that in its inflation expression. This can be provided by InflatableArg.fmt_fn. Please refer to pytorch/test/test_bundled_inputs.py for example on how to use the same.

Also refer JacobSzwejbka comment on the same [here](https://github.com/pytorch/pytorch/pull/62368#issuecomment-892012812)

# Mitigation
Allow InflatedArg to include the text of a TorchScript function that would be defined on the module as a helper, then use that in its inflation expression.
ghstack-source-id: 135158680

Test Plan:
To run `test_dict_args`

```
(base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource/fbcode] buck test //caffe2/test:test_bundled_inputs -- test_dict_args
Action graph will be rebuilt because files have been added or removed.
Building: finished in 5.4 sec (100%) 12180/12180 jobs, 0/12180 updated
  Total time: 5.8 sec
More details at https://www.internalfb.com/intern/buck/build/fafcf277-1095-4cba-978d-6022f0d391ad
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: 5ef9de71-c1b1-406b-a6c0-3321c2368b8d
Trace available for this run at /tmp/tpx-20210727-163946.454212/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7036874465805934
    ✓ ListingSuccess: caffe2/test:test_bundled_inputs - main (11.365)
    ✓ Pass: caffe2/test:test_bundled_inputs - test_dict_args (test_bundled_inputs.TestBundledInputs) (12.307)
Summary
  Pass: 1
  ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/7036874465805934
```

To check the py code of TS module:
P433043973

Reviewed By: dreiss

Differential Revision: D29950421

fbshipit-source-id: c819ec5c94429b7fbf6c4beb0259457f169b08ec

3 years agoAdding DataLoader2 class as future replacement of DataLoader (#63523)
Vitaly Fedyunin [Fri, 20 Aug 2021 16:00:23 +0000 (09:00 -0700)]
Adding DataLoader2 class as future replacement of DataLoader (#63523)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63523

Supports sharding and batching on loader level**
* #63522 Adding IterableAsDataPipe IterDataPipe
usefull for tests and simple cases

Supports sharding and batching on loader level

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30426527

Pulled By: VitalyFedyunin

fbshipit-source-id: e5905d3364c4880e720dd62fb066f08881c71a6e

3 years agoSmall custom function refactor which doesn't change anything (#63433)
albanD [Fri, 20 Aug 2021 15:42:31 +0000 (08:42 -0700)]
Small custom function refactor which doesn't change anything (#63433)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63433

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D30431970

Pulled By: albanD

fbshipit-source-id: 905fa4d2ddeca18005b1bcb13dd6f8a080327e7c

3 years agoAdding IterableAsDataPipe IterDataPipe (#63522)
Vitaly Fedyunin [Fri, 20 Aug 2021 15:36:14 +0000 (08:36 -0700)]
Adding IterableAsDataPipe IterDataPipe (#63522)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63522

Supports sharding and batching on loader level
* **#63522 Adding IterableAsDataPipe IterDataPipe
usefull for tests and simple cases**

usefull for tests and simple cases

Test Plan: Imported from OSS

Reviewed By: ejguan

Differential Revision: D30426528

Pulled By: VitalyFedyunin

fbshipit-source-id: 535b5cc1505bb58731fcca8170541ac5ee7bd417

3 years ago[Static Runtime] Enable RemoveListMutation (#63536)
Mike Iovine [Fri, 20 Aug 2021 13:14:13 +0000 (06:14 -0700)]
[Static Runtime] Enable RemoveListMutation (#63536)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63536

Enable a pass that transforms sequences like this:
```
li = []
li.append(1)
li.append(2)
```
into this:
```
li = [1, 2]
```
Initially I implemented this pass myself (D30387213), but I discovered that there is an existing pass that does the same thing.

Reviewed By: hlu1

Differential Revision: D30412970

fbshipit-source-id: 0810ef03480878d5039bd800a40f5fd31c2652ec

3 years ago[Static Runtime] Add native op for aten::detach (#63625)
Don Jang [Fri, 20 Aug 2021 07:43:40 +0000 (00:43 -0700)]
[Static Runtime] Add native op for aten::detach (#63625)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63625

This change adds a static runtime's native op implementation for `aten::detach` op.

See the standard  `aten::detach`'s implementation (https://codebrowser.bddppq.com/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp.html#_ZN2at6native6detachERKNS_6TensorE ) for comparison.

Test Plan:
- Added `StaticRuntime.IndividualOps_Detach`.

- Observed

```
V0819 18:55:33.181188 3092034 impl.cpp:1398] Switch to native impl for node: %a.1 : Tensor = aten::detach(%input.1)
```

Reviewed By: hlu1

Differential Revision: D30443187

fbshipit-source-id: d6e0eadb1b817e0a126c4fc97526abc276ee8a17

3 years agoUpdate protobuf to 3.13.1 (#62571)
Nikita Shulga [Fri, 20 Aug 2021 06:42:24 +0000 (23:42 -0700)]
Update protobuf to 3.13.1 (#62571)

Summary:
Update bazel to 4.10.0

Update ASAN_SYMBOLIZER_PATH to llvm-7
Suppress `vptr` ubsan violations in `test_jit`
Fix ProtoBuf patching for ONNX which caused Windows builds to crash while attempting to free `std::string` allocated on stack

Fixes https://github.com/pytorch/pytorch/issues/62569

Pull Request resolved: https://github.com/pytorch/pytorch/pull/62571

Reviewed By: walterddr

Differential Revision: D30048685

Pulled By: malfet

fbshipit-source-id: 6462c1bef9c42318551d2cf906bbab41e1d4e1cd

3 years ago[nnc] Updated sliceTail to do inplace mutation (#63532)
Raghavan Raman [Fri, 20 Aug 2021 05:50:32 +0000 (22:50 -0700)]
[nnc] Updated sliceTail to do inplace mutation (#63532)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63532

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30412184

Pulled By: navahgar

fbshipit-source-id: e7669d3b9d24e14501f3feb6505c88d1d42030c6

3 years ago[nnc] Updated sliceHead to do inplace mutation (#63531)
Raghavan Raman [Fri, 20 Aug 2021 05:50:32 +0000 (22:50 -0700)]
[nnc] Updated sliceHead to do inplace mutation (#63531)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63531

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D30412183

Pulled By: navahgar

fbshipit-source-id: 47ee9482a36e606788d28d22eee4edaca45ffa50

3 years ago[PyTorch] Remove unnecessary iostream includes in headers (#61500)
Scott Wolchok [Fri, 20 Aug 2021 01:52:33 +0000 (18:52 -0700)]
[PyTorch] Remove unnecessary iostream includes in headers (#61500)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61500

libstdc++ defines a static variable called `std::__ioinit` in iostream that adds global constructor size overhead to each translation that includes iostream. To reduce the size overhead from that, we can often include ostream instead.
ghstack-source-id: 136163529

Test Plan: buildsizebot some mobile apps

Reviewed By: dhruvbird

Differential Revision: D29648016

fbshipit-source-id: 9c3139712c71248513cc5032d21e77f3ecbae8fe