platform/upstream/pytorch.git
5 years agoUse return names in JIT operators
Christian Puhrsch [Fri, 8 Mar 2019 07:31:00 +0000 (23:31 -0800)]
Use return names in JIT operators

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17638

Differential Revision: D14295606

Pulled By: cpuhrsch

fbshipit-source-id: 62040ac65434411357808735f0fe6cd33cc1c30f

5 years agoChange ConvPoolOp<Context>::SetOutputSize to ConvPoolOp<Context>::GetOutputSize ...
Jerry Zhang [Fri, 8 Mar 2019 02:31:33 +0000 (18:31 -0800)]
Change ConvPoolOp<Context>::SetOutputSize to ConvPoolOp<Context>::GetOutputSize (#17764)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17764

Original commit changeset: f1923fdca4a1

reverted int8 ops fixes the original runtime regression.
We'll ignore the memory regression since it is flaky, see D14228484

Reviewed By: dzhulgakov

Differential Revision: D13885233

fbshipit-source-id: ccbe4b94acb44b7b4cb3ae4d73e3f6091e1e1195

5 years agoClean up some old ScalarType stuff
Roy Li [Fri, 8 Mar 2019 00:16:43 +0000 (16:16 -0800)]
Clean up some old ScalarType stuff

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17755

Differential Revision: D14377135

Pulled By: li-roy

fbshipit-source-id: 35305760a1621340ba66c61a193ff61cfedfa7e8

5 years agoadd reference to flake8-mypy in contributing.md
Elias Ellison [Thu, 7 Mar 2019 23:23:16 +0000 (15:23 -0800)]
add reference to flake8-mypy in contributing.md

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17759

Differential Revision: D14376813

Pulled By: eellison

fbshipit-source-id: cca1128e967ef7368633b94a3fa3c8e76a4a16f4

5 years agoMove lerp to ATen, add functionality for tensor weights (#17348)
vishwakftw [Thu, 7 Mar 2019 22:01:47 +0000 (14:01 -0800)]
Move lerp to ATen, add functionality for tensor weights (#17348)

Summary:
Changelog:
- Remove TH/THC bindings
- Add tensor weights for `lerp`
- Modify derivatives appropriately
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17348

Differential Revision: D14355845

Pulled By: soumith

fbshipit-source-id: eaede4c09ee589d77ba6cf52583510ea8e3a2fcf

5 years agoRefactor dispatcher (#17753)
Iurii Zdebskyi [Thu, 7 Mar 2019 21:38:59 +0000 (13:38 -0800)]
Refactor dispatcher (#17753)

Summary:
This is a side PR for a bool tensor feature. The idea of this change came from a feedback received in this [PR](https://github.com/pytorch/pytorch/pull/17376).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17753

Differential Revision: D14367989

Pulled By: izdeby

fbshipit-source-id: 4fa380e56e20f18e480be68920170dbc3a4eb91c

5 years agoadd layernorm to AD
Wanchao Liang [Thu, 7 Mar 2019 21:31:55 +0000 (13:31 -0800)]
add layernorm to AD

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17702

Differential Revision: D14368472

Pulled By: wanchaol

fbshipit-source-id: 8db390e39444078258ad1d34ba74d6ddafa5d02b

5 years agomove half<->float conversions to oss operators (#17548)
Hector Yuen [Thu, 7 Mar 2019 20:52:54 +0000 (12:52 -0800)]
move half<->float conversions to oss operators (#17548)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17548

expose half float operators to OSS

common/math/Float16.h is the original implementation
this is substituted by caffe2/c10/util/Half.h

from the comments seems like the both implementations don't handle denormals

Reviewed By: jspark1105

Differential Revision: D14244200

fbshipit-source-id: f90ba28c5bf6a2b451b429cc4925b8cc376ac651

5 years agoFix the update ONNX expect files (#17767)
Lu Fang [Thu, 7 Mar 2019 20:51:09 +0000 (12:51 -0800)]
Fix the update ONNX expect files (#17767)

Summary:
Fix the CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17767

Reviewed By: zrphercule

Differential Revision: D14370483

Pulled By: houseroad

fbshipit-source-id: e7b0bbde0797c41f5a010fa206fab80fe2792eb7

5 years agoCleanup testFusion/testOne: there are unused arguments.
Mikhail Zolotukhin [Thu, 7 Mar 2019 19:13:48 +0000 (11:13 -0800)]
Cleanup testFusion/testOne: there are unused arguments.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17737

Differential Revision: D14366584

Pulled By: ZolotukhinM

fbshipit-source-id: 3c2dd2aabfecca475909e4eec4a077d900795da9

5 years agoAutomatic update of fbcode/onnx to 96c58ceeacf0f2b73d752e413e4fd78787a12da3 (#17676)
Lu Fang [Thu, 7 Mar 2019 19:03:57 +0000 (11:03 -0800)]
update of fbcode/onnx to 96c58ceeacf0f2b73d752e413e4fd78787a12da3 (#17676)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17676

Previous import was e18bb41d255a23daf368ffd62a2645db55db4c72

Included changes:
- **[96c58ce](https://github.com/onnx/onnx/commit/96c58ce)**: Fix shape inference when auto_pad is notset again (#1830) <Li-Wen Chang>
- **[873ddbb](https://github.com/onnx/onnx/commit/873ddbb)**: More extendable Runner (#1809) <Michał Karzyński>

Reviewed By: zrphercule

Differential Revision: D14321241

fbshipit-source-id: 12de9021afc61f5435f1b719cccf7b0f4ad73a84

5 years agoSet the default ONNX opset to the latest stable opset (i.e., 9) (#17736)
Lu Fang [Thu, 7 Mar 2019 18:51:29 +0000 (10:51 -0800)]
Set the default ONNX opset to the latest stable opset (i.e., 9) (#17736)

Summary:
1) The changes in the new opset won't affect internal pipeline.
2) The CI won't be affected by the ONNX changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17736

Reviewed By: zrphercule

Differential Revision: D14358710

Pulled By: houseroad

fbshipit-source-id: 4ef15d2246b50f6875ee215ce37ecf92d555ca6a

5 years agoAdd module attributes (#17309)
David Riazati [Thu, 7 Mar 2019 18:41:13 +0000 (10:41 -0800)]
Add module attributes (#17309)

Summary:
Similar to `nn.Parameter`s, this PR lets you store any `IValue` on a module as an attribute on a `ScriptModule` (only from the Python front-end currently). To mark something as an attribute, it should wrapped in `jit.Attribute(value, type)` (ex. `self.table = torch.jit.Attribute(table, Dict[str, torch.Tensor])`)

Followup Work:
* (de)serializing for use in C++
* change `self.training` to be a `bool` attribute instead of a buffer
* mutable attributes
* string frontend support
* documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17309

Differential Revision: D14354316

Pulled By: driazati

fbshipit-source-id: 67e08ab5229366b67fbc837e67b58831a4fb3318

5 years ago- refactoring serialization of ONNX initializers to be name-based (#17420)
Spandan Tiwari [Thu, 7 Mar 2019 18:06:17 +0000 (10:06 -0800)]
- refactoring serialization of ONNX initializers to be name-based (#17420)

Summary:
Currently, serialization of model parameters in ONNX export depends on the order in which they are stored in a container (`list` on Python side and `std::vector` on C++ side). This has worked fine till now, but if we need to do any pass on that graph that mutates the parameter list, then strictly order-based serialization may not work.

This PR is the first in a set to bring in more passes (such as constant folding) related to ONNX export. This PR lays the groundwork by moving the serialization in ONNX export from order-based to name based approach, which is more amenable to some of the passes.

houseroad - As discussed this change uses a map for export, and removes the code from `export.cpp` that relies on the order to compute initializer names.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17420

Differential Revision: D14361993

Pulled By: houseroad

fbshipit-source-id: da93e945d55755c126de06641f35df87d1648cc4

5 years agoONNX Export for Max and Average Pooling in CEIL_MODE
Lara Haidar-Ahmad [Thu, 7 Mar 2019 17:59:28 +0000 (09:59 -0800)]
ONNX Export for Max and Average Pooling in CEIL_MODE

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16769

Differential Revision: D14362175

Pulled By: houseroad

fbshipit-source-id: 65cfb1dfba6a43d39cc85374add368fe8e4e5645

5 years agouse flake8-mypy (#17721)
Elias Ellison [Thu, 7 Mar 2019 17:12:35 +0000 (09:12 -0800)]
use flake8-mypy (#17721)

Summary:
Use flake8 installed with mypy checks so that our linter matches fbcode. Mypy type errors also provide valuable signal
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17721

Differential Revision: D14357778

Pulled By: eellison

fbshipit-source-id: d8c9ea3fe3b5f550c3b70fe259e0eabf95e4c92d

5 years agouse fp16<->fp32 intrinsic (#17496)
Jongsoo Park [Thu, 7 Mar 2019 10:17:42 +0000 (02:17 -0800)]
use fp16<->fp32 intrinsic (#17496)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17496

As title.

Reviewed By: hyuen

Differential Revision: D14222907

fbshipit-source-id: d5d6c032e725ca8b52aca2be7401ec3c59f6a242

5 years agoImplement a Caffe2 standalone LSTM operator (#17726)
Ahmed Aly [Thu, 7 Mar 2019 09:03:51 +0000 (01:03 -0800)]
Implement a Caffe2 standalone LSTM operator (#17726)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17726

Pull Request resolved: https://github.com/pytorch/pytorch/pull/17725

Pull Request resolved: https://github.com/pytorch/pytorch/pull/17461

Implementing a standalone LSTM Operator in Caffe2 adopted from this Aten implementation: diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/native/RNN.cpp. The most tricky thing in this exercise was that caffe2::Tensor has no copy constructor that made it necessary to implement a custom templated copy constructor for the different Tensor containers used in the code. Also there was no way to use off-the-shelf C2 operators in my code easily so I had to copy some code that is doing basic matmul, cat, split, transpose and linear as utility functions.

Two things missing:

- Profiling this implementation against the current ONNXified LSTM op
- Make this operator available to use in PyTorch

Reviewed By: dzhulgakov

Differential Revision: D14351575

fbshipit-source-id: 3b99b53212cf593c7a49e45580b5a07b90809e64

5 years agocaffe2:libtorch_cuda depends on caffe2:caffe2_gpu (#17729)
Sebastian Messmer [Thu, 7 Mar 2019 07:50:14 +0000 (23:50 -0800)]
caffe2:libtorch_cuda depends on caffe2:caffe2_gpu (#17729)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17729

When doing "import torch" in fbcode, previously the caffe2 cuda kernels weren't loaded because libcaffe2_gpu.so wasn't loaded.
Once you also did "from caffe2.python import workspace", then the cuda kernels were loaded because that triggered a runtime mechanism for loading libcaffe2_gpu.so.

We want the cuda kernels to always be available, so this diff adds a dependency from caffe2:libtorch_cuda to caffe2:caffe2_gpu.

Reviewed By: ezyang

Differential Revision: D14353498

fbshipit-source-id: 76a9fe69f231b308ab40eac393bb216c6fad3658

5 years agoadd tensor and cost inference functions (#17684)
Jongsoo Park [Thu, 7 Mar 2019 07:26:27 +0000 (23:26 -0800)]
add tensor and cost inference functions (#17684)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17684

Adding tensor and cost inference functions to more int8 operators.

Reviewed By: yinghai

Differential Revision: D14174746

fbshipit-source-id: dfad975fa75899565c8fb61f1b7747a9206ebd22

5 years agoONNX Export Narrow op
Lara Haidar [Thu, 7 Mar 2019 06:35:12 +0000 (22:35 -0800)]
ONNX Export Narrow op

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17550

Differential Revision: D14350401

Pulled By: houseroad

fbshipit-source-id: 4d88079bb7a8bbd270b0272009826eb3b202cc33

5 years agoKeep the dim_type of hinted shape as BATCH if possible (#17734)
Yinghai Lu [Thu, 7 Mar 2019 03:55:39 +0000 (19:55 -0800)]
Keep the dim_type of hinted shape as BATCH if possible (#17734)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17734

If input is not BATCH, we will skip adjust its batch size during onnxifi transformation. So when we take hints, we take it as CONSTANT but later need to change it to BATCH if possible.

Reviewed By: jackm321

Differential Revision: D14355983

fbshipit-source-id: 63eb54a44afb1565c71486fdd73db07ca0ac4fd4

5 years agofix different round behavior on CPU and GPU #16498 (#17443)
jwu [Thu, 7 Mar 2019 03:37:03 +0000 (19:37 -0800)]
fix different round behavior on CPU and GPU #16498 (#17443)

Summary:
xxtemp, colesbury, bhushan23, zou3519,  convert gpu round behavior to half-to-even, consistent with torch cpu version and numpy. You feedback are welcomed.
See #16498
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17443

Differential Revision: D14261786

Pulled By: VitalyFedyunin

fbshipit-source-id: 98156436b545d72769831a89e2775d43ad913ebc

5 years agoWarn about memory overlaps on expanded tensors (#17576)
zou3519 [Thu, 7 Mar 2019 01:37:13 +0000 (17:37 -0800)]
Warn about memory overlaps on expanded tensors (#17576)

Summary:
Eventually we should remove these when we're certain that all our ops
handle memory overlaps correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17576

Differential Revision: D14349990

Pulled By: zou3519

fbshipit-source-id: c3a09f6113b9b1bf93e7f13c0b426c45b2cdf21f

5 years agofix exp fam. formula
Tongzhou Wang [Wed, 6 Mar 2019 23:35:25 +0000 (15:35 -0800)]
fix exp fam. formula

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17719

Differential Revision: D14349029

Pulled By: soumith

fbshipit-source-id: cf016756a9319436f7379e8377f8bd1e1b672b40

5 years agorefactor caffe2 operator constructors - 10/9 (#17659)
Sebastian Messmer [Wed, 6 Mar 2019 23:08:44 +0000 (15:08 -0800)]
refactor caffe2 operator constructors - 10/9 (#17659)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17659

clangr codemod

Reviewed By: ezyang

Differential Revision: D14304675

fbshipit-source-id: 45fbd84c50651a70ae29bf46df3322715e99d225

5 years agoImprove ONNX symbolic for logsoftmax and softmax (#17672)
Lu Fang [Wed, 6 Mar 2019 22:59:16 +0000 (14:59 -0800)]
Improve ONNX symbolic for logsoftmax and softmax (#17672)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17672

support dtype in the onnx symbolic

Reviewed By: zrphercule

Differential Revision: D14313987

fbshipit-source-id: e9364621b3f795191d880599711dfbcb220d0e31

5 years agoEnable using CMD when building cpp extensions on Windows
peter [Wed, 6 Mar 2019 22:40:05 +0000 (14:40 -0800)]
Enable using CMD when building cpp extensions on Windows

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17706

Differential Revision: D14346482

Pulled By: ezyang

fbshipit-source-id: 7c85e51c701f6c0947ad324ef19fafda40ae1cb9

5 years agoDo not rename net boundary inputs/outputs during ssaRewrite. (#17545)
Yinghai Lu [Wed, 6 Mar 2019 22:24:02 +0000 (14:24 -0800)]
Do not rename net boundary inputs/outputs during ssaRewrite. (#17545)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17545

This diff avoids renaming boundary inputs of net during onnxifi transform.
It also removes adding mappings for the initializer during onnxifi op creation.
Thus gets read of the mapped ws creation during onnxifi op creation.

Reviewed By: zrphercule

Differential Revision: D14243161

fbshipit-source-id: 6eafa920c45f6a6bfacbbb443e8e84cf9778644c

5 years agoReapply D14078519 (#17596)
Sebastian Messmer [Wed, 6 Mar 2019 21:47:27 +0000 (13:47 -0800)]
Reapply D14078519 (#17596)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17596

Was reverted before, now fixed version.

Reviewed By: ezyang

Differential Revision: D14270288

fbshipit-source-id: c72490b5d02cc6098cb60145fa9a842b3c9a24c5

5 years agoBatch of expect file removals (#17581)
eellison [Wed, 6 Mar 2019 21:41:13 +0000 (13:41 -0800)]
Batch of expect file removals (#17581)

Summary:
Another batch of removing expect files.

One note - I removed the Batched expect files without adding equivalent tests since they are already being tested in another ways, and we are no longer actively maintaining that project.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17581

Differential Revision: D14343578

Pulled By: eellison

fbshipit-source-id: ce0b1fd2b5b4ec80ad9003bab1b58f41645d3da6

5 years ago(#14267)
jiej [Wed, 6 Mar 2019 21:36:14 +0000 (13:36 -0800)]
(#14267)

Summary:
- Summary:

Added synchronized batch normalization, allows synchronization of stats across mini-batches between processes within a process group.
Current implementation uses a mixture of extended ATen native functions (cpp cuda extension) + torch.nn.modules (c10d python API)

- User-facing api:

1. torch.nn.utils.convert_sync_batchnorm(modules, process_group=None)

2. torch.nn.SyncBatchNorm(num_features, eps=1e-5, momentum=0.1, affine=True, track_running_stats=True, ***process_group=None***)

- supported use case:
DistributedDataParallel with ***single-gpu multi-process***

a. User creates model containing `torch.nn.SyncBatchNorm` layers through one of the ways listed below:

  1. use layers directly:

     torch.nn.SyncBatchNorm(...)

     similar API as with torch.nn.BatchNormXd(...)
     with added argument `process_group` which is used to limit the scope of
     synchronization within each process group. Default value is None, which
     implies synchronization across all GPUs

  2. use torch.nn.utils.convert_sync_batchnorm(modules, process_group)

     recursively convert all `torch.nn.BatchNormXd` into `torch.nn.SyncBatchNorm`
     preserving values of parameters/buffers.
     the utility function also allows user to specify process_group value to all
     converted layers.

b. user wraps their model with
   `torch.distributed.parallel.DataParallelDistributed`, from this point, user
   should follow the general guidelines for DDP use guide

- Error checking

For use cases not supported, we error out:

1. Application launched without ddp:
   > import torch
   > sbn = torch.nn.SyncBatchNorm(10).cuda()
   > inp = torch.randn(5, 10, 3, 3).cuda()
   > sbn(inp) --> Error!
   > AttributeError: SyncBatchNorm is only supported within torch.nn.parallel.DistributedDataParallel

2. Application launched using DDP with multi-GPU per-process:
   > ddp_module = nn.parallel.DistributedDataParallel(module, device_ids=device_ids, output_device=args.local_rank)
   > ValueError: SyncBatchNorm is only supported for DDP with single GPU per process
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14267

Differential Revision: D14270035

Pulled By: ezyang

fbshipit-source-id: 4956d8fa565c32e9df5408d53719ff9f945f4d6d

5 years agoUpdate ModuleDict doc about order
Tongzhou Wang [Wed, 6 Mar 2019 21:06:41 +0000 (13:06 -0800)]
Update ModuleDict doc about order

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17717

Differential Revision: D14346557

Pulled By: ezyang

fbshipit-source-id: 2484c7d8105f9aa8bce5567d1fa2d4f587cc9cc2

5 years agoUpdate CODEOWNERS (#17720)
Pieter Noordhuis [Wed, 6 Mar 2019 20:30:05 +0000 (12:30 -0800)]
Update CODEOWNERS (#17720)

Summary:
teng-li is passing the baton to mrshenli. Thanks for all your work on distributed teng-li!! :tada:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17720

Differential Revision: D14350120

Pulled By: pietern

fbshipit-source-id: edfe784520c54630203cc8fbb296455d3dbf341b

5 years agoONNX Export Argmin and Argmax ops
Lara Haidar-Ahmad [Wed, 6 Mar 2019 20:05:48 +0000 (12:05 -0800)]
ONNX Export Argmin and Argmax ops

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17382

Differential Revision: D14338811

Pulled By: houseroad

fbshipit-source-id: be07548d8063d1aa94f1801c18137738365b85fb

5 years agoTurn atol to 1e-5 when comparing the end to end results (#17708)
Lu Fang [Wed, 6 Mar 2019 20:02:34 +0000 (12:02 -0800)]
Turn atol to 1e-5 when comparing the end to end results (#17708)

Summary:
results smaller than 1e-5 don't make sense.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17708

Differential Revision: D14348893

Pulled By: houseroad

fbshipit-source-id: 5e07c38e5b58b27b61fae63bfc3c21e2fe5629fe

5 years agoremove loop expects (#17695)
Elias Ellison [Wed, 6 Mar 2019 19:42:19 +0000 (11:42 -0800)]
remove loop expects (#17695)

Summary:
Replace loop unrolling expect files with assertions on the output IR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17695

Differential Revision: D14347105

Pulled By: eellison

fbshipit-source-id: 1703b4ca32bc1c67c01fc4330b0e6eb66feaa103

5 years agotypo fix
youkaichao [Wed, 6 Mar 2019 19:31:50 +0000 (11:31 -0800)]
typo fix

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17653

Differential Revision: D14302003

Pulled By: ezyang

fbshipit-source-id: 8ad90985a392b07127c7e315d4e74ce77962b573

5 years agoomit group conv NHWC test for GPU (#17715)
Deepali Chourasia [Wed, 6 Mar 2019 19:25:26 +0000 (11:25 -0800)]
omit group conv NHWC test for GPU (#17715)

Summary:
Observed the test `TestGroupConvolution.test_group_convolution` to fail with the following error:

```
Falsifying example: test_group_convolution(self=<caffe2.python.operator_test.group_conv_test.TestGroupConvolution testMethod=test_group_convolution>, stride=3, pad=0, kernel=5, size=8, group=4, input_channels_per_group=7, output_channels_per_group=8, batch_size=2, order='NHWC', engine='', use_bias=False, gc=, dc=[, device_type: 1])

You can reproduce this example by temporarily adding reproduce_failure('3.59.1', b'AAAA') as a decorator on your test case
```
This example generated by hypothesis has `group=2, order='NHWC' and dc=[, device_type: 1])`.
I think this example should be skipped.

I have mimicked the change corresponding to [PR#13554](https://github.com/pytorch/pytorch/pull/13554) to skip this example.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17715

Differential Revision: D14346642

Pulled By: ezyang

fbshipit-source-id: b1f1fef09f625fdb43d31c7213854e61a96381ba

5 years agofix tuple matching (#17687)
Elias Ellison [Wed, 6 Mar 2019 19:21:09 +0000 (11:21 -0800)]
fix tuple matching (#17687)

Summary:
Check for Tuple Matching in isSubvalueOf, since they may contain container types that need to be recursed within isSubvalueOf

Fix for https://github.com/pytorch/pytorch/issues/17650
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17687

Differential Revision: D14324642

Pulled By: eellison

fbshipit-source-id: 7f1e019875286b2640a3b9c003d1635dda8cf543

5 years agoTemporarily disable Upsample operator tests in pytorch-onnx tests (#17696)
Spandan Tiwari [Wed, 6 Mar 2019 19:03:32 +0000 (11:03 -0800)]
Temporarily disable Upsample operator tests in pytorch-onnx tests (#17696)

Summary:
In discussion with houseroad, because Upsample op is being updated in ONNX https://github.com/onnx/onnx/pull/1773 and these tests are blocking it. These tests will be updated once the ONNX PR goes in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17696

Differential Revision: D14338845

Pulled By: houseroad

fbshipit-source-id: cfaf8cf1ab578ae69dd3bf21b1c0681b572b9b6f

5 years agoAdd check for x64 Python before setup (#17707)
peter [Wed, 6 Mar 2019 18:41:20 +0000 (10:41 -0800)]
Add check for x64 Python before setup (#17707)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/17657.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17707

Differential Revision: D14346705

Pulled By: ezyang

fbshipit-source-id: 5daafacdb99eb9a9c6517263d10f20c79f920d24

5 years agoReplace caffe2::DeviceGuard with c10::cuda::CUDAGuard (#17623)
Edward Yang [Wed, 6 Mar 2019 18:32:38 +0000 (10:32 -0800)]
Replace caffe2::DeviceGuard with c10::cuda::CUDAGuard (#17623)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17623

Despite it's generic sounding name, caffe2::DeviceGuard actually
only worked on CUDA devices.  Rename it to something that more
clearly spells out its applicability.

I'm not sure if it's the right call, but in this patch I added
'using CUDAGuard = c10::cuda::CUDAGuard', as this seems to be more
in-line with how the Caffe2 codebase is currently written.  More
idiomatic c10 namespace style would be to say cuda::CUDAGuard.
Willing to change this if people shout.

This is a respin of D13156470 (#14284)

Reviewed By: dzhulgakov

Differential Revision: D14285504

fbshipit-source-id: 93b8ab938b064572b3b010c307e1261fde0fff3d

5 years agoRemove nomscheduler (#17693)
Duc Ngo [Wed, 6 Mar 2019 18:31:00 +0000 (10:31 -0800)]
Remove nomscheduler (#17693)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17693

Remove nomscheduler tool

Reviewed By: yinghai

Differential Revision: D14328168

fbshipit-source-id: 674d0e18596a4dc2bbb6b8d321f4066c4fc454ab

5 years agoindex operation support for torch.HalfTensor (#17645)
bhushan [Wed, 6 Mar 2019 18:28:49 +0000 (10:28 -0800)]
index operation support for torch.HalfTensor (#17645)

Summary:
- Test cases added
1. indexing for half tensor
2. setting for half tensor

fixes #17161
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17645

Differential Revision: D14302069

Pulled By: ezyang

fbshipit-source-id: 100f141c07046f200c904e27c5882a9417bccda0

5 years agoRevert D14160172: Implement a Caffe2 standalone LSTM operator
Soumith Chintala [Wed, 6 Mar 2019 16:41:42 +0000 (08:41 -0800)]
Revert D14160172: Implement a Caffe2 standalone LSTM operator

Differential Revision:
D14160172

Original commit changeset: c33e3f9e8aea

fbshipit-source-id: cffe35d93f0ac75ca93aa98a3b82af3d372f2fc1

5 years agofix typo in hub doc
Tongzhou Wang [Wed, 6 Mar 2019 07:14:25 +0000 (23:14 -0800)]
fix typo in hub doc

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17705

Differential Revision: D14338380

Pulled By: ailzhang

fbshipit-source-id: d53eece30bede88a642e718ee6f829ba29c7d1c4

5 years agofix dropout AD & rename range to rangelist (#17691)
Ailing Zhang [Wed, 6 Mar 2019 04:47:02 +0000 (20:47 -0800)]
fix dropout AD & rename range to rangelist (#17691)

Summary:
fixes #17669
Address apaszke 's comments in #17523
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17691

Differential Revision: D14328083

Pulled By: ailzhang

fbshipit-source-id: 9ec4a54f13bfd1aaf4b1821dd00c31793ac07a44

5 years agoenable use of MIOpen for depthwise convolutions (#17685)
Chaitanya Sri Krishna Lolla [Wed, 6 Mar 2019 02:41:20 +0000 (18:41 -0800)]
enable use of MIOpen for depthwise convolutions (#17685)

Summary:
* added miopen conv mode to be used for setConvDescriptor
* added miopen depthwise convolutions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17685

Differential Revision: D14327811

Pulled By: bddppq

fbshipit-source-id: d5bdc1abafd5f39694fadf3f9275b9d880c5b115

5 years agoImplement a Caffe2 standalone LSTM operator (#17461)
Ahmed Aly [Wed, 6 Mar 2019 01:31:51 +0000 (17:31 -0800)]
Implement a Caffe2 standalone LSTM operator (#17461)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17461

Implementing a standalone LSTM Operator in Caffe2 adopted from this Aten implementation: diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/native/RNN.cpp. The most tricky thing in this exercise was that caffe2::Tensor has no copy constructor that made it necessary to implement a custom templated copy constructor for the different Tensor containers used in the code. Also there was no way to use off-the-shelf C2 operators in my code easily so I had to copy some code that is doing basic matmul, cat, split, transpose and linear as utility functions.

Two things missing:

- Profiling this implementation against the current ONNXified LSTM op
- Make this operator available to use in PyTorch

Reviewed By: dzhulgakov

Differential Revision: D14160172

fbshipit-source-id: c33e3f9e8aeae578b64d97593cb031a251216029

5 years agoFix nll_loss crash on cpu where ignore_index is out of bounds (#17328)
Soumith Chintala [Tue, 5 Mar 2019 22:26:20 +0000 (14:26 -0800)]
Fix nll_loss crash on cpu where ignore_index is out of bounds (#17328)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/15508
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17328

Differential Revision: D14322629

Pulled By: soumith

fbshipit-source-id: 7d02f372be78794782c18affcfc109ce30b1e91c

5 years agoAdd '--hip-clang-launch' to favor <<<>>>-based launch. (#17686)
Johannes M Dieterich [Tue, 5 Mar 2019 20:49:25 +0000 (12:49 -0800)]
Add '--hip-clang-launch' to favor <<<>>>-based launch. (#17686)

Summary:
hip-clang uses triple chevron kernel dispatch syntax. Add an option to the hipification script to skip translating triple chevron to hipLaunchKernelGGL.

Once we switch to hip-clang, this option will be default and subsequently removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17686

Differential Revision: D14327810

Pulled By: bddppq

fbshipit-source-id: 5e1512325077dd3ebb8fb9b5bf35fd1f8d9a4dc3

5 years agoImprove caching allocator for Pascal and newer GPUs. (#17120)
Sam Gross [Tue, 5 Mar 2019 17:38:23 +0000 (09:38 -0800)]
Improve caching allocator for Pascal and newer GPUs. (#17120)

Summary:
```
NVIDIA changed the CUDA allocation behavior on Pascal GPUs. The
page size increased from 1MB to 2MB and allocations larger than 1MB
are now always page-aligned. Previously, allocations larger than 1MB
were aligned to 128KB boundaries.

This interacted poorly with the caching allocator. The remaining
memory in a page could only be filled by small cudaMalloc calls, but
the caching allocator never cudaMalloc's a chunk smaller than 1MB.
This behavior could also cause a large discrepancy between the memory
usage reported by nvidia-smi and the memory usage reported by
PyTorch, because nvidia-smi counts a partially used page as "full",
while PyTorch only counts the actual memory requested.

This PR makes a few changes to the caching allocator to better support
Pascal and Volta GPUs:

 - All cudaMalloc calls are now multiples of 2MB (the page size)
 - Requests between 1-10MB allocate (and split) a 20MB block to
   reduce wasted space due to rounding
 - Small requests are now packed into 2MB blocks (instead of 1MB)

This improves Mask R-CNN memory usage by 10-20% in internal tests on
Volta GPUs. Maxwell performance seems to be largely unchanged, but
it's possible that some use cases suffer slightly.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17120

Differential Revision: D14301536

Pulled By: colesbury

fbshipit-source-id: a8282315ea8f7b8ca149b5066fdeaecd0d404edf

5 years agoTurn the Half::from_bits into a constexpr function to avoid unresolve… (#17661)
Davide Libenzi [Tue, 5 Mar 2019 15:24:27 +0000 (07:24 -0800)]
Turn the Half::from_bits into a constexpr function to avoid unresolve… (#17661)

Summary:
…d symbol errors when building in DEBUG mode.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17661

Differential Revision: D14319610

Pulled By: soumith

fbshipit-source-id: 6c508a37155e29260f403d7174f343aa1ff32385

5 years agoRemove Expect Files from python / tracing / script interop
Elias Ellison [Tue, 5 Mar 2019 06:38:41 +0000 (22:38 -0800)]
Remove Expect Files from python / tracing / script interop

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17622

Differential Revision: D14308307

Pulled By: eellison

fbshipit-source-id: bda249d38ac2570000a12b0ca328c26233ecefe8

5 years agoEnable apex on Windows
peterjc123 [Tue, 5 Mar 2019 05:50:53 +0000 (21:50 -0800)]
Enable apex on Windows

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17675

Differential Revision: D14320473

Pulled By: soumith

fbshipit-source-id: cb696984f5196f9b8b50722b4fe927bb6407c322

5 years agobump docker build to upgrade magma to 2.5.0 (#17674)
Soumith Chintala [Tue, 5 Mar 2019 04:28:06 +0000 (20:28 -0800)]
bump docker build to upgrade magma to 2.5.0 (#17674)

Summary:
upgrades magma in docker build.

vishwakftw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17674

Differential Revision: D14320187

Pulled By: soumith

fbshipit-source-id: 7887f65fb703b802fc6231408b55ad9c4039882b

5 years agorefactor caffe2 operator constructors - 1/9 (#17082)
Sebastian Messmer [Mon, 4 Mar 2019 23:56:21 +0000 (15:56 -0800)]
refactor caffe2 operator constructors - 1/9 (#17082)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17082

clangr codemod

Reviewed By: ezyang

Differential Revision: D14078498

fbshipit-source-id: f7f65d6d81c7942293f53fdaa61f756d8b7360c1

5 years agoExpose cuda kernel for caffe2::GenerateProposals
Sebastian Messmer [Mon, 4 Mar 2019 22:53:55 +0000 (14:53 -0800)]
Expose cuda kernel for caffe2::GenerateProposals

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17066

Reviewed By: ezyang, wat3rBro

Differential Revision: D14071130

fbshipit-source-id: 6fe26503f6069c36ec31d6c09b549b932d5db242

5 years agoprint warnings when DNNLOWP_16 or DNNLOWP_ROWWISE_16 engine is used (#17176)
Jongsoo Park [Mon, 4 Mar 2019 22:25:19 +0000 (14:25 -0800)]
print warnings when DNNLOWP_16 or DNNLOWP_ROWWISE_16 engine is used (#17176)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17176

As title

Reviewed By: csummersea

Differential Revision: D14111616

fbshipit-source-id: 1282cb2452c4ad385fd2dc6d3f8c19e9fec715ff

5 years agoFix XOutput/XOutputTensor for ivalue based c2 operators (#17599)
Sebastian Messmer [Mon, 4 Mar 2019 22:17:11 +0000 (14:17 -0800)]
Fix XOutput/XOutputTensor for ivalue based c2 operators (#17599)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17599

XOutput/XOutputTensor was broken for ivalue based operators. This diff fixes that.

Reviewed By: ezyang

Differential Revision: D14274003

fbshipit-source-id: b99f020244c66c4e2551dbd32ae0f665cc91b338

5 years agoFix InputSize/OutputSize for ivalue based operators (#17579)
Sebastian Messmer [Mon, 4 Mar 2019 22:17:11 +0000 (14:17 -0800)]
Fix InputSize/OutputSize for ivalue based operators (#17579)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17579

These methods previously just returned 0 when it was not a legacy operator,
making it impossible to convert some operators.

Reviewed By: dzhulgakov

Differential Revision: D14253094

fbshipit-source-id: 72bfdcf6da291a4ab80d1e0ceb20984b86edc408

5 years agoFix clamp fusion on missing limits (#17533)
Wanchao Liang [Mon, 4 Mar 2019 21:04:53 +0000 (13:04 -0800)]
Fix clamp fusion on missing limits (#17533)

Summary:
Fixes #17449

Context: before #17186, we don't fuse `clamp` for the case when `min/max` are missing inputs, because they are `prim::None` node, after #17186, we make None a `prim::Constant` node which enables the fusion for `clamp`. But codegen.cpp does not handle the case when `prim::Constant` is not a Double/Int/Bool, this PR makes it so that missing inputs are handled correctly, it is done in the following way:

1. emit nothing when you see `type? = prim::Constant()`
2. when emitRHS, do special casing for aten::clamp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17533

Differential Revision: D14238450

Pulled By: wanchaol

fbshipit-source-id: 61a272154754b13e89021bb86002927f02cde19c

5 years agoint32 indexing for Tensor Iterator Reduction (#17428)
Jie [Mon, 4 Mar 2019 21:02:40 +0000 (13:02 -0800)]
int32 indexing for Tensor Iterator Reduction (#17428)

Summary:
1. Enabling int32 indexing for cases where TI cannot accumulate in output due to
incompatible data types (e.g. Welford).
2. Updating Welford kernel to use int32 instead of int64 indexing on GPU.

This change improves performance for torch.var / torch.std

Implementation:
1. Allocated extra buffer to handle accumulation between sub Tensor Iterators.
2. Removed int64 indexing in gpu_reduce_kernel
3. WelfordOps now supports index type / combination typeas a template parameter.
While GPU uses int32_t and float, CPU implementation uses int64_t and double.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17428

Differential Revision: D14264608

Pulled By: umanwizard

fbshipit-source-id: 3eb54451de925b469dbc1127e5ea7443c4431036

5 years agoRemoved all usages of TH_Index_Base (#17591)
Iurii Zdebskyi [Mon, 4 Mar 2019 20:43:28 +0000 (12:43 -0800)]
Removed all usages of TH_Index_Base (#17591)

Summary:
TH_Index_Base is hard coded to 0 and can be removed from the code base.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17591

Differential Revision: D14269273

Pulled By: izdeby

fbshipit-source-id: d844e261f4af7297bad8a81e7d6dcf0a391b94e6

5 years agoPyTorch/Caffe2 tensor interop in Python (#17190)
Dmytro Dzhulgakov [Mon, 4 Mar 2019 19:30:43 +0000 (11:30 -0800)]
PyTorch/Caffe2 tensor interop in Python (#17190)

Summary:
Because of two separate python extensions with different pybind
instances I have to go through void* conversion. Since it's hidden from
user, it's fine.

New APIs added on C2 side:
- workspace.FetchTorch('blob')
- workspace.Workspace.current.blobs['blob'].to_torch()
- workspace.FeedBlob('blob', pytorch_tensor)

Works on CPU an GPU.

The only glitches are with resizing because of variable/tensor split.
But data sharing works properly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17190

Reviewed By: ezyang

Differential Revision: D14163882

Pulled By: dzhulgakov

fbshipit-source-id: d18e5b8fcae026f393c842a1149e972515732de2

5 years agoFixed typo in aten/src/ATen/native_parse.py (#17641)
wkcn [Mon, 4 Mar 2019 18:08:04 +0000 (10:08 -0800)]
Fixed typo in aten/src/ATen/native_parse.py (#17641)

Summary:
Hi, there.
There is a typo in aten/src/ATen/native_parse.py, and I fix it.
`std::aray` -> `std::array`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17641

Differential Revision: D14301981

Pulled By: ezyang

fbshipit-source-id: a37859cdedcbf6c29333b954486dfa086d6c2176

5 years agoRemove GPU dependency from ProfileObserver (#17592)
Martin Schatz [Mon, 4 Mar 2019 17:55:05 +0000 (09:55 -0800)]
Remove GPU dependency from ProfileObserver (#17592)

Summary:
Remove GPU dependency and register ProfileObserver.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17592

Reviewed By: ezyang

Differential Revision: D14265801

Pulled By: mdschatz

fbshipit-source-id: f98c0c32653c64a8b087c58ece4f864dfbe1d4b8

5 years agoDon't make factory methods create a tensor and then immediately copy it (#17565)
Brennan Vincent [Mon, 4 Mar 2019 06:13:27 +0000 (22:13 -0800)]
Don't make factory methods create a tensor and then immediately copy it (#17565)

Summary:
Create a `make_variable` override that moves out of a tensor instead of going through `shallow_copy_and_detach`. Call this override from factory methods like `empty` that create a brand new tensor, do nothing with it, and then copy it into a variable.

Will update this with actual numbers, but it seems to get rid of around 20-40% of the overhead of calling `torch.empty(0)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17565

Differential Revision: D14266130

Pulled By: umanwizard

fbshipit-source-id: f57d5f2ca3f80ee8ee96d50f905e852fd10db941

5 years agoFixed typo in torch/functional.py w/r/t broadcast_tensors (#17642)
Jack Richter-Powell [Sun, 3 Mar 2019 18:05:36 +0000 (10:05 -0800)]
Fixed typo in torch/functional.py w/r/t broadcast_tensors (#17642)

Summary:
In reference to #17574
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17642

Differential Revision: D14297177

Pulled By: ezyang

fbshipit-source-id: 968176ea3b46a0153da0fd9e6b40db314d29e51c

5 years agoChange fake tqdm constructor to match real tqdm (#17636)
Bryan He [Sun, 3 Mar 2019 09:01:26 +0000 (01:01 -0800)]
Change fake tqdm constructor to match real tqdm (#17636)

Summary:
Currently, the fake tqdm implementation requires an input (whereas real tqdm does not).

This caused a problem in torchvision (https://github.com/pytorch/vision/pull/770), and seems likely to cause minor irritations elsewhere.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17636

Differential Revision: D14296530

Pulled By: ezyang

fbshipit-source-id: bc077d898773c93dab34c985a7b30525a43e558a

5 years agoMark native_functions as matched if uncaptured by JIT (#17631)
Christian Puhrsch [Sun, 3 Mar 2019 02:14:02 +0000 (18:14 -0800)]
Mark native_functions as matched if uncaptured by JIT (#17631)

Summary:
Various functions aren't used by the JIT, so they're jit-compliant w.r.t. their schema by default.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17631

Differential Revision: D14295559

Pulled By: cpuhrsch

fbshipit-source-id: a2ecdcb5df47eb67c54ec642d88d42e985515142

5 years agoBan std::array from native_functions.yaml
Christian Puhrsch [Sat, 2 Mar 2019 03:18:47 +0000 (19:18 -0800)]
Ban std::array from native_functions.yaml

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17629

Differential Revision: D14292941

Pulled By: cpuhrsch

fbshipit-source-id: 3c3eed57a5505a4e1da3aea682092677ab0e73e3

5 years agoRemove more usages of BoolTensor and IndexTensor from native_functions.yaml
Christian Puhrsch [Sat, 2 Mar 2019 03:12:08 +0000 (19:12 -0800)]
Remove more usages of BoolTensor and IndexTensor from native_functions.yaml

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16468

Differential Revision: D14095405

Pulled By: cpuhrsch

fbshipit-source-id: ea4d6bb7a4e81c05fe9861190ddbf52201612bbf

5 years agoImplement kthvalue in ATen (#17544)
Thomas Viehmann [Sat, 2 Mar 2019 02:57:02 +0000 (18:57 -0800)]
Implement kthvalue in ATen (#17544)

Summary:
The CPU version is based on the TH version.
The GPU version is based on #8406 by Pararth Shah (thank you).

CPU quickselect based on that in TH's THTensorMoreMath.cpp, but with C++ (quickselectnoindex will be achieved by a different swap)
CPU kthvalue is based on the THTensor function in the same file.
The dim_apply function is a C++ replacement for TH_TENSOR_DIM_APPLYx macros.
The CUDA kernel uses functions adapted from the THCTensorSortK implementation.
In particular radixSelect is from THCTensorTopK.cuh.
The CUDA launcher code replaces a bunch of macros with C++. It will be re-used in one of the following patches.

Plan for further PRs:
- This
- Sort
- TopK + Mode + Median in any order
- Rip out THC stuff.

There may be utility functions / structs in the SortingCommon.cuh that come into
relevance only with sort.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17544

Differential Revision: D14286934

Pulled By: ezyang

fbshipit-source-id: 35dbea050b097e88777ac5fa5c0f499d5e23c738

5 years agoChange vml.h to support sizes greater than 2**32 - 1
Christian Puhrsch [Sat, 2 Mar 2019 00:53:23 +0000 (16:53 -0800)]
Change vml.h to support sizes greater than 2**32 - 1

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17280

Differential Revision: D14154997

Pulled By: cpuhrsch

fbshipit-source-id: c19b15d18da59c9ee87e82765d3244d2a4ef6729

5 years agomsvc_fixes (#17201)
Grigory Arutyunov [Fri, 1 Mar 2019 23:07:18 +0000 (15:07 -0800)]
msvc_fixes (#17201)

Summary:
Fixing MSVC errors

```
  D:\pytorch-scripts\caffe2_builders\v141\pytorch\aten\src\THC/THCReduce.cuh(144): error C4002: too many actual paramet
ers for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caffe2_gpu.vcxp
roj]
  D:\pytorch-scripts\caffe2_builders\v141\pytorch\aten\src\THC/THCReduce.cuh(259): error C4002: too many actual paramet
ers for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caffe2_gpu.vcxp
roj]
  D:/pytorch-scripts/caffe2_builders/v141/pytorch/aten/src/THCUNN/SpatialDilatedMaxPooling.cu(51): error C4002: too man
y actual parameters for macro 'C10_LAUNCH_BOUNDS_1' [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2
\caffe2_gpu.vcxproj]
```

on variadic C10_LAUNCH_BOUNDS as well as Debug linking issues with at::Half in pool_op_cudnn.cc like this one

```
pool_op_cudnn.obj : error LNK2019: unresolved external symbol "public: bool __cdecl caffe2::MaxPoolFunctor<class caff
e2::CUDAContext>::GlobalPoolingBackward<struct c10::Half,2>(int,int,int,struct c10::Half const *,struct c10::Half const
 ,struct c10::Half const ,struct c10::Half ,class caffe2::CUDAContext )const " (??$GlobalPoolingBackward@UHalf@c10@
@$01@?$MaxPoolFunctor@VCUDAContext@caffe2@@caffe2@QEBA_NHHHPEBUHalf@c10@00PEAU23@PEAVCUDAContext@1@Z) referenced in
 function "public: bool __cdecl caffe2::`anonymous namespace'::CuDNNMaxPoolFunctor::GlobalPoolingBackward<struct c10::H
alf,2>(int,int,int,struct c10::Half const ,struct c10::Half const ,struct c10::Half const ,struct c10::Half ,class
caffe2::CUDAContext *)const " (??$GlobalPoolingBackward@UHalf@c10@@$01@CuDNNMaxPoolFunctor@?A0xb936404a@caffe2@QEBA_NH
HHPEBUHalf@c10@00PEAU34@PEAVCUDAContext@2@Z) [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\Debug\caffe2\caff
e2_gpu.vcxproj]
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17201

Differential Revision: D14165732

Pulled By: ezyang

fbshipit-source-id: 875fd9a5b2db6f83fc483f6d750d2c011260eb8b

5 years agoHipify fixes for Masquerade logic (#17598)
Jithun Nair [Fri, 1 Mar 2019 23:00:30 +0000 (15:00 -0800)]
Hipify fixes for Masquerade logic (#17598)

Summary:
ezyang Please review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17598

Differential Revision: D14287724

Pulled By: ezyang

fbshipit-source-id: 46e5083854a827370bb4c81b82e5a4ede511e473

5 years agoRename prim::Undefined to prim::AutogradZero (#17611)
Wanchao Liang [Fri, 1 Mar 2019 23:00:01 +0000 (15:00 -0800)]
Rename prim::Undefined to prim::AutogradZero (#17611)

Summary:
supersedes #17245
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17611

Differential Revision: D14283581

Pulled By: wanchaol

fbshipit-source-id: 8022d02b8a021ea2fee9a18a2c8920eb123200c5

5 years agoAdd python test for extension backend tensor.device (#17602)
Roy Li [Fri, 1 Mar 2019 22:18:58 +0000 (14:18 -0800)]
Add python test for extension backend tensor.device (#17602)

Summary:
Adding a test for #17361
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17602

Differential Revision: D14287373

Pulled By: li-roy

fbshipit-source-id: 544ecf17eb310aed22ba0ea5f86f46b8e3bb69b5

5 years agoRevert D13935403: Call c10 cuda op from test_torch
Edward Yang [Fri, 1 Mar 2019 22:14:02 +0000 (14:14 -0800)]
Revert D13935403: Call c10 cuda op from test_torch

Differential Revision:
D13935403

Original commit changeset: b2915ec8a366

fbshipit-source-id: 0f3409d5c102d719bc1f0483695aee93e7d613c9

5 years agoadd command line option to use hive filler; add README (#17619)
Amy Yang [Fri, 1 Mar 2019 21:53:11 +0000 (13:53 -0800)]
add command line option to use hive filler; add README (#17619)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17619

--filler hive --iter -1 will let debugger exhaust all batches from a hive partition before exiting.
add README that summarizes command line options and usage.

Reviewed By: yinghai

Differential Revision: D14220166

fbshipit-source-id: daa23b7e8a9184481c6d7b67acf1599e5c99d74a

5 years agoRemove TH(CU)NN Sparse Linear (#17610)
Thomas Viehmann [Fri, 1 Mar 2019 20:32:47 +0000 (12:32 -0800)]
Remove TH(CU)NN Sparse Linear (#17610)

Summary:
Sparse Linear in TH(CU)NN implements sparse linear layers without
using sparse matrices.
It is currently not documented in PyTorch and there is no functional or
module interface. This means it is unused from a PyTorch point of view.

The reason for removing it is twofold:
- The module uses sort, which I would like to move to ATen.
- When we implement a SparseLinear layer, we would want to do it
  using sparse tensors, so it's not all that useful, anyway.

I checked this on slack with soumith, I hope the above is an accurate
representation. All bad ideas are my own.

This is part of the ongoing work to move
sort/topk/mode/median/kthvalue to ATen.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17610

Differential Revision: D14280663

Pulled By: gchanan

fbshipit-source-id: 289231d2c20626855ce2ceecd4f204b460c32378

5 years agoCorrect docstring of vision/init functions
ZhuBaohe [Fri, 1 Mar 2019 19:26:54 +0000 (11:26 -0800)]
Correct docstring of vision/init functions

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17351

Differential Revision: D14276355

Pulled By: soumith

fbshipit-source-id: 9b572b6a04eeb1e44cd93961edac76ed10f7b24e

5 years agoCall c10 cuda op from test_torch
Sebastian Messmer [Fri, 1 Mar 2019 18:51:34 +0000 (10:51 -0800)]
Call c10 cuda op from test_torch

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16692

Reviewed By: ezyang

Differential Revision: D13935403

fbshipit-source-id: b2915ec8a3664bb6e918ed357908cc33d8f9449a

5 years agoRevert #17191 and #17215 that no longer apply on Windows (#17567)
peter [Fri, 1 Mar 2019 18:33:58 +0000 (10:33 -0800)]
Revert #17191 and #17215 that no longer apply on Windows (#17567)

Summary:
They are previously merged to resolve #17051. However, since it was resolved by the upstream, and it was causing some issues like https://github.com/abjer/tsds/issues/8, I think it's time to revert these changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17567

Differential Revision: D14265241

Pulled By: kostmo

fbshipit-source-id: 7fa2b7dd4ebc5148681acb439cf82d983898694e

5 years agousertype -> class (#17528)
Michael Suo [Fri, 1 Mar 2019 18:00:19 +0000 (10:00 -0800)]
usertype -> class (#17528)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17528

as title. register_prim_ops is messy because someone ruined clang-format, but I figured it's okay to include here since this is such a mechanical change

Reviewed By: driazati

Differential Revision: D14236943

fbshipit-source-id: c2b22845837b7f830015510e48ec2ee5202fa407

5 years agoalias analysis refactor take 2 (#17594)
Michael Suo [Fri, 1 Mar 2019 18:00:19 +0000 (10:00 -0800)]
alias analysis refactor take 2 (#17594)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17594

The original version of this broke things because a concurrent change raced with it in CI.

Reviewed By: ezyang

Differential Revision: D14266663

fbshipit-source-id: e8ac5dfcb7349b4f2c425d9f0eabbfc964314063

5 years agoFix the missing Windows CPU job in the build status section (#17608)
peter [Fri, 1 Mar 2019 17:56:43 +0000 (09:56 -0800)]
Fix the missing Windows CPU job in the build status section (#17608)

Summary:
It will be better to split the CPU job on CI. But unluckily, we are out of Windows machines.
cc, davidbrownellWork yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17608

Differential Revision: D14281393

Pulled By: soumith

fbshipit-source-id: ae9a6140b7207ce56cfb2da3d812bc3fe060764a

5 years agoUpdate magma to 2.5.0 for Windows
peter [Fri, 1 Mar 2019 17:45:38 +0000 (09:45 -0800)]
Update magma to 2.5.0 for Windows

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17607

Differential Revision: D14281291

Pulled By: yf225

fbshipit-source-id: 51209c5540932871e45e54ba6d61b3b7d264aa8c

5 years agoAdding support for 0-d tensor for transpose (.t()) (#17535)
bhushan [Fri, 1 Mar 2019 16:38:06 +0000 (08:38 -0800)]
Adding support for 0-d tensor for transpose (.t()) (#17535)

Summary:
- Test updates
1. test_torch: added 0-d test case and t_() test cases
2. test_jit  : updated error message for TestAsync.test_async_script_error

- Updating documentation for torch.t()
Adding information regarding new support of 0-D and 1-D tenso

Fixes #17520
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17535

Differential Revision: D14269984

Pulled By: gchanan

fbshipit-source-id: 38b723f31484be939261c88edb33575d242eca65

5 years agoUpdating submodules
svcscm [Fri, 1 Mar 2019 09:33:59 +0000 (01:33 -0800)]
Updating submodules

Reviewed By: yns88

fbshipit-source-id: 05fafcfb34c76f425ac5c8ef24a5f920641c2cf7

5 years agoMark cudaGetLastError return value unused in C10_CUDA_CHECK
Junjie Bai [Fri, 1 Mar 2019 08:02:56 +0000 (00:02 -0800)]
Mark cudaGetLastError return value unused in C10_CUDA_CHECK

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17605

Reviewed By: xw285cornell

Differential Revision: D14277586

Pulled By: bddppq

fbshipit-source-id: 38879208f2ab83cf39d8a8a61b288cd09fcafd9a

5 years agoadd dropout during eval (#17549)
Huan Gui [Fri, 1 Mar 2019 07:17:35 +0000 (23:17 -0800)]
add dropout during eval (#17549)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17549

Currently Dropout is only enabled in training, we enable the option of having dropout in Eval.

This is to follow [1]. This functionality would be used for uncertainty estimation in exploration project.

[1] Gal, Yarin, and Zoubin Ghahramani. "Dropout as a bayesian approximation: Representing model uncertainty in deep learning." international conference on machine learning. 2016.

Reviewed By: Wakeupbuddy

Differential Revision: D14216216

fbshipit-source-id: 87c8c9cc522a82df467b685805f0775c86923d8b

5 years agoAdjust launch_bounds annotation for AMD hardware. (#17555)
Johannes M Dieterich [Fri, 1 Mar 2019 06:53:34 +0000 (22:53 -0800)]
Adjust launch_bounds annotation for AMD hardware. (#17555)

Summary:
The max pooling backwards kernel is currently annotated with launch bounds (256,8).

Adjust the number of waves to 4 (4 times 64 is 256) for ROCm. This improves training performance for torchvision models by up to 15% (AlexNet) on a gfx906 GPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17555

Differential Revision: D14277744

Pulled By: bddppq

fbshipit-source-id: 2a62088f7b8a87d1e350c432bf655288967c7883

5 years agoFix verbose compiler warning in flat_hash_map (#17562)
Sebastian Messmer [Fri, 1 Mar 2019 00:26:49 +0000 (16:26 -0800)]
Fix verbose compiler warning in flat_hash_map (#17562)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17562

fixes https://github.com/pytorch/pytorch/issues/17332

Reviewed By: ezyang

Differential Revision: D14254499

fbshipit-source-id: 9d5d7408c2ce510ac20cd438c6514dc2bbe3a854

5 years agoFix diagnostic pragmas (#17561)
Sebastian Messmer [Fri, 1 Mar 2019 00:26:49 +0000 (16:26 -0800)]
Fix diagnostic pragmas (#17561)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17561

The push at the top of the file was missing a corresponding pop

Reviewed By: ezyang

Differential Revision: D14254500

fbshipit-source-id: ff20359b563d6d6dcc68273dc754ab31aa8fad12

5 years agoAllow dispatch based on tensor list args (#17522)
Sebastian Messmer [Fri, 1 Mar 2019 00:25:37 +0000 (16:25 -0800)]
Allow dispatch based on tensor list args (#17522)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17522

Dispatch is still based on the first tensor arg, but that first "tensor arg" is now allowed to be a tensor list.
That is, the first argument that is either Tensor or TensorList will be the deciding factor for dispatch.
If it is a TensorList, then that TensorList must not be empty or dispatch will fail.

Reviewed By: ezyang

Differential Revision: D14235840

fbshipit-source-id: 266c18912d56ce77aa84306c5605c4191f3d882b

5 years agoAllow exposing caffe2 operators with variable number of input tensors to c10 (#17491)
Sebastian Messmer [Fri, 1 Mar 2019 00:25:37 +0000 (16:25 -0800)]
Allow exposing caffe2 operators with variable number of input tensors to c10 (#17491)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17491

Before, there was no way to expose a caffe2 operator that had a variable number of inputs.
Now, this is allowed by giving the operator one tensor list input.
Note that the tensor list must be the first input, and that any other tensor inputs will be ignored and inaccessible in this case.

Reviewed By: ezyang

Differential Revision: D14220705

fbshipit-source-id: 7f921bfb581caf46b229888c409bbcc40f7dda80

5 years agoblacklist fft algorithms for strided dgrad (#17016)
Syed Tousif Ahmed [Fri, 1 Mar 2019 00:17:37 +0000 (16:17 -0800)]
blacklist fft algorithms for strided dgrad (#17016)

Summary:
Applies https://github.com/pytorch/pytorch/pull/16626 from v1.0.1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17016

Differential Revision: D14270100

Pulled By: ezyang

fbshipit-source-id: 1137899dd1551d33d16f39e8dde76cad8192af46