platform/upstream/pytorch.git
5 years agoAllow torch.utils.cpp_extension.load to load shared libraries that aren't Python...
Peter Goldsborough [Mon, 26 Nov 2018 17:37:04 +0000 (09:37 -0800)]
Allow torch.utils.cpp_extension.load to load shared libraries that aren't Python modules (#13941)

Summary:
For custom TorchScript operators, `torch.ops.load_library` must be used and passed the path to the shared library containing the custom ops. Our C++ extensions stuff generally is meant to build a Python module and import it. This PR changes `torch.utils.cpp_extension.load` to have an option to just return the shared library path instead of importing it as a Python module, so you can then pass it to `torch.ops.load_library`. This means folks can re-use `torch.utils.cpp_extension.load` and `torch.utils.cpp_extension.load_inline` to even write their custom ops inline. I think t-vi  and fmassa will appreciate this.

soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13941

Differential Revision: D13110592

Pulled By: goldsborough

fbshipit-source-id: 37756307dbf80a81d2ed550e67c8743dca01dc20

5 years agoBatch more matrix multiplies (#13456)
Adam Paszke [Mon, 26 Nov 2018 17:18:43 +0000 (09:18 -0800)]
Batch more matrix multiplies (#13456)

Summary:
This handles the input pre-multiplication in RNNs, yielding pretty significant speedups in backward times. This pass depends on loop unrolling, so we'll batch only as many elements as the unrolling factor allows.

cc mruberry ngimel zou3519 zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13456

Differential Revision: D12920339

Pulled By: zou3519

fbshipit-source-id: 5bcd6d259c054a6dea02ae09a9fdf9f030856443

5 years agoEnable native wrappers for the remainder of nn functions.
Gregory Chanan [Mon, 26 Nov 2018 15:56:43 +0000 (07:56 -0800)]
Enable native wrappers for the remainder of nn functions.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14290

Differential Revision: D13162562

Pulled By: gchanan

fbshipit-source-id: 615e1727988bfeeade48f9b38162333a2e298f7b

5 years agoAdd Recency Weighted into SparseLookup (#14291)
Huan Gui [Sat, 24 Nov 2018 10:41:25 +0000 (02:41 -0800)]
Add Recency Weighted into SparseLookup (#14291)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14291

Add RecencyWeighted into SparseLookup.

Reviewed By: Wakeupbuddy

Differential Revision: D13147738

fbshipit-source-id: de5dc3aaee8ce7d41c6d30d2ff47e9786a7fa4da

5 years agoquote NUMPY_INCLUDE_DIR (#14341)
Shuichi KITAGUCHI [Sat, 24 Nov 2018 05:32:10 +0000 (21:32 -0800)]
quote NUMPY_INCLUDE_DIR (#14341)

Summary:
when NUMPY_INCLUDE_DIR contains space character (e.g. "C:\Program Files (x86)\Microsoft Visual Studio\..."), cmake cannot receive correct path name.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14341

Differential Revision: D13188408

Pulled By: soumith

fbshipit-source-id: b62127d90e53da94fe6af5d3bdd2ea4fd6546210

5 years agoshape analysis fix (#14325)
Michael Suo [Fri, 23 Nov 2018 19:22:22 +0000 (11:22 -0800)]
shape analysis fix (#14325)

Summary:
This PR is deceptively large because of an indenting change. The actual change is small; I will highlight it inline
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14325

Differential Revision: D13183296

Pulled By: suo

fbshipit-source-id: fcbf6d5317954694ec83e6b8cc1c989f2d8ac298

5 years agoSome minor fixes for Windows build script (#14218)
peter [Fri, 23 Nov 2018 16:15:28 +0000 (08:15 -0800)]
Some minor fixes for Windows build script (#14218)

Summary:
1. Fix execution failure when some of the paths are not defined
2. Users can now optionally override install dir by setting `CMAKE_INSTALL_PREFIX`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14218

Differential Revision: D13180350

Pulled By: soumith

fbshipit-source-id: 8c9680d1285dbf08b49380af1ebfa43ede99babc

5 years agoAllow dataloader to accept a custom memory pinning function (#14171)
Michael Carilli [Fri, 23 Nov 2018 16:08:35 +0000 (08:08 -0800)]
Allow dataloader to accept a custom memory pinning function (#14171)

Summary:
Currently, the `pin_memory_batch` function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how.

This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom `collate_fn` returns a custom batch type.

The present PR adds the ability for the user to pass a `pin_fn` alongside any custom `collate_fn` to handle such custom types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14171

Differential Revision: D13166669

Pulled By: soumith

fbshipit-source-id: ca965f9841d4a259b3ca4413c8bd0d8743d433ab

5 years agoOption to preserve bitwise accuracy of gradient checkpointed vs non-checkpointed...
Michael Carilli [Fri, 23 Nov 2018 16:07:51 +0000 (08:07 -0800)]
Option to preserve bitwise accuracy of gradient checkpointed vs non-checkpointed dropout (#14253)

Summary:
This issue was noticed, and fix proposed, by raulpuric.

Checkpointing is implemented by rerunning a forward-pass segment for each checkpointed segment during backward.  This can result in the RNG state advancing more than it would without checkpointing, which can cause checkpoints that include dropout invocations to lose end-to-end bitwise accuracy as compared to non-checkpointed passes.

The present PR contains optional logic to juggle the RNG states such that checkpointed passes containing dropout achieve bitwise accuracy with non-checkpointed equivalents.**  The user requests this behavior by supplying `preserve_rng_state=True` to `torch.utils.checkpoint` or `torch.utils.checkpoint_sequential`.

Currently, `preserve_rng_state=True` may incur a moderate performance hit because restoring MTGP states can be expensive.  However, restoring Philox states is dirt cheap, so syed-ahmed's [RNG refactor](https://github.com/pytorch/pytorch/pull/13070#discussion_r235179882), once merged, will make this option more or less free.

I'm a little wary of the [def checkpoint(function, *args, preserve_rng_state=False):](https://github.com/pytorch/pytorch/pull/14253/files#diff-58da227fc9b1d56752b7dfad90428fe0R75) argument-passing method (specifically, putting a kwarg after a variable argument list).  Python 3 seems happy with it.
Edit:  It appears Python 2.7 is NOT happy with a [kwarg after *args](https://travis-ci.org/pytorch/pytorch/builds/457706518?utm_source=github_status&utm_medium=notification).  `preserve_rng_state` also needs to be communicated in a way that doesn't break any existing usage.  I'm open to suggestions (a global flag perhaps)?

**Batchnorm may still be an issue, but that's a battle for another day.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14253

Differential Revision: D13166665

Pulled By: soumith

fbshipit-source-id: 240cddab57ceaccba038b0276151342344eeecd7

5 years agoUpdating submodules
svcscm [Fri, 23 Nov 2018 05:58:28 +0000 (21:58 -0800)]
Updating submodules

Reviewed By: yns88

fbshipit-source-id: e92b0c24a56b588dcf30542692cb4bdc2d474825

5 years agoRemove individual "using c10:xxx" statements (#13168)
Sebastian Messmer [Thu, 22 Nov 2018 19:55:07 +0000 (11:55 -0800)]
Remove individual "using c10:xxx" statements (#13168)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13168

We now have a "using namespace c10" in the at and caffe2 namespaces, we don't need the individual ones anymore

Reviewed By: ezyang

Differential Revision: D11669870

fbshipit-source-id: fc2bb1008e533906914188da4b6eb30e7db6acc1

5 years agoMake sure we bind input/output of Onnxifi op positionally (#14214)
Yinghai Lu [Thu, 22 Nov 2018 08:28:51 +0000 (00:28 -0800)]
Make sure we bind input/output of Onnxifi op positionally (#14214)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14214

This is to pick up the residual task of T36325466 to make sure that input/output binding of c2 Onnxifi op is positional.

Reviewed By: dzhulgakov

Differential Revision: D13134470

fbshipit-source-id: d1b916dade65c79133b86507cd54ea5166fa6810

5 years agoConvert gumbel_softmax, lp pooling weak functions and modules (#14232)
Wanchao Liang [Thu, 22 Nov 2018 07:42:24 +0000 (23:42 -0800)]
Convert gumbel_softmax, lp pooling weak functions and modules (#14232)

Summary:
1. Support `Optional[BroadcastingList1[int]]` like type annotation to accept a int or a list[int]
2. Convert gumbel_softmax, lp pooling weak functions and modules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14232

Differential Revision: D13164506

Pulled By: wanchaol

fbshipit-source-id: 6c2a2b9a0613bfe907dbb5934122656ce2b05700

5 years agoUse ADL to find toString (#14021)
Sebastian Messmer [Thu, 22 Nov 2018 07:04:43 +0000 (23:04 -0800)]
Use ADL to find toString (#14021)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14021

I'm planning to move at::Scalar to c10, and there's a at::toString(Scalar) defined.
Unfortunately, we call it by specifying at::toString() instead of relying on ADL.
This diff changes that to prepare the actual move.

Reviewed By: ezyang

Differential Revision: D13015239

fbshipit-source-id: f2a09f43a96bc5ef20ec2c4c88f7790fd5a04870

5 years agoFix include paths for intrusive_ptr (#13692)
Sebastian Messmer [Thu, 22 Nov 2018 07:04:42 +0000 (23:04 -0800)]
Fix include paths for intrusive_ptr (#13692)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13692

This now lives in c10/util, not ATen/core anymore.

Reviewed By: ezyang

Differential Revision: D12937091

fbshipit-source-id: ea2d420a15e7941a38d0b4c75e20ca18437c73f8

5 years agoMove intrusive_ptr to c10/util
Sebastian Messmer [Thu, 22 Nov 2018 07:04:42 +0000 (23:04 -0800)]
Move intrusive_ptr to c10/util

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13691

Reviewed By: ezyang

Differential Revision: D12937090

fbshipit-source-id: fe9d21d5f7ea4e78e7e38ac60db13814a9971ed9

5 years agoignore generated caffe2 docs and virtualenvs
Joel Marcey [Thu, 22 Nov 2018 06:28:20 +0000 (22:28 -0800)]
ignore generated caffe2 docs and virtualenvs

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14309

Reviewed By: soumith

Differential Revision: D13166626

Pulled By: JoelMarcey

fbshipit-source-id: 4f11228d8b5da85cec222bf11282722a7319581b

5 years agoUpdating submodules
svcscm [Thu, 22 Nov 2018 05:59:40 +0000 (21:59 -0800)]
Updating submodules

Reviewed By: yns88

fbshipit-source-id: 20976d595e68a08d746d8806fd0205d810656366

5 years agoremoving quantization utility functions moved to fbgemm (#14301)
Jongsoo Park [Thu, 22 Nov 2018 05:36:16 +0000 (21:36 -0800)]
removing quantization utility functions moved to fbgemm (#14301)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14301

This diff removes quantization utility functions copied to fbgemm

Reviewed By: Maratyszcza

Differential Revision: D13159299

fbshipit-source-id: a7f3cd2af0aa241a8578d532a70a157da70d9289

5 years agoCuda version comparison with CUDA_VERSION_STRING (#14302)
Achal Shah [Thu, 22 Nov 2018 05:00:22 +0000 (21:00 -0800)]
Cuda version comparison with CUDA_VERSION_STRING (#14302)

Summary:
Cuda headers include cuda version in form of major.minor. But when we do find_package(cuda). CUDA_VERSION variable includes patch number as well which fails following condition.

`
if(NOT ${cuda_version_from_header} STREQUAL ${CUDA_VERSION})
`

**For example:**
I have cuda 10.0 installed. My nvcc output looks like this
`Cuda compilation tools, release 10.0, **V10.0.130**
`

If I compile my application with caffe2. It gives me following error:

```
CMake Error at /usr/share/cmake/Caffe2/public/cuda.cmake:59 (message):
  FindCUDA says CUDA version is (usually determined by nvcc), but the CUDA
  headers say the version is 10.0.  This often occurs when you set both
  CUDA_HOME and CUDA_NVCC_EXECUTABLE to non-standard locations, without also
  setting PATH to point to the correct nvcc.  Perhaps, try re-running this
  command again with PATH=/usr/local/cuda/bin:$PATH.  See above log messages
  for more diagnostics, and see
  https://github.com/pytorch/pytorch/issues/8092 for more details.
```

**In this case, it got failed because**
cuda_version_from_header = 10.0
CUDA_VERSION = 10.0.130 (Came from NVCC)

`if(NOT ${cuda_version_from_header} STREQUAL ${CUDA_VERSION})
`

**Fix:**
We should compare header version with **major.minor format** which is given by CUDA_VERSION_STRING
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14302

Differential Revision: D13166485

Pulled By: soumith

fbshipit-source-id: 1b74e756a76c4cc5aa09978f5850f763ed5469b6

5 years agoUpdating submodules
svcscm [Thu, 22 Nov 2018 04:51:26 +0000 (20:51 -0800)]
Updating submodules

Reviewed By: yns88

fbshipit-source-id: ee60b4dddf688608ef80043b1dc336d120a045d0

5 years agoUpdating submodules
svcscm [Thu, 22 Nov 2018 04:29:22 +0000 (20:29 -0800)]
Updating submodules

Reviewed By: yns88

fbshipit-source-id: 366c29d09bec53459e2a4890c7fe8d10f45ff5c3

5 years agoRobust NCCL barrier improvement to cover all devices combinations (#14271)
Teng Li [Thu, 22 Nov 2018 02:21:55 +0000 (18:21 -0800)]
Robust NCCL barrier improvement to cover all devices combinations (#14271)

Summary:
This covers the very edgy case when we run the same NCCL process group with multiple GPU combinations instead of the last GPU combination. We always keep track of what GPUs have been used previously in the NCCL process group and barrier() itself will synchronize on each GPU's NCCL stream.

Test covered as well. Tested on 8-GPU machine
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14271

Differential Revision: D13164993

Pulled By: teng-li

fbshipit-source-id: 81e04352740ea50b5e943369e74cfcba40bb61c1

5 years agoalias analysis (#14018)
Michael Suo [Thu, 22 Nov 2018 01:46:46 +0000 (17:46 -0800)]
alias analysis (#14018)

Summary:
First draft of an alias analysis pass. It's a big PR unfortunately; a rough table of contents/suggested order of review:
1. `AliasAnalysis` pass, which traverses the graph and builds an `AliasDb`. The basic strategy is to assign alias information to every value of mutable type (list/tuple/tensor), and use the alias annotations of each node's schema to assign alias info to the outputs based on the alias info the inputs. Nodes that aren't explicitly schematized have hand-written analysis rules.

2. Integration of aliasing information into `moveBefore/AfterTopologicallyValid()`. Basically, we pass in an alias DB when we ask for moveBefore/After. Similar to how we can boil down dependency analysis to "what nodes use this node", we can boil down mutability analysis to "what nodes write to an alias set input/output'd by this node".

3. Integration of alias analysis to optimization passes that need it. Right now, it is `GraphFuser`, `CreateAutodiffSubgraphs`, constant prop, and CSE. Not sure if any others need it.

- Testing; still figuring out the best way to do this.
- Eventually we want to integrate the alias db into the graph, but we shouldn't do that until we can guarantee that the information can stay up to date with mutations.
- Do the same thing `python_printer` did for operators and force people to register alias analyzers if they can't schematize their op.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14018

Differential Revision: D13144906

Pulled By: suo

fbshipit-source-id: 1bc964f9121a504c237cef6dfeea6b233694de6a

5 years agoRemove extra include
Ilia Cherniavskii [Thu, 22 Nov 2018 01:19:37 +0000 (17:19 -0800)]
Remove extra include

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14206

Reviewed By: dzhulgakov

Differential Revision: D13131318

fbshipit-source-id: 559b55b8d98cdf6b7d1d3e31237c5473edc5e462

5 years agoRemoved redundant allreduce options in DDP (#14208)
Teng Li [Thu, 22 Nov 2018 00:54:36 +0000 (16:54 -0800)]
Removed redundant allreduce options in DDP (#14208)

Summary:
This somehow is not cleaned up after the C++ migration. Unused and can be removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14208

Differential Revision: D13132492

Pulled By: teng-li

fbshipit-source-id: 0f05b6368174664ebb2560c037347c8eb45f7c38

5 years agoAdd list inequality operator (#14129)
David Riazati [Thu, 22 Nov 2018 00:30:43 +0000 (16:30 -0800)]
Add list inequality operator (#14129)

Summary:
This PR adds `aten::neq` for list inequality comparisons and converts
`nll_loss` to weak script
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14129

Differential Revision: D13123894

Pulled By: driazati

fbshipit-source-id: 8c1edf7c163217ec00eb653f95d196db3998613f

5 years agoAdd onnxifi support to SparseLengthsWeightedSum (#14210)
Yinghai Lu [Wed, 21 Nov 2018 23:43:10 +0000 (15:43 -0800)]
Add onnxifi support to SparseLengthsWeightedSum (#14210)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14210

We left `SparseLengthsWeightedSum` as benchmark is not testing it due to fp16 filler issue. It was flushed out by unit tests. Hence we add the support here.

Reviewed By: bddppq

Differential Revision: D13132320

fbshipit-source-id: b21c30c185c9e1fbf3980641bc3cdc39e85af2e1

5 years agoAdd "axis" and "axis_w" arguments in FC to support customized axix to reduce dim...
Gu, Jinghui [Wed, 21 Nov 2018 23:42:29 +0000 (15:42 -0800)]
Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim. (#12971)

Summary:
Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12971

Reviewed By: bddppq

Differential Revision: D12850675

Pulled By: yinghai

fbshipit-source-id: f1cde163201bd7add53b8475329db1f038a73019

5 years agoIDEEP fallback for ResizeNearest op (#14212)
Viswanath Sivakumar [Wed, 21 Nov 2018 21:42:04 +0000 (13:42 -0800)]
IDEEP fallback for ResizeNearest op (#14212)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14212

TSIA

Reviewed By: yinghai

Differential Revision: D13134134

fbshipit-source-id: e3c5c9c8756d6e25b213f8dde9d809a44373d7a3

5 years agoFix ONNX_ATEN mode (#14239)
zrphercule [Wed, 21 Nov 2018 21:12:18 +0000 (13:12 -0800)]
Fix ONNX_ATEN mode (#14239)

Summary:
Fix ONNX_ATEN mode by adding it to the validateBlock method.
Before this pr, validateBlock will throw an exception when using this mode.

I will add related test cases for ONNX_ATEN mode in a different pr once this is merged, since we dont have any currently.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14239

Differential Revision: D13145443

Pulled By: zrphercule

fbshipit-source-id: 60e7942aa126acfe67bdb428ef231ac3066234b1

5 years agoBump gloo (#14281)
Pieter Noordhuis [Wed, 21 Nov 2018 19:25:42 +0000 (11:25 -0800)]
Bump gloo (#14281)

Summary:
Includes more robust error handling and timeout support.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14281

Differential Revision: D13158232

Pulled By: pietern

fbshipit-source-id: e80432799a020576d5abdcd9a21d66b629479caf

5 years agofix comment on dnnlowp op arguments (#14265)
Jongsoo Park [Wed, 21 Nov 2018 17:37:58 +0000 (09:37 -0800)]
fix comment on dnnlowp op arguments (#14265)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14265

Fix comment

Reviewed By: hx89

Differential Revision: D13152106

fbshipit-source-id: fbe98906963cbd5cb20a583a737a792fbc38292e

5 years agonative NN wrappers, including with buffers.
Gregory Chanan [Wed, 21 Nov 2018 17:04:59 +0000 (09:04 -0800)]
native NN wrappers, including with buffers.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14256

Differential Revision: D13148783

Pulled By: gchanan

fbshipit-source-id: 4b6179033cf1df26061b6731eaaa4e008692e592

5 years agoRemove header generated at configuration time (#14244)
Pieter Noordhuis [Wed, 21 Nov 2018 16:43:14 +0000 (08:43 -0800)]
Remove header generated at configuration time (#14244)

Summary:
The build was picking up the empty stub header instead of the generated
one. Because of the large number of include paths we end up passing to
the compiler it is brittle to have both an empty stub file and a
generated file and expect the compiler to pick up the right one.

With the recent change to compile everything from a single CMake run we
can now use native CMake facilities to propagate macros that indicate
backend support. The stanzas target_compile_definitions with the
INTERFACE flag ensure that these macros are set only for downstream
consumers of the c10d target.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14244

Reviewed By: teng-li

Differential Revision: D13144293

Pulled By: pietern

fbshipit-source-id: f49324220db689c68c126b159f4f00a8b9bc1252

5 years agoAddress jittering issues in python_print (#14064)
Zachary DeVito [Wed, 21 Nov 2018 14:36:26 +0000 (06:36 -0800)]
Address jittering issues in python_print (#14064)

Summary:
export - print a method with python_print
import - import a method with import_method

We want to ensure:

    export(g) == export(import(export(g)))

That is after after exporting/importing once, the graph will stay exactly
the same. This is less strict that g == import(export(g)) which would
require us to maintain a lot more information about the structure of the
IR and about the names of debug symbols.

This PR addresses this with the following fixes:
* print out double-precision numbers with high enough precision such
  that they always parse in the same way
* when creating loop-carried dependencies, sort them
  by variable name, ensuring a consistent order
* parse nan correctly
* DCE: remove unused outputs of if statements, and loop-carried dependencies
  in loops that are dead both after the loop and inside the body of the
  loop.
* Do not set uniqueName for variables whose names are _[0-9]+, these
  are probably rare in user code, and we need a way to communicate
  that we do not care about a variable name when re-parsing the graph.
  Otherwise temporary variable names will jitter around.
* Expand the definition of a constant in printing code to None,
  and family.
* Allow re-treeing to work as long as the only thing in its way is a
  constant node. These do not have side effects but are sometimes
  inserted in a different order when tracing compared to how we print them.
* Print all constant nodes out first in the order in which they are used_val
 (or, if they are inlined, ensure they get assigned CONSTANT.cX number
  in a consistent order). Cleanup tuples (this is done in the compiler,
  but not in the tracer, leading to some tuple indexing jitter if not
  done).
* use strtod_l, not std::stod which can throw exceptions

Other:
* Add REL_WITH_DEB_INFO to setup.py. It already existed for the
  cmake files. Threading it into setup.py allows us to turn on
  debug symbols with optimization everywhere.
* enable round trip testing for all generated graphs. This only adds
  ~6 seconds to total build time but tests printing for every graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14064

Differential Revision: D13094637

Pulled By: zdevito

fbshipit-source-id: 0a1c6912194d965f15d6b0c6cf838ccc551f161d

5 years agoUpdating submodules
svcscm [Wed, 21 Nov 2018 10:16:29 +0000 (02:16 -0800)]
Updating submodules

Reviewed By: cdelahousse

fbshipit-source-id: 27838fb2dad82c78906faf3cc2d124557c30e88f

5 years agoUpdating submodules
svcscm [Wed, 21 Nov 2018 08:25:17 +0000 (00:25 -0800)]
Updating submodules

Reviewed By: cdelahousse

fbshipit-source-id: 3c17e12a579245a84e9a56b1d8a1641232150675

5 years agoAdd tensor table in ModelDef and use it for jit script serialization and deserializat...
Lu Fang [Wed, 21 Nov 2018 07:33:30 +0000 (23:33 -0800)]
Add tensor table in ModelDef and use it for jit script serialization and deserialization (#13861)

Summary:
As we discussed, the tensors in the torch script will be associated with the tensor data in the serialized file. So let's add a table of tensor (actually it's a repeated TensorProto filed) in the ModelDef. TensorProto.name will be the id.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13861

Reviewed By: dzhulgakov

Differential Revision: D13036940

Pulled By: zrphercule

fbshipit-source-id: ecb91b062ac4bc26af2a8d6d12c91d5614efd559

5 years agoc10d Automatically retry on EINTR (#14180)
Tongzhou Wang [Wed, 21 Nov 2018 07:27:16 +0000 (23:27 -0800)]
c10d Automatically retry on EINTR (#14180)

Summary:
Probably fixes https://github.com/pytorch/pytorch/issues/14170

Actually I probably shouldn't retry all `SYSCHECK` calls. I'll leave to the reviewers to decide.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14180

Reviewed By: pietern

Differential Revision: D13144741

Pulled By: SsnL

fbshipit-source-id: d73288f76b18cae14b1b43dad4e5e8d010a96d95

5 years agoMake NCCL backend support barrier op (#14142)
Teng Li [Wed, 21 Nov 2018 05:10:18 +0000 (21:10 -0800)]
Make NCCL backend support barrier op (#14142)

Summary:
This is a feature request from: https://github.com/pytorch/pytorch/issues/13573

As the title says, this PR makes NCCL backend support barrier op.

There are a couple scenarios that need to be addressed:
(1) When there is already a NCCL op happened, we need to record what GPU device(s)  the previous op happened and queue the allreduce barrier op on the same GPU device
(2) When there is no NCCL op yet, we will try to use a single GPU and separate each process from a single GPU as the best effort.

As for the async work, during wait, we would like not just wait on the NCCL kernel to be completed, but also block the thread until the current stream and nccl stream return.

`test_distributed` should cover the test. I also manually tested both scenarios.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14142

Differential Revision: D13113391

Pulled By: teng-li

fbshipit-source-id: 96c33d4d129e2977e6892d85d0fc449424c35499

5 years agoFix memory leakage in onnxifi transformer (#14245)
Yinghai Lu [Wed, 21 Nov 2018 02:00:14 +0000 (18:00 -0800)]
Fix memory leakage in onnxifi transformer (#14245)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14245

tsia

Reviewed By: bddppq, rdzhabarov

Differential Revision: D13144783

fbshipit-source-id: 5e07bb7ab883ba1af68547a26272cd320967b9e3

5 years agoAllow undefined tensors as constants (#14120)
David Riazati [Wed, 21 Nov 2018 00:42:00 +0000 (16:42 -0800)]
Allow undefined tensors as constants (#14120)

Summary:
This PR inserts `prim::None` constants for undefined tensors. This comes in the standard library if an `Optional[Tensor]` is statically determined to be `None`:

```python
torch.jit.script
def fn(x=None):
    # type: (Optional[Tensor]) -> Tensor
    return torch.jit._unwrap_optional(x)

torch.jit.script
def fn2():
    # type: () -> Tensor
    return fn()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14120

Differential Revision: D13124625

Pulled By: driazati

fbshipit-source-id: 9eaa82e478c49c503f68ed89d8c770e8273ea569

5 years agoExport BatchNorm functional and module, add necessary JIT support (#14016)
Wanchao Liang [Tue, 20 Nov 2018 22:09:27 +0000 (14:09 -0800)]
Export BatchNorm functional and module, add necessary JIT support (#14016)

Summary:
This PR did three things:

1. It export the BatchNorm functional and module, and rewrite some of the components to stay align with the current supported JIT features
2. In the process of export, add necessary compiler support for in_place op aug assign
4. change the test_jit behavior in add_module_test to utilize a single rng state during module initialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14016

Differential Revision: D13112064

Pulled By: wanchaol

fbshipit-source-id: 31e3aee5fbb509673c781e7dbb6d8884cfa55d91

5 years agoHave PYTORCH_FUSION_DEBUG print C kernel source (#14213)
Thomas Viehmann [Tue, 20 Nov 2018 20:43:23 +0000 (12:43 -0800)]
Have PYTORCH_FUSION_DEBUG print C kernel source (#14213)

Summary:
- Move up handling the environment variable from CPU only to all
- Introduce two levels to be enabled with PYTORCH_FUSION_DEBUG=n:
  1: print C source
  2: print CPU assembly, too (previous effect of PYTORCH_FUSION_DEBUG)

apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14213

Differential Revision: D13135393

Pulled By: soumith

fbshipit-source-id: befa4ebea3b3c97e471393a9f6402b93a6b24031

5 years agoDelete backwards compatibility StorageImpl.h and TensorImpl.h (#14230)
Tugrul Ates [Tue, 20 Nov 2018 20:23:14 +0000 (12:23 -0800)]
Delete backwards compatibility StorageImpl.h and TensorImpl.h (#14230)

Summary:
Since they directly include the real ones in core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14230

Differential Revision: D13140323

Pulled By: tugrulates

fbshipit-source-id: d7e3b94e891b2d7fa273d01c0b7edfebdbd7e368

5 years agoremove unused parameters from caffe2_dnnlowp_utils.cc (#14164)
Jongsoo Park [Tue, 20 Nov 2018 08:53:29 +0000 (00:53 -0800)]
remove unused parameters from caffe2_dnnlowp_utils.cc (#14164)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14164

See title

Reviewed By: csummersea

Differential Revision: D13115470

fbshipit-source-id: d754f558cd06e5f4c1cd00315e912cdb7b50731a

5 years agouse pragma once (#14163)
Jongsoo Park [Tue, 20 Nov 2018 08:53:29 +0000 (00:53 -0800)]
use pragma once (#14163)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14163

Some of the names we were using to guard the header file was too short (e.g. DYNAMIC_HISTOGRAM_H).

Reviewed By: csummersea

Differential Revision: D13115451

fbshipit-source-id: cef8c84c62922616ceea17effff7bdf8d67302a2

5 years agoformat python files (#14161)
Jongsoo Park [Tue, 20 Nov 2018 08:53:29 +0000 (00:53 -0800)]
format python files (#14161)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14161

Formatting using Nuclide

Reviewed By: hx89

Differential Revision: D13115348

fbshipit-source-id: 7432ce6072a1822d7287b4ebcfcb6309282e15ac

5 years agoclang-format (#14160)
Jongsoo Park [Tue, 20 Nov 2018 08:53:29 +0000 (00:53 -0800)]
clang-format (#14160)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14160

clang-format of C++ files

Reviewed By: hx89

Differential Revision: D13115201

fbshipit-source-id: d2ad65f66209e00578ef90f87f41272de2d24aa9

5 years agoAdd sigmoid op based on MKL-DNN
Hui Wu [Tue, 20 Nov 2018 06:54:19 +0000 (22:54 -0800)]
Add sigmoid op based on MKL-DNN

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13097

Differential Revision: D13105366

Pulled By: yinghai

fbshipit-source-id: d156e8fd519baeecf61c25dcd8fa2c2fa7351ef4

5 years agoOSS build fix (#14192)
Daya S Khudia [Tue, 20 Nov 2018 06:45:00 +0000 (22:45 -0800)]
OSS build fix (#14192)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14192

We can only use C10_* in OSS. The build is only broken if built with USE_FBGEMM=ON

Reviewed By: jianyuh

Differential Revision: D13121781

fbshipit-source-id: f0ee9a75997766e63e1da8a53de7ddb98296a171

5 years agoMake EncodeMethod in jit script serialization return a string (#14167)
Lu Fang [Tue, 20 Nov 2018 06:12:16 +0000 (22:12 -0800)]
Make EncodeMethod in jit script serialization return a string (#14167)

Summary:
Nit

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14167

Reviewed By: ezyang

Differential Revision: D13116584

Pulled By: dzhulgakov

fbshipit-source-id: c0e7e71a81004031564bd2fc59f393041e1283d5

5 years agoCreate README.md of caffe2/quantization/server
Jongsoo Park [Tue, 20 Nov 2018 05:44:29 +0000 (21:44 -0800)]
Create README.md of caffe2/quantization/server

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14217

Reviewed By: csummersea

Differential Revision: D13135086

Pulled By: jspark1105

fbshipit-source-id: bddf4f1c2dc5ec8ea6ebe9e265956f367e082d52

5 years agoCircleCI: fix NCCL install (#14172)
Will Feng [Tue, 20 Nov 2018 05:28:29 +0000 (21:28 -0800)]
CircleCI: fix NCCL install (#14172)

Summary:
The `$BUILD_ENVIRONMENT` checks work in `test.sh` but not `build.sh`, this PR fixes the issue.

This replaces https://github.com/pytorch/pytorch/pull/14124.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14172

Differential Revision: D13135087

Pulled By: yf225

fbshipit-source-id: 42fff3926734778713d483d74ba0a89e5502dd9e

5 years agoFix a bug in test case of onnx::If
zrphercule [Tue, 20 Nov 2018 02:43:58 +0000 (18:43 -0800)]
Fix a bug in test case of onnx::If

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14209

Differential Revision: D13132607

Pulled By: zrphercule

fbshipit-source-id: b7f7ccc6a6cbdeb57a7f88a1971d15dd81e6fc81

5 years agoTensor type checking and informative error messages for torch.distributed (#14204)
Teng Li [Tue, 20 Nov 2018 02:25:00 +0000 (18:25 -0800)]
Tensor type checking and informative error messages for torch.distributed (#14204)

Summary:
This will address https://github.com/pytorch/pytorch/issues/13574

This error message should be more informative to the user for all the non-multiGPU ops, since we python binding to multi-gpu ops always.

test_distributed should cover all. Also tested both RunTime errors.

```
>>> a = torch.ByteTensor([])
>>> b = [a, a]
>>> dist.all_reduce(b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 809, in all_reduce
    _check_single_tensor(tensor, "tensor")
  File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 207, in _check_single_tensor
    "to be a torch.Tensor type".format(param_name))
RuntimeError: Invalid function argument. Expecting parameter: tensor to be a torch.Tensor type

>>> b = ["b"]
>>> dist.all_gather(b, a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 1006, in all_gather
    _check_tensor_list(tensor_list, "tensor_list")
  File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 225, in _check_tensor_list
    "to be a List[torch.Tensor] type".format(param_name))
RuntimeError: Invalid function argument. Expecting parameter: tensor_list to be a List[torch.Tensor] type
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14204

Differential Revision: D13131526

Pulled By: teng-li

fbshipit-source-id: bca3d881e41044a013a6b90fa187e722b9dd45f2

5 years agoMove stream functions from CUDAContext to CUDAStream (#14110)
Edward Yang [Tue, 20 Nov 2018 01:01:34 +0000 (17:01 -0800)]
Move stream functions from CUDAContext to CUDAStream (#14110)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14110

I'm planning to move CUDAStream to c10/cuda, without also moving
CUDAContext, and so it's most convenient if these definitions
are in the actual header file in question.

Reviewed By: smessmer

Differential Revision: D13104693

fbshipit-source-id: 23ce492003091adadaa5ca6a17124213005046c2

5 years agoMove CUDAStreamInternals inside detail namespace. (#14109)
Edward Yang [Tue, 20 Nov 2018 01:01:34 +0000 (17:01 -0800)]
Move CUDAStreamInternals inside detail namespace. (#14109)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14109

Previously it was at the top level, because the author was under
the impression that you could only refer to top-level C++ names
from C, but this is not true; you just need to make a stub struct
conditioned on __cplusplus.

Reviewed By: smessmer

Differential Revision: D13104694

fbshipit-source-id: ecb7ae6dcfa4ab4e062aad7a886937dca15fd1b2

5 years agoDelete dependencies from CUDAStream; remove synchronize_with (#13920)
Edward Yang [Tue, 20 Nov 2018 01:01:33 +0000 (17:01 -0800)]
Delete dependencies from CUDAStream; remove synchronize_with (#13920)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13920

I want to move CUDAStream and CUDAGuard to c10_cuda without also
bringing along CUDAContext or CUDAEvent for the ride (at least for
now).  To do this, I need to eliminate those dependencies.

There's a few functions in CUDAContext.h which don't really need
THCState, so they're separated out and put in general
purpose c10/cuda/CUDAFunctions.h

Reviewed By: smessmer

Differential Revision: D13047468

fbshipit-source-id: 7ed9d5e660f95805ab39d7af25892327edae050e

5 years agoFix race in AtomicFetchAdd. (#13479)
Yavuz Yetim [Mon, 19 Nov 2018 23:57:28 +0000 (15:57 -0800)]
Fix race in AtomicFetchAdd. (#13479)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13479

Increases the lock scope to above Output() calls.

These calls potentially allocate the underlying blob/tensor
objects and multiple invocations race each other over the
same output blobs/tensors.

Reviewed By: bwasti

Differential Revision: D12891629

fbshipit-source-id: a6015cfdb08e352521a1f062eb9d94a971cfbdb0

5 years agoRemove API macros from intrusive_ptr (#14137)
Sebastian Messmer [Mon, 19 Nov 2018 23:35:18 +0000 (15:35 -0800)]
Remove API macros from intrusive_ptr (#14137)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14137

This is a templated header-only class and shouldn't need export/import macros.

Reviewed By: ezyang

Differential Revision: D13111712

fbshipit-source-id: c8c958e75b090d011d25156af22f37f9ca605196

5 years agoTensor construction: combine Resize+mutable_data - 1/4 (#13942)
Jerry Zhang [Mon, 19 Nov 2018 23:29:45 +0000 (15:29 -0800)]
Tensor construction: combine Resize+mutable_data - 1/4 (#13942)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13942

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: smessmer

Differential Revision: D13054770

fbshipit-source-id: a9e86e5dfcb4f7cebf5243e1d359fad064561bed

5 years agoTensor construction: combine Resize+mutable_data - 3/4 (#13944)
Jerry Zhang [Mon, 19 Nov 2018 23:25:43 +0000 (15:25 -0800)]
Tensor construction: combine Resize+mutable_data - 3/4 (#13944)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13944

Pull Request resolved: https://github.com/pytorch/pytorch/pull/13854

Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407

Reviewed By: ezyang

Differential Revision: D13054836

fbshipit-source-id: 5de07a156687f1ee607d0450410881d9176a87a7

5 years agoStore the optimize flag in module (#14166)
Lu Fang [Mon, 19 Nov 2018 22:29:31 +0000 (14:29 -0800)]
Store the optimize flag in module (#14166)

Summary:
When the save/load of script module, we store optimize flag in module instead of encoding it in method.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/14166

Reviewed By: ezyang

Differential Revision: D13117577

Pulled By: dzhulgakov

fbshipit-source-id: dc322948bda0ac5809d8ef9a345497ebb8f33a61

5 years agoCleanup caffe2 hipify exclude patterns (#14198)
Junjie Bai [Mon, 19 Nov 2018 22:21:20 +0000 (14:21 -0800)]
Cleanup caffe2 hipify exclude patterns (#14198)

Summary:
depthwise_3x3_conv_op.cu does not exist
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14198

Differential Revision: D13127479

Pulled By: bddppq

fbshipit-source-id: ec6bd434055a49ea405c4b399bde8c074114f955

5 years agoSupport 'python_module' of 'nn' in native functions. (#14126)
Gregory Chanan [Mon, 19 Nov 2018 22:10:47 +0000 (14:10 -0800)]
Support 'python_module' of 'nn' in native functions. (#14126)

Summary:
Also move mse_loss, binary_cross_entropy, l1_loss to use this functionality.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14126

Reviewed By: ezyang

Differential Revision: D13109975

Pulled By: gchanan

fbshipit-source-id: 0b29dc8cf222d25db14da7532d8dc096a988a0ec

5 years agoUse onnx proto_utils to support using protobuf-lite
Junjie Bai [Mon, 19 Nov 2018 21:25:32 +0000 (13:25 -0800)]
Use onnx proto_utils to support using protobuf-lite

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14150

Differential Revision: D13115586

Pulled By: bddppq

fbshipit-source-id: d6b6935a8deac60f6f58d62a71f6840182a72a51

5 years agoUse fbgemm revision file added by shipit (#14105)
Daya S Khudia [Mon, 19 Nov 2018 20:08:35 +0000 (12:08 -0800)]
Use fbgemm revision file added by shipit (#14105)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14105

Pull Request resolved: https://github.com/facebook/fbshipit/pull/62

Use fbgemm revision file created by ShipIt for updating fbgemm revision for pytorch. We don't have to manually update submodule now.

Reviewed By: yns88

Differential Revision: D13072074

fbshipit-source-id: bef9eabad50f7140179c370a60bd9ca73067b9b5

5 years agoSetup sccache for PyTorch ROCm CI (#14153)
Your Name [Mon, 19 Nov 2018 19:26:38 +0000 (11:26 -0800)]
Setup sccache for PyTorch ROCm CI (#14153)

Summary:
Discovered huge build time difference between caffe2 rocm build and pytorch rocm build (6min vs. 30min), turns out it's because the sccache setup needed in caffe2 docker images are not n pytorch build script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14153

Differential Revision: D13115097

Pulled By: bddppq

fbshipit-source-id: 88414f164b980f0e667c8e138479b4a75ab7692e

5 years agoallow empty index for scatter_* methods (#14077)
Ailing Zhang [Mon, 19 Nov 2018 17:45:28 +0000 (09:45 -0800)]
allow empty index for scatter_* methods (#14077)

Summary:
Fixes #2027
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14077

Differential Revision: D13095788

Pulled By: ailzhang

fbshipit-source-id: ad2c8bbf83d36e07940782b9206fbdcde8905fd3

5 years agouse at::Device throughout JIT (#14181)
ArmenAg [Mon, 19 Nov 2018 17:18:45 +0000 (09:18 -0800)]
use at::Device throughout JIT (#14181)

Summary:
zdevito soumith

Sorry about the previous PR, had some git issues. This is the same exact code as the previous PR but updated w.r.t pytorch/master.

fixes #13254
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14181

Differential Revision: D13117688

Pulled By: soumith

fbshipit-source-id: 044840b2c7a0101ef43dd16655fd9a0f9981f53f

5 years agoSupport named return arguments in native_functions. (#14100)
Gregory Chanan [Mon, 19 Nov 2018 16:18:47 +0000 (08:18 -0800)]
Support named return arguments in native_functions. (#14100)

Summary:
Note there was a hacky way of doing this before by specifying "return:" lists manually; this makes the
return names part of the function declaration itself.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14100

Differential Revision: D13101810

Pulled By: gchanan

fbshipit-source-id: 1c80574cd4e8263764fc65126427b122fe36df35

5 years agoSplit out CUDAMultiStreamGuard from CUDAGuard (#13912)
Edward Yang [Mon, 19 Nov 2018 16:13:08 +0000 (08:13 -0800)]
Split out CUDAMultiStreamGuard from CUDAGuard (#13912)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13912

The implementation and API of CUDAMultiStreamGuard is less mature,
and it cannot be implemented generically (yet) in c10_cuda.  This
might be a reasonable thing to do eventually, but not for now.

Reviewed By: smessmer

Differential Revision: D13046500

fbshipit-source-id: 4ea39ca1344f1ad5ae7c82c98617aa348c327848

5 years agoMove AT_CUDA_CHECK to c10
Edward Yang [Mon, 19 Nov 2018 16:13:08 +0000 (08:13 -0800)]
Move AT_CUDA_CHECK to c10

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13910

Reviewed By: smessmer

Differential Revision: D13046201

fbshipit-source-id: 8d360a0e4d6c2edf070d130e600c6b04f0ee0058

5 years agoAdd c10 cuda library. (#13900)
Edward Yang [Mon, 19 Nov 2018 16:13:07 +0000 (08:13 -0800)]
Add c10 cuda library. (#13900)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13900

Add c10 cuda library.

Right now, this is not used by anything, and only tests if the CUDA
headers are available (and not, e.g., that linking works.)

Extra changes:
- cmake/public/cuda.cmake now is correctly include guarded, so you
  can include it multiple times without trouble.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Reviewed By: smessmer

Differential Revision: D13025313

fbshipit-source-id: fda85b4c35783ffb48ddd6bbb98dbd9154119d86

5 years agoSwitch Int8Add operator to QNNPACK (#14089)
Marat Dukhan [Mon, 19 Nov 2018 07:55:01 +0000 (23:55 -0800)]
Switch Int8Add operator to QNNPACK (#14089)

Summary:
- Improved single-threaded performance due to optimized low-level micro-kernels
- Improved parallelization (previously was parallelized across images in a batch and pixels only, now within channels as well)
- Slightly different result due to different implementation of fixed-point arithmetics (no accuracy loss expected)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14089

Differential Revision: D13110135

Pulled By: Maratyszcza

fbshipit-source-id: 1f149394af5c16940f79a3fd36e183bba1be2497

5 years agoNo more -werror for c10d (#14155)
Teng Li [Sun, 18 Nov 2018 21:51:15 +0000 (13:51 -0800)]
No more -werror for c10d (#14155)

Summary:
As the title says
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14155

Differential Revision: D13115769

Pulled By: teng-li

fbshipit-source-id: 278deba090364544d92fa603621604ce37fa974e

5 years agoAdd ultra low precision options (#14133)
Summer Deng [Sun, 18 Nov 2018 20:49:39 +0000 (12:49 -0800)]
Add ultra low precision options (#14133)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14133

Experiment with ultra low precisions on the Resnext-101 URU trunk model

Reviewed By: jspark1105

Differential Revision: D10108518

fbshipit-source-id: f04d74fbe1c9e75efafcd9845719bdb2efbbfe9c

5 years agoAdds symbolic diff for THNN Conv2d and aten native BatchNorm (#13888)
Soumith Chintala [Sun, 18 Nov 2018 17:20:29 +0000 (09:20 -0800)]
Adds symbolic diff for THNN Conv2d and aten native BatchNorm (#13888)

Summary:
Adds symbolic diff and tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13888

Differential Revision: D13115548

Pulled By: soumith

fbshipit-source-id: ba75b01a95a5715a7761724dda018168b6188917

5 years agoPrint warning when ROCm memory leaking is detected in pytorch tests (#14151)
Your Name [Sun, 18 Nov 2018 08:09:25 +0000 (00:09 -0800)]
Print warning when ROCm memory leaking is detected in pytorch tests (#14151)

Summary:
We keep seeing random failures in CI because of ROCm memory leaking, e.g:

https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/3102//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/3080//console

To make the CI more stable, turn it to warning instead of failure.

iotamudelta please help investigating the memory leaking
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14151

Differential Revision: D13115096

Pulled By: bddppq

fbshipit-source-id: a13b68274ecba363d9d8436aa6a62ac40a77d78c

5 years agoRemove debugging code in test_cholesky_batched (#14156)
vishwakftw [Sun, 18 Nov 2018 06:25:39 +0000 (22:25 -0800)]
Remove debugging code in test_cholesky_batched (#14156)

Summary:
They didn't turn up in my tests because I use pytest which doesn't
print debug statements if the tests pass

Differential Revision: D13115227

Pulled By: soumith

fbshipit-source-id: 46a7d47da7412d6b071158a23ab21e7fb0c6e11b

5 years agoBack out "[reland][codemod][caffe2] Tensor construction: combine Resize+mutable_data...
Jerry Zhang [Sun, 18 Nov 2018 03:42:42 +0000 (19:42 -0800)]
Back out "[reland][codemod][caffe2] Tensor construction: combine Resize+mutable_data - 2/4" (#14154)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14154

Original commit changeset: e89c2e692178

Reviewed By: amateurcoffee

Differential Revision: D13115023

fbshipit-source-id: 8f9fb55842ae6c8139d5cd88ec6d0abb0c5cc5e7

5 years agoCostInference for 1D conv (#14009)
Martin Schatz [Sun, 18 Nov 2018 01:26:09 +0000 (17:26 -0800)]
CostInference for 1D conv (#14009)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14009

As title

Reviewed By: yinghai

Differential Revision: D13078718

fbshipit-source-id: 081e7b13ad6741c635ef413915b555f10f93bd33

5 years agoBatched cholesky decomposition (#14017)
vishwakftw [Sat, 17 Nov 2018 18:47:17 +0000 (10:47 -0800)]
Batched cholesky decomposition (#14017)

Summary:
Implements batching for the Cholesky decomposition.

Performance could be improved with a dedicated batched `tril` and `triu` op, which is also impeding autograd operations.

Changes made:
- batching code
- tests in `test_torch.py`, `test_cuda.py` and `test_autograd.py`.
- doc string modification
- autograd modification
- removal of `_batch_potrf` in `MultivariateNormal`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14017

Differential Revision: D13087945

Pulled By: ezyang

fbshipit-source-id: 2386db887140295475ffc247742d5e9562a42f6e

5 years agoremove unnecessary file from avx2 list (#14012)
Jongsoo Park [Sat, 17 Nov 2018 18:26:56 +0000 (10:26 -0800)]
remove unnecessary file from avx2 list (#14012)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14012

conv_dnnlowp_op.cc doesn't need avx2 anymore.

Reviewed By: dskhudia

Differential Revision: D13079665

fbshipit-source-id: dbfe8d2213de4969b6334d54de81d51149268cbd

5 years agoChange from using enum to int to store data_type
Your Name [Sat, 17 Nov 2018 17:22:09 +0000 (09:22 -0800)]
Change from using enum to int to store data_type

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14140

Differential Revision: D13112937

Pulled By: bddppq

fbshipit-source-id: 124d9546bfbd1f9c207a21e40eb3646f7739bd58

5 years agoRevert "CircleCI: fix NCCL install (#14124)" (#14146)
Junjie Bai [Sat, 17 Nov 2018 08:20:44 +0000 (00:20 -0800)]
Revert "CircleCI: fix NCCL install (#14124)" (#14146)

Summary:
This reverts commit a1fa9d8cf9b2b0e7373ec420c2487d4dfd0e587c.

[pytorch_linux_trusty_py2_7_9_build](https://circleci.com/gh/pytorch/pytorch/270206?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link/console):
```
Nov 17 07:37:27 + sudo apt-get -qq update
Nov 17 07:37:30 W: Ignoring Provides line with DepCompareOp for package gdb-minimal
Nov 17 07:37:30 W: You may want to run apt-get update to correct these problems
Nov 17 07:37:30 + sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
Nov 17 07:37:30 E: Command line option --allow-downgrades is not understood
Nov 17 07:37:30 + cleanup
Nov 17 07:37:30 + retcode=100
Nov 17 07:37:30 + set +x
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14146

Differential Revision: D13113912

Pulled By: bddppq

fbshipit-source-id: cd9d371cf72159f03d12a8b56ed5bd2060ebbe59

5 years agoRevert D10428917: [Caffe2] Add cost into profile observer
Junjie Bai [Sat, 17 Nov 2018 07:26:12 +0000 (23:26 -0800)]
Revert D10428917: [Caffe2] Add cost into profile observer

Differential Revision:
D10428917

Original commit changeset: 7c100e551bdd

fbshipit-source-id: 5164d9ba61cc103eccfdeb91a5cc140cea31a819

5 years agoRevert D10439558: Add cost for non-linear ops
Junjie Bai [Sat, 17 Nov 2018 07:26:12 +0000 (23:26 -0800)]
Revert D10439558: Add cost for non-linear ops

Differential Revision:
D10439558

Original commit changeset: 9aeb05bac8b5

fbshipit-source-id: f00977b4f95bdd500d254eb44fb5b0c816506ee4

5 years agoUpdate FXdiv submodule (#14128)
Marat Dukhan [Sat, 17 Nov 2018 05:57:42 +0000 (21:57 -0800)]
Update FXdiv submodule (#14128)

Summary:
Use the most recent version that disables inline assembly.
I suspect inline assembly causes miscompilation on some versions of gcc7.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14128

Reviewed By: bddppq

Differential Revision: D13112370

Pulled By: Maratyszcza

fbshipit-source-id: 36cc95dc51390a293b72c18ae982c3a515a11981

5 years agoRename neon2sse.h to NEON_2_SSE.h to match upstream repo
Marat Dukhan [Sat, 17 Nov 2018 05:21:40 +0000 (21:21 -0800)]
Rename neon2sse.h to NEON_2_SSE.h to match upstream repo

Summary:
- NEON2SSE is a header that implements NEON intrinsics on top fo SSE intrinsics
- Upstream repo provides NEON_2_SSE.h header, but internally it was imported as neon2sse.h
- This patch fix incompatibilities between internal and upstream versions

Reviewed By: hlu1

Differential Revision: D13096755

fbshipit-source-id: 65e1df9a2a5e74bd52c9aee9be27469ba938cd8c

5 years agoDisable QNNPACK for multi-architecture iOS builds (#14125)
Marat Dukhan [Sat, 17 Nov 2018 05:02:37 +0000 (21:02 -0800)]
Disable QNNPACK for multi-architecture iOS builds (#14125)

Summary:
QNNPACK contains assembly files, and CMake tries to build them for wrong architectures in multi-arch builds. This patch has two effects:
- Disables QNNPACK in multi-arch iOS builds
- Specifies a single `IOS_ARCH=arm64` by default (covers most iPhones/iPads on the market)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14125

Differential Revision: D13112366

Pulled By: Maratyszcza

fbshipit-source-id: b369083045b440e41d506667a92e41139c11a971

5 years agoRegister caffe2 layer norm with c10 dispatcher (#13693)
Sebastian Messmer [Sat, 17 Nov 2018 04:10:31 +0000 (20:10 -0800)]
Register caffe2 layer norm with c10 dispatcher (#13693)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13693

We can't directly call the caffe2::Operator class from c10 yet because that class isn't deprotobuffed yet.
Instead, we factor out the kernel into a reusable static method and call it from the caffe2::Operator and
also register it with c10.

Reviewed By: ezyang

Differential Revision: D12912242

fbshipit-source-id: c57502f14cea7a8be281f9787b175bb6e402d00c

5 years agoAdd c10/core/ to cmake build (#14111)
Sebastian Messmer [Sat, 17 Nov 2018 04:10:30 +0000 (20:10 -0800)]
Add c10/core/ to cmake build (#14111)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14111

It was already in TARGETs, but we forgot it in cmake.

Reviewed By: ezyang

Differential Revision: D13105166

fbshipit-source-id: f09549e98ebca751339b5ada1150e00cc4cd9540

5 years agoUpdate atol scale in dnnlowp test (#14135)
Haixin Liu [Sat, 17 Nov 2018 03:08:49 +0000 (19:08 -0800)]
Update atol scale in dnnlowp test (#14135)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14135

Update atol scale of dnnlowp test. Can't reproduce the flaky test error in the task locally even after setting the same seed value, but found according to comments in check_quantized_results_close(), atol_scale should be 1/1.9=0.526315789473684, which is larger than current value 0.51. So increase the atol_scale to 0.53.

Reviewed By: jspark1105

Differential Revision: D13108415

fbshipit-source-id: 1e8840659fdf0092f51b439cf499858795f9706a

5 years agofix sparse_adagrad param_size overflow error (#14049)
Jongsoo Park [Sat, 17 Nov 2018 02:49:08 +0000 (18:49 -0800)]
fix sparse_adagrad param_size overflow error (#14049)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14049

param_size should be passed as int64_t

Reviewed By: hyuen

Differential Revision: D13090511

fbshipit-source-id: 7892d315d7c82c7d7ca103fb36d30cdf1fe24785

5 years agoAdd cost for non-linear ops (#13327)
Haixin Liu [Sat, 17 Nov 2018 02:30:49 +0000 (18:30 -0800)]
Add cost for non-linear ops (#13327)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13327

Add cost inference function to non-linear ops. Since the actual flops of the non-linear operator depends on the implementation, we use the number of non-linear operations as the proxy for the analytical flops for non-linear operators.

Reviewed By: jspark1105

Differential Revision: D10439558

fbshipit-source-id: 9aeb05bac8b5c7ae5d351ebf365e0a81cf4fc227

5 years agoAdd cost into profile observer (#12793)
Haixin Liu [Sat, 17 Nov 2018 02:30:49 +0000 (18:30 -0800)]
Add cost into profile observer (#12793)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12793

Add analytical cost into profile observer. It includes the op level cost information for each op run and net level aggregated cost information for each op type.

It outputs the following information:
1. analytical flops
2. analytical bytes_read
3. analytical bytes_written

Example output at op level:
```I1017 14:58:14.245978 3686541 profile_observer_gpu.cc:26] --------- Starting operator FC op#24 ---------
I1017 14:58:14.246049 3686541 profile_observer_gpu.cc:33] Input 0: Tensor model1/embedded_encoder_inputs of type float. Dims: (17,1,256,):
I1017 14:58:14.246109 3686541 profile_observer_gpu.cc:33] Input 1: Tensor model1/encoder/layer0/fw/milstm/i2h_w of type float. Dims: (2048,256,):
I1017 14:58:14.246176 3686541 profile_observer_gpu.cc:33] Input 2: Tensor model1/encoder/layer0/fw/milstm/i2h_b of type float. Dims: (2048,):
I1017 14:58:14.246217 3686541 profile_observer_gpu.cc:44] Argument 0: name: "use_cudnn" i: 1
I1017 14:58:14.246271 3686541 profile_observer_gpu.cc:44] Argument 1: name: "cudnn_exhaustive_search" i: 0
I1017 14:58:14.246338 3686541 profile_observer_gpu.cc:44] Argument 2: name: "order" s: "NHWC"
I1017 14:58:14.246372 3686541 profile_observer_gpu.cc:44] Argument 3: name: "axis" i: 2
I1017 14:58:14.246418 3686541 profile_observer_gpu.cc:44] Argument 4: name: "quantization_scheme" i: 1
I1017 14:58:14.246470 3686541 profile_observer_gpu.cc:53] Output 0: Tensor model1/encoder/layer0/fw/milstm/i2h of type float. Dims: (17,1,2048,):
I1017 14:58:14.246596 3686541 profile_observer_gpu.cc:61] Cost (flops, bytes_read, bytes_written):
I1017 14:58:14.246649 3686541 profile_observer_gpu.cc:62]        17860608 2122752 139264
I1017 14:58:14.246677 3686541 profile_observer_gpu.cc:64] --------- Finished operator FC in 0.764221 ms ---------
```
Example output at net level:
```
I1017 11:13:44.675585 3146691 profile_observer_gpu.cc:165] ================ Detailed stats for net model0/encoder/layer0/bw/milstm ================
I1017 11:13:44.675662 3146691 profile_observer_gpu.cc:167] Cost (flops, bytes_read, bytes_written) per operator type:
I1017 11:13:44.675706 3146691 profile_observer_gpu.cc:169]        20992000 42045440 81920 FC
I1017 11:13:44.675745 3146691 profile_observer_gpu.cc:169]           20480 163840 81920 Mul
I1017 11:13:44.675824 3146691 profile_observer_gpu.cc:169]           20480 163840 81920 Sum
I1017 11:13:44.675878 3146691 profile_observer_gpu.cc:169]               0 0 0 ElementwiseLinear
I1017 11:13:44.675909 3146691 profile_observer_gpu.cc:169]               0 0 0 LSTMUnit
I1017 11:13:44.675958 3146691 profile_observer_gpu.cc:169]               0 0 0 rnn_internal_apply_link
```

Reviewed By: mdschatz

Differential Revision: D10428917

fbshipit-source-id: 7c100e551bdd3ac8d7c09be12c72d70a2d67cae1

5 years agoCircleCI: fix NCCL install (#14124)
Will Feng [Sat, 17 Nov 2018 02:28:55 +0000 (18:28 -0800)]
CircleCI: fix NCCL install (#14124)

Summary:
The `$BUILD_ENVIRONMENT` checks work in `test.sh` but not `build.sh`, this PR is trying to figure out why.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14124

Reviewed By: teng-li

Differential Revision: D13112483

Pulled By: yf225

fbshipit-source-id: 5f65997586648805cf52217a261389625b5535e1