Yinghai Lu [Thu, 22 Nov 2018 08:28:51 +0000 (00:28 -0800)]
Make sure we bind input/output of Onnxifi op positionally (#14214)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14214
This is to pick up the residual task of T36325466 to make sure that input/output binding of c2 Onnxifi op is positional.
Reviewed By: dzhulgakov
Differential Revision:
D13134470
fbshipit-source-id:
d1b916dade65c79133b86507cd54ea5166fa6810
Wanchao Liang [Thu, 22 Nov 2018 07:42:24 +0000 (23:42 -0800)]
Convert gumbel_softmax, lp pooling weak functions and modules (#14232)
Summary:
1. Support `Optional[BroadcastingList1[int]]` like type annotation to accept a int or a list[int]
2. Convert gumbel_softmax, lp pooling weak functions and modules
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14232
Differential Revision:
D13164506
Pulled By: wanchaol
fbshipit-source-id:
6c2a2b9a0613bfe907dbb5934122656ce2b05700
Sebastian Messmer [Thu, 22 Nov 2018 07:04:43 +0000 (23:04 -0800)]
Use ADL to find toString (#14021)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14021
I'm planning to move at::Scalar to c10, and there's a at::toString(Scalar) defined.
Unfortunately, we call it by specifying at::toString() instead of relying on ADL.
This diff changes that to prepare the actual move.
Reviewed By: ezyang
Differential Revision:
D13015239
fbshipit-source-id:
f2a09f43a96bc5ef20ec2c4c88f7790fd5a04870
Sebastian Messmer [Thu, 22 Nov 2018 07:04:42 +0000 (23:04 -0800)]
Fix include paths for intrusive_ptr (#13692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13692
This now lives in c10/util, not ATen/core anymore.
Reviewed By: ezyang
Differential Revision:
D12937091
fbshipit-source-id:
ea2d420a15e7941a38d0b4c75e20ca18437c73f8
Sebastian Messmer [Thu, 22 Nov 2018 07:04:42 +0000 (23:04 -0800)]
Move intrusive_ptr to c10/util
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13691
Reviewed By: ezyang
Differential Revision:
D12937090
fbshipit-source-id:
fe9d21d5f7ea4e78e7e38ac60db13814a9971ed9
Joel Marcey [Thu, 22 Nov 2018 06:28:20 +0000 (22:28 -0800)]
ignore generated caffe2 docs and virtualenvs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14309
Reviewed By: soumith
Differential Revision:
D13166626
Pulled By: JoelMarcey
fbshipit-source-id:
4f11228d8b5da85cec222bf11282722a7319581b
svcscm [Thu, 22 Nov 2018 05:59:40 +0000 (21:59 -0800)]
Updating submodules
Reviewed By: yns88
fbshipit-source-id:
20976d595e68a08d746d8806fd0205d810656366
Jongsoo Park [Thu, 22 Nov 2018 05:36:16 +0000 (21:36 -0800)]
removing quantization utility functions moved to fbgemm (#14301)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14301
This diff removes quantization utility functions copied to fbgemm
Reviewed By: Maratyszcza
Differential Revision:
D13159299
fbshipit-source-id:
a7f3cd2af0aa241a8578d532a70a157da70d9289
Achal Shah [Thu, 22 Nov 2018 05:00:22 +0000 (21:00 -0800)]
Cuda version comparison with CUDA_VERSION_STRING (#14302)
Summary:
Cuda headers include cuda version in form of major.minor. But when we do find_package(cuda). CUDA_VERSION variable includes patch number as well which fails following condition.
`
if(NOT ${cuda_version_from_header} STREQUAL ${CUDA_VERSION})
`
**For example:**
I have cuda 10.0 installed. My nvcc output looks like this
`Cuda compilation tools, release 10.0, **V10.0.130**
`
If I compile my application with caffe2. It gives me following error:
```
CMake Error at /usr/share/cmake/Caffe2/public/cuda.cmake:59 (message):
FindCUDA says CUDA version is (usually determined by nvcc), but the CUDA
headers say the version is 10.0. This often occurs when you set both
CUDA_HOME and CUDA_NVCC_EXECUTABLE to non-standard locations, without also
setting PATH to point to the correct nvcc. Perhaps, try re-running this
command again with PATH=/usr/local/cuda/bin:$PATH. See above log messages
for more diagnostics, and see
https://github.com/pytorch/pytorch/issues/8092 for more details.
```
**In this case, it got failed because**
cuda_version_from_header = 10.0
CUDA_VERSION = 10.0.130 (Came from NVCC)
`if(NOT ${cuda_version_from_header} STREQUAL ${CUDA_VERSION})
`
**Fix:**
We should compare header version with **major.minor format** which is given by CUDA_VERSION_STRING
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14302
Differential Revision:
D13166485
Pulled By: soumith
fbshipit-source-id:
1b74e756a76c4cc5aa09978f5850f763ed5469b6
svcscm [Thu, 22 Nov 2018 04:51:26 +0000 (20:51 -0800)]
Updating submodules
Reviewed By: yns88
fbshipit-source-id:
ee60b4dddf688608ef80043b1dc336d120a045d0
svcscm [Thu, 22 Nov 2018 04:29:22 +0000 (20:29 -0800)]
Updating submodules
Reviewed By: yns88
fbshipit-source-id:
366c29d09bec53459e2a4890c7fe8d10f45ff5c3
Teng Li [Thu, 22 Nov 2018 02:21:55 +0000 (18:21 -0800)]
Robust NCCL barrier improvement to cover all devices combinations (#14271)
Summary:
This covers the very edgy case when we run the same NCCL process group with multiple GPU combinations instead of the last GPU combination. We always keep track of what GPUs have been used previously in the NCCL process group and barrier() itself will synchronize on each GPU's NCCL stream.
Test covered as well. Tested on 8-GPU machine
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14271
Differential Revision:
D13164993
Pulled By: teng-li
fbshipit-source-id:
81e04352740ea50b5e943369e74cfcba40bb61c1
Michael Suo [Thu, 22 Nov 2018 01:46:46 +0000 (17:46 -0800)]
alias analysis (#14018)
Summary:
First draft of an alias analysis pass. It's a big PR unfortunately; a rough table of contents/suggested order of review:
1. `AliasAnalysis` pass, which traverses the graph and builds an `AliasDb`. The basic strategy is to assign alias information to every value of mutable type (list/tuple/tensor), and use the alias annotations of each node's schema to assign alias info to the outputs based on the alias info the inputs. Nodes that aren't explicitly schematized have hand-written analysis rules.
2. Integration of aliasing information into `moveBefore/AfterTopologicallyValid()`. Basically, we pass in an alias DB when we ask for moveBefore/After. Similar to how we can boil down dependency analysis to "what nodes use this node", we can boil down mutability analysis to "what nodes write to an alias set input/output'd by this node".
3. Integration of alias analysis to optimization passes that need it. Right now, it is `GraphFuser`, `CreateAutodiffSubgraphs`, constant prop, and CSE. Not sure if any others need it.
- Testing; still figuring out the best way to do this.
- Eventually we want to integrate the alias db into the graph, but we shouldn't do that until we can guarantee that the information can stay up to date with mutations.
- Do the same thing `python_printer` did for operators and force people to register alias analyzers if they can't schematize their op.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14018
Differential Revision:
D13144906
Pulled By: suo
fbshipit-source-id:
1bc964f9121a504c237cef6dfeea6b233694de6a
Ilia Cherniavskii [Thu, 22 Nov 2018 01:19:37 +0000 (17:19 -0800)]
Remove extra include
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14206
Reviewed By: dzhulgakov
Differential Revision:
D13131318
fbshipit-source-id:
559b55b8d98cdf6b7d1d3e31237c5473edc5e462
Teng Li [Thu, 22 Nov 2018 00:54:36 +0000 (16:54 -0800)]
Removed redundant allreduce options in DDP (#14208)
Summary:
This somehow is not cleaned up after the C++ migration. Unused and can be removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14208
Differential Revision:
D13132492
Pulled By: teng-li
fbshipit-source-id:
0f05b6368174664ebb2560c037347c8eb45f7c38
David Riazati [Thu, 22 Nov 2018 00:30:43 +0000 (16:30 -0800)]
Add list inequality operator (#14129)
Summary:
This PR adds `aten::neq` for list inequality comparisons and converts
`nll_loss` to weak script
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14129
Differential Revision:
D13123894
Pulled By: driazati
fbshipit-source-id:
8c1edf7c163217ec00eb653f95d196db3998613f
Yinghai Lu [Wed, 21 Nov 2018 23:43:10 +0000 (15:43 -0800)]
Add onnxifi support to SparseLengthsWeightedSum (#14210)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14210
We left `SparseLengthsWeightedSum` as benchmark is not testing it due to fp16 filler issue. It was flushed out by unit tests. Hence we add the support here.
Reviewed By: bddppq
Differential Revision:
D13132320
fbshipit-source-id:
b21c30c185c9e1fbf3980641bc3cdc39e85af2e1
Gu, Jinghui [Wed, 21 Nov 2018 23:42:29 +0000 (15:42 -0800)]
Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim. (#12971)
Summary:
Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12971
Reviewed By: bddppq
Differential Revision:
D12850675
Pulled By: yinghai
fbshipit-source-id:
f1cde163201bd7add53b8475329db1f038a73019
Viswanath Sivakumar [Wed, 21 Nov 2018 21:42:04 +0000 (13:42 -0800)]
IDEEP fallback for ResizeNearest op (#14212)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14212
TSIA
Reviewed By: yinghai
Differential Revision:
D13134134
fbshipit-source-id:
e3c5c9c8756d6e25b213f8dde9d809a44373d7a3
zrphercule [Wed, 21 Nov 2018 21:12:18 +0000 (13:12 -0800)]
Fix ONNX_ATEN mode (#14239)
Summary:
Fix ONNX_ATEN mode by adding it to the validateBlock method.
Before this pr, validateBlock will throw an exception when using this mode.
I will add related test cases for ONNX_ATEN mode in a different pr once this is merged, since we dont have any currently.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14239
Differential Revision:
D13145443
Pulled By: zrphercule
fbshipit-source-id:
60e7942aa126acfe67bdb428ef231ac3066234b1
Pieter Noordhuis [Wed, 21 Nov 2018 19:25:42 +0000 (11:25 -0800)]
Bump gloo (#14281)
Summary:
Includes more robust error handling and timeout support.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14281
Differential Revision:
D13158232
Pulled By: pietern
fbshipit-source-id:
e80432799a020576d5abdcd9a21d66b629479caf
Jongsoo Park [Wed, 21 Nov 2018 17:37:58 +0000 (09:37 -0800)]
fix comment on dnnlowp op arguments (#14265)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14265
Fix comment
Reviewed By: hx89
Differential Revision:
D13152106
fbshipit-source-id:
fbe98906963cbd5cb20a583a737a792fbc38292e
Gregory Chanan [Wed, 21 Nov 2018 17:04:59 +0000 (09:04 -0800)]
native NN wrappers, including with buffers.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14256
Differential Revision:
D13148783
Pulled By: gchanan
fbshipit-source-id:
4b6179033cf1df26061b6731eaaa4e008692e592
Pieter Noordhuis [Wed, 21 Nov 2018 16:43:14 +0000 (08:43 -0800)]
Remove header generated at configuration time (#14244)
Summary:
The build was picking up the empty stub header instead of the generated
one. Because of the large number of include paths we end up passing to
the compiler it is brittle to have both an empty stub file and a
generated file and expect the compiler to pick up the right one.
With the recent change to compile everything from a single CMake run we
can now use native CMake facilities to propagate macros that indicate
backend support. The stanzas target_compile_definitions with the
INTERFACE flag ensure that these macros are set only for downstream
consumers of the c10d target.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14244
Reviewed By: teng-li
Differential Revision:
D13144293
Pulled By: pietern
fbshipit-source-id:
f49324220db689c68c126b159f4f00a8b9bc1252
Zachary DeVito [Wed, 21 Nov 2018 14:36:26 +0000 (06:36 -0800)]
Address jittering issues in python_print (#14064)
Summary:
export - print a method with python_print
import - import a method with import_method
We want to ensure:
export(g) == export(import(export(g)))
That is after after exporting/importing once, the graph will stay exactly
the same. This is less strict that g == import(export(g)) which would
require us to maintain a lot more information about the structure of the
IR and about the names of debug symbols.
This PR addresses this with the following fixes:
* print out double-precision numbers with high enough precision such
that they always parse in the same way
* when creating loop-carried dependencies, sort them
by variable name, ensuring a consistent order
* parse nan correctly
* DCE: remove unused outputs of if statements, and loop-carried dependencies
in loops that are dead both after the loop and inside the body of the
loop.
* Do not set uniqueName for variables whose names are _[0-9]+, these
are probably rare in user code, and we need a way to communicate
that we do not care about a variable name when re-parsing the graph.
Otherwise temporary variable names will jitter around.
* Expand the definition of a constant in printing code to None,
and family.
* Allow re-treeing to work as long as the only thing in its way is a
constant node. These do not have side effects but are sometimes
inserted in a different order when tracing compared to how we print them.
* Print all constant nodes out first in the order in which they are used_val
(or, if they are inlined, ensure they get assigned CONSTANT.cX number
in a consistent order). Cleanup tuples (this is done in the compiler,
but not in the tracer, leading to some tuple indexing jitter if not
done).
* use strtod_l, not std::stod which can throw exceptions
Other:
* Add REL_WITH_DEB_INFO to setup.py. It already existed for the
cmake files. Threading it into setup.py allows us to turn on
debug symbols with optimization everywhere.
* enable round trip testing for all generated graphs. This only adds
~6 seconds to total build time but tests printing for every graph.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14064
Differential Revision:
D13094637
Pulled By: zdevito
fbshipit-source-id:
0a1c6912194d965f15d6b0c6cf838ccc551f161d
svcscm [Wed, 21 Nov 2018 10:16:29 +0000 (02:16 -0800)]
Updating submodules
Reviewed By: cdelahousse
fbshipit-source-id:
27838fb2dad82c78906faf3cc2d124557c30e88f
svcscm [Wed, 21 Nov 2018 08:25:17 +0000 (00:25 -0800)]
Updating submodules
Reviewed By: cdelahousse
fbshipit-source-id:
3c17e12a579245a84e9a56b1d8a1641232150675
Lu Fang [Wed, 21 Nov 2018 07:33:30 +0000 (23:33 -0800)]
Add tensor table in ModelDef and use it for jit script serialization and deserialization (#13861)
Summary:
As we discussed, the tensors in the torch script will be associated with the tensor data in the serialized file. So let's add a table of tensor (actually it's a repeated TensorProto filed) in the ModelDef. TensorProto.name will be the id.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13861
Reviewed By: dzhulgakov
Differential Revision:
D13036940
Pulled By: zrphercule
fbshipit-source-id:
ecb91b062ac4bc26af2a8d6d12c91d5614efd559
Tongzhou Wang [Wed, 21 Nov 2018 07:27:16 +0000 (23:27 -0800)]
c10d Automatically retry on EINTR (#14180)
Summary:
Probably fixes https://github.com/pytorch/pytorch/issues/14170
Actually I probably shouldn't retry all `SYSCHECK` calls. I'll leave to the reviewers to decide.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14180
Reviewed By: pietern
Differential Revision:
D13144741
Pulled By: SsnL
fbshipit-source-id:
d73288f76b18cae14b1b43dad4e5e8d010a96d95
Teng Li [Wed, 21 Nov 2018 05:10:18 +0000 (21:10 -0800)]
Make NCCL backend support barrier op (#14142)
Summary:
This is a feature request from: https://github.com/pytorch/pytorch/issues/13573
As the title says, this PR makes NCCL backend support barrier op.
There are a couple scenarios that need to be addressed:
(1) When there is already a NCCL op happened, we need to record what GPU device(s) the previous op happened and queue the allreduce barrier op on the same GPU device
(2) When there is no NCCL op yet, we will try to use a single GPU and separate each process from a single GPU as the best effort.
As for the async work, during wait, we would like not just wait on the NCCL kernel to be completed, but also block the thread until the current stream and nccl stream return.
`test_distributed` should cover the test. I also manually tested both scenarios.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14142
Differential Revision:
D13113391
Pulled By: teng-li
fbshipit-source-id:
96c33d4d129e2977e6892d85d0fc449424c35499
Yinghai Lu [Wed, 21 Nov 2018 02:00:14 +0000 (18:00 -0800)]
Fix memory leakage in onnxifi transformer (#14245)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14245
tsia
Reviewed By: bddppq, rdzhabarov
Differential Revision:
D13144783
fbshipit-source-id:
5e07bb7ab883ba1af68547a26272cd320967b9e3
David Riazati [Wed, 21 Nov 2018 00:42:00 +0000 (16:42 -0800)]
Allow undefined tensors as constants (#14120)
Summary:
This PR inserts `prim::None` constants for undefined tensors. This comes in the standard library if an `Optional[Tensor]` is statically determined to be `None`:
```python
torch.jit.script
def fn(x=None):
# type: (Optional[Tensor]) -> Tensor
return torch.jit._unwrap_optional(x)
torch.jit.script
def fn2():
# type: () -> Tensor
return fn()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14120
Differential Revision:
D13124625
Pulled By: driazati
fbshipit-source-id:
9eaa82e478c49c503f68ed89d8c770e8273ea569
Wanchao Liang [Tue, 20 Nov 2018 22:09:27 +0000 (14:09 -0800)]
Export BatchNorm functional and module, add necessary JIT support (#14016)
Summary:
This PR did three things:
1. It export the BatchNorm functional and module, and rewrite some of the components to stay align with the current supported JIT features
2. In the process of export, add necessary compiler support for in_place op aug assign
4. change the test_jit behavior in add_module_test to utilize a single rng state during module initialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14016
Differential Revision:
D13112064
Pulled By: wanchaol
fbshipit-source-id:
31e3aee5fbb509673c781e7dbb6d8884cfa55d91
Thomas Viehmann [Tue, 20 Nov 2018 20:43:23 +0000 (12:43 -0800)]
Have PYTORCH_FUSION_DEBUG print C kernel source (#14213)
Summary:
- Move up handling the environment variable from CPU only to all
- Introduce two levels to be enabled with PYTORCH_FUSION_DEBUG=n:
1: print C source
2: print CPU assembly, too (previous effect of PYTORCH_FUSION_DEBUG)
apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14213
Differential Revision:
D13135393
Pulled By: soumith
fbshipit-source-id:
befa4ebea3b3c97e471393a9f6402b93a6b24031
Tugrul Ates [Tue, 20 Nov 2018 20:23:14 +0000 (12:23 -0800)]
Delete backwards compatibility StorageImpl.h and TensorImpl.h (#14230)
Summary:
Since they directly include the real ones in core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14230
Differential Revision:
D13140323
Pulled By: tugrulates
fbshipit-source-id:
d7e3b94e891b2d7fa273d01c0b7edfebdbd7e368
Jongsoo Park [Tue, 20 Nov 2018 08:53:29 +0000 (00:53 -0800)]
remove unused parameters from caffe2_dnnlowp_utils.cc (#14164)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14164
See title
Reviewed By: csummersea
Differential Revision:
D13115470
fbshipit-source-id:
d754f558cd06e5f4c1cd00315e912cdb7b50731a
Jongsoo Park [Tue, 20 Nov 2018 08:53:29 +0000 (00:53 -0800)]
use pragma once (#14163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14163
Some of the names we were using to guard the header file was too short (e.g. DYNAMIC_HISTOGRAM_H).
Reviewed By: csummersea
Differential Revision:
D13115451
fbshipit-source-id:
cef8c84c62922616ceea17effff7bdf8d67302a2
Jongsoo Park [Tue, 20 Nov 2018 08:53:29 +0000 (00:53 -0800)]
format python files (#14161)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14161
Formatting using Nuclide
Reviewed By: hx89
Differential Revision:
D13115348
fbshipit-source-id:
7432ce6072a1822d7287b4ebcfcb6309282e15ac
Jongsoo Park [Tue, 20 Nov 2018 08:53:29 +0000 (00:53 -0800)]
clang-format (#14160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14160
clang-format of C++ files
Reviewed By: hx89
Differential Revision:
D13115201
fbshipit-source-id:
d2ad65f66209e00578ef90f87f41272de2d24aa9
Hui Wu [Tue, 20 Nov 2018 06:54:19 +0000 (22:54 -0800)]
Add sigmoid op based on MKL-DNN
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13097
Differential Revision:
D13105366
Pulled By: yinghai
fbshipit-source-id:
d156e8fd519baeecf61c25dcd8fa2c2fa7351ef4
Daya S Khudia [Tue, 20 Nov 2018 06:45:00 +0000 (22:45 -0800)]
OSS build fix (#14192)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14192
We can only use C10_* in OSS. The build is only broken if built with USE_FBGEMM=ON
Reviewed By: jianyuh
Differential Revision:
D13121781
fbshipit-source-id:
f0ee9a75997766e63e1da8a53de7ddb98296a171
Lu Fang [Tue, 20 Nov 2018 06:12:16 +0000 (22:12 -0800)]
Make EncodeMethod in jit script serialization return a string (#14167)
Summary:
Nit
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14167
Reviewed By: ezyang
Differential Revision:
D13116584
Pulled By: dzhulgakov
fbshipit-source-id:
c0e7e71a81004031564bd2fc59f393041e1283d5
Jongsoo Park [Tue, 20 Nov 2018 05:44:29 +0000 (21:44 -0800)]
Create README.md of caffe2/quantization/server
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14217
Reviewed By: csummersea
Differential Revision:
D13135086
Pulled By: jspark1105
fbshipit-source-id:
bddf4f1c2dc5ec8ea6ebe9e265956f367e082d52
Will Feng [Tue, 20 Nov 2018 05:28:29 +0000 (21:28 -0800)]
CircleCI: fix NCCL install (#14172)
Summary:
The `$BUILD_ENVIRONMENT` checks work in `test.sh` but not `build.sh`, this PR fixes the issue.
This replaces https://github.com/pytorch/pytorch/pull/14124.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14172
Differential Revision:
D13135087
Pulled By: yf225
fbshipit-source-id:
42fff3926734778713d483d74ba0a89e5502dd9e
zrphercule [Tue, 20 Nov 2018 02:43:58 +0000 (18:43 -0800)]
Fix a bug in test case of onnx::If
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14209
Differential Revision:
D13132607
Pulled By: zrphercule
fbshipit-source-id:
b7f7ccc6a6cbdeb57a7f88a1971d15dd81e6fc81
Teng Li [Tue, 20 Nov 2018 02:25:00 +0000 (18:25 -0800)]
Tensor type checking and informative error messages for torch.distributed (#14204)
Summary:
This will address https://github.com/pytorch/pytorch/issues/13574
This error message should be more informative to the user for all the non-multiGPU ops, since we python binding to multi-gpu ops always.
test_distributed should cover all. Also tested both RunTime errors.
```
>>> a = torch.ByteTensor([])
>>> b = [a, a]
>>> dist.all_reduce(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 809, in all_reduce
_check_single_tensor(tensor, "tensor")
File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 207, in _check_single_tensor
"to be a torch.Tensor type".format(param_name))
RuntimeError: Invalid function argument. Expecting parameter: tensor to be a torch.Tensor type
>>> b = ["b"]
>>> dist.all_gather(b, a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 1006, in all_gather
_check_tensor_list(tensor_list, "tensor_list")
File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 225, in _check_tensor_list
"to be a List[torch.Tensor] type".format(param_name))
RuntimeError: Invalid function argument. Expecting parameter: tensor_list to be a List[torch.Tensor] type
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14204
Differential Revision:
D13131526
Pulled By: teng-li
fbshipit-source-id:
bca3d881e41044a013a6b90fa187e722b9dd45f2
Edward Yang [Tue, 20 Nov 2018 01:01:34 +0000 (17:01 -0800)]
Move stream functions from CUDAContext to CUDAStream (#14110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14110
I'm planning to move CUDAStream to c10/cuda, without also moving
CUDAContext, and so it's most convenient if these definitions
are in the actual header file in question.
Reviewed By: smessmer
Differential Revision:
D13104693
fbshipit-source-id:
23ce492003091adadaa5ca6a17124213005046c2
Edward Yang [Tue, 20 Nov 2018 01:01:34 +0000 (17:01 -0800)]
Move CUDAStreamInternals inside detail namespace. (#14109)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14109
Previously it was at the top level, because the author was under
the impression that you could only refer to top-level C++ names
from C, but this is not true; you just need to make a stub struct
conditioned on __cplusplus.
Reviewed By: smessmer
Differential Revision:
D13104694
fbshipit-source-id:
ecb7ae6dcfa4ab4e062aad7a886937dca15fd1b2
Edward Yang [Tue, 20 Nov 2018 01:01:33 +0000 (17:01 -0800)]
Delete dependencies from CUDAStream; remove synchronize_with (#13920)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13920
I want to move CUDAStream and CUDAGuard to c10_cuda without also
bringing along CUDAContext or CUDAEvent for the ride (at least for
now). To do this, I need to eliminate those dependencies.
There's a few functions in CUDAContext.h which don't really need
THCState, so they're separated out and put in general
purpose c10/cuda/CUDAFunctions.h
Reviewed By: smessmer
Differential Revision:
D13047468
fbshipit-source-id:
7ed9d5e660f95805ab39d7af25892327edae050e
Yavuz Yetim [Mon, 19 Nov 2018 23:57:28 +0000 (15:57 -0800)]
Fix race in AtomicFetchAdd. (#13479)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13479
Increases the lock scope to above Output() calls.
These calls potentially allocate the underlying blob/tensor
objects and multiple invocations race each other over the
same output blobs/tensors.
Reviewed By: bwasti
Differential Revision:
D12891629
fbshipit-source-id:
a6015cfdb08e352521a1f062eb9d94a971cfbdb0
Sebastian Messmer [Mon, 19 Nov 2018 23:35:18 +0000 (15:35 -0800)]
Remove API macros from intrusive_ptr (#14137)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14137
This is a templated header-only class and shouldn't need export/import macros.
Reviewed By: ezyang
Differential Revision:
D13111712
fbshipit-source-id:
c8c958e75b090d011d25156af22f37f9ca605196
Jerry Zhang [Mon, 19 Nov 2018 23:29:45 +0000 (15:29 -0800)]
Tensor construction: combine Resize+mutable_data - 1/4 (#13942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13942
Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407
Reviewed By: smessmer
Differential Revision:
D13054770
fbshipit-source-id:
a9e86e5dfcb4f7cebf5243e1d359fad064561bed
Jerry Zhang [Mon, 19 Nov 2018 23:25:43 +0000 (15:25 -0800)]
Tensor construction: combine Resize+mutable_data - 3/4 (#13944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13944
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13854
Codemod generated with clangr shard mode, 25 files per diff,
motivation: https://github.com/pytorch/pytorch/pull/12407
Reviewed By: ezyang
Differential Revision:
D13054836
fbshipit-source-id:
5de07a156687f1ee607d0450410881d9176a87a7
Lu Fang [Mon, 19 Nov 2018 22:29:31 +0000 (14:29 -0800)]
Store the optimize flag in module (#14166)
Summary:
When the save/load of script module, we store optimize flag in module instead of encoding it in method.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14166
Reviewed By: ezyang
Differential Revision:
D13117577
Pulled By: dzhulgakov
fbshipit-source-id:
dc322948bda0ac5809d8ef9a345497ebb8f33a61
Junjie Bai [Mon, 19 Nov 2018 22:21:20 +0000 (14:21 -0800)]
Cleanup caffe2 hipify exclude patterns (#14198)
Summary:
depthwise_3x3_conv_op.cu does not exist
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14198
Differential Revision:
D13127479
Pulled By: bddppq
fbshipit-source-id:
ec6bd434055a49ea405c4b399bde8c074114f955
Gregory Chanan [Mon, 19 Nov 2018 22:10:47 +0000 (14:10 -0800)]
Support 'python_module' of 'nn' in native functions. (#14126)
Summary:
Also move mse_loss, binary_cross_entropy, l1_loss to use this functionality.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14126
Reviewed By: ezyang
Differential Revision:
D13109975
Pulled By: gchanan
fbshipit-source-id:
0b29dc8cf222d25db14da7532d8dc096a988a0ec
Junjie Bai [Mon, 19 Nov 2018 21:25:32 +0000 (13:25 -0800)]
Use onnx proto_utils to support using protobuf-lite
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14150
Differential Revision:
D13115586
Pulled By: bddppq
fbshipit-source-id:
d6b6935a8deac60f6f58d62a71f6840182a72a51
Daya S Khudia [Mon, 19 Nov 2018 20:08:35 +0000 (12:08 -0800)]
Use fbgemm revision file added by shipit (#14105)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14105
Pull Request resolved: https://github.com/facebook/fbshipit/pull/62
Use fbgemm revision file created by ShipIt for updating fbgemm revision for pytorch. We don't have to manually update submodule now.
Reviewed By: yns88
Differential Revision:
D13072074
fbshipit-source-id:
bef9eabad50f7140179c370a60bd9ca73067b9b5
Your Name [Mon, 19 Nov 2018 19:26:38 +0000 (11:26 -0800)]
Setup sccache for PyTorch ROCm CI (#14153)
Summary:
Discovered huge build time difference between caffe2 rocm build and pytorch rocm build (6min vs. 30min), turns out it's because the sccache setup needed in caffe2 docker images are not n pytorch build script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14153
Differential Revision:
D13115097
Pulled By: bddppq
fbshipit-source-id:
88414f164b980f0e667c8e138479b4a75ab7692e
Ailing Zhang [Mon, 19 Nov 2018 17:45:28 +0000 (09:45 -0800)]
allow empty index for scatter_* methods (#14077)
Summary:
Fixes #2027
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14077
Differential Revision:
D13095788
Pulled By: ailzhang
fbshipit-source-id:
ad2c8bbf83d36e07940782b9206fbdcde8905fd3
ArmenAg [Mon, 19 Nov 2018 17:18:45 +0000 (09:18 -0800)]
use at::Device throughout JIT (#14181)
Summary:
zdevito soumith
Sorry about the previous PR, had some git issues. This is the same exact code as the previous PR but updated w.r.t pytorch/master.
fixes #13254
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14181
Differential Revision:
D13117688
Pulled By: soumith
fbshipit-source-id:
044840b2c7a0101ef43dd16655fd9a0f9981f53f
Gregory Chanan [Mon, 19 Nov 2018 16:18:47 +0000 (08:18 -0800)]
Support named return arguments in native_functions. (#14100)
Summary:
Note there was a hacky way of doing this before by specifying "return:" lists manually; this makes the
return names part of the function declaration itself.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14100
Differential Revision:
D13101810
Pulled By: gchanan
fbshipit-source-id:
1c80574cd4e8263764fc65126427b122fe36df35
Edward Yang [Mon, 19 Nov 2018 16:13:08 +0000 (08:13 -0800)]
Split out CUDAMultiStreamGuard from CUDAGuard (#13912)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13912
The implementation and API of CUDAMultiStreamGuard is less mature,
and it cannot be implemented generically (yet) in c10_cuda. This
might be a reasonable thing to do eventually, but not for now.
Reviewed By: smessmer
Differential Revision:
D13046500
fbshipit-source-id:
4ea39ca1344f1ad5ae7c82c98617aa348c327848
Edward Yang [Mon, 19 Nov 2018 16:13:08 +0000 (08:13 -0800)]
Move AT_CUDA_CHECK to c10
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13910
Reviewed By: smessmer
Differential Revision:
D13046201
fbshipit-source-id:
8d360a0e4d6c2edf070d130e600c6b04f0ee0058
Edward Yang [Mon, 19 Nov 2018 16:13:07 +0000 (08:13 -0800)]
Add c10 cuda library. (#13900)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13900
Add c10 cuda library.
Right now, this is not used by anything, and only tests if the CUDA
headers are available (and not, e.g., that linking works.)
Extra changes:
- cmake/public/cuda.cmake now is correctly include guarded, so you
can include it multiple times without trouble.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Reviewed By: smessmer
Differential Revision:
D13025313
fbshipit-source-id:
fda85b4c35783ffb48ddd6bbb98dbd9154119d86
Marat Dukhan [Mon, 19 Nov 2018 07:55:01 +0000 (23:55 -0800)]
Switch Int8Add operator to QNNPACK (#14089)
Summary:
- Improved single-threaded performance due to optimized low-level micro-kernels
- Improved parallelization (previously was parallelized across images in a batch and pixels only, now within channels as well)
- Slightly different result due to different implementation of fixed-point arithmetics (no accuracy loss expected)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14089
Differential Revision:
D13110135
Pulled By: Maratyszcza
fbshipit-source-id:
1f149394af5c16940f79a3fd36e183bba1be2497
Teng Li [Sun, 18 Nov 2018 21:51:15 +0000 (13:51 -0800)]
No more -werror for c10d (#14155)
Summary:
As the title says
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14155
Differential Revision:
D13115769
Pulled By: teng-li
fbshipit-source-id:
278deba090364544d92fa603621604ce37fa974e
Summer Deng [Sun, 18 Nov 2018 20:49:39 +0000 (12:49 -0800)]
Add ultra low precision options (#14133)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14133
Experiment with ultra low precisions on the Resnext-101 URU trunk model
Reviewed By: jspark1105
Differential Revision:
D10108518
fbshipit-source-id:
f04d74fbe1c9e75efafcd9845719bdb2efbbfe9c
Soumith Chintala [Sun, 18 Nov 2018 17:20:29 +0000 (09:20 -0800)]
Adds symbolic diff for THNN Conv2d and aten native BatchNorm (#13888)
Summary:
Adds symbolic diff and tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13888
Differential Revision:
D13115548
Pulled By: soumith
fbshipit-source-id:
ba75b01a95a5715a7761724dda018168b6188917
Your Name [Sun, 18 Nov 2018 08:09:25 +0000 (00:09 -0800)]
Print warning when ROCm memory leaking is detected in pytorch tests (#14151)
Summary:
We keep seeing random failures in CI because of ROCm memory leaking, e.g:
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/3102//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/3080//console
To make the CI more stable, turn it to warning instead of failure.
iotamudelta please help investigating the memory leaking
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14151
Differential Revision:
D13115096
Pulled By: bddppq
fbshipit-source-id:
a13b68274ecba363d9d8436aa6a62ac40a77d78c
vishwakftw [Sun, 18 Nov 2018 06:25:39 +0000 (22:25 -0800)]
Remove debugging code in test_cholesky_batched (#14156)
Summary:
They didn't turn up in my tests because I use pytest which doesn't
print debug statements if the tests pass
Differential Revision:
D13115227
Pulled By: soumith
fbshipit-source-id:
46a7d47da7412d6b071158a23ab21e7fb0c6e11b
Jerry Zhang [Sun, 18 Nov 2018 03:42:42 +0000 (19:42 -0800)]
Back out "[reland][codemod][caffe2] Tensor construction: combine Resize+mutable_data - 2/4" (#14154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14154
Original commit changeset:
e89c2e692178
Reviewed By: amateurcoffee
Differential Revision:
D13115023
fbshipit-source-id:
8f9fb55842ae6c8139d5cd88ec6d0abb0c5cc5e7
Martin Schatz [Sun, 18 Nov 2018 01:26:09 +0000 (17:26 -0800)]
CostInference for 1D conv (#14009)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14009
As title
Reviewed By: yinghai
Differential Revision:
D13078718
fbshipit-source-id:
081e7b13ad6741c635ef413915b555f10f93bd33
vishwakftw [Sat, 17 Nov 2018 18:47:17 +0000 (10:47 -0800)]
Batched cholesky decomposition (#14017)
Summary:
Implements batching for the Cholesky decomposition.
Performance could be improved with a dedicated batched `tril` and `triu` op, which is also impeding autograd operations.
Changes made:
- batching code
- tests in `test_torch.py`, `test_cuda.py` and `test_autograd.py`.
- doc string modification
- autograd modification
- removal of `_batch_potrf` in `MultivariateNormal`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14017
Differential Revision:
D13087945
Pulled By: ezyang
fbshipit-source-id:
2386db887140295475ffc247742d5e9562a42f6e
Jongsoo Park [Sat, 17 Nov 2018 18:26:56 +0000 (10:26 -0800)]
remove unnecessary file from avx2 list (#14012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14012
conv_dnnlowp_op.cc doesn't need avx2 anymore.
Reviewed By: dskhudia
Differential Revision:
D13079665
fbshipit-source-id:
dbfe8d2213de4969b6334d54de81d51149268cbd
Your Name [Sat, 17 Nov 2018 17:22:09 +0000 (09:22 -0800)]
Change from using enum to int to store data_type
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14140
Differential Revision:
D13112937
Pulled By: bddppq
fbshipit-source-id:
124d9546bfbd1f9c207a21e40eb3646f7739bd58
Junjie Bai [Sat, 17 Nov 2018 08:20:44 +0000 (00:20 -0800)]
Revert "CircleCI: fix NCCL install (#14124)" (#14146)
Summary:
This reverts commit
a1fa9d8cf9b2b0e7373ec420c2487d4dfd0e587c.
[pytorch_linux_trusty_py2_7_9_build](https://circleci.com/gh/pytorch/pytorch/270206?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link/console):
```
Nov 17 07:37:27 + sudo apt-get -qq update
Nov 17 07:37:30 W: Ignoring Provides line with DepCompareOp for package gdb-minimal
Nov 17 07:37:30 W: You may want to run apt-get update to correct these problems
Nov 17 07:37:30 + sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev
Nov 17 07:37:30 E: Command line option --allow-downgrades is not understood
Nov 17 07:37:30 + cleanup
Nov 17 07:37:30 + retcode=100
Nov 17 07:37:30 + set +x
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14146
Differential Revision:
D13113912
Pulled By: bddppq
fbshipit-source-id:
cd9d371cf72159f03d12a8b56ed5bd2060ebbe59
Junjie Bai [Sat, 17 Nov 2018 07:26:12 +0000 (23:26 -0800)]
Revert
D10428917: [Caffe2] Add cost into profile observer
Differential Revision:
D10428917
Original commit changeset:
7c100e551bdd
fbshipit-source-id:
5164d9ba61cc103eccfdeb91a5cc140cea31a819
Junjie Bai [Sat, 17 Nov 2018 07:26:12 +0000 (23:26 -0800)]
Revert
D10439558: Add cost for non-linear ops
Differential Revision:
D10439558
Original commit changeset:
9aeb05bac8b5
fbshipit-source-id:
f00977b4f95bdd500d254eb44fb5b0c816506ee4
Marat Dukhan [Sat, 17 Nov 2018 05:57:42 +0000 (21:57 -0800)]
Update FXdiv submodule (#14128)
Summary:
Use the most recent version that disables inline assembly.
I suspect inline assembly causes miscompilation on some versions of gcc7.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14128
Reviewed By: bddppq
Differential Revision:
D13112370
Pulled By: Maratyszcza
fbshipit-source-id:
36cc95dc51390a293b72c18ae982c3a515a11981
Marat Dukhan [Sat, 17 Nov 2018 05:21:40 +0000 (21:21 -0800)]
Rename neon2sse.h to NEON_2_SSE.h to match upstream repo
Summary:
- NEON2SSE is a header that implements NEON intrinsics on top fo SSE intrinsics
- Upstream repo provides NEON_2_SSE.h header, but internally it was imported as neon2sse.h
- This patch fix incompatibilities between internal and upstream versions
Reviewed By: hlu1
Differential Revision:
D13096755
fbshipit-source-id:
65e1df9a2a5e74bd52c9aee9be27469ba938cd8c
Marat Dukhan [Sat, 17 Nov 2018 05:02:37 +0000 (21:02 -0800)]
Disable QNNPACK for multi-architecture iOS builds (#14125)
Summary:
QNNPACK contains assembly files, and CMake tries to build them for wrong architectures in multi-arch builds. This patch has two effects:
- Disables QNNPACK in multi-arch iOS builds
- Specifies a single `IOS_ARCH=arm64` by default (covers most iPhones/iPads on the market)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14125
Differential Revision:
D13112366
Pulled By: Maratyszcza
fbshipit-source-id:
b369083045b440e41d506667a92e41139c11a971
Sebastian Messmer [Sat, 17 Nov 2018 04:10:31 +0000 (20:10 -0800)]
Register caffe2 layer norm with c10 dispatcher (#13693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13693
We can't directly call the caffe2::Operator class from c10 yet because that class isn't deprotobuffed yet.
Instead, we factor out the kernel into a reusable static method and call it from the caffe2::Operator and
also register it with c10.
Reviewed By: ezyang
Differential Revision:
D12912242
fbshipit-source-id:
c57502f14cea7a8be281f9787b175bb6e402d00c
Sebastian Messmer [Sat, 17 Nov 2018 04:10:30 +0000 (20:10 -0800)]
Add c10/core/ to cmake build (#14111)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14111
It was already in TARGETs, but we forgot it in cmake.
Reviewed By: ezyang
Differential Revision:
D13105166
fbshipit-source-id:
f09549e98ebca751339b5ada1150e00cc4cd9540
Haixin Liu [Sat, 17 Nov 2018 03:08:49 +0000 (19:08 -0800)]
Update atol scale in dnnlowp test (#14135)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14135
Update atol scale of dnnlowp test. Can't reproduce the flaky test error in the task locally even after setting the same seed value, but found according to comments in check_quantized_results_close(), atol_scale should be 1/1.9=0.
526315789473684, which is larger than current value 0.51. So increase the atol_scale to 0.53.
Reviewed By: jspark1105
Differential Revision:
D13108415
fbshipit-source-id:
1e8840659fdf0092f51b439cf499858795f9706a
Jongsoo Park [Sat, 17 Nov 2018 02:49:08 +0000 (18:49 -0800)]
fix sparse_adagrad param_size overflow error (#14049)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14049
param_size should be passed as int64_t
Reviewed By: hyuen
Differential Revision:
D13090511
fbshipit-source-id:
7892d315d7c82c7d7ca103fb36d30cdf1fe24785
Haixin Liu [Sat, 17 Nov 2018 02:30:49 +0000 (18:30 -0800)]
Add cost for non-linear ops (#13327)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13327
Add cost inference function to non-linear ops. Since the actual flops of the non-linear operator depends on the implementation, we use the number of non-linear operations as the proxy for the analytical flops for non-linear operators.
Reviewed By: jspark1105
Differential Revision:
D10439558
fbshipit-source-id:
9aeb05bac8b5c7ae5d351ebf365e0a81cf4fc227
Haixin Liu [Sat, 17 Nov 2018 02:30:49 +0000 (18:30 -0800)]
Add cost into profile observer (#12793)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12793
Add analytical cost into profile observer. It includes the op level cost information for each op run and net level aggregated cost information for each op type.
It outputs the following information:
1. analytical flops
2. analytical bytes_read
3. analytical bytes_written
Example output at op level:
```I1017 14:58:14.245978 3686541 profile_observer_gpu.cc:26] --------- Starting operator FC op#24 ---------
I1017 14:58:14.246049 3686541 profile_observer_gpu.cc:33] Input 0: Tensor model1/embedded_encoder_inputs of type float. Dims: (17,1,256,):
I1017 14:58:14.246109 3686541 profile_observer_gpu.cc:33] Input 1: Tensor model1/encoder/layer0/fw/milstm/i2h_w of type float. Dims: (2048,256,):
I1017 14:58:14.246176 3686541 profile_observer_gpu.cc:33] Input 2: Tensor model1/encoder/layer0/fw/milstm/i2h_b of type float. Dims: (2048,):
I1017 14:58:14.246217 3686541 profile_observer_gpu.cc:44] Argument 0: name: "use_cudnn" i: 1
I1017 14:58:14.246271 3686541 profile_observer_gpu.cc:44] Argument 1: name: "cudnn_exhaustive_search" i: 0
I1017 14:58:14.246338 3686541 profile_observer_gpu.cc:44] Argument 2: name: "order" s: "NHWC"
I1017 14:58:14.246372 3686541 profile_observer_gpu.cc:44] Argument 3: name: "axis" i: 2
I1017 14:58:14.246418 3686541 profile_observer_gpu.cc:44] Argument 4: name: "quantization_scheme" i: 1
I1017 14:58:14.246470 3686541 profile_observer_gpu.cc:53] Output 0: Tensor model1/encoder/layer0/fw/milstm/i2h of type float. Dims: (17,1,2048,):
I1017 14:58:14.246596 3686541 profile_observer_gpu.cc:61] Cost (flops, bytes_read, bytes_written):
I1017 14:58:14.246649 3686541 profile_observer_gpu.cc:62]
17860608 2122752 139264
I1017 14:58:14.246677 3686541 profile_observer_gpu.cc:64] --------- Finished operator FC in 0.764221 ms ---------
```
Example output at net level:
```
I1017 11:13:44.675585 3146691 profile_observer_gpu.cc:165] ================ Detailed stats for net model0/encoder/layer0/bw/milstm ================
I1017 11:13:44.675662 3146691 profile_observer_gpu.cc:167] Cost (flops, bytes_read, bytes_written) per operator type:
I1017 11:13:44.675706 3146691 profile_observer_gpu.cc:169]
20992000 42045440 81920 FC
I1017 11:13:44.675745 3146691 profile_observer_gpu.cc:169] 20480 163840 81920 Mul
I1017 11:13:44.675824 3146691 profile_observer_gpu.cc:169] 20480 163840 81920 Sum
I1017 11:13:44.675878 3146691 profile_observer_gpu.cc:169] 0 0 0 ElementwiseLinear
I1017 11:13:44.675909 3146691 profile_observer_gpu.cc:169] 0 0 0 LSTMUnit
I1017 11:13:44.675958 3146691 profile_observer_gpu.cc:169] 0 0 0 rnn_internal_apply_link
```
Reviewed By: mdschatz
Differential Revision:
D10428917
fbshipit-source-id:
7c100e551bdd3ac8d7c09be12c72d70a2d67cae1
Will Feng [Sat, 17 Nov 2018 02:28:55 +0000 (18:28 -0800)]
CircleCI: fix NCCL install (#14124)
Summary:
The `$BUILD_ENVIRONMENT` checks work in `test.sh` but not `build.sh`, this PR is trying to figure out why.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14124
Reviewed By: teng-li
Differential Revision:
D13112483
Pulled By: yf225
fbshipit-source-id:
5f65997586648805cf52217a261389625b5535e1
Teng Li [Sat, 17 Nov 2018 02:02:13 +0000 (18:02 -0800)]
Fixed MPI build with higher version of GCC (#14122)
Summary:
This appears as I enabled -Werror in c10d build. Good to catch this and fix it.
Should fix https://github.com/pytorch/pytorch/issues/14078 and https://github.com/pytorch/pytorch/issues/13962
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14122
Differential Revision:
D13110678
Pulled By: teng-li
fbshipit-source-id:
f4c19e16976d65debbd33ed59e17ddbaa19f765a
Teng Li [Sat, 17 Nov 2018 01:49:56 +0000 (17:49 -0800)]
multiprocessing.spawn python version check (#14039)
Summary:
This will be super helpful to the user
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14039
Differential Revision:
D13089200
Pulled By: teng-li
fbshipit-source-id:
29e7507bd8fe5a0c58a85c52f976bfca282b4c1b
Gregory Chanan [Sat, 17 Nov 2018 00:47:00 +0000 (16:47 -0800)]
Don't python bind _thnn_ functions. (#14101)
Summary:
This is needed for moving nn functions to native functions, but since some functions are already named
this way, I'm going to stop binding pre-emptively so we can check if there are any current dependencies.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14101
Differential Revision:
D13102219
Pulled By: gchanan
fbshipit-source-id:
6bbcca33a03ab1bf648f1b73cadfe84339fa3050
Peter Goldsborough [Fri, 16 Nov 2018 22:53:19 +0000 (14:53 -0800)]
Fix docs/cpp/requirements.txt (#14121)
Summary:
soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14121
Differential Revision:
D13108063
Pulled By: goldsborough
fbshipit-source-id:
35cf65ba776e8826c5cab7ae6d3a2d446f87e7cc
Thomas Viehmann [Fri, 16 Nov 2018 21:59:31 +0000 (13:59 -0800)]
Allow cooperative structured objects to be passed modules in tracing (#13961)
Summary:
Before this patch, the JIT does not allow Module's forward to take
structured objects.
This patch allows cooperative objects to do so.
Cooperative means:
- It has a method self._jit_unwrap() that returns (a list/tuple of)
tensors. These are then used in _iter_tensors.
- It has a method self._jit_wrap(flattened_input) that takes
(a list/tuple?) the flattened_unput (potentially more than it needs)
and returns itself (updated) and the unconsumed flattened_inputs.
This is then used in the _unflatten mechanism.
This is all it takes to permit maskrcnn-benchmark to use
its structured BoxList/ImageList types and trace it without calling
the .forward directly.
I'll push a model working with this patch in
https://github.com/facebookresearch/maskrcnn-benchmark/pull/138
I must admit I haven't fully checked whether there are ONNX changes needed before it, too, can profit, but I would be hopeful that anything currently usable remains so.
fmassa zdevito
So the main downside that I'm aware of is that people will later want to use more elaborate mechanisms, but I think this could be done by just amending what wrap/unwrap are returning / consuming.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13961
Differential Revision:
D13103927
Pulled By: soumith
fbshipit-source-id:
2cbc724cc4b53197388b662f75d9e601a495c087
Peter Goldsborough [Fri, 16 Nov 2018 21:01:25 +0000 (13:01 -0800)]
Add SharedDataset (#13800)
Summary:
This PR adds a `SharedDataset` to the C++ frontend data API, which allows wrapping a shared_ptr to a dataset into a class that conforms to the `Dataset` interface (with `get_batch`). This enables use cases where a custom dataset is (1) thread-safe and (2) expensive to copy. All workers will reference a single instance of this dataset. No additional copies are incurred.
jaliyae apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13800
Differential Revision:
D13075610
Pulled By: goldsborough
fbshipit-source-id:
4ffdfd7959d49b042c0e254110085f62a0bfeb6c
jjsjann123 [Fri, 16 Nov 2018 20:59:01 +0000 (12:59 -0800)]
remove dynamic initialization warning (#13913) (#13967)
Summary:
removed assignment in default constructor.
removed static shared memory and used dynamic shared memory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13967
Differential Revision:
D13089996
Pulled By: soumith
fbshipit-source-id:
2a218b909c849bed39636b45a02d10ebc279a0b0
Peter Goldsborough [Fri, 16 Nov 2018 20:12:01 +0000 (12:12 -0800)]
Missing .decode() after check_output in cpp_extensions (#13935)
Summary:
soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13935
Differential Revision:
D13090852
Pulled By: goldsborough
fbshipit-source-id:
47da269d074fd1e7220e90580692d6ee489ec78b
ArutyunovG [Fri, 16 Nov 2018 20:06:21 +0000 (12:06 -0800)]
Windows shared build (#13550)
Summary:
Hi guys,
I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios.
This is the first pull request.
Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015.
CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system.
Python is 3.5, Detectron works from python interface as well.
It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built.
What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat.
After this pull request the next step is to add Visual Studio 2017 support in the script.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550
Reviewed By: ezyang
Differential Revision:
D13042597
Pulled By: orionr
fbshipit-source-id:
f313f909f599cd582a1d000eff766eef3a9fc4fc
Freddie Mendoza [Fri, 16 Nov 2018 20:05:27 +0000 (12:05 -0800)]
Make JOIN_TIMEOUT longer for ppc64le (#14107)
Summary:
This should resolve the issue on ppc64le getting FAIL: test_proper_exit (__main__.TestDataLoader). This only happens when the CI build machine is very busy and fails with a timeout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14107
Differential Revision:
D13103859
Pulled By: soumith
fbshipit-source-id:
268be80b59840853c5025f3211af272f68608fe5
Ilia Cherniavskii [Fri, 16 Nov 2018 20:01:01 +0000 (12:01 -0800)]
Log error from the net's run (#14035)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14035
Log error meesage in case of net's run failure
Reviewed By: andrewwdye
Differential Revision:
D13085431
fbshipit-source-id:
d79f76782410cd3a5bd2d8d7f5fb1e535d821051