HE, Tao [Sun, 10 Mar 2019 16:21:13 +0000 (09:21 -0700)]
When openblas exists, "OpenBLAS_FOUND" is defined, rather than "OPENBLAS_FOUND". (#17841)
Summary:
See https://github.com/pytorch/pytorch/blob/master/cmake/Modules/FindOpenBLAS.cmake#L36
This typo lead to cmake fails to detect openblas on ubuntu.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17841
Differential Revision:
D14400261
Pulled By: soumith
fbshipit-source-id:
287e019e122230cf6b70ab1ea94e5c514f429c88
bhushan [Sun, 10 Mar 2019 16:20:30 +0000 (09:20 -0700)]
Passing indices as a list to Subset instead of Tensor (#17649)
Summary:
Indices in Subset were stored as tensors earlier
passing as list in random_split to ensure integer indexing
fixes: #17466
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17649
Differential Revision:
D14400250
Pulled By: soumith
fbshipit-source-id:
cd20a959f33773c4babf8e861ea37ec61c2713a0
James Reed [Sun, 10 Mar 2019 07:10:26 +0000 (23:10 -0800)]
Clarify JIT docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17846
Differential Revision:
D14400363
Pulled By: jamesr66a
fbshipit-source-id:
862316b5fd95526b6edebeca19d2cc522779df11
Pritam Damania [Sun, 10 Mar 2019 05:31:42 +0000 (21:31 -0800)]
Add metadata for torch jit TracedModules. (#17640)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17640
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17311
I've extended our model metadata framework in this diff to support
traced modules as well. Re-used a lot of components from the previous
implementation of ScriptModule metadata.
Tracing is a little different from Scripting since you can't just create a
subclass of TopLevelTraceModule (type returned by torch.jit.trace) and attach
metadata the way we did for ScriptModule. As a result, I've introduced a
separate API torch.fb.jit_trace which returns an instance of
TracedModuleWithMetadata which is a subclass of TopLevelTracedModule. As a
result, we can now attach metadata to this instance.
Reviewed By: dzhulgakov
Differential Revision:
D14117966
fbshipit-source-id:
3eee5eef733cb8d6a219c02e2f41d08698eca326
Konstantin Lopuhin [Sun, 10 Mar 2019 04:06:57 +0000 (20:06 -0800)]
Fix PySlice_Unpack not available on PyPy 3.6 yet (#17836)
Summary:
This is one of the fixes needed to support compilation on PyPy 3.6, see https://github.com/pytorch/pytorch/issues/17835
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17836
Differential Revision:
D14399404
Pulled By: soumith
fbshipit-source-id:
ca650a6e2066aed86ddd3314a95d0cb3c515c633
Ronan Lamy [Sat, 9 Mar 2019 19:38:05 +0000 (11:38 -0800)]
PyPy compatibility: let unmodified slots be inherited in the standard way (#17837)
Summary:
This is needed to fix a segfault on PyPy 3.6, see https://bitbucket.org/pypy/pypy/issues/2968/segfault-calling-cpyext_tp_new_tuple and https://github.com/pytorch/pytorch/issues/17835
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17837
Differential Revision:
D14399408
Pulled By: soumith
fbshipit-source-id:
75328a30018313d3223dd3e3eef9240a416c049b
Junjie Bai [Sat, 9 Mar 2019 05:50:20 +0000 (21:50 -0800)]
Run fp16 resnet50 training in bench script (#17831)
Summary:
cc xw285cornell
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17831
Differential Revision:
D14398532
Pulled By: bddppq
fbshipit-source-id:
37c03cc2eebe3a6083e05631cb6ff03474e4a8a2
Summer Deng [Sat, 9 Mar 2019 03:00:43 +0000 (19:00 -0800)]
Int8 FC performance debugging (#17700)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17700
Add performance debugging utilities in DNNLOWP FC operator and the python script
Reviewed By: amylittleyang
Differential Revision:
D14321299
fbshipit-source-id:
50dbd7b352a1da5d2ecb659d8003e71e70750063
Xiaomeng Yang [Sat, 9 Mar 2019 01:35:17 +0000 (17:35 -0800)]
Optimize LayerNormOp (#17604)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17604
Optimize LayerNormOp
i-am-not-moving-c2-to-c10
Reviewed By: houseroad
Differential Revision:
D14274175
fbshipit-source-id:
a7aa263a1b0eb109682d2be99306e7b2cdcc0faf
Roy Li [Sat, 9 Mar 2019 00:39:04 +0000 (16:39 -0800)]
Remove some simple use cases of Type::ScalarType()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17529
Reviewed By: ezyang
Differential Revision:
D14237932
fbshipit-source-id:
be633a1fc19215d53cfe083fdd7196acf2b7dd2f
Roy Li [Sat, 9 Mar 2019 00:39:04 +0000 (16:39 -0800)]
Change Dispatch.h to use ScalarType over Type
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17527
Reviewed By: zou3519
Differential Revision:
D14235395
fbshipit-source-id:
3f53e33f6794f1f14c2edf79014b8ef8397822c5
Lu Fang [Sat, 9 Mar 2019 00:27:00 +0000 (16:27 -0800)]
Revert
D14361993: [pytorch][PR] [Onnx] - refactoring serialization of ONNX initializers to be name-based
Differential Revision:
D14361993
Original commit changeset:
da93e945d557
fbshipit-source-id:
15eea001fbcd059ac13903405aeb9ea182c6ee8b
James Reed [Fri, 8 Mar 2019 23:33:34 +0000 (15:33 -0800)]
Open registration for c10 thread pool (#17788)
Summary:
1. Move ATen threadpool & open registration mechanism to C10
2. Move the `global_work_queue` to use this open registration mechanism, to allow users to substitute in their own
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17788
Reviewed By: zdevito
Differential Revision:
D14379707
Pulled By: jamesr66a
fbshipit-source-id:
949662d0024875abf09907d97db927f160c54d45
David Riazati [Fri, 8 Mar 2019 23:26:25 +0000 (15:26 -0800)]
Cast nn.Upsample.scale_factor to a float (#17732)
Summary:
Fixes #17106
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17732
Differential Revision:
D14388192
Pulled By: driazati
fbshipit-source-id:
d9c9e87a7c6db63c1de3ddebbb8dcf619f0dc34d
Edward Yang [Fri, 8 Mar 2019 22:30:15 +0000 (14:30 -0800)]
Fix lint in run_test.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17815
Reviewed By: eellison
Differential Revision:
D14390308
fbshipit-source-id:
22efd62a1bbd1fc8155a942d7160d5b7d3158e6b
Edward Yang [Fri, 8 Mar 2019 22:19:23 +0000 (14:19 -0800)]
Fix lint in test/common_utils.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17814
Reviewed By: eellison
Differential Revision:
D14390194
fbshipit-source-id:
b4b3bbe20a15d0b9ed127b255e01c0d6d0832c1b
Roy Li [Fri, 8 Mar 2019 22:05:01 +0000 (14:05 -0800)]
Replace tensor.type().scalarType() calls with tensor.scalar_type()
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17515
Reviewed By: ezyang
Differential Revision:
D14233250
fbshipit-source-id:
6c7af8d2291c0c2b148001b30cf03834f34366c0
Yinghai Lu [Fri, 8 Mar 2019 21:15:05 +0000 (13:15 -0800)]
Catch exceptions in bound_shape_inference (#17775)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17775
Handles use input shape hint properly.
Reviewed By: zrphercule
Differential Revision:
D14368735
fbshipit-source-id:
504cd96589e47aa432617e56362aa6b01a25ba9b
Sebastian Messmer [Fri, 8 Mar 2019 20:33:31 +0000 (12:33 -0800)]
refactor caffe2 operator constructors - 11/9 (#17722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17722
clangr codemod
Reviewed By: ezyang
Differential Revision:
D14350584
fbshipit-source-id:
adef54cedc9409b4fb365f6644e2621a9e47b2ff
Edward Yang [Fri, 8 Mar 2019 20:15:49 +0000 (12:15 -0800)]
Suppress C408 lint (don't use dict constructor) (#17813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17813
We have a lot of manually written out dict() constructors,
and (1) I don't think use of curly brace syntax is much
of an improvement and (2) it seems like a waste of time to
fix them all.
Reviewed By: eellison
Differential Revision:
D14390136
fbshipit-source-id:
6199bef4dea75b6079bcb9d9e8acf20a2e1a86e1
Christian Puhrsch [Fri, 8 Mar 2019 19:37:01 +0000 (11:37 -0800)]
Add matches_jit_signature to recent native functions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17805
Differential Revision:
D14388004
Pulled By: cpuhrsch
fbshipit-source-id:
c50580b6fe1e9cfefed91aaa526376325d9f9c0d
peterjc123 [Fri, 8 Mar 2019 18:39:06 +0000 (10:39 -0800)]
Add /MD to prevent linking errors on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17799
Differential Revision:
D14385777
Pulled By: ezyang
fbshipit-source-id:
8c1d9f80c48399087f5fae4474690e6d80d740e6
Dmytro Dzhulgakov [Fri, 8 Mar 2019 18:36:29 +0000 (10:36 -0800)]
Change message on unknown db type to be friendly (#17795)
Summary:
CreateDB actually returns nullptr when db type is unknown and throws when the file is missing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17795
Reviewed By: ezyang
Differential Revision:
D14383226
Pulled By: dzhulgakov
fbshipit-source-id:
1dcf75a6b4ba8b64a24d4e5daf02db3189d56b7b
David Riazati [Fri, 8 Mar 2019 18:29:51 +0000 (10:29 -0800)]
Trace rnn max_batch_size (#17727)
Summary:
This causes the tracer to record the select / cast to int operation instead of just an int constant
Fixes #15319 but relies on a fix for #17583 first
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17727
Differential Revision:
D14377886
Pulled By: driazati
fbshipit-source-id:
59453def54ba72756303f723993844dbeb5d2f8b
Sebastian Messmer [Fri, 8 Mar 2019 18:19:49 +0000 (10:19 -0800)]
Remove legacy way of exposing caffe2 operators to PyTorch (#17742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17742
This path isn't used anymore, and is incompatible with the changes stacked on top of this diff.
Removing it.
cc bwasti to check and confirm these can really be deleted
Reviewed By: ezyang
Differential Revision:
D14362426
fbshipit-source-id:
32cdc19f28c2a981ae1e204901420998367ee588
Gregory Chanan [Fri, 8 Mar 2019 17:41:33 +0000 (09:41 -0800)]
Remove 'Tensor' key from ATen codegen. (#17782)
Summary:
We used to have different ATen Tensor types, but we don't anymore. This was just being maintained by a codegen'ed comment.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17782
Reviewed By: ezyang
Differential Revision:
D14378004
Pulled By: gchanan
fbshipit-source-id:
1bbf276393a391252d372cc385230c784bd78588
Gregory Chanan [Fri, 8 Mar 2019 17:39:03 +0000 (09:39 -0800)]
Remove ProcessorSpecificPlugin. (#17789)
Summary:
It doesn't seem to be used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17789
Reviewed By: ezyang
Differential Revision:
D14382423
Pulled By: gchanan
fbshipit-source-id:
0ac3236c48979a1b2bcd615e307e55f10fd8eb77
Gregory Chanan [Fri, 8 Mar 2019 17:38:48 +0000 (09:38 -0800)]
Remove THPPlugin. (#17790)
Summary:
It doesn't seem to be used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17790
Reviewed By: ezyang
Differential Revision:
D14380897
Pulled By: gchanan
fbshipit-source-id:
3c3884a08c3b6c1489347d439509b19e079c5861
Edward Yang [Fri, 8 Mar 2019 15:23:16 +0000 (07:23 -0800)]
Replace tens with hundreds.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17752
Differential Revision:
D14366743
fbshipit-source-id:
39f6ac08180d780866e284024918d9abd197d239
Tim Khatkevich [Fri, 8 Mar 2019 13:43:17 +0000 (05:43 -0800)]
Support failback for more operators in ideep (#17747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17747
RMACRegions, Normalize and RoIPooling
Reviewed By: dskhudia
Differential Revision:
D14365096
fbshipit-source-id:
dafcb7077515e03c2880832a442015b70fc7140d
Mikhail Zolotukhin [Fri, 8 Mar 2019 09:08:17 +0000 (01:08 -0800)]
Cleanup include files in jit/passes/common_subexpression_elimination.h.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17784
Differential Revision:
D14381529
Pulled By: ZolotukhinM
fbshipit-source-id:
e32e17ee644ef888a6d56a8ee3648e7ac21758bf
Christian Puhrsch [Fri, 8 Mar 2019 07:31:00 +0000 (23:31 -0800)]
Use return names in JIT operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17638
Differential Revision:
D14295606
Pulled By: cpuhrsch
fbshipit-source-id:
62040ac65434411357808735f0fe6cd33cc1c30f
Jerry Zhang [Fri, 8 Mar 2019 02:31:33 +0000 (18:31 -0800)]
Change ConvPoolOp<Context>::SetOutputSize to ConvPoolOp<Context>::GetOutputSize (#17764)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17764
Original commit changeset:
f1923fdca4a1
reverted int8 ops fixes the original runtime regression.
We'll ignore the memory regression since it is flaky, see
D14228484
Reviewed By: dzhulgakov
Differential Revision:
D13885233
fbshipit-source-id:
ccbe4b94acb44b7b4cb3ae4d73e3f6091e1e1195
Roy Li [Fri, 8 Mar 2019 00:16:43 +0000 (16:16 -0800)]
Clean up some old ScalarType stuff
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17755
Differential Revision:
D14377135
Pulled By: li-roy
fbshipit-source-id:
35305760a1621340ba66c61a193ff61cfedfa7e8
Elias Ellison [Thu, 7 Mar 2019 23:23:16 +0000 (15:23 -0800)]
add reference to flake8-mypy in contributing.md
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17759
Differential Revision:
D14376813
Pulled By: eellison
fbshipit-source-id:
cca1128e967ef7368633b94a3fa3c8e76a4a16f4
vishwakftw [Thu, 7 Mar 2019 22:01:47 +0000 (14:01 -0800)]
Move lerp to ATen, add functionality for tensor weights (#17348)
Summary:
Changelog:
- Remove TH/THC bindings
- Add tensor weights for `lerp`
- Modify derivatives appropriately
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17348
Differential Revision:
D14355845
Pulled By: soumith
fbshipit-source-id:
eaede4c09ee589d77ba6cf52583510ea8e3a2fcf
Iurii Zdebskyi [Thu, 7 Mar 2019 21:38:59 +0000 (13:38 -0800)]
Refactor dispatcher (#17753)
Summary:
This is a side PR for a bool tensor feature. The idea of this change came from a feedback received in this [PR](https://github.com/pytorch/pytorch/pull/17376).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17753
Differential Revision:
D14367989
Pulled By: izdeby
fbshipit-source-id:
4fa380e56e20f18e480be68920170dbc3a4eb91c
Wanchao Liang [Thu, 7 Mar 2019 21:31:55 +0000 (13:31 -0800)]
add layernorm to AD
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17702
Differential Revision:
D14368472
Pulled By: wanchaol
fbshipit-source-id:
8db390e39444078258ad1d34ba74d6ddafa5d02b
Hector Yuen [Thu, 7 Mar 2019 20:52:54 +0000 (12:52 -0800)]
move half<->float conversions to oss operators (#17548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17548
expose half float operators to OSS
common/math/Float16.h is the original implementation
this is substituted by caffe2/c10/util/Half.h
from the comments seems like the both implementations don't handle denormals
Reviewed By: jspark1105
Differential Revision:
D14244200
fbshipit-source-id:
f90ba28c5bf6a2b451b429cc4925b8cc376ac651
Lu Fang [Thu, 7 Mar 2019 20:51:09 +0000 (12:51 -0800)]
Fix the update ONNX expect files (#17767)
Summary:
Fix the CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17767
Reviewed By: zrphercule
Differential Revision:
D14370483
Pulled By: houseroad
fbshipit-source-id:
e7b0bbde0797c41f5a010fa206fab80fe2792eb7
Mikhail Zolotukhin [Thu, 7 Mar 2019 19:13:48 +0000 (11:13 -0800)]
Cleanup testFusion/testOne: there are unused arguments.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17737
Differential Revision:
D14366584
Pulled By: ZolotukhinM
fbshipit-source-id:
3c2dd2aabfecca475909e4eec4a077d900795da9
Lu Fang [Thu, 7 Mar 2019 19:03:57 +0000 (11:03 -0800)]
update of fbcode/onnx to
96c58ceeacf0f2b73d752e413e4fd78787a12da3 (#17676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17676
Previous import was
e18bb41d255a23daf368ffd62a2645db55db4c72
Included changes:
- **[96c58ce](https://github.com/onnx/onnx/commit/96c58ce)**: Fix shape inference when auto_pad is notset again (#1830) <Li-Wen Chang>
- **[873ddbb](https://github.com/onnx/onnx/commit/873ddbb)**: More extendable Runner (#1809) <Michał Karzyński>
Reviewed By: zrphercule
Differential Revision:
D14321241
fbshipit-source-id:
12de9021afc61f5435f1b719cccf7b0f4ad73a84
Lu Fang [Thu, 7 Mar 2019 18:51:29 +0000 (10:51 -0800)]
Set the default ONNX opset to the latest stable opset (i.e., 9) (#17736)
Summary:
1) The changes in the new opset won't affect internal pipeline.
2) The CI won't be affected by the ONNX changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17736
Reviewed By: zrphercule
Differential Revision:
D14358710
Pulled By: houseroad
fbshipit-source-id:
4ef15d2246b50f6875ee215ce37ecf92d555ca6a
David Riazati [Thu, 7 Mar 2019 18:41:13 +0000 (10:41 -0800)]
Add module attributes (#17309)
Summary:
Similar to `nn.Parameter`s, this PR lets you store any `IValue` on a module as an attribute on a `ScriptModule` (only from the Python front-end currently). To mark something as an attribute, it should wrapped in `jit.Attribute(value, type)` (ex. `self.table = torch.jit.Attribute(table, Dict[str, torch.Tensor])`)
Followup Work:
* (de)serializing for use in C++
* change `self.training` to be a `bool` attribute instead of a buffer
* mutable attributes
* string frontend support
* documentation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17309
Differential Revision:
D14354316
Pulled By: driazati
fbshipit-source-id:
67e08ab5229366b67fbc837e67b58831a4fb3318
Spandan Tiwari [Thu, 7 Mar 2019 18:06:17 +0000 (10:06 -0800)]
- refactoring serialization of ONNX initializers to be name-based (#17420)
Summary:
Currently, serialization of model parameters in ONNX export depends on the order in which they are stored in a container (`list` on Python side and `std::vector` on C++ side). This has worked fine till now, but if we need to do any pass on that graph that mutates the parameter list, then strictly order-based serialization may not work.
This PR is the first in a set to bring in more passes (such as constant folding) related to ONNX export. This PR lays the groundwork by moving the serialization in ONNX export from order-based to name based approach, which is more amenable to some of the passes.
houseroad - As discussed this change uses a map for export, and removes the code from `export.cpp` that relies on the order to compute initializer names.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17420
Differential Revision:
D14361993
Pulled By: houseroad
fbshipit-source-id:
da93e945d55755c126de06641f35df87d1648cc4
Lara Haidar-Ahmad [Thu, 7 Mar 2019 17:59:28 +0000 (09:59 -0800)]
ONNX Export for Max and Average Pooling in CEIL_MODE
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/16769
Differential Revision:
D14362175
Pulled By: houseroad
fbshipit-source-id:
65cfb1dfba6a43d39cc85374add368fe8e4e5645
Elias Ellison [Thu, 7 Mar 2019 17:12:35 +0000 (09:12 -0800)]
use flake8-mypy (#17721)
Summary:
Use flake8 installed with mypy checks so that our linter matches fbcode. Mypy type errors also provide valuable signal
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17721
Differential Revision:
D14357778
Pulled By: eellison
fbshipit-source-id:
d8c9ea3fe3b5f550c3b70fe259e0eabf95e4c92d
Jongsoo Park [Thu, 7 Mar 2019 10:17:42 +0000 (02:17 -0800)]
use fp16<->fp32 intrinsic (#17496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17496
As title.
Reviewed By: hyuen
Differential Revision:
D14222907
fbshipit-source-id:
d5d6c032e725ca8b52aca2be7401ec3c59f6a242
Ahmed Aly [Thu, 7 Mar 2019 09:03:51 +0000 (01:03 -0800)]
Implement a Caffe2 standalone LSTM operator (#17726)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17726
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17725
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17461
Implementing a standalone LSTM Operator in Caffe2 adopted from this Aten implementation: diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/native/RNN.cpp. The most tricky thing in this exercise was that caffe2::Tensor has no copy constructor that made it necessary to implement a custom templated copy constructor for the different Tensor containers used in the code. Also there was no way to use off-the-shelf C2 operators in my code easily so I had to copy some code that is doing basic matmul, cat, split, transpose and linear as utility functions.
Two things missing:
- Profiling this implementation against the current ONNXified LSTM op
- Make this operator available to use in PyTorch
Reviewed By: dzhulgakov
Differential Revision:
D14351575
fbshipit-source-id:
3b99b53212cf593c7a49e45580b5a07b90809e64
Sebastian Messmer [Thu, 7 Mar 2019 07:50:14 +0000 (23:50 -0800)]
caffe2:libtorch_cuda depends on caffe2:caffe2_gpu (#17729)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17729
When doing "import torch" in fbcode, previously the caffe2 cuda kernels weren't loaded because libcaffe2_gpu.so wasn't loaded.
Once you also did "from caffe2.python import workspace", then the cuda kernels were loaded because that triggered a runtime mechanism for loading libcaffe2_gpu.so.
We want the cuda kernels to always be available, so this diff adds a dependency from caffe2:libtorch_cuda to caffe2:caffe2_gpu.
Reviewed By: ezyang
Differential Revision:
D14353498
fbshipit-source-id:
76a9fe69f231b308ab40eac393bb216c6fad3658
Jongsoo Park [Thu, 7 Mar 2019 07:26:27 +0000 (23:26 -0800)]
add tensor and cost inference functions (#17684)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17684
Adding tensor and cost inference functions to more int8 operators.
Reviewed By: yinghai
Differential Revision:
D14174746
fbshipit-source-id:
dfad975fa75899565c8fb61f1b7747a9206ebd22
Lara Haidar [Thu, 7 Mar 2019 06:35:12 +0000 (22:35 -0800)]
ONNX Export Narrow op
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17550
Differential Revision:
D14350401
Pulled By: houseroad
fbshipit-source-id:
4d88079bb7a8bbd270b0272009826eb3b202cc33
Yinghai Lu [Thu, 7 Mar 2019 03:55:39 +0000 (19:55 -0800)]
Keep the dim_type of hinted shape as BATCH if possible (#17734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17734
If input is not BATCH, we will skip adjust its batch size during onnxifi transformation. So when we take hints, we take it as CONSTANT but later need to change it to BATCH if possible.
Reviewed By: jackm321
Differential Revision:
D14355983
fbshipit-source-id:
63eb54a44afb1565c71486fdd73db07ca0ac4fd4
jwu [Thu, 7 Mar 2019 03:37:03 +0000 (19:37 -0800)]
fix different round behavior on CPU and GPU #16498 (#17443)
Summary:
xxtemp, colesbury, bhushan23, zou3519, convert gpu round behavior to half-to-even, consistent with torch cpu version and numpy. You feedback are welcomed.
See #16498
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17443
Differential Revision:
D14261786
Pulled By: VitalyFedyunin
fbshipit-source-id:
98156436b545d72769831a89e2775d43ad913ebc
zou3519 [Thu, 7 Mar 2019 01:37:13 +0000 (17:37 -0800)]
Warn about memory overlaps on expanded tensors (#17576)
Summary:
Eventually we should remove these when we're certain that all our ops
handle memory overlaps correctly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17576
Differential Revision:
D14349990
Pulled By: zou3519
fbshipit-source-id:
c3a09f6113b9b1bf93e7f13c0b426c45b2cdf21f
Tongzhou Wang [Wed, 6 Mar 2019 23:35:25 +0000 (15:35 -0800)]
fix exp fam. formula
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17719
Differential Revision:
D14349029
Pulled By: soumith
fbshipit-source-id:
cf016756a9319436f7379e8377f8bd1e1b672b40
Sebastian Messmer [Wed, 6 Mar 2019 23:08:44 +0000 (15:08 -0800)]
refactor caffe2 operator constructors - 10/9 (#17659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17659
clangr codemod
Reviewed By: ezyang
Differential Revision:
D14304675
fbshipit-source-id:
45fbd84c50651a70ae29bf46df3322715e99d225
Lu Fang [Wed, 6 Mar 2019 22:59:16 +0000 (14:59 -0800)]
Improve ONNX symbolic for logsoftmax and softmax (#17672)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17672
support dtype in the onnx symbolic
Reviewed By: zrphercule
Differential Revision:
D14313987
fbshipit-source-id:
e9364621b3f795191d880599711dfbcb220d0e31
peter [Wed, 6 Mar 2019 22:40:05 +0000 (14:40 -0800)]
Enable using CMD when building cpp extensions on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17706
Differential Revision:
D14346482
Pulled By: ezyang
fbshipit-source-id:
7c85e51c701f6c0947ad324ef19fafda40ae1cb9
Yinghai Lu [Wed, 6 Mar 2019 22:24:02 +0000 (14:24 -0800)]
Do not rename net boundary inputs/outputs during ssaRewrite. (#17545)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17545
This diff avoids renaming boundary inputs of net during onnxifi transform.
It also removes adding mappings for the initializer during onnxifi op creation.
Thus gets read of the mapped ws creation during onnxifi op creation.
Reviewed By: zrphercule
Differential Revision:
D14243161
fbshipit-source-id:
6eafa920c45f6a6bfacbbb443e8e84cf9778644c
Sebastian Messmer [Wed, 6 Mar 2019 21:47:27 +0000 (13:47 -0800)]
Reapply
D14078519 (#17596)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17596
Was reverted before, now fixed version.
Reviewed By: ezyang
Differential Revision:
D14270288
fbshipit-source-id:
c72490b5d02cc6098cb60145fa9a842b3c9a24c5
eellison [Wed, 6 Mar 2019 21:41:13 +0000 (13:41 -0800)]
Batch of expect file removals (#17581)
Summary:
Another batch of removing expect files.
One note - I removed the Batched expect files without adding equivalent tests since they are already being tested in another ways, and we are no longer actively maintaining that project.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17581
Differential Revision:
D14343578
Pulled By: eellison
fbshipit-source-id:
ce0b1fd2b5b4ec80ad9003bab1b58f41645d3da6
jiej [Wed, 6 Mar 2019 21:36:14 +0000 (13:36 -0800)]
(#14267)
Summary:
- Summary:
Added synchronized batch normalization, allows synchronization of stats across mini-batches between processes within a process group.
Current implementation uses a mixture of extended ATen native functions (cpp cuda extension) + torch.nn.modules (c10d python API)
- User-facing api:
1. torch.nn.utils.convert_sync_batchnorm(modules, process_group=None)
2. torch.nn.SyncBatchNorm(num_features, eps=1e-5, momentum=0.1, affine=True, track_running_stats=True, ***process_group=None***)
- supported use case:
DistributedDataParallel with ***single-gpu multi-process***
a. User creates model containing `torch.nn.SyncBatchNorm` layers through one of the ways listed below:
1. use layers directly:
torch.nn.SyncBatchNorm(...)
similar API as with torch.nn.BatchNormXd(...)
with added argument `process_group` which is used to limit the scope of
synchronization within each process group. Default value is None, which
implies synchronization across all GPUs
2. use torch.nn.utils.convert_sync_batchnorm(modules, process_group)
recursively convert all `torch.nn.BatchNormXd` into `torch.nn.SyncBatchNorm`
preserving values of parameters/buffers.
the utility function also allows user to specify process_group value to all
converted layers.
b. user wraps their model with
`torch.distributed.parallel.DataParallelDistributed`, from this point, user
should follow the general guidelines for DDP use guide
- Error checking
For use cases not supported, we error out:
1. Application launched without ddp:
> import torch
> sbn = torch.nn.SyncBatchNorm(10).cuda()
> inp = torch.randn(5, 10, 3, 3).cuda()
> sbn(inp) --> Error!
> AttributeError: SyncBatchNorm is only supported within torch.nn.parallel.DistributedDataParallel
2. Application launched using DDP with multi-GPU per-process:
> ddp_module = nn.parallel.DistributedDataParallel(module, device_ids=device_ids, output_device=args.local_rank)
> ValueError: SyncBatchNorm is only supported for DDP with single GPU per process
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14267
Differential Revision:
D14270035
Pulled By: ezyang
fbshipit-source-id:
4956d8fa565c32e9df5408d53719ff9f945f4d6d
Tongzhou Wang [Wed, 6 Mar 2019 21:06:41 +0000 (13:06 -0800)]
Update ModuleDict doc about order
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17717
Differential Revision:
D14346557
Pulled By: ezyang
fbshipit-source-id:
2484c7d8105f9aa8bce5567d1fa2d4f587cc9cc2
Pieter Noordhuis [Wed, 6 Mar 2019 20:30:05 +0000 (12:30 -0800)]
Update CODEOWNERS (#17720)
Summary:
teng-li is passing the baton to mrshenli. Thanks for all your work on distributed teng-li!! :tada:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17720
Differential Revision:
D14350120
Pulled By: pietern
fbshipit-source-id:
edfe784520c54630203cc8fbb296455d3dbf341b
Lara Haidar-Ahmad [Wed, 6 Mar 2019 20:05:48 +0000 (12:05 -0800)]
ONNX Export Argmin and Argmax ops
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17382
Differential Revision:
D14338811
Pulled By: houseroad
fbshipit-source-id:
be07548d8063d1aa94f1801c18137738365b85fb
Lu Fang [Wed, 6 Mar 2019 20:02:34 +0000 (12:02 -0800)]
Turn atol to 1e-5 when comparing the end to end results (#17708)
Summary:
results smaller than 1e-5 don't make sense.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17708
Differential Revision:
D14348893
Pulled By: houseroad
fbshipit-source-id:
5e07c38e5b58b27b61fae63bfc3c21e2fe5629fe
Elias Ellison [Wed, 6 Mar 2019 19:42:19 +0000 (11:42 -0800)]
remove loop expects (#17695)
Summary:
Replace loop unrolling expect files with assertions on the output IR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17695
Differential Revision:
D14347105
Pulled By: eellison
fbshipit-source-id:
1703b4ca32bc1c67c01fc4330b0e6eb66feaa103
youkaichao [Wed, 6 Mar 2019 19:31:50 +0000 (11:31 -0800)]
typo fix
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17653
Differential Revision:
D14302003
Pulled By: ezyang
fbshipit-source-id:
8ad90985a392b07127c7e315d4e74ce77962b573
Deepali Chourasia [Wed, 6 Mar 2019 19:25:26 +0000 (11:25 -0800)]
omit group conv NHWC test for GPU (#17715)
Summary:
Observed the test `TestGroupConvolution.test_group_convolution` to fail with the following error:
```
Falsifying example: test_group_convolution(self=<caffe2.python.operator_test.group_conv_test.TestGroupConvolution testMethod=test_group_convolution>, stride=3, pad=0, kernel=5, size=8, group=4, input_channels_per_group=7, output_channels_per_group=8, batch_size=2, order='NHWC', engine='', use_bias=False, gc=, dc=[, device_type: 1])
You can reproduce this example by temporarily adding reproduce_failure('3.59.1', b'AAAA') as a decorator on your test case
```
This example generated by hypothesis has `group=2, order='NHWC' and dc=[, device_type: 1])`.
I think this example should be skipped.
I have mimicked the change corresponding to [PR#13554](https://github.com/pytorch/pytorch/pull/13554) to skip this example.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17715
Differential Revision:
D14346642
Pulled By: ezyang
fbshipit-source-id:
b1f1fef09f625fdb43d31c7213854e61a96381ba
Elias Ellison [Wed, 6 Mar 2019 19:21:09 +0000 (11:21 -0800)]
fix tuple matching (#17687)
Summary:
Check for Tuple Matching in isSubvalueOf, since they may contain container types that need to be recursed within isSubvalueOf
Fix for https://github.com/pytorch/pytorch/issues/17650
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17687
Differential Revision:
D14324642
Pulled By: eellison
fbshipit-source-id:
7f1e019875286b2640a3b9c003d1635dda8cf543
Spandan Tiwari [Wed, 6 Mar 2019 19:03:32 +0000 (11:03 -0800)]
Temporarily disable Upsample operator tests in pytorch-onnx tests (#17696)
Summary:
In discussion with houseroad, because Upsample op is being updated in ONNX https://github.com/onnx/onnx/pull/1773 and these tests are blocking it. These tests will be updated once the ONNX PR goes in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17696
Differential Revision:
D14338845
Pulled By: houseroad
fbshipit-source-id:
cfaf8cf1ab578ae69dd3bf21b1c0681b572b9b6f
peter [Wed, 6 Mar 2019 18:41:20 +0000 (10:41 -0800)]
Add check for x64 Python before setup (#17707)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/17657.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17707
Differential Revision:
D14346705
Pulled By: ezyang
fbshipit-source-id:
5daafacdb99eb9a9c6517263d10f20c79f920d24
Edward Yang [Wed, 6 Mar 2019 18:32:38 +0000 (10:32 -0800)]
Replace caffe2::DeviceGuard with c10::cuda::CUDAGuard (#17623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17623
Despite it's generic sounding name, caffe2::DeviceGuard actually
only worked on CUDA devices. Rename it to something that more
clearly spells out its applicability.
I'm not sure if it's the right call, but in this patch I added
'using CUDAGuard = c10::cuda::CUDAGuard', as this seems to be more
in-line with how the Caffe2 codebase is currently written. More
idiomatic c10 namespace style would be to say cuda::CUDAGuard.
Willing to change this if people shout.
This is a respin of
D13156470 (#14284)
Reviewed By: dzhulgakov
Differential Revision:
D14285504
fbshipit-source-id:
93b8ab938b064572b3b010c307e1261fde0fff3d
Duc Ngo [Wed, 6 Mar 2019 18:31:00 +0000 (10:31 -0800)]
Remove nomscheduler (#17693)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17693
Remove nomscheduler tool
Reviewed By: yinghai
Differential Revision:
D14328168
fbshipit-source-id:
674d0e18596a4dc2bbb6b8d321f4066c4fc454ab
bhushan [Wed, 6 Mar 2019 18:28:49 +0000 (10:28 -0800)]
index operation support for torch.HalfTensor (#17645)
Summary:
- Test cases added
1. indexing for half tensor
2. setting for half tensor
fixes #17161
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17645
Differential Revision:
D14302069
Pulled By: ezyang
fbshipit-source-id:
100f141c07046f200c904e27c5882a9417bccda0
Soumith Chintala [Wed, 6 Mar 2019 16:41:42 +0000 (08:41 -0800)]
Revert
D14160172: Implement a Caffe2 standalone LSTM operator
Differential Revision:
D14160172
Original commit changeset:
c33e3f9e8aea
fbshipit-source-id:
cffe35d93f0ac75ca93aa98a3b82af3d372f2fc1
Tongzhou Wang [Wed, 6 Mar 2019 07:14:25 +0000 (23:14 -0800)]
fix typo in hub doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17705
Differential Revision:
D14338380
Pulled By: ailzhang
fbshipit-source-id:
d53eece30bede88a642e718ee6f829ba29c7d1c4
Ailing Zhang [Wed, 6 Mar 2019 04:47:02 +0000 (20:47 -0800)]
fix dropout AD & rename range to rangelist (#17691)
Summary:
fixes #17669
Address apaszke 's comments in #17523
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17691
Differential Revision:
D14328083
Pulled By: ailzhang
fbshipit-source-id:
9ec4a54f13bfd1aaf4b1821dd00c31793ac07a44
Chaitanya Sri Krishna Lolla [Wed, 6 Mar 2019 02:41:20 +0000 (18:41 -0800)]
enable use of MIOpen for depthwise convolutions (#17685)
Summary:
* added miopen conv mode to be used for setConvDescriptor
* added miopen depthwise convolutions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17685
Differential Revision:
D14327811
Pulled By: bddppq
fbshipit-source-id:
d5bdc1abafd5f39694fadf3f9275b9d880c5b115
Ahmed Aly [Wed, 6 Mar 2019 01:31:51 +0000 (17:31 -0800)]
Implement a Caffe2 standalone LSTM operator (#17461)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17461
Implementing a standalone LSTM Operator in Caffe2 adopted from this Aten implementation: diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/native/RNN.cpp. The most tricky thing in this exercise was that caffe2::Tensor has no copy constructor that made it necessary to implement a custom templated copy constructor for the different Tensor containers used in the code. Also there was no way to use off-the-shelf C2 operators in my code easily so I had to copy some code that is doing basic matmul, cat, split, transpose and linear as utility functions.
Two things missing:
- Profiling this implementation against the current ONNXified LSTM op
- Make this operator available to use in PyTorch
Reviewed By: dzhulgakov
Differential Revision:
D14160172
fbshipit-source-id:
c33e3f9e8aeae578b64d97593cb031a251216029
Soumith Chintala [Tue, 5 Mar 2019 22:26:20 +0000 (14:26 -0800)]
Fix nll_loss crash on cpu where ignore_index is out of bounds (#17328)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15508
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17328
Differential Revision:
D14322629
Pulled By: soumith
fbshipit-source-id:
7d02f372be78794782c18affcfc109ce30b1e91c
Johannes M Dieterich [Tue, 5 Mar 2019 20:49:25 +0000 (12:49 -0800)]
Add '--hip-clang-launch' to favor <<<>>>-based launch. (#17686)
Summary:
hip-clang uses triple chevron kernel dispatch syntax. Add an option to the hipification script to skip translating triple chevron to hipLaunchKernelGGL.
Once we switch to hip-clang, this option will be default and subsequently removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17686
Differential Revision:
D14327810
Pulled By: bddppq
fbshipit-source-id:
5e1512325077dd3ebb8fb9b5bf35fd1f8d9a4dc3
Sam Gross [Tue, 5 Mar 2019 17:38:23 +0000 (09:38 -0800)]
Improve caching allocator for Pascal and newer GPUs. (#17120)
Summary:
```
NVIDIA changed the CUDA allocation behavior on Pascal GPUs. The
page size increased from 1MB to 2MB and allocations larger than 1MB
are now always page-aligned. Previously, allocations larger than 1MB
were aligned to 128KB boundaries.
This interacted poorly with the caching allocator. The remaining
memory in a page could only be filled by small cudaMalloc calls, but
the caching allocator never cudaMalloc's a chunk smaller than 1MB.
This behavior could also cause a large discrepancy between the memory
usage reported by nvidia-smi and the memory usage reported by
PyTorch, because nvidia-smi counts a partially used page as "full",
while PyTorch only counts the actual memory requested.
This PR makes a few changes to the caching allocator to better support
Pascal and Volta GPUs:
- All cudaMalloc calls are now multiples of 2MB (the page size)
- Requests between 1-10MB allocate (and split) a 20MB block to
reduce wasted space due to rounding
- Small requests are now packed into 2MB blocks (instead of 1MB)
This improves Mask R-CNN memory usage by 10-20% in internal tests on
Volta GPUs. Maxwell performance seems to be largely unchanged, but
it's possible that some use cases suffer slightly.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17120
Differential Revision:
D14301536
Pulled By: colesbury
fbshipit-source-id:
a8282315ea8f7b8ca149b5066fdeaecd0d404edf
Davide Libenzi [Tue, 5 Mar 2019 15:24:27 +0000 (07:24 -0800)]
Turn the Half::from_bits into a constexpr function to avoid unresolve… (#17661)
Summary:
…d symbol errors when building in DEBUG mode.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17661
Differential Revision:
D14319610
Pulled By: soumith
fbshipit-source-id:
6c508a37155e29260f403d7174f343aa1ff32385
Elias Ellison [Tue, 5 Mar 2019 06:38:41 +0000 (22:38 -0800)]
Remove Expect Files from python / tracing / script interop
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17622
Differential Revision:
D14308307
Pulled By: eellison
fbshipit-source-id:
bda249d38ac2570000a12b0ca328c26233ecefe8
peterjc123 [Tue, 5 Mar 2019 05:50:53 +0000 (21:50 -0800)]
Enable apex on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17675
Differential Revision:
D14320473
Pulled By: soumith
fbshipit-source-id:
cb696984f5196f9b8b50722b4fe927bb6407c322
Soumith Chintala [Tue, 5 Mar 2019 04:28:06 +0000 (20:28 -0800)]
bump docker build to upgrade magma to 2.5.0 (#17674)
Summary:
upgrades magma in docker build.
vishwakftw
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17674
Differential Revision:
D14320187
Pulled By: soumith
fbshipit-source-id:
7887f65fb703b802fc6231408b55ad9c4039882b
Sebastian Messmer [Mon, 4 Mar 2019 23:56:21 +0000 (15:56 -0800)]
refactor caffe2 operator constructors - 1/9 (#17082)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17082
clangr codemod
Reviewed By: ezyang
Differential Revision:
D14078498
fbshipit-source-id:
f7f65d6d81c7942293f53fdaa61f756d8b7360c1
Sebastian Messmer [Mon, 4 Mar 2019 22:53:55 +0000 (14:53 -0800)]
Expose cuda kernel for caffe2::GenerateProposals
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17066
Reviewed By: ezyang, wat3rBro
Differential Revision:
D14071130
fbshipit-source-id:
6fe26503f6069c36ec31d6c09b549b932d5db242
Jongsoo Park [Mon, 4 Mar 2019 22:25:19 +0000 (14:25 -0800)]
print warnings when DNNLOWP_16 or DNNLOWP_ROWWISE_16 engine is used (#17176)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17176
As title
Reviewed By: csummersea
Differential Revision:
D14111616
fbshipit-source-id:
1282cb2452c4ad385fd2dc6d3f8c19e9fec715ff
Sebastian Messmer [Mon, 4 Mar 2019 22:17:11 +0000 (14:17 -0800)]
Fix XOutput/XOutputTensor for ivalue based c2 operators (#17599)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17599
XOutput/XOutputTensor was broken for ivalue based operators. This diff fixes that.
Reviewed By: ezyang
Differential Revision:
D14274003
fbshipit-source-id:
b99f020244c66c4e2551dbd32ae0f665cc91b338
Sebastian Messmer [Mon, 4 Mar 2019 22:17:11 +0000 (14:17 -0800)]
Fix InputSize/OutputSize for ivalue based operators (#17579)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17579
These methods previously just returned 0 when it was not a legacy operator,
making it impossible to convert some operators.
Reviewed By: dzhulgakov
Differential Revision:
D14253094
fbshipit-source-id:
72bfdcf6da291a4ab80d1e0ceb20984b86edc408
Wanchao Liang [Mon, 4 Mar 2019 21:04:53 +0000 (13:04 -0800)]
Fix clamp fusion on missing limits (#17533)
Summary:
Fixes #17449
Context: before #17186, we don't fuse `clamp` for the case when `min/max` are missing inputs, because they are `prim::None` node, after #17186, we make None a `prim::Constant` node which enables the fusion for `clamp`. But codegen.cpp does not handle the case when `prim::Constant` is not a Double/Int/Bool, this PR makes it so that missing inputs are handled correctly, it is done in the following way:
1. emit nothing when you see `type? = prim::Constant()`
2. when emitRHS, do special casing for aten::clamp
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17533
Differential Revision:
D14238450
Pulled By: wanchaol
fbshipit-source-id:
61a272154754b13e89021bb86002927f02cde19c
Jie [Mon, 4 Mar 2019 21:02:40 +0000 (13:02 -0800)]
int32 indexing for Tensor Iterator Reduction (#17428)
Summary:
1. Enabling int32 indexing for cases where TI cannot accumulate in output due to
incompatible data types (e.g. Welford).
2. Updating Welford kernel to use int32 instead of int64 indexing on GPU.
This change improves performance for torch.var / torch.std
Implementation:
1. Allocated extra buffer to handle accumulation between sub Tensor Iterators.
2. Removed int64 indexing in gpu_reduce_kernel
3. WelfordOps now supports index type / combination typeas a template parameter.
While GPU uses int32_t and float, CPU implementation uses int64_t and double.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17428
Differential Revision:
D14264608
Pulled By: umanwizard
fbshipit-source-id:
3eb54451de925b469dbc1127e5ea7443c4431036
Iurii Zdebskyi [Mon, 4 Mar 2019 20:43:28 +0000 (12:43 -0800)]
Removed all usages of TH_Index_Base (#17591)
Summary:
TH_Index_Base is hard coded to 0 and can be removed from the code base.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17591
Differential Revision:
D14269273
Pulled By: izdeby
fbshipit-source-id:
d844e261f4af7297bad8a81e7d6dcf0a391b94e6
Dmytro Dzhulgakov [Mon, 4 Mar 2019 19:30:43 +0000 (11:30 -0800)]
PyTorch/Caffe2 tensor interop in Python (#17190)
Summary:
Because of two separate python extensions with different pybind
instances I have to go through void* conversion. Since it's hidden from
user, it's fine.
New APIs added on C2 side:
- workspace.FetchTorch('blob')
- workspace.Workspace.current.blobs['blob'].to_torch()
- workspace.FeedBlob('blob', pytorch_tensor)
Works on CPU an GPU.
The only glitches are with resizing because of variable/tensor split.
But data sharing works properly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17190
Reviewed By: ezyang
Differential Revision:
D14163882
Pulled By: dzhulgakov
fbshipit-source-id:
d18e5b8fcae026f393c842a1149e972515732de2
wkcn [Mon, 4 Mar 2019 18:08:04 +0000 (10:08 -0800)]
Fixed typo in aten/src/ATen/native_parse.py (#17641)
Summary:
Hi, there.
There is a typo in aten/src/ATen/native_parse.py, and I fix it.
`std::aray` -> `std::array`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17641
Differential Revision:
D14301981
Pulled By: ezyang
fbshipit-source-id:
a37859cdedcbf6c29333b954486dfa086d6c2176
Martin Schatz [Mon, 4 Mar 2019 17:55:05 +0000 (09:55 -0800)]
Remove GPU dependency from ProfileObserver (#17592)
Summary:
Remove GPU dependency and register ProfileObserver.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17592
Reviewed By: ezyang
Differential Revision:
D14265801
Pulled By: mdschatz
fbshipit-source-id:
f98c0c32653c64a8b087c58ece4f864dfbe1d4b8
Brennan Vincent [Mon, 4 Mar 2019 06:13:27 +0000 (22:13 -0800)]
Don't make factory methods create a tensor and then immediately copy it (#17565)
Summary:
Create a `make_variable` override that moves out of a tensor instead of going through `shallow_copy_and_detach`. Call this override from factory methods like `empty` that create a brand new tensor, do nothing with it, and then copy it into a variable.
Will update this with actual numbers, but it seems to get rid of around 20-40% of the overhead of calling `torch.empty(0)`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17565
Differential Revision:
D14266130
Pulled By: umanwizard
fbshipit-source-id:
f57d5f2ca3f80ee8ee96d50f905e852fd10db941